import tensorflow as tf
M = tf.Variable([0.01],tf.float32)
b = tf.Variable([1.0],tf.float32)
#inputs and outputs
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32) # actual value of y which we already know
Yp = M * x + b # y predicted value
#loss
squareR = tf.square(Yp - y)
loss = tf.reduce_sum(squareR)
#optimize
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(1000):
sess.run(train,{x:[1,2,3,4,5],y:[1.9,2.4,3.7,4.9,5.1]})
print(sess.run([M,b]))
output
[array([ 0.88999945], dtype=float32), array([ 0.93000191], dtype=float32)]
Problem:
when I am changing the values of x and y to
x:[100,200,300,400,500],y:[19,24,37,49,51]
then the output is:
[array([ nan], dtype=float32), array([ nan], dtype=float32)]
please help me out to get slope and y-intercept of linear model.
Adding some print statements to your training loop, we can see what's going on during training:
for i in range(1000):
_, mm, bb = sess.run([train,M,b],{x:[100,200,300,400,500],y:[19,24,37,49,51]})
print(mm, bb)
if np.isnan(mm):
break
print(sess.run([M,b]))
The output:
[ 1118.01000977] [ 4.19999981]
[-12295860.] [-33532.921875]
[ 1.35243170e+11] [ 3.68845632e+08]
[ -1.48755065e+15] [ -4.05696309e+12]
[ 1.63616896e+19] [ 4.46228634e+16]
[ -1.79963571e+23] [ -4.90810521e+20]
[ 1.97943407e+27] [ 5.39846559e+24]
[ -2.17719537e+31] [ -5.93781625e+28]
[ 2.39471499e+35] [ 6.53105210e+32]
[-inf] [-inf]
[ nan] [ nan]
That output means your training is diverging. In this case, lowering the learning rate is one of the possible approaches to fix the problem.
Lowering the learning rate to 0.000001 works, these are the learned M and b after 1000 iterations:
[array([ 0.11159456], dtype=float32), array([ 1.01534212], dtype=float32)]
Related
I have tried the example with keras but was not with LSTM. My model is with LSTM in Tensorflow and I am willing to predict the output in the form of classes as the keras model thus with predict_classes.
The Tensorflow model I am trying is something like this:
seq_len=10
n_steps = seq_len-1
n_inputs = x_train.shape[2]
n_neurons = 50
n_outputs = y_train.shape[1]
n_layers = 2
learning_rate = 0.0001
batch_size =100
n_epochs = 1000
train_set_size = x_train.shape[0]
test_set_size = x_test.shape[0]
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,activation=tf.nn.sigmoid, use_peepholes = True) for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
outputs = outputs[:,n_steps-1,:]
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
I am encoding the with sklearn LabelEncoder as:
encoder_train = LabelEncoder()
encoder_train.fit(y_train)
encoded_Y_train = encoder_train.transform(y_train)
y_train = np_utils.to_categorical(encoded_Y_train)
The data is converted to sparse matrix kinda thing in binary format.
When I tried to predict the output I got the following:
actual==> [[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]]
predicted==> [[0.3112209 0.3690182 0.31357136]
[0.31085992 0.36959863 0.31448898]
[0.31073445 0.3703295 0.31469804]
[0.31177694 0.37011752 0.3145326 ]
[0.31220382 0.3692756 0.31515726]
[0.31232828 0.36947766 0.3149037 ]
[0.31190437 0.36756667 0.31323162]
[0.31339088 0.36542615 0.310322 ]
[0.31598282 0.36328828 0.30711085]]
What I was expecting for the label based on the encoding done. As the Keras model thus. See the following:
predictions = model.predict_classes(X_test, verbose=True)
print("REAL VALUES:",reverse_category(Y_test,axis=1))
print("PRED VALUES:",predictions)
print("REAL COLORS:")
print(encoder.inverse_transform(reverse_category(Y_test,axis=1)))
print("PREDICTED COLORS:")
print(encoder.inverse_transform(predictions))
The output is something like the following:
REAL VALUES: [1 1 1 ... 1 2 1]
PRED VALUES: [2 1 1 ... 1 2 2]
REAL COLORS:
['ball' 'ball' 'ball' ... 'ball' 'bat' 'ball']
PREDICTED COLORS:
['bat' 'ball' 'ball' ... 'ball' 'bat' 'bat']
Kindly, let me know what I can do in the tensorflow model that will get me the result with respect to the encoding done.
I am using Tensorflow 1.12.0 and Windows 10
You are trying to map the predicted class probabilities back to class labels. Each row in the list of output predictions contains the three predicted class probabilities. Use np.argmax to obtain the one with the highest predicted probability in order to map to the predicted class label:
import numpy as np
predictions = [[0.3112209, 0.3690182, 0.31357136],
[0.31085992, 0.36959863, 0.31448898],
[0.31073445, 0.3703295, 0.31469804],
[0.31177694, 0.37011752, 0.3145326 ],
[0.31220382, 0.3692756, 0.31515726],
[0.31232828, 0.36947766, 0.3149037 ],
[0.31190437, 0.36756667, 0.31323162],
[0.31339088, 0.36542615, 0.310322 ],
[0.31598282, 0.36328828, 0.30711085]]
np.argmax(predictions, axis=1)
Gives:
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In this case, class 1 is predicted 9 times.
As noted in the comments: this is exactly what Keras does under the hood, as you'll see in the source code.
Using TensorFlow version 1.3.0 in Python 3.5.2. I'm trying to mimic the functionality of the DNNClassifier in the Iris data tutorial on the TensorFlow website, and am running into difficulties. I'm importing a CSV file with about 155 rows of data and 15 columns, breaking the data into training and test data (where I try to classify either a positive or negative movement), and receive an error when I begin to train my classifier. Here's how the data is set up
#import values from csv
mexicof1 = pd.read_csv('Source/mexicoR.csv')
#construct pandas dataframe
mexico_df = pd.DataFrame(mexicof1)
#start counting from mexico.mat.2.nrow.mexico.mat...1.
mexico_dff = pd.DataFrame(mexico_df.iloc[:,1:16])
mexico_dff.columns = ['tp1_delta','PC1','PC2','PC3','PC4','PC5','PC6','PC7', \
'PC8', 'PC9', 'PC10', 'PC11', 'PC12', 'PC13', 'PC14']
#binary assignment for positive/negative values
for i in range(0,155):
if(mexico_dff.iloc[i,0] > 0):
mexico_dff.iloc[i,0] = "pos"
else:
mexico_dff.iloc[i,0] = "neg"
#up movement vs. down movement classification set up
up = np.asarray([1,0])
down = np.asarray([0,1])
mexico_dff['tp1_delta'] = mexico_dff['tp1_delta'].map({"pos": up, "neg": down})
#Break into training and test data
#data: independent values
#labels: classification
mexico_train_DNN1data = mexico_dff.iloc[0:150, 1:15]
mexico_train_DNN1labels = mexico_dff.iloc[0:150, 0]
mexico_test_DNN1data = mexico_dff.iloc[150:156, 1:15]
mexico_test_DNN1labels = mexico_dff.iloc[150:156, 0]
#Construct numpy arrays for test data
temptrain = []
for i in range(0, len(mexico_train_DNN1labels)):
temptrain.append(mexico_train_DNN1labels.iloc[i])
temptrainFIN = np.array(temptrain, dtype = np.float32)
temptest = []
for i in range(0, len(mexico_test_DNN1labels)):
temptest.append(mexico_test_DNN1labels.iloc[i])
temptestFIN = np.array(temptest, dtype = np.float32)
#set up NumPy arrays
mTrainDat = np.array(mexico_train_DNN1data, dtype = np.float32)
mTrainLab = temptrainFIN
mTestDat = np.array(mexico_test_DNN1data, dtype = np.float32)
mTestLab = temptestFIN
Doing this gives me data that looks like the following:
#Independent value output
mTestDat
Out[289]:
array([[-0.08404002, -3.07483053, 0.41106853, ..., -0.08682428,
0.32954004, -0.36451185],
[-0.31538665, -2.23493481, 1.97653472, ..., 0.35220796,
0.09061374, -0.59035355],
[ 0.44257978, -3.04786181, -0.6633662 , ..., 1.34870672,
0.43879321, 0.26306254],
...,
[ 2.38574553, 0.09045095, -0.09710167, ..., 1.20889878,
0.00937434, -0.06398607],
[ 1.68626559, 0.65349185, 0.23625408, ..., -1.16267788,
0.45464727, -1.14916229],
[ 1.58263958, 0.1223636 , -0.12084256, ..., 0.7947616 ,
-0.47359121, 0.28013545]], dtype=float32)
#Classification labels (up or down movement) output
mTestLab
Out[290]:
array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 1., 0.],
[ 0., 1.],
[ 1., 0.],
........
[ 1., 0.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.]], dtype=float32)
After following the tutorial from this given set up, I can run the code as far as the classifier.train() function before it stops running and gives me the following error:
# Specify that all features have real-value data
feature_columns = [tf.feature_column.numeric_column("x", shape=[mexico_train_DNN1data.shape[1]])]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
optimizer = tf.train.AdamOptimizer(0.01),
n_classes=2) #representing either an up or down movement
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x = {"x": mTrainDat},
y = mTrainLab,
num_epochs = None,
shuffle = True)
#Now, we train the model
classifier.train(input_fn=train_input_fn, steps = 2000)
File "Source\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\canned\head.py", line 174, in _check_labels
(static_shape,))
ValueError: labels shape must be [batch_size, labels_dimension], got (128, 2).
I'm not sure why I'm encountering this error, any help is appreciated.
You're using one-hot ([1, 0] or [0, 1]) encoded labels when DNNClassifier expects a class label (i.e. 0 or 1). Decode a one-hot encoding on the last axis, use
class_labels = np.argmax(one_hot_vector, axis=-1)
Note for the binary it might be quicker to do
class_labels = one_hot_vector[..., 1].astype(np.int32)
though performance difference won't be massive and I'd probably use the more general version in case you add another class later.
In your case, after you've generated your numpy labels, just add
mTrainLab = np.argmax(mTrainLab, axis=-1)
mTestLab = np.argmax(mTestLab, axis=-1)
This is my first attemp at TensorFlow: I am building a Linear Regression model with multiple inputs.
The problem is that the result is always NaN, and I suspect that it is because I am a complete noob with matrix operations using numpy and tensorflow (matlab background hehe).
Here is the code:
import numpy as np
import tensorflow as tf
N_INP = 2
N_OUT = 1
# Model params
w = tf.Variable(tf.zeros([1, N_INP]), name='w')
b = tf.Variable(tf.zeros([1, N_INP]), name='b')
# Model input and output
x = tf.placeholder(tf.float32, [None, N_INP], name='x')
y = tf.placeholder(tf.float32, [None, N_OUT], name='y')
linear_model = tf.reduce_sum(x * w + b, axis=1, name='out')
# Loss as sum(error^2)
loss = tf.reduce_sum(tf.square(linear_model - y), name='loss')
# Create optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss, name='train')
# Define training data
w_real = np.array([-1, 4])
b_real = np.array([1, -5])
x_train = np.array([[1, 2, 3, 4], [0, 0.5, 1, 1.5]]).T
y_train = np.sum(x_train * w_real + b_real, 1)[np.newaxis].T
print('Real X:\n', x_train)
print('Real Y:\n', y_train)
# Create session and init parameters
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Training loop
train_data = {x: x_train, y: y_train}
for i in range(1000):
sess.run(train, train_data)
# Eval solution
w_est, b_est, curr_loss, y_pred = sess.run([w, b, loss, linear_model], train_data)
print("w: %s b: %s loss: %s" % (w_est, b_est, curr_loss))
print("y_pred: %s" % (y_pred,))
And here is the output:
Real X:
[[ 1. 0. ]
[ 2. 0.5]
[ 3. 1. ]
[ 4. 1.5]]
Real Y:
[[-5.]
[-4.]
[-3.]
[-2.]]
w: [[ nan nan]] b: [[ nan nan]] loss: nan
y_pred: [ nan nan nan nan]
You need to add keep_dims=True inside your definition of linear_model. That is,
linear_model = tf.reduce_sum(x * w + b, axis=1, name='out',keep_dims=True)
The reason is that otherwise the result is "flattened", and you cannot subtract y from it.
For example,
'x' is [[1,2,3],
[4,5,6]]
tf.reduce_sum(x, axis=1) is [6, 15]
tf.reduce_sum(x, axis=1, keep_dims=True) is [[6], [15]]
I have the following code based on the MNIST example. It is modified in two ways:
1) I'm not using a one-hot-vector, so I simply use tf.equal(y, y_)
2) My results are binary: either 0 or 1
import tensorflow as tf
import numpy as np
# get the data
train_data, train_results = get_data(2000, 2014)
test_data, test_results = get_data(2014, 2015)
# setup a session
sess = tf.Session()
x_len = len(train_data[0])
y_len = len(train_results[0])
# make placeholders for inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, x_len])
y_ = tf.placeholder(tf.float32, shape=[None, y_len])
# create the weights and bias
W = tf.Variable(tf.zeros([x_len, 1]))
b = tf.Variable(tf.zeros([1]))
# initialize everything
sess.run(tf.initialize_all_variables())
# create the "equation" for y in terms of x
y_prime = tf.matmul(x, W) + b
y = tf.nn.softmax(y_prime)
# construct the error function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_prime, y_)
# setup the training algorithm
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# train the thing
for i in range(1000):
rand_rows = np.random.choice(train_data.shape[0], 100, replace=False)
_, w_out, b_out, ce_out = sess.run([train_step, W, b, cross_entropy], feed_dict={x: train_data[rand_rows, :], y_: train_results[rand_rows, :]})
print("%d: %s %s %s" % (i, str(w_out), str(b_out), str(ce_out)))
# compute how many times it was correct
correct_prediction = tf.equal(y, y_)
# find the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: test_data, y_: test_results}))
for i in range(0, len(test_data)):
res = sess.run(y, {x: [test_data[i]]})
print("RES: " + str(res) + " ACT: " + str(test_results[i]))
The accuracy is always 0.5 (because my test data has about as many 1s as 0s). The values of W and b always seem to increase, probably because the values of cross_entropy are always a vector of all zeros.
When I try and use this model for prediction, the predictions are always 1:
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
What am I doing wrong here?
You seem to be predicting a single scalar, rather than a vector. The softmax op produces a vector-valued prediction for each example. This vector must always sum to 1. When the vector only contains one element, that element must always be 1. If you want to use a softmax for this problem, you could use [1, 0] as the output target where you are currently using [0] and use [0, 1] where you are currently using [1]. Another option is you could keep using just one number, but change the output layer to sigmoid instead of softmax, and change the cost function to be the sigmoid-based cost function as well.
I'd like to add a max norm constraint to several of the weight matrices in my TensorFlow graph, ala Torch's renorm method.
If the L2 norm of any neuron's weight matrix exceeds max_norm, I'd like to scale its weights down so that their L2 norm is exactly max_norm.
What's the best way to express this using TensorFlow?
Here is a possible implementation:
import tensorflow as tf
def maxnorm_regularizer(threshold, axes=1, name="maxnorm", collection="maxnorm"):
def maxnorm(weights):
clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)
clip_weights = tf.assign(weights, clipped, name=name)
tf.add_to_collection(collection, clip_weights)
return None # there is no regularization loss term
return maxnorm
Here's how you would use it:
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.framework import arg_scope
with arg_scope(
[fully_connected],
weights_regularizer=max_norm_regularizer(1.5)):
hidden1 = fully_connected(X, 200, scope="hidden1")
hidden2 = fully_connected(hidden1, 100, scope="hidden2")
outputs = fully_connected(hidden2, 5, activation_fn=None, scope="outs")
max_norm_ops = tf.get_collection("max_norm")
[...]
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_epochs):
for X_batch, y_batch in load_next_batch():
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
sess.run(max_norm_ops)
This creates a 3 layer neural network and trains it with max norm regularization at every layer (with a threshold of 1.5). I just tried it, seems to work. Hope this helps! Suggestions for improvements are welcome. :)
Notes
This code is based on tf.clip_by_norm():
>>> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
>>> print(x.eval())
[[ 0. 0.]
[ 3. 4.]
[ 30. 40.]
[ 300. 400.]]
>>> clip_rows = tf.clip_by_norm(x, clip_norm=10, axes=1)
>>> print(clip_rows.eval())
[[ 0. 0. ]
[ 3. 4. ]
[ 6. 8. ] # clipped!
[ 6.00000048 8. ]] # clipped!
You can also clip columns if you need to:
>>> clip_cols = tf.clip_by_norm(x, clip_norm=350, axes=0)
>>> print(clip_cols.eval())
[[ 0. 0. ]
[ 3. 3.48245788]
[ 30. 34.82457733]
[ 300. 348.24578857]]
# clipped!
Using RafaĆ's suggestion and TensorFlow's implementation of clip_by_norm, here's what I came up with:
def renorm(x, axis, max_norm):
'''Renormalizes the sub-tensors along axis such that they do not exceed norm max_norm.'''
# This elaborate dance avoids empty slices, which TF dislikes.
rank = tf.rank(x)
bigrange = tf.range(-1, rank + 1)
dims = tf.slice(
tf.concat(0, [tf.slice(bigrange, [0], [1 + axis]),
tf.slice(bigrange, [axis + 2], [-1])]),
[1], rank - [1])
# Determine which columns need to be renormalized.
l2norm_inv = tf.rsqrt(tf.reduce_sum(x * x, dims, keep_dims=True))
scale = max_norm * tf.minimum(l2norm_inv, tf.constant(1.0 / max_norm))
# Broadcast the scalings
return tf.mul(scale, x)
It seems to have the desired behavior for 2-dimensional matrices and should
generalize to tensors:
> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
> print x.eval()
[[ 0. 0.] # rows have norms of 0, 5, 50, 500
[ 3. 4.] # cols have norms of ~302, ~402
[ 30. 40.]
[ 300. 400.]]
> print renorm(x, 0, 10).eval()
[[ 0. 0. ] # unaffected
[ 3. 4. ] # unaffected
[ 5.99999952 7.99999952] # rescaled
[ 6.00000048 8.00000095]] # rescaled
> print renorm(x, 1, 350).eval()
[[ 0. 0. ] # col 0 is unaffected
[ 3. 3.48245788] # col 1 is rescaled
[ 30. 34.82457733]
[ 300. 348.24578857]]
Take a look at clip_by_norm function, which does exactly this. It takes a single tensor as input and returns a scaled down tensor.