Tensor Flow softmax Regression Always Predicts 1 - python

I have the following code based on the MNIST example. It is modified in two ways:
1) I'm not using a one-hot-vector, so I simply use tf.equal(y, y_)
2) My results are binary: either 0 or 1
import tensorflow as tf
import numpy as np
# get the data
train_data, train_results = get_data(2000, 2014)
test_data, test_results = get_data(2014, 2015)
# setup a session
sess = tf.Session()
x_len = len(train_data[0])
y_len = len(train_results[0])
# make placeholders for inputs and outputs
x = tf.placeholder(tf.float32, shape=[None, x_len])
y_ = tf.placeholder(tf.float32, shape=[None, y_len])
# create the weights and bias
W = tf.Variable(tf.zeros([x_len, 1]))
b = tf.Variable(tf.zeros([1]))
# initialize everything
sess.run(tf.initialize_all_variables())
# create the "equation" for y in terms of x
y_prime = tf.matmul(x, W) + b
y = tf.nn.softmax(y_prime)
# construct the error function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_prime, y_)
# setup the training algorithm
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# train the thing
for i in range(1000):
rand_rows = np.random.choice(train_data.shape[0], 100, replace=False)
_, w_out, b_out, ce_out = sess.run([train_step, W, b, cross_entropy], feed_dict={x: train_data[rand_rows, :], y_: train_results[rand_rows, :]})
print("%d: %s %s %s" % (i, str(w_out), str(b_out), str(ce_out)))
# compute how many times it was correct
correct_prediction = tf.equal(y, y_)
# find the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: test_data, y_: test_results}))
for i in range(0, len(test_data)):
res = sess.run(y, {x: [test_data[i]]})
print("RES: " + str(res) + " ACT: " + str(test_results[i]))
The accuracy is always 0.5 (because my test data has about as many 1s as 0s). The values of W and b always seem to increase, probably because the values of cross_entropy are always a vector of all zeros.
When I try and use this model for prediction, the predictions are always 1:
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
RES: [[ 1.]] ACT: [ 0.]
RES: [[ 1.]] ACT: [ 1.]
What am I doing wrong here?

You seem to be predicting a single scalar, rather than a vector. The softmax op produces a vector-valued prediction for each example. This vector must always sum to 1. When the vector only contains one element, that element must always be 1. If you want to use a softmax for this problem, you could use [1, 0] as the output target where you are currently using [0] and use [0, 1] where you are currently using [1]. Another option is you could keep using just one number, but change the output layer to sigmoid instead of softmax, and change the cost function to be the sigmoid-based cost function as well.

Related

Different predictions in Keras with the same input and same weights but different batch sizes

I was trying to optimize my Code when I encountered this strange behaviour with the following model in Keras:
# random_minibatches of form (state, action, reward, state_next, done)
random_minibatch = random.sample(list_of_samples, batch_size)
# A state is a list in the form of [x, y]
next_states = [temp[3] for temp in random_minibatch]
# Reshape the next_states for the model
next_states = np.reshape(next_states, [-1, 2])
next_states_preds = model.predict(next_states)
for i, (_, _, _, state_next, _) in enumerate(random_minibatch):
state_next= np.reshape(state_next, [1, 2])
pred = model.predict(state_next)
print("inputs: {} ; {}".format(next_states[i], state_next))
print(pred)
print(next_states_preds[i])
print("amax: {} ; {}".format(np.amax(pred), np.amax(next_states_preds[i])))
print()
and a simple model:
model = Sequential()
model.add(layers.Dense(16, activation="relu", input_dim=2))
model.add(layers.Dense(32, activation="relu"))
model.add(layers.Dense(8))
model.compile(loss="mse", optimizer=Adam(lr=0.00025))
next_states is a list of lists in the form of [[x1, y1], [x2, y2], ...]
and state_next is a list in the form of [x, y]
As you can see, next_states contains every state_next in the for-loop and the input for my model is the same. The only difference is that in the first time I put the whole list of lists in the model and the second time I put the lists in one by one.
My Problem is that I get different outputs from the same input.
An example of the printed output would be:
inputs: [39 -7] ; [39 -7]
[0. 0. 0. 0. 0. 5.457102 0. 0.]
[[0. 0. 0. 0. 0. 5.4571013 0. 0.]]
amax: 5.457101345062256 ; 5.457101821899414
So at this point I'm not sure if I misunderstood something or just did something wrong somewhere? I would be very glad if someone could help me with that strange behaviour.

Tensorflow predict the class of output

I have tried the example with keras but was not with LSTM. My model is with LSTM in Tensorflow and I am willing to predict the output in the form of classes as the keras model thus with predict_classes.
The Tensorflow model I am trying is something like this:
seq_len=10
n_steps = seq_len-1
n_inputs = x_train.shape[2]
n_neurons = 50
n_outputs = y_train.shape[1]
n_layers = 2
learning_rate = 0.0001
batch_size =100
n_epochs = 1000
train_set_size = x_train.shape[0]
test_set_size = x_test.shape[0]
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_outputs])
layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,activation=tf.nn.sigmoid, use_peepholes = True) for layer in range(n_layers)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
outputs = outputs[:,n_steps-1,:]
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
I am encoding the with sklearn LabelEncoder as:
encoder_train = LabelEncoder()
encoder_train.fit(y_train)
encoded_Y_train = encoder_train.transform(y_train)
y_train = np_utils.to_categorical(encoded_Y_train)
The data is converted to sparse matrix kinda thing in binary format.
When I tried to predict the output I got the following:
actual==> [[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]]
predicted==> [[0.3112209 0.3690182 0.31357136]
[0.31085992 0.36959863 0.31448898]
[0.31073445 0.3703295 0.31469804]
[0.31177694 0.37011752 0.3145326 ]
[0.31220382 0.3692756 0.31515726]
[0.31232828 0.36947766 0.3149037 ]
[0.31190437 0.36756667 0.31323162]
[0.31339088 0.36542615 0.310322 ]
[0.31598282 0.36328828 0.30711085]]
What I was expecting for the label based on the encoding done. As the Keras model thus. See the following:
predictions = model.predict_classes(X_test, verbose=True)
print("REAL VALUES:",reverse_category(Y_test,axis=1))
print("PRED VALUES:",predictions)
print("REAL COLORS:")
print(encoder.inverse_transform(reverse_category(Y_test,axis=1)))
print("PREDICTED COLORS:")
print(encoder.inverse_transform(predictions))
The output is something like the following:
REAL VALUES: [1 1 1 ... 1 2 1]
PRED VALUES: [2 1 1 ... 1 2 2]
REAL COLORS:
['ball' 'ball' 'ball' ... 'ball' 'bat' 'ball']
PREDICTED COLORS:
['bat' 'ball' 'ball' ... 'ball' 'bat' 'bat']
Kindly, let me know what I can do in the tensorflow model that will get me the result with respect to the encoding done.
I am using Tensorflow 1.12.0 and Windows 10
You are trying to map the predicted class probabilities back to class labels. Each row in the list of output predictions contains the three predicted class probabilities. Use np.argmax to obtain the one with the highest predicted probability in order to map to the predicted class label:
import numpy as np
predictions = [[0.3112209, 0.3690182, 0.31357136],
[0.31085992, 0.36959863, 0.31448898],
[0.31073445, 0.3703295, 0.31469804],
[0.31177694, 0.37011752, 0.3145326 ],
[0.31220382, 0.3692756, 0.31515726],
[0.31232828, 0.36947766, 0.3149037 ],
[0.31190437, 0.36756667, 0.31323162],
[0.31339088, 0.36542615, 0.310322 ],
[0.31598282, 0.36328828, 0.30711085]]
np.argmax(predictions, axis=1)
Gives:
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
In this case, class 1 is predicted 9 times.
As noted in the comments: this is exactly what Keras does under the hood, as you'll see in the source code.

linear regression using tensorflow

import tensorflow as tf
M = tf.Variable([0.01],tf.float32)
b = tf.Variable([1.0],tf.float32)
#inputs and outputs
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32) # actual value of y which we already know
Yp = M * x + b # y predicted value
#loss
squareR = tf.square(Yp - y)
loss = tf.reduce_sum(squareR)
#optimize
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(1000):
sess.run(train,{x:[1,2,3,4,5],y:[1.9,2.4,3.7,4.9,5.1]})
print(sess.run([M,b]))
output
[array([ 0.88999945], dtype=float32), array([ 0.93000191], dtype=float32)]
Problem:
when I am changing the values of x and y to
x:[100,200,300,400,500],y:[19,24,37,49,51]
then the output is:
[array([ nan], dtype=float32), array([ nan], dtype=float32)]
please help me out to get slope and y-intercept of linear model.
Adding some print statements to your training loop, we can see what's going on during training:
for i in range(1000):
_, mm, bb = sess.run([train,M,b],{x:[100,200,300,400,500],y:[19,24,37,49,51]})
print(mm, bb)
if np.isnan(mm):
break
print(sess.run([M,b]))
The output:
[ 1118.01000977] [ 4.19999981]
[-12295860.] [-33532.921875]
[ 1.35243170e+11] [ 3.68845632e+08]
[ -1.48755065e+15] [ -4.05696309e+12]
[ 1.63616896e+19] [ 4.46228634e+16]
[ -1.79963571e+23] [ -4.90810521e+20]
[ 1.97943407e+27] [ 5.39846559e+24]
[ -2.17719537e+31] [ -5.93781625e+28]
[ 2.39471499e+35] [ 6.53105210e+32]
[-inf] [-inf]
[ nan] [ nan]
That output means your training is diverging. In this case, lowering the learning rate is one of the possible approaches to fix the problem.
Lowering the learning rate to 0.000001 works, these are the learned M and b after 1000 iterations:
[array([ 0.11159456], dtype=float32), array([ 1.01534212], dtype=float32)]

Tensorflow returns always same result

I made the code with tensorflow to train and test convolutional neural network.
I use input with my own jpeg image.
Image files are called as batch format by my own code, input_batch.py.
After being called, convolution is proceeded with convpool.py.
But now, my results in test data are returned with same values.
Also in training, some batch data's convolution results are returned same.
I also looked at this issue, Tensorflow predicts always the same result
but the solutions in this issue are failed to apply in my code.
My result always return like this :
step 0, training accuracy 0.2
result: [[ 5.76441448e-18 1.00000000e+00]
[ 5.76441448e-18 1.00000000e+00]
[ 5.76441448e-18 1.00000000e+00]
[ 5.76441448e-18 1.00000000e+00]
[ 5.76441448e-18 1.00000000e+00]
[ 5.76441448e-18 1.00000000e+00]
[ 5.76434913e-18 1.00000000e+00]
[ 5.85150709e-18 1.00000000e+00]
[ 2.83430459e-17 1.00000000e+00]
[ 0.00000000e+00 1.00000000e+00]]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
test result:[[ 0. 1.]]actual result:[ 0. 1.]
test result:[[ 0. 1.]]actual result:[ 0. 1.]
test result:[[ 0. 1.]]actual result:[ 0. 1.]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
test result:[[ 0. 1.]]actual result:[ 0. 1.]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
test result:[[ 0. 1.]]actual result:[ 1. 0.]
and here is my code:
import tensorflow as tf
import input_batch
import input
import convpool
import matplotlib.pyplot as plt
import numpy as np
FLAGS = tf.app.flags.FLAGS
sess = tf.Session()
x_image = tf.placeholder("float", shape=[None,FLAGS.width,FLAGS.height,FLAGS.depth])
y_ = tf.placeholder("float", shape=[None,FLAGS.num_class])
# x_image=tf.reshape(x,[-1,FLAGS.width,FLAGS.height,FLAGS.depth])
def weight_variable(shape):
initial=tf.truncated_normal(shape,stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial=tf.constant(0.1,shape=shape)
return tf.Variable(initial)
def spp_layer(x,n_bin,output_depth):
(a,b,c,d) = x.get_shape()
h = int(b+(n_bin-1))/n_bin
w = int(c+(n_bin-1))/n_bin
return tf.reshape(tf.nn.max_pool(x,ksize=[1, h, w, 1], strides=[1, h, w, 1], padding='SAME'),[-1, n_bin * n_bin , output_depth])
W_conv1 = weight_variable([11, 11, 3 , 96])
b_conv1 = bias_variable([96])
W_conv2 = weight_variable([5, 5, 96, 256])
b_conv2 = bias_variable([256])
W_fc1 = weight_variable([14*14* 256, 4096])
b_fc1 = bias_variable([4096])
W_fc2 = weight_variable([4096, 2])
b_fc2 = bias_variable([2])
keep_prob = tf.placeholder("float")
y_conv_train = convpool.train(x_image,W_conv1,b_conv1,W_conv2,b_conv2,W_fc1,b_fc1,W_fc2,b_fc2,keep_prob)
cross_entropy = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y_conv_train,1e-10,1.0)))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv_train,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for i in range(50):
batch = input_batch.get_data_jpeg(sess,'train',10)
if i%1==0:
train_accuracy = sess.run(accuracy,feed_dict={x_image:batch[0], y_:batch[1], keep_prob:1.0})
train_result = sess.run(y_conv_train, feed_dict={x_image: batch[0], y_: batch[1], keep_prob: 1.0})
# print('result : ', sess.run(W_fc2))
print("step %d, training accuracy %g" %(i,train_accuracy))
print('result:', train_result)
sess.run(train_step, feed_dict={x_image:batch[0], y_:batch[1], keep_prob:0.5})
# ############################test###################################
for i in range (10):
input.initialization()
testinput = input.get_data_jpeg(sess,'eval')
test_img = testinput.x_data
(i_x,i_y,i_z) = testinput.x_size
testimg = tf.reshape(test_img, [-1,i_x,i_y,i_z])
testresult=convpool.train(testimg,W_conv1,b_conv1,W_conv2,b_conv2,W_fc1,b_fc1,W_fc2,b_fc2,1.0)
result = sess.run(testresult)
print("test result:"+str(result)+ "actual result:"+ str(testinput.y_data))
#convpool.py
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
def train(input,W_conv1,b_conv1,W_conv2,b_conv2,W_fc1,b_fc1,W_fc2,b_fc2,keep_prob):
h_conv1 = tf.nn.relu(tf.nn.conv2d(input, W_conv1, strides=[1, 4, 4, 1], padding='SAME') + b_conv1)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# print(h_pool2)
h_pool2_flat = tf.reshape(h_pool2, [-1, 14*14 * 256])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
return y_conv
#input_batch.py
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('data_dir', 'C://Users/sh/PycharmProjects/test2/data',
"""Directory where to write event logs """
"""and checkpoint.""")
FLAGS.width = 224
FLAGS.height = 224
FLAGS.depth = 3
FLAGS.num_class = 2
batch_index = 0
filenames = []
FLAGS.imsize = FLAGS.height * FLAGS.width * FLAGS.depth
def get_filenames(data_set):
global filenames
labels = [ ]
with open(FLAGS.data_dir + '/labels.txt') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
labels += inner_list
for i, label in enumerate(labels):
list = os.listdir(FLAGS.data_dir + '/' + data_set + '/' +label)
for filename in list:
filenames.append([label + '/' + filename, i])
random.shuffle(filenames)
def get_data_jpeg(sess, data_set, batch_size):
global batch_index, filenames
if len(filenames) == 0:
get_filenames(data_set)
max = len(filenames)
begin = batch_index
end = batch_index + batch_size
if end >= max:
end = max
batch_index = 0
x_data = np.array([])
y_data = np.zeros((batch_size, FLAGS.num_class))
index = 0
for i in range(begin, end) :
with tf.gfile.FastGFile(FLAGS.data_dir + '/' + data_set + '/' + filenames[i][0], 'rb') as f :
image_data = f.read()
decode_image = tf.image.decode_jpeg(image_data, channels=FLAGS.depth)
resized_image = tf.image.resize_images(decode_image, [FLAGS.height, FLAGS.width],method=1)
image = sess.run(resized_image)
x_data = np.append(x_data, np.asarray(image.data, dtype='float32'))/255
y_data[index][filenames[i][1]] = 1 batch_index += batch_size
try:
x_data = x_data.reshape(batch_size, FLAGS.height, FLAGS.width, FLAGS.depth)
except:
return None, None
return x_data, y_data

Renormalize weight matrix using TensorFlow

I'd like to add a max norm constraint to several of the weight matrices in my TensorFlow graph, ala Torch's renorm method.
If the L2 norm of any neuron's weight matrix exceeds max_norm, I'd like to scale its weights down so that their L2 norm is exactly max_norm.
What's the best way to express this using TensorFlow?
Here is a possible implementation:
import tensorflow as tf
def maxnorm_regularizer(threshold, axes=1, name="maxnorm", collection="maxnorm"):
def maxnorm(weights):
clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)
clip_weights = tf.assign(weights, clipped, name=name)
tf.add_to_collection(collection, clip_weights)
return None # there is no regularization loss term
return maxnorm
Here's how you would use it:
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.framework import arg_scope
with arg_scope(
[fully_connected],
weights_regularizer=max_norm_regularizer(1.5)):
hidden1 = fully_connected(X, 200, scope="hidden1")
hidden2 = fully_connected(hidden1, 100, scope="hidden2")
outputs = fully_connected(hidden2, 5, activation_fn=None, scope="outs")
max_norm_ops = tf.get_collection("max_norm")
[...]
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_epochs):
for X_batch, y_batch in load_next_batch():
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
sess.run(max_norm_ops)
This creates a 3 layer neural network and trains it with max norm regularization at every layer (with a threshold of 1.5). I just tried it, seems to work. Hope this helps! Suggestions for improvements are welcome. :)
Notes
This code is based on tf.clip_by_norm():
>>> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
>>> print(x.eval())
[[ 0. 0.]
[ 3. 4.]
[ 30. 40.]
[ 300. 400.]]
>>> clip_rows = tf.clip_by_norm(x, clip_norm=10, axes=1)
>>> print(clip_rows.eval())
[[ 0. 0. ]
[ 3. 4. ]
[ 6. 8. ] # clipped!
[ 6.00000048 8. ]] # clipped!
You can also clip columns if you need to:
>>> clip_cols = tf.clip_by_norm(x, clip_norm=350, axes=0)
>>> print(clip_cols.eval())
[[ 0. 0. ]
[ 3. 3.48245788]
[ 30. 34.82457733]
[ 300. 348.24578857]]
# clipped!
Using RafaƂ's suggestion and TensorFlow's implementation of clip_by_norm, here's what I came up with:
def renorm(x, axis, max_norm):
'''Renormalizes the sub-tensors along axis such that they do not exceed norm max_norm.'''
# This elaborate dance avoids empty slices, which TF dislikes.
rank = tf.rank(x)
bigrange = tf.range(-1, rank + 1)
dims = tf.slice(
tf.concat(0, [tf.slice(bigrange, [0], [1 + axis]),
tf.slice(bigrange, [axis + 2], [-1])]),
[1], rank - [1])
# Determine which columns need to be renormalized.
l2norm_inv = tf.rsqrt(tf.reduce_sum(x * x, dims, keep_dims=True))
scale = max_norm * tf.minimum(l2norm_inv, tf.constant(1.0 / max_norm))
# Broadcast the scalings
return tf.mul(scale, x)
It seems to have the desired behavior for 2-dimensional matrices and should
generalize to tensors:
> x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
> print x.eval()
[[ 0. 0.] # rows have norms of 0, 5, 50, 500
[ 3. 4.] # cols have norms of ~302, ~402
[ 30. 40.]
[ 300. 400.]]
> print renorm(x, 0, 10).eval()
[[ 0. 0. ] # unaffected
[ 3. 4. ] # unaffected
[ 5.99999952 7.99999952] # rescaled
[ 6.00000048 8.00000095]] # rescaled
> print renorm(x, 1, 350).eval()
[[ 0. 0. ] # col 0 is unaffected
[ 3. 3.48245788] # col 1 is rescaled
[ 30. 34.82457733]
[ 300. 348.24578857]]
Take a look at clip_by_norm function, which does exactly this. It takes a single tensor as input and returns a scaled down tensor.

Categories