Video classification using many to many LSTM in TensorFlow

Video classification using many to many LSTM in TensorFlow - python

I have to build a binary classifier to predict whether the input video contains an action or not.
The input to the model will be of shape: [batch, frames, height, width, channel]
Here, batch is number of videos, frames is number of images in that video (It's fixed for every video), height is number of rows in that image, width is number of columns in that image, and channel is RGB colors.
I found in Andrej Karpathy blog that many to many Recurrent Neural Network is best for this application: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Thus, I need to implement this in TensorFlow:
I learned how to implement LSTM using this tutorial: https://github.com/nlintz/TensorFlow-Tutorials/blob/master/07_lstm.py#L52
But, it is implementing many to one LSTM and predicting output and reducing loss using only last tensor: outputs[-1]
And, I want to predict output using many tensors (let say 4) and reduce loss using them.
Here's my implementation:
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
# Training Parameters
batch = 5 # number of examples
frames = time_step_size = 20
height = 60
width = 80
channel = 3
lstm_size = 240
num_classes = 2
# Creating random data
input_x = np.random.normal(size=[batch, frames, height, width, channel])
input_y = np.zeros((batch, num_classes))
B = np.ones(batch)
input_y[:,1] = B
X = tf.placeholder("float", [None, frames, height, width, channel], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')
with tf.name_scope('Model'):
XR = tf.reshape(X, [-1, height*width*channel]) # shape=(?, 14400)
X_split3 = tf.split(XR, time_step_size, 0) # 20 tensors of shape=(?, 14400)
lstm = rnn.BasicLSTMCell(lstm_size, forget_bias=1.0, state_is_tuple=True)
outputs, _states = rnn.static_rnn(lstm, X_split3, dtype=tf.float32) # 20 tensors of shape=(?, 240)
logits = tf.layers.dense(outputs[-1], num_classes, name='logits') # shape=(?, 2)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
with tf.name_scope('Loss'):
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
with tf.name_scope('optimizer'):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
logits_output = sess.run(logits, feed_dict={X: input_x})
print(logits_output.shape) # shape=(5, 2)
sess.run(train_op, feed_dict={X: input_x, Y: input_y})
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: input_x, Y: input_y})
print("Loss: ", loss) # loss: 1.46626135e-05
print("Accuracy: ", acc) # Accuracy: 1.0
Problems:
1. I need help to implement many to many LSTM and predict output after certain frames (let say 4), but, I am only using last tensor outputs[-1] to reduce loss. There are 20 tensors, one for each frames or time_step_size. If I transform every 5th tensor: outputs[4], outputs[9], outputs[14], outputs[-1], I will get 4 logits. So, how I am going to reduce loss on all four of them?
2. One more problem is, I have to implement binary classifier, but I only have video of action I want to identify. So, the input_y is one hot representation of labels in which 1st column is always 0 and 2nd column is always 1 (action I have to identify), and I don’t have any example video in which 1st column's value is 1. Do you think it will work?
3. Why in above implement, in only one iteration the accuracy is 1?
Thanks

For 1., Dense takes any number of batch dimensions, so you should be able to transform into logits from all of the steps in one go (then likewise operate on a batch until you get a final loss for each step, then aggregate e.g. by taking the mean).
For 2. and 3., it seems like you need to find some negative examples. There's a literature on "positive and unlabeled (PU)" learning and "one-class classification" which may help.

Related

linear regression by tensorflow gets noticeable mean square error

I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# TRAINING
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))

You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
_, c = sess.run(
[optimizer, loss],
feed_dict={
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
})
print("Train step {} loss value {}".format(i, c))
# TRAINING
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
0.044106700712455045

Tensorflow CNN model always predicts same class

I have been trying to develop a CNN model for image classification. I am new to tensorflow and getting help from the following books
Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems
TensorFlow For Machine Intelligence by Sam Abrahams
For the past few weeks I have been working to develop a good model but I always get the same prediction. I have tried many different architectures but no luck!
Lately I decided to test my model with CIFAR-10 dataset and using the exact same model as given in the Learning Tensorflow book. But the outcome was same (same class for every image) even after training for 50K steps.
Here is highlight of my model and code.
1.) Downloaded CIFAR-10 image sets, converted them into tfrecord files with labels(labels are string for each category of CIFAR-10 in the tfrecord file) each for training and test set.
2) Reading the images from tfrecord file and generating random shuffle batch of size 100.
3) Converting the label from string to the integer32 type from 0-9 each for given category
4) Pass the training and test batches to the network and getting the output of [batch_size , num_class] size.
5) Train the model using Adam optimizer and softmax cross entropy loss function (Have tried gradient optimizer as well)
7) evaluate the model for test batches before and after the training.
8) Getting the same prediction for entire data set (But different every time I re run the code to try again)
Is there something wrong I am doing here? I would appreciate if someone could help me out with this problem.
Note - My approach of converting images and labels into tfrecord could be unusual but believe me I have come up with this idea from the books I mentioned earlier.
My code for the problem:
import tensorflow as tf
import numpy as np
import _datetime as dt
import PIL
# The glob module allows directory listing
import glob
import random
from itertools import groupby
from collections import defaultdict
H , W = 32 , 32 # Height and weight of the image
C = 3 # Number of channels
sessInt = tf.InteractiveSession()
# Read file and return the batches of the input data
def get_Batches_From_TFrecord(tf_record_filenames_list, batch_size):
# Match and load all the tfrecords found in the specified directory
tf_record_filename_queue = tf.train.string_input_producer(tf_record_filenames_list)
# It may have more than one example in them.
tf_record_reader = tf.TFRecordReader()
tf_image_name, tf_record_serialized = tf_record_reader.read(tf_record_filename_queue)
# The label and image are stored as bytes but could be stored as int64 or float64 values in a
# serialized tf.Example protobuf.
tf_record_features = tf.parse_single_example(tf_record_serialized,
features={'label': tf.FixedLenFeature([], tf.string),
'image': tf.FixedLenFeature([], tf.string), })
# Using tf.uint8 because all of the channel information is between 0-255
tf_record_image = tf.decode_raw(tf_record_features['image'], tf.uint8)
try:
# Reshape the image to look like the input image
tf_record_image = tf.reshape(tf_record_image, [H, W, C])
except:
print(tf_image_name)
tf_record_label = tf.cast(tf_record_features['label'], tf.string)
'''
#Check the image and label
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sessInt, coord=coord)
label = tf_record_label.eval().decode()
print(label)
image = PIL.Image.fromarray(tf_record_image.eval())
image.show()
coord.request_stop()
coord.join(threads)
'''
# creating a batch to feed the data
min_after_dequeue = 10 * batch_size
capacity = min_after_dequeue + 5 * batch_size
# Shuffle examples while feeding in the queue
image_batch, label_batch = tf.train.shuffle_batch([tf_record_image, tf_record_label], batch_size=batch_size,
capacity=capacity, min_after_dequeue=min_after_dequeue)
# Sequential feed in the examples in the queue (Don't shuffle)
# image_batch, label_batch = tf.train.batch([tf_record_image, tf_record_label], batch_size=batch_size, capacity=capacity)
# Converting the images to a float to match the expected input to convolution2d
float_image_batch = tf.image.convert_image_dtype(image_batch, tf.float32)
string_label_batch = label_batch
return float_image_batch, string_label_batch
#Count the number of images in the tfrecord file
def number_of_records(tfrecord_file_name):
count = 0
record_iterator = tf.python_io.tf_record_iterator(path = tfrecord_file_name)
for record in record_iterator:
count+=1
return count
def get_num_of_samples(tfrecords_list):
total_samples = 0
for tfrecord in tfrecords_list:
total_samples += number_of_records(tfrecord)
return total_samples
# Provide the input tfrecord names in a list
train_filenames = ["./TFRecords/cifar_train.tfrecord"]
test_filename = ["./TFRecords/cifar_test.tfrecord"]
num_train_samples = get_num_of_samples(train_filenames)
num_test_samples = get_num_of_samples(test_filename)
print("Number of Training samples: ", num_train_samples)
print("Number of Test samples: ", num_test_samples)
'''
IMP Note : (Batch_size * Training_Steps) should be at least greater than (2*Number_of_samples) for shuffling of batches
'''
train_batch_size = 100
# Total number of batches for input records
# Note - Num of samples in the tfrecord file can be determined by the tfrecord iterator.
# Batch size for test samples
test_batch_size = 50
train_image_batch, train_label_batch = get_Batches_From_TFrecord(train_filenames, train_batch_size)
test_image_batch, test_label_batch = get_Batches_From_TFrecord(test_filename, test_batch_size)
# Definition of the convolution network which returns a single neuron for each input image in the batch
# Define a placeholder for keep probability in dropout
# (Dropout should only use while training, for testing dropout should be always 1.0)
fc_prob = tf.placeholder(tf.float32)
conv_prob = tf.placeholder(tf.float32)
#Helper function to add learned filters(images) into tensorboard summary - for a random input in the batch
def add_filter_summary(name, filter_tensor):
rand_idx = random.randint(0,filter_tensor.get_shape()[0]-1) #Choose any random number from[0,batch_size)
#dispay_filter = filter_tensor[random.randint(0,filter_tensor.get_shape()[3])]
dispay_filter = filter_tensor[5] #keeping the index fix for consistency in visualization
with tf.name_scope("Filter_Summaries"):
img_summary = tf.summary.image(name, tf.reshape(dispay_filter,[-1 , filter_tensor.get_shape()[1],filter_tensor.get_shape()[1],1] ), max_outputs = 500)
# Helper functions for the network
def weight_initializer(shape):
weights = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(weights)
def bias_initializer(shape):
biases = tf.constant(0.1, shape=shape)
return tf.Variable(biases)
def conv2d(input, weights, stride):
return tf.nn.conv2d(input, filter=weights, strides=[1, stride, stride, 1], padding="SAME")
def pool_layer(input, window_size=2 , stride=2):
return tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1, stride, stride, 1], padding='VALID')
# This is the actual layer we will use.
# Linear convolution as defined in conv2d, with a bias,
# followed by the ReLU nonlinearity.
def conv_layer(input, filter_shape , stride=1):
W = weight_initializer(filter_shape)
b = bias_initializer([filter_shape[3]])
return tf.nn.relu(conv2d(input, W, stride) + b)
# A standard full layer with a bias. Notice that here we didn’t add the ReLU.
# This allows us to use the same layer for the final output,
# where we don’t need the nonlinear part.
def full_layer(input, out_size):
in_size = int(input.get_shape()[1])
W = weight_initializer([in_size, out_size])
b = bias_initializer([out_size])
return tf.matmul(input, W) + b
## Model fro the book learning tensorflow - for CIFAR data
def conv_network(image_batch, batch_size):
# Now create the model which returns the output neurons (eequals to the number of labels)
# as a final fully connecetd layer output. Which we can use as input to the softmax classifier
C1 , C2 , C3 = 30 , 50, 80 # Number of output features for each convolution layer
F1 = 500 # Number of output neuron for FC1 layer
#Add original image to tensorboard summary
add_filter_summary("Original" , image_batch)
# First convolutaion layer with 5x5 filter size and 32 filters
conv1 = conv_layer(image_batch, filter_shape=[3, 3, C, C1])
pool1 = pool_layer(conv1, window_size=2)
pool1 = tf.nn.dropout(pool1, keep_prob=conv_prob)
add_filter_summary("conv1" , pool1)
# Second convolutaion layer with 5x5 filter_size and 64 filters
conv2 = conv_layer(pool1, filter_shape=[5, 5, C1, C2])
pool2 = pool_layer(conv2, 2)
pool2 = tf.nn.dropout(pool2, keep_prob=conv_prob)
add_filter_summary("conv2" , pool2)
# Third convolution layer
conv3 = conv_layer(pool2, filter_shape=[5, 5, C2, C3])
# Since at this point the feature maps are of size 8×8 (following the first two poolings
# that each reduced the 32×32 pictures by half on each axis).
# This last pool layer pools each of the feature maps and keeps only the maximal value.
# The number of feature maps at the third block was set to 80,
# so at that point (following the max pooling) the representation is reduced to only 80 numbers
pool3 = pool_layer(conv3, window_size = 8 , stride=8)
pool3 = tf.nn.dropout(pool3, keep_prob=conv_prob)
add_filter_summary("conv3" , pool3)
# Reshape the output to feed to the FC layer
flatterned_layer = tf.reshape(pool3, [batch_size,
-1]) # -1 is to specify to use all the dimensions remaining in the input (other than batch_size).reshape(input , )
fc1 = tf.nn.relu(full_layer(flatterned_layer, F1))
full1_drop = tf.nn.dropout(fc1, keep_prob=fc_prob)
# Fully connected layer 2 (output layer)
final_Output = full_layer(full1_drop, 10)
return final_Output, tf.summary.merge_all()
# Now that architecture is created , next step is to create the classification model
# (to predict the output class of the input data)
# Here we have used Logistic regression (Sigmoid function) to predict the output because we have only rwo class.
# For multiple class problem - softmax is the best prediction function
# Prepare the inputs to the input
Train_X , img_summary = conv_network(train_image_batch, train_batch_size)
Test_X , _ = conv_network(test_image_batch, test_batch_size)
# Generate 0 based index for labels
Train_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(train_label_batch, ["airplane"]), tf.equal(train_label_batch, ["automobile"]),
tf.equal(train_label_batch, ["bird"]),tf.equal(train_label_batch, ["cat"]),
tf.equal(train_label_batch, ["deer"]),tf.equal(train_label_batch, ["dog"]),
tf.equal(train_label_batch, ["frog"]),tf.equal(train_label_batch, ["horse"]),
tf.equal(train_label_batch, ["ship"]), tf.equal(train_label_batch, ["truck"]) ])), 0))
Test_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(test_label_batch, ["airplane"]), tf.equal(test_label_batch, ["automobile"]),
tf.equal(test_label_batch, ["bird"]),tf.equal(test_label_batch, ["cat"]),
tf.equal(test_label_batch, ["deer"]),tf.equal(test_label_batch, ["dog"]),
tf.equal(test_label_batch, ["frog"]),tf.equal(test_label_batch, ["horse"]),
tf.equal(test_label_batch, ["ship"]), tf.equal(test_label_batch, ["truck"]) ])), 0))
# Y = tf.reshape(float_label_batch, X.get_shape())
# compute inference model over data X and return the result
# (using sigmoid function - as this function is the best to predict two class output)
# (For multiclass problem - Softmax is the bset prediction function)
def inference(X):
return tf.nn.softmax(X)
# compute loss over training data X and expected outputs Y
# Cross entropy function is the best suited for loss calculation (Than the squared error function)
# Get the second column of the input to get only the features
def loss(X, Y):
return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=X, labels=Y))
# train / adjust model parameters according to computed total loss (using gradient descent)
def train(total_loss, learning_rate):
return tf.train.AdamOptimizer(learning_rate).minimize(total_loss)
# evaluate the resulting trained model with dropout probability (Ideally 1.0 for testing)
def evaluate(sess, X, Y, dropout_prob):
# predicted = tf.cast(inference(X) > 0.5 , tf.float32)
#print("\nNetwork output:")
#print(sess.run(inference(X) , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
# Inference contains the predicted probability of each class for each input image.
# The class having higher probability is the prediction of the network. y_pred_cls = tf.argmax(y_pred, dimension=1)
predicted = tf.cast(tf.argmax(X, 1), tf.int32)
#print("\npredicted labels:")
#print(sess.run(predicted , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
#print("\nTrue Labels:")
#print(sess.run(Y , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
batch_accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32))
# calculate the mean of the accuracies of the each batch (iteration)
# No. of iteration Iteration should cover the (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples ) condition
total_accuracy = np.mean([sess.run(batch_accuracy, feed_dict={conv_prob:1.0 , fc_prob:1.0}) for i in range(250)])
print("Accuracy of the model(in %): {:.4f} ".format(100 * total_accuracy))
# create a saver class to save the training checkpoints
saver = tf.train.Saver(max_to_keep=10)
# Create tensorboard sumamry for loss function
with tf.name_scope("summaries"):
loss_summary = tf.summary.scalar("loss", loss(Train_X, Train_Y))
#merged = tf.summary.merge_all()
# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
log_writer = tf.summary.FileWriter('./logs', sess.graph)
total_loss = loss(Train_X, Train_Y)
train_op = train(total_loss, 0.001)
#Initialise all variables after defining all variables
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(sess.run(Train_Y))
print(sess.run(Test_Y))
evaluate(sess, Test_X, Test_Y,1.0)
# actual training loop------------------------------------------------------
training_steps = 50000
print("\nStarting to train model with", str(training_steps), " steps...")
to1 = dt.datetime.now()
for step in range(1, training_steps + 1):
# print(sess.run(train_label_batch))
sess.run([train_op], feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # Pass the dropout value for training batch to the placeholder
# for debugging and learning purposes, see how the loss gets decremented thru training steps
if step % 100 == 0:
# print("\n")
# print(sess.run(train_label_batch))
loss_summaries, img_summaries , Tloss = sess.run([loss_summary, img_summary, total_loss],
feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # evaluate total loss to add it in summary object
log_writer.add_summary(loss_summaries, step) # add summary for each step
log_writer.add_summary(img_summaries, step)
print("Step:", step, " , loss: ", Tloss)
if step%2000 == 0:
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
print("\n")
evaluate(sess, Test_X, Test_Y,1.0)
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
to2 = dt.datetime.now()
print("\nTotal Trainig time Elapsed: ", str(to2 - to1))
# once the training is complete, evaluate the model with test (validation set)-------------------------------------------
# Restore the model file and perform the testing
#saver.restore(sess, "./Models/BookLT3_CIFAR-15000")
print("\nPost Training....")
# Performs Evaluation of model on batches of test samples
# In order to evaluate entire test set , number of iteration should be chosen such that ,
# (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples )
evaluate(sess, Test_X, Test_Y,1.0) # Evaluate multiple batch of test data set (randomly chosen by shuffle train batch queue)
evaluate(sess, Test_X, Test_Y,1.0)
evaluate(sess, Test_X, Test_Y,1.0)
coord.request_stop()
coord.join(threads)
sess.close()
Here is the screenshot of my Pre training result:
Here is the screenshot of the result during training:
Hereis the screenshot of the Post training result

I did not run the code to verify that this is the only issue, but here is one important issue. When classifying, you should use one-hot encoding for your labels. Meaning that if you have 3 classes, you want your labels to be [1, 0, 0] for class 1, [0, 1, 0] for class 2, [0, 0, 1] for class 3. Your approach of using 1, 2, and 3 as labels leads to various issues. For examples, the network is penalized more for predicting class 1 versus predicting class 2 for an image from class 3. TensorFlow functions like tf.nn.softmax_cross_entropy_with_logits work with such representations.
Here is the basic example of correctly using one_hot labels to compute loss: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
Here is how the one_hot label is constructed for mnist digits:
https://github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69

Invalid Argument Error Tensorflow Placeholder

import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
n_steps = 3 # number of time steps in RNN
n_inputs = 1 # number of inputs received by RNN Cell at each time step
n_neurons = 10 # number of RNN cells in hidden layer
n_outputs = 3 # number of outputs given out by RNN
n_layers = 3 # number of layers in network
n_epochs = 1000 # number of epochs for RNN training
learning_rate = 0.01 # learning rate for training step
X_train = np.array([[[1.],
[2.],
[3.]],
[[4.], # training data for input sequence
[5.],
[6.]],
[[7.],
[8.],
[9.]]])
y_train = np.array([[4., 5., 6.],
[7., 8., 9.], # training data for output sequence
[10., 11., 12.]])
X_train.reshape((3, 3, 1)) # reshape X training data
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) # placeholder for input sequence
y = tf.placeholder(tf.float32, [None, n_outputs]) # placeholder for output sequence
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units = n_neurons, reuse =
True) # create hidden layer of 10 basic RNN cells
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype = tf.float32) # create layer using basic RNN Cell and input sequence X using dynamic RNN method (get outputs and final state)
logits = tf.layers.dense(states, n_outputs) # logits (and final prediction) of RNN
prediction = logits # final prediction of RNN
rsme = tf.square(y - prediction) # squared deviations of predictions form targets
loss = tf.reduce_mean(rmse) # mean of all squared deviations (cost function)
training_op = tf.train.GradientDescentOptimizer(learning_rate =
learning_rate).minimize(loss) # training step using Gradient Descent Optimizer
tf.global_variables_initializer().run()
for _ in range(n_epochs):
sess.run(training_op , feed_dict = {X: X_train, y: y_train}) # run training operation iteratively
In the above code, I am trying to predict the last 3 elements of a sequence given the first 3 elements using a dynamic Recurrent Neural Network with basic RNN cells. It has an input layer with 3 neurons and a hidden layer containing 10 recursive neurons and an output layer of 3 neurons. But, it is giving an 'Invalid Argument Error', saying that I am feeding a tensor with a negative dimension of (-1, 3, 1) to a placeholder of shape (?, 3, 1). I am not able to fix this error even after plenty of desperate googling. Can someone please help me fix this error.
Thanks in advance for the help!

Simple Tensorflow Multilayer Neural Network Not Learning

I am trying to write a two layer neural network to train a class labeler. The input to the network is a 150-feature list of about 1000 examples; all features on all examples have been L2 normalized.
I only have two outputs, and they should be disjoint--I am just attempting to predict whether the example is a one or a zero.
My code is relatively simple; I am feeding the input data into the hidden layer, and then the hidden layer into the output. As I really just want to see this working in action, I am training on the entire data set with each step.
My code is below. Based on the other NN implementations I have referred to, I believe that the performance of this network should be improving over time. However, regardless of the number of epochs I set, I am getting back an accuracy of about ~20%. The accuracy is not changing when the number of steps are changed, so I don't believe that my weights and biases are being updated.
Is there something obvious I am missing with my model? Thanks!
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.01
epochs = 200
n_input = 150
n_hidden = 75
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden]))
b1 = tf.Variable(tf.truncated_normal([n_output]))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden]))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output]))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
model = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run(model,feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy

I changed your optimizer to AdamOptimizer (in many cases it performs better than GradientDescentOptimizer).
I also played a bit with the parameters. In particular, I took smaller std for your variable initialization, decreased learning rate (as your loss was unstable and "jumped around") and increased epochs (as I noticed that your loss continues to decrease).
I also reduced the size of the hidden layer. It is harder to train networks with large hidden layer when you don't have that much data.
Regarding your loss, it is better to apply tf.reduce_mean on it so that loss would be a number. In addition, following the answer of ml4294, I used softmax instead of sigmoid, so the loss looks like:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y))
The code below achieves accuracy of around 99.9% on the training data:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.002
epochs = 400
n_input = 150
n_hidden = 60
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden],stddev=0.2,seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output],stddev=0.2,seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden],stddev=0.2,seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output],stddev=0.2,seed=0))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run([model,loss],feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy

Just a suggestion in addition to the answer provided by Miriam Farber:
You use a multi-dimensional output label ([0., 1.]) for the classification. I suggest to use the softmax cross entropy tf.nn.softmax_cross_entropy_with_logits() instead of the sigmoid cross entropy, since you assume the outputs to be disjoint softmax on Wikipedia. I achieved much faster convergence with this small modification.
This should also improve your performance once you decide to increase your output dimensionality from 2 to a higher number.

I guess you have some problem here:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
It should look smth like that:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y))
Did't look at you code much, so if this would't work out you can check udacity deep learning course or forum they have good samples of that are you trying to do.
GL

Reusing the same layers for training and testing, but creating different nodes

I'm trying to (re)train AlexNet (based on the code found here) for a particular binary classification problem. Since my GPU is not very powerful, I settled on a batch size of 8 for training. This size determines the shape of the input tensor (8,227,227,3). However, one can use a larger batch size for the testing process, since there is no backprop involved.
My question is, how could I reuse the already trained hidden layers to create a different network on the same graph specifically for testing?
Here's a snippet of what I have tried to do:
NUM_TRAINING_STEPS = 200
BATCH_SIZE = 1
LEARNING_RATE = 1e-1
IMAGE_SIZE = 227
NUM_CHANNELS = 3
NUM_CLASSES = 2
def main():
graph = tf.Graph()
trace = Tracer()
train_data = readImage(filename1)
test_data = readImage(filename2)
train_labels = np.array([[0.0,1.0]])
with graph.as_default():
batch_data = tf.placeholder(tf.float32, shape=(BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS) )
batch_labels = tf.placeholder(tf.float32, shape=(BATCH_SIZE, NUM_CLASSES) )
logits_training = createNetwork(batch_data)
loss = lossLayer(logits_training, batch_labels)
train_prediction = tf.nn.softmax(logits_training)
print 'Prediction shape: ' + str(train_prediction.get_shape())
optimizer = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss)
test_placeholder = tf.placeholder(tf.float32, shape=(1, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS) )
logits_test = createNetwork(test_placeholder)
test_prediction = tf.nn.softmax(logits_test)
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
for step in range(NUM_TRAINING_STEPS):
print 'Step #: ' + str(step+1)
feed_dict = {batch_data: train_data, batch_labels : train_labels}
_, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
feed_dict = {batch_data:test_data, test_placeholder:test_data}
logits1, logits2 = session.run([logits_training,logits_test],feed_dict=feed_dict)
print (logits1 - logits2)
return
I'm only training with a single image, just to evaluate whether network is actually being trained and if the values of logits1 and logits2 are the same. They are not, by several orders of magnitude.
createNetwork is a function which loads the weights for AlexNet and builds the model, based on the code for the myalexnet.py script found on the page to which I linked.
I've tried to replicate the examples from the Udacity course on Deep Learning, in particular, assignments 3 and 4.
If anyone could figure out how I could use the same layers for training and testing, I would be very grateful.

Use shape=Nonefor your placeholders: placeholder doc
This way you can feed any shape of data. Another (worse) option is to recreate your graph for testing with the shapes that you need, and load the ckpt that was created during training.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Video classification using many to many LSTM in TensorFlow - python

Related

linear regression by tensorflow gets noticeable mean square error

Tensorflow CNN model always predicts same class

Invalid Argument Error Tensorflow Placeholder

Simple Tensorflow Multilayer Neural Network Not Learning

Reusing the same layers for training and testing, but creating different nodes

Categories

Resources