Understanding dimension of input to pre-defined LSTM

Understanding dimension of input to pre-defined LSTM - python

I am trying to design a model in tensorflow to predict next words using lstm.
Tensorflow tutorial for RNN gives pseudocode how to use LSTM for PTB dataset.
I reached to step of generating batches and labels.
def generate_batches(raw_data, batch_size):
global data_index
data_len = len(raw_data)
num_batches = data_len // batch_size
#batch = dict.fromkeys([i for i in range(num_batches)])
#labels = dict.fromkeys([i for i in range(num_batches)])
batch = np.ndarray(shape=(batch_size), dtype=np.float)
labels = np.ndarray(shape=(batch_size, 1), dtype=np.float)
for i in xrange(batch_size) :
batch[i] = raw_data[i + data_index]
labels[i, 0] = raw_data[i + data_index + 1]
data_index = (data_index + 1) % len(raw_data)
return batch, labels
This code gives batch and labels size (batch_size X 1).
These batch and labels can also be size of (batch_size x vocabulary_size) using tf.nn.embedding_lookup().
So, the problem here is how to proceed next using the function rnn_cell.BasicLSTMCell or using user defined lstm model? What will be the input dimension to LSTM cell and how will it be used with num_steps?
Which size of batch and labels is useful in any scenario?

The full example for PTB is in the source code. There are recommended defaults (SmallConfig, MediumConfig, and LargeConfig) that you can use.

Related

Multi dimensional input multi dimensional output rnn keras data preprocessing

I want to create a RNN model in Keras. In each time-step the input has 9 element and the output has 4 element.
input_size = (304414,9)
target_size = (304414,4)
How can I create a dataset of sliding windows over the time-series.

You can use this code by considering windows size and stride
for idx in range(0, input.shape[0] - window_size - 1, stride):
input.append(input_data[idx + 1: idx + 1 + window_size, :])
input = np.reshape( input, (len(input), input[0].shape[0], input[0].shape[1]))

Solved: How to combine tf.gradients with tf.data.dataset and keras models

I'm trying to build a workflow that uses tf.data.dataset batches and an iterator. For performance reasons, I am really trying to avoid using the placeholder->feed_dict loop workflow.
The process I'm trying to implement involves grad-cam (which requires the gradient of the loss with respect to the final convolutional layer of a CNN) as an intermediate step, and ideally I'd like to be able to try it out on several Keras pre-trained models, including non-sequential ones like ResNet.
Most implementations of grad-cam that I've found rely on hand-crafting the CNN of interest in tensorflow. I found one implementation, https://github.com/jacobgil/keras-grad-cam, that is made for keras models, and following that example, I get
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
The problem is that gradient_function([batch]) returns a numpy array whose value is determined by the first batch, so that heatmap doesn't change with subsequent evaluations.
I've tried replacing K.function with a Model in various ways, but nothing seems to work. I usually end up either with an error suggesting that grads evaluates to None or that one model or another is expecting a feed_dict and not receiving one.
Is this code salvageable? Is there a better way to do this besides looping through the data several times (once to get all the grad-cams and then again once I have them) or using placeholders and feed_dicts?
Edit:
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
# other operations on heatmap and batch ...
# ...
output_function = K.function(model.input, [node1, ..., nodeN])
for batch in range(n_batches):
outputs1, ... , outputsN = output_function(batch)
Gives me the desired outputs for each batch.

Yes, K.function returns numpy arrays because it evaluates the symbolic computation in your graph. What I think you should do is to keep everything symbolic up to K.function, and after getting the gradients, perform all computations of the Grad-CAM weights and final saliency map using numpy.
Then you can iterate on your dataset, evaluate gradient_function on a new batch of data, and compute the saliency map.
If you want to keep everything symbolic, then you should not use K.function to produce the gradient function, but use the symbolic gradient (the output of K.gradient, without lambda) and convolutional feature maps (conv_output) and perform the saliency map computation on top of that, and then build a function (using K.function) that takes the model input, and outputs the saliency map.
Hope the explanation is enough.

Implemented LSTM inference code shows different test loss compared to using lasagne.

I have difficulty implementing LSTM inference code using matlab. I extraced weight from trained model using theano and lasagne. And then, i coded LSTM inference(test) code using extraced weights and biases.
The loss of implemented inference code shows about 5, but the test loss of theano and lasagne shows about 1.5.
I think implemented inference matlab code is correct and the code is like this.
for i=1:100
input1 = input(i,:);
input_gate=activation_sigmoid(mtimes(input1,W_in_to_ingate)+mtimes(hidden,W_hid_to_ingate)+reshape(b_ingate,1,512));
forget=activation_sigmoid(mtimes(input1,W_in_to_forgetgate)+mtimes(hidden,W_hid_to_forgetgate)+reshape(b_forgetgate,1,512));
output=activation_sigmoid(mtimes(input1,W_in_to_outgate)+mtimes(hidden,W_hid_to_outgate)+reshape(b_outgate,1,512));
cell=activation_tanh(mtimes(input1,W_in_to_cell)+mtimes(hidden,W_hid_to_cell)+reshape(b_cell,1,512));
ct=input_gate.*cell+forget.*ct;
hidden=activation_tanh(ct).*output;
result=softmax(reshape(mtimes(hidden,W)+reshape(b,1,83),83,1));
new_loss=-(log(result(answer(1,i)+1)));
loss = loss+new_loss;
end
loss=loss/100;
I have no idea which part is incorrect compared to lasagne LSTM source code.
The code using theano and lasagne is described below.
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
l_forward_2 = lab.LSTMLayer(
l_in,
num_units=N_HIDDEN,
grad_clipping=GRAD_CLIP,
peepholes=False,
nonlinearity=activation,
method=method) ### batch_size*SEQ_LENGTH*N_HIDDEN
l_shp = lasagne.layers.ReshapeLayer(l_forward_2, (-1, N_HIDDEN)) ## (batch_size*SEQ_LENGTH, N_HIDDEN)
l_out = lasagne.layers.DenseLayer(l_shp, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
batchsize, seqlen, _ = l_in.input_var.shape
l_shp1 = lasagne.layers.ReshapeLayer(l_out, (batchsize, seqlen, vocab_size))
l_out1 = lasagne.layers.SliceLayer(l_shp1, -1, 1)
test_output = lasagne.layers.get_output(l_out, deterministic=True)
test_loss = T.nnet.categorical_crossentropy(test_output,target.flatten()).mean()
val_fn = theano.function([l_in.input_var, target], test_loss, allow_input_downcast=True)
The input of LSTM layer and target label are made like this.
def gen_data(pp, batch_size,SEQ_LENGTH, data, return_target=True):
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size)) ###### 128*100*85
y = np.zeros((batch_size, SEQ_LENGTH))
for n in range(batch_size):
# ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[pp[n]*SEQ_LENGTH+i]]] = 1.
y[n,i] = char_to_ix[data[pp[n]*SEQ_LENGTH+i+1]]
return x, np.array(y,dtype='int32')
I have been struggled with this problem about a month. I really need your help.
Thank you.

Tensorflow CNN model always predicts same class

I have been trying to develop a CNN model for image classification. I am new to tensorflow and getting help from the following books
Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems
TensorFlow For Machine Intelligence by Sam Abrahams
For the past few weeks I have been working to develop a good model but I always get the same prediction. I have tried many different architectures but no luck!
Lately I decided to test my model with CIFAR-10 dataset and using the exact same model as given in the Learning Tensorflow book. But the outcome was same (same class for every image) even after training for 50K steps.
Here is highlight of my model and code.
1.) Downloaded CIFAR-10 image sets, converted them into tfrecord files with labels(labels are string for each category of CIFAR-10 in the tfrecord file) each for training and test set.
2) Reading the images from tfrecord file and generating random shuffle batch of size 100.
3) Converting the label from string to the integer32 type from 0-9 each for given category
4) Pass the training and test batches to the network and getting the output of [batch_size , num_class] size.
5) Train the model using Adam optimizer and softmax cross entropy loss function (Have tried gradient optimizer as well)
7) evaluate the model for test batches before and after the training.
8) Getting the same prediction for entire data set (But different every time I re run the code to try again)
Is there something wrong I am doing here? I would appreciate if someone could help me out with this problem.
Note - My approach of converting images and labels into tfrecord could be unusual but believe me I have come up with this idea from the books I mentioned earlier.
My code for the problem:
import tensorflow as tf
import numpy as np
import _datetime as dt
import PIL
# The glob module allows directory listing
import glob
import random
from itertools import groupby
from collections import defaultdict
H , W = 32 , 32 # Height and weight of the image
C = 3 # Number of channels
sessInt = tf.InteractiveSession()
# Read file and return the batches of the input data
def get_Batches_From_TFrecord(tf_record_filenames_list, batch_size):
# Match and load all the tfrecords found in the specified directory
tf_record_filename_queue = tf.train.string_input_producer(tf_record_filenames_list)
# It may have more than one example in them.
tf_record_reader = tf.TFRecordReader()
tf_image_name, tf_record_serialized = tf_record_reader.read(tf_record_filename_queue)
# The label and image are stored as bytes but could be stored as int64 or float64 values in a
# serialized tf.Example protobuf.
tf_record_features = tf.parse_single_example(tf_record_serialized,
features={'label': tf.FixedLenFeature([], tf.string),
'image': tf.FixedLenFeature([], tf.string), })
# Using tf.uint8 because all of the channel information is between 0-255
tf_record_image = tf.decode_raw(tf_record_features['image'], tf.uint8)
try:
# Reshape the image to look like the input image
tf_record_image = tf.reshape(tf_record_image, [H, W, C])
except:
print(tf_image_name)
tf_record_label = tf.cast(tf_record_features['label'], tf.string)
'''
#Check the image and label
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sessInt, coord=coord)
label = tf_record_label.eval().decode()
print(label)
image = PIL.Image.fromarray(tf_record_image.eval())
image.show()
coord.request_stop()
coord.join(threads)
'''
# creating a batch to feed the data
min_after_dequeue = 10 * batch_size
capacity = min_after_dequeue + 5 * batch_size
# Shuffle examples while feeding in the queue
image_batch, label_batch = tf.train.shuffle_batch([tf_record_image, tf_record_label], batch_size=batch_size,
capacity=capacity, min_after_dequeue=min_after_dequeue)
# Sequential feed in the examples in the queue (Don't shuffle)
# image_batch, label_batch = tf.train.batch([tf_record_image, tf_record_label], batch_size=batch_size, capacity=capacity)
# Converting the images to a float to match the expected input to convolution2d
float_image_batch = tf.image.convert_image_dtype(image_batch, tf.float32)
string_label_batch = label_batch
return float_image_batch, string_label_batch
#Count the number of images in the tfrecord file
def number_of_records(tfrecord_file_name):
count = 0
record_iterator = tf.python_io.tf_record_iterator(path = tfrecord_file_name)
for record in record_iterator:
count+=1
return count
def get_num_of_samples(tfrecords_list):
total_samples = 0
for tfrecord in tfrecords_list:
total_samples += number_of_records(tfrecord)
return total_samples
# Provide the input tfrecord names in a list
train_filenames = ["./TFRecords/cifar_train.tfrecord"]
test_filename = ["./TFRecords/cifar_test.tfrecord"]
num_train_samples = get_num_of_samples(train_filenames)
num_test_samples = get_num_of_samples(test_filename)
print("Number of Training samples: ", num_train_samples)
print("Number of Test samples: ", num_test_samples)
'''
IMP Note : (Batch_size * Training_Steps) should be at least greater than (2*Number_of_samples) for shuffling of batches
'''
train_batch_size = 100
# Total number of batches for input records
# Note - Num of samples in the tfrecord file can be determined by the tfrecord iterator.
# Batch size for test samples
test_batch_size = 50
train_image_batch, train_label_batch = get_Batches_From_TFrecord(train_filenames, train_batch_size)
test_image_batch, test_label_batch = get_Batches_From_TFrecord(test_filename, test_batch_size)
# Definition of the convolution network which returns a single neuron for each input image in the batch
# Define a placeholder for keep probability in dropout
# (Dropout should only use while training, for testing dropout should be always 1.0)
fc_prob = tf.placeholder(tf.float32)
conv_prob = tf.placeholder(tf.float32)
#Helper function to add learned filters(images) into tensorboard summary - for a random input in the batch
def add_filter_summary(name, filter_tensor):
rand_idx = random.randint(0,filter_tensor.get_shape()[0]-1) #Choose any random number from[0,batch_size)
#dispay_filter = filter_tensor[random.randint(0,filter_tensor.get_shape()[3])]
dispay_filter = filter_tensor[5] #keeping the index fix for consistency in visualization
with tf.name_scope("Filter_Summaries"):
img_summary = tf.summary.image(name, tf.reshape(dispay_filter,[-1 , filter_tensor.get_shape()[1],filter_tensor.get_shape()[1],1] ), max_outputs = 500)
# Helper functions for the network
def weight_initializer(shape):
weights = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(weights)
def bias_initializer(shape):
biases = tf.constant(0.1, shape=shape)
return tf.Variable(biases)
def conv2d(input, weights, stride):
return tf.nn.conv2d(input, filter=weights, strides=[1, stride, stride, 1], padding="SAME")
def pool_layer(input, window_size=2 , stride=2):
return tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1, stride, stride, 1], padding='VALID')
# This is the actual layer we will use.
# Linear convolution as defined in conv2d, with a bias,
# followed by the ReLU nonlinearity.
def conv_layer(input, filter_shape , stride=1):
W = weight_initializer(filter_shape)
b = bias_initializer([filter_shape[3]])
return tf.nn.relu(conv2d(input, W, stride) + b)
# A standard full layer with a bias. Notice that here we didn’t add the ReLU.
# This allows us to use the same layer for the final output,
# where we don’t need the nonlinear part.
def full_layer(input, out_size):
in_size = int(input.get_shape()[1])
W = weight_initializer([in_size, out_size])
b = bias_initializer([out_size])
return tf.matmul(input, W) + b
## Model fro the book learning tensorflow - for CIFAR data
def conv_network(image_batch, batch_size):
# Now create the model which returns the output neurons (eequals to the number of labels)
# as a final fully connecetd layer output. Which we can use as input to the softmax classifier
C1 , C2 , C3 = 30 , 50, 80 # Number of output features for each convolution layer
F1 = 500 # Number of output neuron for FC1 layer
#Add original image to tensorboard summary
add_filter_summary("Original" , image_batch)
# First convolutaion layer with 5x5 filter size and 32 filters
conv1 = conv_layer(image_batch, filter_shape=[3, 3, C, C1])
pool1 = pool_layer(conv1, window_size=2)
pool1 = tf.nn.dropout(pool1, keep_prob=conv_prob)
add_filter_summary("conv1" , pool1)
# Second convolutaion layer with 5x5 filter_size and 64 filters
conv2 = conv_layer(pool1, filter_shape=[5, 5, C1, C2])
pool2 = pool_layer(conv2, 2)
pool2 = tf.nn.dropout(pool2, keep_prob=conv_prob)
add_filter_summary("conv2" , pool2)
# Third convolution layer
conv3 = conv_layer(pool2, filter_shape=[5, 5, C2, C3])
# Since at this point the feature maps are of size 8×8 (following the first two poolings
# that each reduced the 32×32 pictures by half on each axis).
# This last pool layer pools each of the feature maps and keeps only the maximal value.
# The number of feature maps at the third block was set to 80,
# so at that point (following the max pooling) the representation is reduced to only 80 numbers
pool3 = pool_layer(conv3, window_size = 8 , stride=8)
pool3 = tf.nn.dropout(pool3, keep_prob=conv_prob)
add_filter_summary("conv3" , pool3)
# Reshape the output to feed to the FC layer
flatterned_layer = tf.reshape(pool3, [batch_size,
-1]) # -1 is to specify to use all the dimensions remaining in the input (other than batch_size).reshape(input , )
fc1 = tf.nn.relu(full_layer(flatterned_layer, F1))
full1_drop = tf.nn.dropout(fc1, keep_prob=fc_prob)
# Fully connected layer 2 (output layer)
final_Output = full_layer(full1_drop, 10)
return final_Output, tf.summary.merge_all()
# Now that architecture is created , next step is to create the classification model
# (to predict the output class of the input data)
# Here we have used Logistic regression (Sigmoid function) to predict the output because we have only rwo class.
# For multiple class problem - softmax is the best prediction function
# Prepare the inputs to the input
Train_X , img_summary = conv_network(train_image_batch, train_batch_size)
Test_X , _ = conv_network(test_image_batch, test_batch_size)
# Generate 0 based index for labels
Train_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(train_label_batch, ["airplane"]), tf.equal(train_label_batch, ["automobile"]),
tf.equal(train_label_batch, ["bird"]),tf.equal(train_label_batch, ["cat"]),
tf.equal(train_label_batch, ["deer"]),tf.equal(train_label_batch, ["dog"]),
tf.equal(train_label_batch, ["frog"]),tf.equal(train_label_batch, ["horse"]),
tf.equal(train_label_batch, ["ship"]), tf.equal(train_label_batch, ["truck"]) ])), 0))
Test_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(test_label_batch, ["airplane"]), tf.equal(test_label_batch, ["automobile"]),
tf.equal(test_label_batch, ["bird"]),tf.equal(test_label_batch, ["cat"]),
tf.equal(test_label_batch, ["deer"]),tf.equal(test_label_batch, ["dog"]),
tf.equal(test_label_batch, ["frog"]),tf.equal(test_label_batch, ["horse"]),
tf.equal(test_label_batch, ["ship"]), tf.equal(test_label_batch, ["truck"]) ])), 0))
# Y = tf.reshape(float_label_batch, X.get_shape())
# compute inference model over data X and return the result
# (using sigmoid function - as this function is the best to predict two class output)
# (For multiclass problem - Softmax is the bset prediction function)
def inference(X):
return tf.nn.softmax(X)
# compute loss over training data X and expected outputs Y
# Cross entropy function is the best suited for loss calculation (Than the squared error function)
# Get the second column of the input to get only the features
def loss(X, Y):
return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=X, labels=Y))
# train / adjust model parameters according to computed total loss (using gradient descent)
def train(total_loss, learning_rate):
return tf.train.AdamOptimizer(learning_rate).minimize(total_loss)
# evaluate the resulting trained model with dropout probability (Ideally 1.0 for testing)
def evaluate(sess, X, Y, dropout_prob):
# predicted = tf.cast(inference(X) > 0.5 , tf.float32)
#print("\nNetwork output:")
#print(sess.run(inference(X) , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
# Inference contains the predicted probability of each class for each input image.
# The class having higher probability is the prediction of the network. y_pred_cls = tf.argmax(y_pred, dimension=1)
predicted = tf.cast(tf.argmax(X, 1), tf.int32)
#print("\npredicted labels:")
#print(sess.run(predicted , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
#print("\nTrue Labels:")
#print(sess.run(Y , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
batch_accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32))
# calculate the mean of the accuracies of the each batch (iteration)
# No. of iteration Iteration should cover the (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples ) condition
total_accuracy = np.mean([sess.run(batch_accuracy, feed_dict={conv_prob:1.0 , fc_prob:1.0}) for i in range(250)])
print("Accuracy of the model(in %): {:.4f} ".format(100 * total_accuracy))
# create a saver class to save the training checkpoints
saver = tf.train.Saver(max_to_keep=10)
# Create tensorboard sumamry for loss function
with tf.name_scope("summaries"):
loss_summary = tf.summary.scalar("loss", loss(Train_X, Train_Y))
#merged = tf.summary.merge_all()
# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
log_writer = tf.summary.FileWriter('./logs', sess.graph)
total_loss = loss(Train_X, Train_Y)
train_op = train(total_loss, 0.001)
#Initialise all variables after defining all variables
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(sess.run(Train_Y))
print(sess.run(Test_Y))
evaluate(sess, Test_X, Test_Y,1.0)
# actual training loop------------------------------------------------------
training_steps = 50000
print("\nStarting to train model with", str(training_steps), " steps...")
to1 = dt.datetime.now()
for step in range(1, training_steps + 1):
# print(sess.run(train_label_batch))
sess.run([train_op], feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # Pass the dropout value for training batch to the placeholder
# for debugging and learning purposes, see how the loss gets decremented thru training steps
if step % 100 == 0:
# print("\n")
# print(sess.run(train_label_batch))
loss_summaries, img_summaries , Tloss = sess.run([loss_summary, img_summary, total_loss],
feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # evaluate total loss to add it in summary object
log_writer.add_summary(loss_summaries, step) # add summary for each step
log_writer.add_summary(img_summaries, step)
print("Step:", step, " , loss: ", Tloss)
if step%2000 == 0:
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
print("\n")
evaluate(sess, Test_X, Test_Y,1.0)
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
to2 = dt.datetime.now()
print("\nTotal Trainig time Elapsed: ", str(to2 - to1))
# once the training is complete, evaluate the model with test (validation set)-------------------------------------------
# Restore the model file and perform the testing
#saver.restore(sess, "./Models/BookLT3_CIFAR-15000")
print("\nPost Training....")
# Performs Evaluation of model on batches of test samples
# In order to evaluate entire test set , number of iteration should be chosen such that ,
# (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples )
evaluate(sess, Test_X, Test_Y,1.0) # Evaluate multiple batch of test data set (randomly chosen by shuffle train batch queue)
evaluate(sess, Test_X, Test_Y,1.0)
evaluate(sess, Test_X, Test_Y,1.0)
coord.request_stop()
coord.join(threads)
sess.close()
Here is the screenshot of my Pre training result:
Here is the screenshot of the result during training:
Hereis the screenshot of the Post training result

I did not run the code to verify that this is the only issue, but here is one important issue. When classifying, you should use one-hot encoding for your labels. Meaning that if you have 3 classes, you want your labels to be [1, 0, 0] for class 1, [0, 1, 0] for class 2, [0, 0, 1] for class 3. Your approach of using 1, 2, and 3 as labels leads to various issues. For examples, the network is penalized more for predicting class 1 versus predicting class 2 for an image from class 3. TensorFlow functions like tf.nn.softmax_cross_entropy_with_logits work with such representations.
Here is the basic example of correctly using one_hot labels to compute loss: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
Here is how the one_hot label is constructed for mnist digits:
https://github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69

Predicting the next word using the LSTM ptb model tensorflow example

I am trying to use the tensorflow LSTM model to make next word predictions.
As described in this related question (which has no accepted answer) the example contains pseudocode to extract next word probabilities:
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
I am confused about how to interpret the probabilities vector. I modified the __init__ function of the PTBModel in ptb_word_lm.py to store the probabilities and logits:
class PTBModel(object):
"""The PTB model."""
def __init__(self, is_training, config):
# General definition of LSTM (unrolled)
# identical to tensorflow example ...
# omitted for brevity ...
# computing the logits (also from example code)
logits = tf.nn.xw_plus_b(output,
tf.get_variable("softmax_w", [size, vocab_size]),
tf.get_variable("softmax_b", [vocab_size]))
loss = seq2seq.sequence_loss_by_example([logits],
[tf.reshape(self._targets, [-1])],
[tf.ones([batch_size * num_steps])],
vocab_size)
self._cost = cost = tf.reduce_sum(loss) / batch_size
self._final_state = states[-1]
# my addition: storing the probabilities and logits
self.probabilities = tf.nn.softmax(logits)
self.logits = logits
# more model definition ...
Then printed some info about them in the run_epoch function:
def run_epoch(session, m, data, eval_op, verbose=True):
"""Runs the model on the given data."""
# first part of function unchanged from example
for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
m.num_steps)):
# evaluate proobability and logit tensors too:
cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
{m.input_data: x,
m.targets: y,
m.initial_state: state})
costs += cost
iters += m.num_steps
if verbose and step % (epoch_size // 10) == 10:
print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
(step * 1.0 / epoch_size, np.exp(costs / iters),
iters * m.batch_size / (time.time() - start_time), iters))
chosen_word = np.argmax(probs, 1)
print("Probabilities shape: %s, Logits shape: %s" %
(probs.shape, logits.shape) )
print(chosen_word)
print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))
return np.exp(costs / iters)
This produces output like this:
0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220
(20, 10000) (20, 10000)
[ 14 1 6 589 1 5 0 87 6 5 3 5 2 2 2 2 6 2 6 1]
Batch size: 1, Num steps: 20
I was expecting the probs vector to be an array of probabilities, with one for each word in the vocabulary (eg with shape (1, vocab_size)), meaning that I could get the predicted word using np.argmax(probs, 1) as suggested in the other question.
However, the first dimension of the vector is actually equal to the number of steps in the unrolled LSTM (20 if the small config settings are used), which I'm not sure what to do with. To access to the predicted word, do I just need to use the last value (because it's the output of the final step)? Or is there something else that I'm missing?
I tried to understand how the predictions are made and evaluated by looking at the implementation of seq2seq.sequence_loss_by_example, which must perform this evaluation, but this ends up calling gen_nn_ops._sparse_softmax_cross_entropy_with_logits, which doesn't seem to be included in the github repo, so I'm not sure where else to look.
I'm quite new to both tensorflow and LSTMs, so any help is appreciated!

The output tensor contains the concatentation of the LSTM cell outputs for each timestep (see its definition here). Therefore you can find the prediction for the next word by taking chosen_word[-1] (or chosen_word[sequence_length - 1] if the sequence has been padded to match the unrolled LSTM).
The tf.nn.sparse_softmax_cross_entropy_with_logits() op is documented in the public API under a different name. For technical reasons, it calls a generated wrapper function that does not appear in the GitHub repository. The implementation of the op is in C++, here.

I am implementing seq2seq model too.
So lets me try to explain with my understanding:
The outputs of your LSTM model is a list (with length num_steps) of 2D tensor of size [batch_size, size].
The code line:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
will produce a new output which is a 2D tensor of size [batch_size x num_steps, size].
For your case, batch_size = 1 and num_steps = 20 --> output shape is [20, size].
Code line:
logits = tf.nn.xw_plus_b(output, tf.get_variable("softmax_w", [size, vocab_size]), tf.get_variable("softmax_b", [vocab_size]))
<=> output[batch_size x num_steps, size] x softmax_w[size, vocab_size] will output logits of size [batch_size x num_steps, vocab_size].
For your case, logits of size [20, vocab_size]
--> probs tensor has same size as logits by [20, vocab_size].
Code line:
chosen_word = np.argmax(probs, 1)
will output chosen_word tensor of size [20, 1] with each value is the next prediction word index of current word.
Code line:
loss = seq2seq.sequence_loss_by_example([logits], [tf.reshape(self._targets, [-1])], [tf.ones([batch_size * num_steps])])
is to compute the softmax cross entropy loss for batch_size of sequences.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding dimension of input to pre-defined LSTM - python

The full example for PTB is in the source code. There are recommended defaults (SmallConfig, MediumConfig, and LargeConfig) that you can use.

Related

Multi dimensional input multi dimensional output rnn keras data preprocessing

Solved: How to combine tf.gradients with tf.data.dataset and keras models

Implemented LSTM inference code shows different test loss compared to using lasagne.

Tensorflow CNN model always predicts same class

Predicting the next word using the LSTM ptb model tensorflow example

Categories

Resources