I have trained a simple long short-term memory (lstm) model in lasagne following the recipie here:https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py
Here is the architecture:
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
# We now build the LSTM layer which takes l_in as the input layer
# We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients.
l_forward_1 = lasagne.layers.LSTMLayer(
l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
l_forward_2 = lasagne.layers.LSTMLayer(
l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
# The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN)
# Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer.
# The output of the sliced layer will then be of size (batch_size, N_HIDDEN)
l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1)
# The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction
# The output of this stage is (batch_size, vocab_size)
l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
# Theano tensor for the targets
target_values = T.ivector('target_output')
# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(l_out)
# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
cost = T.nnet.categorical_crossentropy(network_output,target_values).mean()
# Retrieve all parameters from the network
all_params = lasagne.layers.get_all_params(l_out)
# Compute AdaGrad updates for training
print("Computing updates ...")
updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE)
# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True)
compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True)
# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs.
probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)
and the model is trained via:
for it in xrange(data_size * num_epochs / BATCH_SIZE):
try_it_out() # Generate text using the p^th character as the start.
avg_cost = 0;
for _ in range(PRINT_FREQ):
x,y = gen_data(p)
#print(p)
p += SEQ_LENGTH + BATCH_SIZE - 1
if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
print('Carriage Return')
p = 0;
avg_cost += train(x, y)
print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))
How can I save the model so I do not need to train it again? With scikit I generally just pickle the model object. However I am unclear on the analogous process with Theano / lasagne.
You can save the weights with numpy:
np.savez('model.npz', *lasagne.layers.get_all_param_values(network_output))
And load them again later on like this:
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network_output, param_values)
Source: https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py
As for the model definition itself: One option is certainly to keep the code and regenerate the network, before setting the pretrained weights.
You can save the model parameters and the model by Pickle
import cPickle as pickle
import os
#save the network and its parameters as a dictionary
netInfo = {'network': network, 'params': lasagne.layers.get_all_param_values(network)}
Net_FileName = 'LSTM.pkl'
# save the dictionary as a .pkl file
pickle.dump(netInfo, open(os.path.join(/path/to/a/folder/, Net_FileName), 'wb'),protocol=pickle.HIGHEST_PROTOCOL)
After saving your model, it can be retrieved by pickle.load:
net = pickle.load(open(os.path.join(/path/to/a/folder/,Net_FileName),'rb'))
all_params = net['params']
lasagne.layers.set_all_param_values(net['network'], all_params)
I've had success using dill in combination with the numpy.savez function:
import dill as pickle
...
np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
with open('model.dpkl','wb') as p_output:
pickle.dump(network, p_output)
To import the pickled model:
with open('model.dpkl', 'rb') as p_input:
network = pickle.load(p_input)
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network, param_values)
Related
I am trying to understand deep-learning with pytorch. I read the pytorch tutorial: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html, and its written:
''torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.
For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.
If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.''
I am not sure to understand what it means.
Indeed, I have made a simple feed-forward neural network (cf below code), on which I used a really small dataset (the idea is to learn how it works without mini-batches first, not really to have anything useful), and hence don't need to use mini-batches. Hence, I introduce directly all the sample at each epochs. If I understand it correctly, I should add 'train_data = train_data.unsqueeze(0) '. But I am not sure where, as it seems to change the data size to 1. Also, it works without adding this line so why should I really use it?
Any help would be highly appreciated!
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pandas as pd
import os
import numpy as np
# Download data
#...
# Construct network
len_input = len(data[0])
len_output = nbr_class
print('There is %d classes used for classification'%nbr_class)
#defining a new class: Net, that extended nn.Module
#setup the “skeleton” of our network architecture
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#creation of fully connected. A fully connected neural network layer is represented by the nn.Linear object,
#with the first argument in the definition being the number of nodes in layer l and the next argument being
#the number of nodes in layer l+1
self.fc1 = nn.Linear(len_input, 200)
self.fc2 = nn.Linear(200, 200)
self.fc3 = nn.Linear(200, len_output)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return F.log_softmax(x)
# Create initial network
epochs = 3000
model = Net()
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Train & test network
def train(data, target, epoch):
model.train() # set the model in "training mode"
# run the main training loop
#no need batch size as small number of sample! like this the process is more exact
#zero our gradients before calling .backward() which is necessary for new sum of gradients
optimizer.zero_grad()
target_pred = model(data)
loss = criterion(target_pred, target)
#Propagate the gradients back through the network
loss.backward()
#update the weights
#tell our optimizer to “step”, meaning that the weights will be updated using the calculated gradients
#according to our rule. perform model parameter update (update weights)
optimizer.step()
# for graphing puposes
loss_array.append(loss.data[0])
if epoch in list(range(0,20000,100)):
print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.data[0]))
def test(epoch, test_data, test_target):
#eval mode to turn Dropout and BatchNorm off
model.eval()
test_loss = 0
correct = 0
test_target_pred = model(test_data)
criterion = nn.NLLLoss()
# sum up batch loss
test_loss += criterion(test_target_pred, test_target).data[0]
pred = test_target_pred.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(test_target.data).sum()
if epoch in list(range(0,20000,100)):
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_target), 100. * correct / len(test_target)))
if __name__ == '__main__':
for epoch in range(epochs):
train(data = train_data, target=train_target, epoch=epoch)
test(epoch, test_data, test_target)
I have been trying to develop a CNN model for image classification. I am new to tensorflow and getting help from the following books
Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems
TensorFlow For Machine Intelligence by Sam Abrahams
For the past few weeks I have been working to develop a good model but I always get the same prediction. I have tried many different architectures but no luck!
Lately I decided to test my model with CIFAR-10 dataset and using the exact same model as given in the Learning Tensorflow book. But the outcome was same (same class for every image) even after training for 50K steps.
Here is highlight of my model and code.
1.) Downloaded CIFAR-10 image sets, converted them into tfrecord files with labels(labels are string for each category of CIFAR-10 in the tfrecord file) each for training and test set.
2) Reading the images from tfrecord file and generating random shuffle batch of size 100.
3) Converting the label from string to the integer32 type from 0-9 each for given category
4) Pass the training and test batches to the network and getting the output of [batch_size , num_class] size.
5) Train the model using Adam optimizer and softmax cross entropy loss function (Have tried gradient optimizer as well)
7) evaluate the model for test batches before and after the training.
8) Getting the same prediction for entire data set (But different every time I re run the code to try again)
Is there something wrong I am doing here? I would appreciate if someone could help me out with this problem.
Note - My approach of converting images and labels into tfrecord could be unusual but believe me I have come up with this idea from the books I mentioned earlier.
My code for the problem:
import tensorflow as tf
import numpy as np
import _datetime as dt
import PIL
# The glob module allows directory listing
import glob
import random
from itertools import groupby
from collections import defaultdict
H , W = 32 , 32 # Height and weight of the image
C = 3 # Number of channels
sessInt = tf.InteractiveSession()
# Read file and return the batches of the input data
def get_Batches_From_TFrecord(tf_record_filenames_list, batch_size):
# Match and load all the tfrecords found in the specified directory
tf_record_filename_queue = tf.train.string_input_producer(tf_record_filenames_list)
# It may have more than one example in them.
tf_record_reader = tf.TFRecordReader()
tf_image_name, tf_record_serialized = tf_record_reader.read(tf_record_filename_queue)
# The label and image are stored as bytes but could be stored as int64 or float64 values in a
# serialized tf.Example protobuf.
tf_record_features = tf.parse_single_example(tf_record_serialized,
features={'label': tf.FixedLenFeature([], tf.string),
'image': tf.FixedLenFeature([], tf.string), })
# Using tf.uint8 because all of the channel information is between 0-255
tf_record_image = tf.decode_raw(tf_record_features['image'], tf.uint8)
try:
# Reshape the image to look like the input image
tf_record_image = tf.reshape(tf_record_image, [H, W, C])
except:
print(tf_image_name)
tf_record_label = tf.cast(tf_record_features['label'], tf.string)
'''
#Check the image and label
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sessInt, coord=coord)
label = tf_record_label.eval().decode()
print(label)
image = PIL.Image.fromarray(tf_record_image.eval())
image.show()
coord.request_stop()
coord.join(threads)
'''
# creating a batch to feed the data
min_after_dequeue = 10 * batch_size
capacity = min_after_dequeue + 5 * batch_size
# Shuffle examples while feeding in the queue
image_batch, label_batch = tf.train.shuffle_batch([tf_record_image, tf_record_label], batch_size=batch_size,
capacity=capacity, min_after_dequeue=min_after_dequeue)
# Sequential feed in the examples in the queue (Don't shuffle)
# image_batch, label_batch = tf.train.batch([tf_record_image, tf_record_label], batch_size=batch_size, capacity=capacity)
# Converting the images to a float to match the expected input to convolution2d
float_image_batch = tf.image.convert_image_dtype(image_batch, tf.float32)
string_label_batch = label_batch
return float_image_batch, string_label_batch
#Count the number of images in the tfrecord file
def number_of_records(tfrecord_file_name):
count = 0
record_iterator = tf.python_io.tf_record_iterator(path = tfrecord_file_name)
for record in record_iterator:
count+=1
return count
def get_num_of_samples(tfrecords_list):
total_samples = 0
for tfrecord in tfrecords_list:
total_samples += number_of_records(tfrecord)
return total_samples
# Provide the input tfrecord names in a list
train_filenames = ["./TFRecords/cifar_train.tfrecord"]
test_filename = ["./TFRecords/cifar_test.tfrecord"]
num_train_samples = get_num_of_samples(train_filenames)
num_test_samples = get_num_of_samples(test_filename)
print("Number of Training samples: ", num_train_samples)
print("Number of Test samples: ", num_test_samples)
'''
IMP Note : (Batch_size * Training_Steps) should be at least greater than (2*Number_of_samples) for shuffling of batches
'''
train_batch_size = 100
# Total number of batches for input records
# Note - Num of samples in the tfrecord file can be determined by the tfrecord iterator.
# Batch size for test samples
test_batch_size = 50
train_image_batch, train_label_batch = get_Batches_From_TFrecord(train_filenames, train_batch_size)
test_image_batch, test_label_batch = get_Batches_From_TFrecord(test_filename, test_batch_size)
# Definition of the convolution network which returns a single neuron for each input image in the batch
# Define a placeholder for keep probability in dropout
# (Dropout should only use while training, for testing dropout should be always 1.0)
fc_prob = tf.placeholder(tf.float32)
conv_prob = tf.placeholder(tf.float32)
#Helper function to add learned filters(images) into tensorboard summary - for a random input in the batch
def add_filter_summary(name, filter_tensor):
rand_idx = random.randint(0,filter_tensor.get_shape()[0]-1) #Choose any random number from[0,batch_size)
#dispay_filter = filter_tensor[random.randint(0,filter_tensor.get_shape()[3])]
dispay_filter = filter_tensor[5] #keeping the index fix for consistency in visualization
with tf.name_scope("Filter_Summaries"):
img_summary = tf.summary.image(name, tf.reshape(dispay_filter,[-1 , filter_tensor.get_shape()[1],filter_tensor.get_shape()[1],1] ), max_outputs = 500)
# Helper functions for the network
def weight_initializer(shape):
weights = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(weights)
def bias_initializer(shape):
biases = tf.constant(0.1, shape=shape)
return tf.Variable(biases)
def conv2d(input, weights, stride):
return tf.nn.conv2d(input, filter=weights, strides=[1, stride, stride, 1], padding="SAME")
def pool_layer(input, window_size=2 , stride=2):
return tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1, stride, stride, 1], padding='VALID')
# This is the actual layer we will use.
# Linear convolution as defined in conv2d, with a bias,
# followed by the ReLU nonlinearity.
def conv_layer(input, filter_shape , stride=1):
W = weight_initializer(filter_shape)
b = bias_initializer([filter_shape[3]])
return tf.nn.relu(conv2d(input, W, stride) + b)
# A standard full layer with a bias. Notice that here we didn’t add the ReLU.
# This allows us to use the same layer for the final output,
# where we don’t need the nonlinear part.
def full_layer(input, out_size):
in_size = int(input.get_shape()[1])
W = weight_initializer([in_size, out_size])
b = bias_initializer([out_size])
return tf.matmul(input, W) + b
## Model fro the book learning tensorflow - for CIFAR data
def conv_network(image_batch, batch_size):
# Now create the model which returns the output neurons (eequals to the number of labels)
# as a final fully connecetd layer output. Which we can use as input to the softmax classifier
C1 , C2 , C3 = 30 , 50, 80 # Number of output features for each convolution layer
F1 = 500 # Number of output neuron for FC1 layer
#Add original image to tensorboard summary
add_filter_summary("Original" , image_batch)
# First convolutaion layer with 5x5 filter size and 32 filters
conv1 = conv_layer(image_batch, filter_shape=[3, 3, C, C1])
pool1 = pool_layer(conv1, window_size=2)
pool1 = tf.nn.dropout(pool1, keep_prob=conv_prob)
add_filter_summary("conv1" , pool1)
# Second convolutaion layer with 5x5 filter_size and 64 filters
conv2 = conv_layer(pool1, filter_shape=[5, 5, C1, C2])
pool2 = pool_layer(conv2, 2)
pool2 = tf.nn.dropout(pool2, keep_prob=conv_prob)
add_filter_summary("conv2" , pool2)
# Third convolution layer
conv3 = conv_layer(pool2, filter_shape=[5, 5, C2, C3])
# Since at this point the feature maps are of size 8×8 (following the first two poolings
# that each reduced the 32×32 pictures by half on each axis).
# This last pool layer pools each of the feature maps and keeps only the maximal value.
# The number of feature maps at the third block was set to 80,
# so at that point (following the max pooling) the representation is reduced to only 80 numbers
pool3 = pool_layer(conv3, window_size = 8 , stride=8)
pool3 = tf.nn.dropout(pool3, keep_prob=conv_prob)
add_filter_summary("conv3" , pool3)
# Reshape the output to feed to the FC layer
flatterned_layer = tf.reshape(pool3, [batch_size,
-1]) # -1 is to specify to use all the dimensions remaining in the input (other than batch_size).reshape(input , )
fc1 = tf.nn.relu(full_layer(flatterned_layer, F1))
full1_drop = tf.nn.dropout(fc1, keep_prob=fc_prob)
# Fully connected layer 2 (output layer)
final_Output = full_layer(full1_drop, 10)
return final_Output, tf.summary.merge_all()
# Now that architecture is created , next step is to create the classification model
# (to predict the output class of the input data)
# Here we have used Logistic regression (Sigmoid function) to predict the output because we have only rwo class.
# For multiple class problem - softmax is the best prediction function
# Prepare the inputs to the input
Train_X , img_summary = conv_network(train_image_batch, train_batch_size)
Test_X , _ = conv_network(test_image_batch, test_batch_size)
# Generate 0 based index for labels
Train_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(train_label_batch, ["airplane"]), tf.equal(train_label_batch, ["automobile"]),
tf.equal(train_label_batch, ["bird"]),tf.equal(train_label_batch, ["cat"]),
tf.equal(train_label_batch, ["deer"]),tf.equal(train_label_batch, ["dog"]),
tf.equal(train_label_batch, ["frog"]),tf.equal(train_label_batch, ["horse"]),
tf.equal(train_label_batch, ["ship"]), tf.equal(train_label_batch, ["truck"]) ])), 0))
Test_Y = tf.to_int32(tf.argmax(
tf.to_int32(tf.stack([tf.equal(test_label_batch, ["airplane"]), tf.equal(test_label_batch, ["automobile"]),
tf.equal(test_label_batch, ["bird"]),tf.equal(test_label_batch, ["cat"]),
tf.equal(test_label_batch, ["deer"]),tf.equal(test_label_batch, ["dog"]),
tf.equal(test_label_batch, ["frog"]),tf.equal(test_label_batch, ["horse"]),
tf.equal(test_label_batch, ["ship"]), tf.equal(test_label_batch, ["truck"]) ])), 0))
# Y = tf.reshape(float_label_batch, X.get_shape())
# compute inference model over data X and return the result
# (using sigmoid function - as this function is the best to predict two class output)
# (For multiclass problem - Softmax is the bset prediction function)
def inference(X):
return tf.nn.softmax(X)
# compute loss over training data X and expected outputs Y
# Cross entropy function is the best suited for loss calculation (Than the squared error function)
# Get the second column of the input to get only the features
def loss(X, Y):
return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=X, labels=Y))
# train / adjust model parameters according to computed total loss (using gradient descent)
def train(total_loss, learning_rate):
return tf.train.AdamOptimizer(learning_rate).minimize(total_loss)
# evaluate the resulting trained model with dropout probability (Ideally 1.0 for testing)
def evaluate(sess, X, Y, dropout_prob):
# predicted = tf.cast(inference(X) > 0.5 , tf.float32)
#print("\nNetwork output:")
#print(sess.run(inference(X) , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
# Inference contains the predicted probability of each class for each input image.
# The class having higher probability is the prediction of the network. y_pred_cls = tf.argmax(y_pred, dimension=1)
predicted = tf.cast(tf.argmax(X, 1), tf.int32)
#print("\npredicted labels:")
#print(sess.run(predicted , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
#print("\nTrue Labels:")
#print(sess.run(Y , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
batch_accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32))
# calculate the mean of the accuracies of the each batch (iteration)
# No. of iteration Iteration should cover the (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples ) condition
total_accuracy = np.mean([sess.run(batch_accuracy, feed_dict={conv_prob:1.0 , fc_prob:1.0}) for i in range(250)])
print("Accuracy of the model(in %): {:.4f} ".format(100 * total_accuracy))
# create a saver class to save the training checkpoints
saver = tf.train.Saver(max_to_keep=10)
# Create tensorboard sumamry for loss function
with tf.name_scope("summaries"):
loss_summary = tf.summary.scalar("loss", loss(Train_X, Train_Y))
#merged = tf.summary.merge_all()
# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
log_writer = tf.summary.FileWriter('./logs', sess.graph)
total_loss = loss(Train_X, Train_Y)
train_op = train(total_loss, 0.001)
#Initialise all variables after defining all variables
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(sess.run(Train_Y))
print(sess.run(Test_Y))
evaluate(sess, Test_X, Test_Y,1.0)
# actual training loop------------------------------------------------------
training_steps = 50000
print("\nStarting to train model with", str(training_steps), " steps...")
to1 = dt.datetime.now()
for step in range(1, training_steps + 1):
# print(sess.run(train_label_batch))
sess.run([train_op], feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # Pass the dropout value for training batch to the placeholder
# for debugging and learning purposes, see how the loss gets decremented thru training steps
if step % 100 == 0:
# print("\n")
# print(sess.run(train_label_batch))
loss_summaries, img_summaries , Tloss = sess.run([loss_summary, img_summary, total_loss],
feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # evaluate total loss to add it in summary object
log_writer.add_summary(loss_summaries, step) # add summary for each step
log_writer.add_summary(img_summaries, step)
print("Step:", step, " , loss: ", Tloss)
if step%2000 == 0:
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
print("\n")
evaluate(sess, Test_X, Test_Y,1.0)
saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
to2 = dt.datetime.now()
print("\nTotal Trainig time Elapsed: ", str(to2 - to1))
# once the training is complete, evaluate the model with test (validation set)-------------------------------------------
# Restore the model file and perform the testing
#saver.restore(sess, "./Models/BookLT3_CIFAR-15000")
print("\nPost Training....")
# Performs Evaluation of model on batches of test samples
# In order to evaluate entire test set , number of iteration should be chosen such that ,
# (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples )
evaluate(sess, Test_X, Test_Y,1.0) # Evaluate multiple batch of test data set (randomly chosen by shuffle train batch queue)
evaluate(sess, Test_X, Test_Y,1.0)
evaluate(sess, Test_X, Test_Y,1.0)
coord.request_stop()
coord.join(threads)
sess.close()
Here is the screenshot of my Pre training result:
Here is the screenshot of the result during training:
Hereis the screenshot of the Post training result
I did not run the code to verify that this is the only issue, but here is one important issue. When classifying, you should use one-hot encoding for your labels. Meaning that if you have 3 classes, you want your labels to be [1, 0, 0] for class 1, [0, 1, 0] for class 2, [0, 0, 1] for class 3. Your approach of using 1, 2, and 3 as labels leads to various issues. For examples, the network is penalized more for predicting class 1 versus predicting class 2 for an image from class 3. TensorFlow functions like tf.nn.softmax_cross_entropy_with_logits work with such representations.
Here is the basic example of correctly using one_hot labels to compute loss: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py
Here is how the one_hot label is constructed for mnist digits:
https://github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69
I have recently started using TensorFlow (TF), and I have come across a problem that I need some help with. Basically, I've restored a pre-trained model, and I need to modify the weights and biases of one of its layers before I retest its accuracy. Now, my problem is the following:
how can I change the weights and biases using the assign method in TF? Is modifying the weights of a restored modeled even possible in TF?
Here is my code:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data # Imports the MINST dataset
# Data Set:
# ---------
mnist = input_data.read_data_sets("/home/frr/MNIST_data", one_hot=True)# An object where data is stored
ImVecDim = 784# The number of elements in a an image vector (flattening a 28x28 2D image)
NumOfClasses = 10
g = tf.get_default_graph()
with tf.Session() as sess:
LoadMod = tf.train.import_meta_graph('simple_mnist.ckpt.meta') # This object loads the model
LoadMod.restore(sess, tf.train.latest_checkpoint('./'))# Loading weights and biases and other stuff to the model
# ( Here I'd like to modify the weights and biases of layer 1, set them to one for example, before I go ahead and test the accuracy ) #
# Testing the acuracy of the model:
X = g.get_tensor_by_name('ImageIn:0')
Y = g.get_tensor_by_name('LabelIn:0')
KP = g.get_tensor_by_name('KeepProb:0')
Accuracy = g.get_tensor_by_name('NetAccuracy:0')
feed_dict = { X: mnist.test.images[:256], Y: mnist.test.labels[:256], KP: 1.0 }
print( 'Model Accuracy = ' )
print( sess.run( Accuracy, feed_dict ) )
In addition to an existing answer, tensor update can be performed via tf.assign function.
v1 = sess.graph.get_tensor_by_name('v1:0')
print(sess.run(v1)) # 1.0
sess.run(tf.assign(v1, v1 + 1))
print(sess.run(v1)) # 2.0
Thanks for everyone who responded. I'd just like to put the pieces together. This is the code the helped me accomplish what I want:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data # Imports the MINST dataset
# Data Set:
# ---------
mnist = input_data.read_data_sets("/home/frr/MNIST_data", one_hot=True)# An object where data is stored
ImVecDim = 784# The number of elements in a an image vector (flattening a 28x28 2D image)
NumOfClasses = 10
g = tf.get_default_graph()
with tf.Session() as sess:
LoadMod = tf.train.import_meta_graph('simple_mnist.ckpt.meta') # This object loads the model
LoadMod.restore(sess, tf.train.latest_checkpoint('./'))# Loading weights and biases and other stuff to the model
wc1 = g.get_tensor_by_name('wc1:0')
sess.run( tf.assign( wc1,tf.multiply(wc1,0) ) )# Setting the values of the variable 'wc1' in the model to zero.
# Testing the acuracy of the model:
X = g.get_tensor_by_name('ImageIn:0')
Y = g.get_tensor_by_name('LabelIn:0')
KP = g.get_tensor_by_name('KeepProb:0')
Accuracy = g.get_tensor_by_name('NetAccuracy:0')
feed_dict = { X: mnist.test.images[:256], Y: mnist.test.labels[:256], KP: 1.0 }
print( 'Model Accuracy = ' )
print( sess.run( Accuracy, feed_dict ) )
Yes it is possible. Your weights and biases are already loaded after you loaded the meta graph. You need to find their names (see the list_variables function) and then assign them to a Python variable.
For that, use tf.get_variable with the variable name. You might have to set reuse=True on your variable scope. See this answer for more detail on reusing variables.
Once you have them as a weights variable, you can call sess.run(weights.assign(...)).
An update to this for Tensorflow 2.4 using a different example than OP's.
# Step 0 - Init
model = # some tf.keras.Model
model_folder = # path to model files
ckpt_obj = tf.train.Checkpoint(model=model)
ckpt_obj.restore(save_path=tf.train.latest_checkpoint(str(model_folder))).expect_partial()
# Step 1 - Loop over all layers
for layer in model.layers:
# Step 2 - Loop over submodules of a layer
for submodule in layer.submodules:
# Step 3 - Find a particular type of submodule (alternative use submodule.name=='SomeName')
if type(submodule) == tfp.layers.Convolution3DFlipout: # kernel=N(loc,scale) --> N=Normal distro
# Step 4 - Extract numpy weights using .get_weights()
## Note: Different compared to submodule.weights which returns a tensor that shall also have a name e.g. wc1:0
weights = submodule.get_weights() # [scale, rho, bias] --> kernel=N(loc,scale=tfp.bijectors.Softplus(rho)) --> output=input*kernel + bias
# Step 5 - Set weights as a new numpy array of your choice
weights[1] = np.full(weights[1].shape, -np.inf)
# Step 6 - Update weights
submodule.set_weights(weights)
input = tf.random.normal((1,100,100,100,1)) # 3D input with batch=1, channels=1
_ = model(input)
Problem statement
I am trying to train a dynamic RNN in TensorFlow v1.0.1 on Linux RedHat 7.3 (problem also manifests on Windows 7), and no matter what I try, I get the exact same training and validation error at every epoch, i.e. my weights are not updating.
I appreciate any help you can offer.
Example
I tried to reduce this to a minimum example that shows my issue, but the minimum example is still pretty large. I based the network structure largely on this gist.
Network definition
import functools
import numpy as np
import tensorflow as tf
def lazy_property(function):
attribute = '_' + function.__name__
#property
#functools.wraps(function)
def wrapper(self):
if not hasattr(self, attribute):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return wrapper
class MyNetwork:
"""
Class defining an RNN for labeling a time series.
"""
def __init__(self, data, target, num_hidden=64):
self.data = data
self.target = target
self._num_hidden = num_hidden
self._num_steps = int(self.target.get_shape()[1])
self._num_classes = int(self.target.get_shape()[2])
self._weight_and_bias() # create weight and bias tensors
self.prediction
self.error
self.optimize
#lazy_property
def prediction(self):
"""Defines the recurrent neural network prediction scheme."""
# Dynamic LSTM.
network = tf.contrib.rnn.BasicLSTMCell(self._num_hidden)
output, _ = tf.nn.dynamic_rnn(network, data, dtype=tf.float32)
# Flatten and apply same weights to all time steps.
output = tf.reshape(output, [-1, self._num_hidden])
prediction = tf.nn.softmax(tf.matmul(output, self.weight) + self.bias)
prediction = tf.reshape(prediction,
[-1, self._num_steps, self._num_classes])
return prediction
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
cross_entropy = -tf.reduce_sum(self.target * tf.log(self.prediction),
axis=[1, 2])
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
#lazy_property
def optimize(self):
"""Defines the optimization scheme."""
learning_rate = 0.003
optimizer = tf.train.RMSPropOptimizer(learning_rate)
return optimizer.minimize(self.cost)
#lazy_property
def error(self):
"""Defines a measure of prediction error."""
mistakes = tf.not_equal(tf.argmax(self.target, 2),
tf.argmax(self.prediction, 2))
return tf.reduce_mean(tf.cast(mistakes, tf.float32))
def _weight_and_bias(self):
"""Returns appropriately sized weight and bias tensors for the output layer."""
self.weight = tf.Variable(tf.truncated_normal(
[self._num_hidden, self._num_classes],
mean=0.0,
stddev=0.01,
dtype=tf.float32))
self.bias = tf.Variable(tf.constant(0.1, shape=[self._num_classes]))
Training
Here is my training process. The all_data class just holds my data and labels, and uses a batch generator class to spit out batches for training when I call all_data.train.next() and all_data.train_labels.next(). You can reproduce with any batch generation scheme you like, and I can add the code if you think it is relevant; I felt like this was getting too long as it is.
tf.reset_default_graph()
data = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_features])
target = tf.placeholder(tf.float32,
[None, all_data.num_steps, all_data.num_outputs])
model = MyNetwork(data, target, NUM_HIDDEN)
print('Training the model...')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print('Initialized.')
for epoch in range(3):
print('Epoch {} |'.format(epoch), end='', flush=True)
for step in range(all_data.train_size // BATCH_SIZE):
# Generate the next training batch and train.
d = all_data.train.next()
t = all_data.train_labels.next()
sess.run(model.optimize,
feed_dict={data: d, target: t})
# Update the user periodically.
if step % summary_frequency == 0:
print('.', end='', flush=True)
# Show training and validation error at the end of each epoch.
print('|', flush=True)
train_error = sess.run(model.error,
feed_dict={data: d, target: t})
valid_error = sess.run(model.error,
feed_dict={
data: all_data.valid,
target: all_data.valid_labels
})
print('Training error: {}%'.format(100 * train_error))
print('Validation error: {}%'.format(100 * valid_error))
# Check testing error after everything.
test_error = sess.run(model.error,
feed_dict={
data: all_data.test,
target: all_data.test_labels
})
print('Testing error after {} epochs: {}%'.format(epoch + 1, 100 * test_error))
For a simple example, I generated random data and labels, where data has shape [num_samples, num_steps, num_features], and each sample has a single label associated with the whole thing:
data = np.random.rand(5000, 1000, 2)
labels = np.random.randint(low=0, high=2, size=[5000])
I then converted my labels to one-hot vectors and tiled them so that the resulting labels tensor was the same size as the data tensor.
Results
No matter what I do, I get results like this:
Training the model...
Initialized.
Epoch 0 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 1 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch 2 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Testing error after 3 epochs: 49.000000953674316%
Where I have exactly the same error at every epoch. Even if my weights were randomly walking around this should change. For the example shown here, I used random data with random labels, so I do not expect much improvement, but I do expect some change, and I am getting the exact same results every epoch. When I do this with my actual data set, I get the same behavior.
Insight
I hesitate to include this in case it proves to be a red herring, but I believe that my optimizer is calculating cost function gradients of None. When I tried a different optimizer and attempted to clip the gradients, I went ahead and used tf.Print to output the gradients as well. The network crashed with an error that tf.Print could not handle None-type values.
Attempted fixes
I have tried the following things, and the problem persists in all cases:
Using different optimizers, e.g. AdamOptimizer with and without modifications to the gradients (clipping).
Adjusting batch sizes.
Using many more and many fewer hidden nodes.
Running for more epochs.
Initializing my weights with different values assigned to stddev.
Initializing my biases to zeros (using tf.zeros) and to different constants.
Using weights and biases that are defined within the prediction method and are not member variables of the class, and a _weight_and_bias method that is defined as a #staticmethod like in this gist.
Determining logits in the prediction function instead of softmax predictions, i.e. predictions = tf.matmul(output, self.weights) + self.bias, and then using tf.nn.softmax_cross_entropy_with_logits. This requires some reshaping because the method wants its labels and targets given with shape [batch_size, num_classes], so the cost method becomes:
(line added to get code to format...)
#lazy_property
def cost(self):
"""Defines the cost function for the network."""
targs = tf.reshape(self.target, [-1, self._num_classes])
logits = tf.reshape(self.predictions, [-1, self._num_classes])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=targs, logits=logits)
cross_entropy = tf.reduce_mean(cross_entropy)
return cross_entropy
Changing which size dimension I leave as None when I create my placeholders as suggested in this answer, which requires a bit of rewriting in the network definition. Basically setting size = [all_data.batch_size, -1, all_data.num_features] and size = [all_data.batch_size, -1, all_data.num_classes].
Using tf.contrib.rnn.DropoutWrapper in my network definition and passing a dropout value set to 0.5 in training and 1.0 in validation and testing.
The problem went away when I used
output = tf.contrib.layers.flatten(output)
logits = tf.contrib.layers.fully_connected(output, some_size, activation_fn=None)
instead of flattening my network output, defining weights, and performing the tf.matmul(output, weight) + bias manually. I then used logits (instead of predictions in the question) in my cost function with
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=target,
logits=logits)
If you want to get the network prediction, you will still need to do prediction = tf.nn.softmax(logits).
I have no idea why this helped, but the network would not train even on random made-up data until I made these changes.
I am following the tutorial at TF Slim. However at
loss = slim.losses.sum_of_squares(predictions, targets)
I seem to be getting AttributeError: 'module' object has no attribute 'sum_of_squares'. I have installed TF version 0.12head running on Ubuntu 16.04, CPU version. Complete code I am running is follows:
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time
from datasets import dataset_utils
# Main slim library
slim = tf.contrib.slim
def regression_model(inputs, is_training=True, scope="deep_regression"):
"""Creates the regression model.
Args:
inputs: A node that yields a `Tensor` of size [batch_size, dimensions].
is_training: Whether or not we're currently training the model.
scope: An optional variable_op scope for the model.
Returns:
predictions: 1-D `Tensor` of shape [batch_size] of responses.
end_points: A dict of end points representing the hidden layers.
"""
with tf.variable_scope(scope, 'deep_regression', [inputs]):
end_points = {}
# Set the default weight _regularizer and acvitation for each fully_connected layer.
with slim.arg_scope([slim.fully_connected],
activation_fn=tf.nn.relu,
weights_regularizer=slim.l2_regularizer(0.01)):
# Creates a fully connected layer from the inputs with 32 hidden units.
net = slim.fully_connected(inputs, 32, scope='fc1')
end_points['fc1'] = net
# Adds a dropout layer to prevent over-fitting.
net = slim.dropout(net, 0.8, is_training=is_training)
# Adds another fully connected layer with 16 hidden units.
net = slim.fully_connected(net, 16, scope='fc2')
end_points['fc2'] = net
# Creates a fully-connected layer with a single hidden unit. Note that the
# layer is made linear by setting activation_fn=None.
predictions = slim.fully_connected(net, 1, activation_fn=None, scope='prediction')
end_points['out'] = predictions
return predictions, end_points
with tf.Graph().as_default():
# Dummy placeholders for arbitrary number of 1d inputs and outputs
inputs = tf.placeholder(tf.float32, shape=(None, 1))
outputs = tf.placeholder(tf.float32, shape=(None, 1))
# Build model
predictions, end_points = regression_model(inputs)
# Print name and shape of each tensor.
print "Layers"
for k, v in end_points.iteritems():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
# Print name and shape of parameter nodes (values not yet initialized)
print "\n"
print "Parameters"
for v in slim.get_model_variables():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
def produce_batch(batch_size, noise=0.3):
xs = np.random.random(size=[batch_size, 1]) * 10
ys = np.sin(xs) + 5 + np.random.normal(size=[batch_size, 1], scale=noise)
return [xs.astype(np.float32), ys.astype(np.float32)]
x_train, y_train = produce_batch(200)
x_test, y_test = produce_batch(200)
plt.scatter(x_train, y_train)
def convert_data_to_tensors(x, y):
inputs = tf.constant(x)
inputs.set_shape([None, 1])
outputs = tf.constant(y)
outputs.set_shape([None, 1])
return inputs, outputs
# The following snippet trains the regression model using a sum_of_squares loss.
ckpt_dir = '/tmp/regression_model/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
inputs, targets = convert_data_to_tensors(x_train, y_train)
# Make the model.
predictions, nodes = regression_model(inputs, is_training=True)
# Add the loss function to the graph.
loss = slim.losses.sum_of_squares(predictions, targets)
# The total loss is the uers's loss plus any regularization losses.
total_loss = slim.losses.get_total_loss()
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.005)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training inside a session.
final_loss = slim.learning.train(
train_op,
logdir=ckpt_dir,
number_of_steps=5000,
save_summaries_secs=5,
log_every_n_steps=500)
print("Finished training. Last batch loss:", final_loss)
print("Checkpoint saved in %s" % ckpt_dir)
Apparently for some reason, it has been removed in the latest build as can be seen at GitHub Repo. I switched to loss = slim.losses.mean_squared_error(predictions, targets) which should serve the purpose I assume.