Why my ANN is not learning? - python

I'm doing electricity load forecasting using a simple feedforward neural network. Following is my code:
...
num_periods = 24
f_horizon = 48 #forecast horizon
...
#RNN designning
tf.reset_default_graph()
inputs = num_periods #input vector size
hidden = 100
output = num_periods #output vector size
learning_rate = 0.01
seed = 128
x = tf.placeholder(tf.float32, [None, inputs])
y = tf.placeholder(tf.float32, [None, output])
weights = {
'hidden': tf.Variable(tf.random_normal([inputs, hidden], seed=seed)),
'output': tf.Variable(tf.random_normal([hidden, output], seed=seed))
}
biases = {
'hidden': tf.Variable(tf.random_normal([1,hidden], seed=seed)),
'output': tf.Variable(tf.random_normal([1,output], seed=seed))
}
hidden_layer = tf.add(tf.matmul(x, weights['hidden']), biases['hidden'])
hidden_layer = tf.nn.relu(hidden_layer)
output_layer = tf.matmul(hidden_layer, weights['output']) + biases['output']
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output_layer, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.initialize_all_variables() #initialize all the variables
epochs = 1000 #number of iterations or training cycles, includes both the FeedFoward and Backpropogation
mape = []
...
for st in state.values():
print("State: ", st, end='\n')
with tf.Session() as sess:
init.run()
for ep in range(epochs):
sess.run([optimizer, cost], feed_dict={x: x_batches[st], y: y_batches[st]})
print("\n")
Here is what I'm getting the output for the NSW state:
As we can see that the cost is increasing continuously with epochs. Why is this happening ?

You are using the wrong loss, as forecasting electricity load sounds like a regression problem, while cross entropy is only for classification.
Something like mean squared error should work instead.

Related

How do I use TensorFlow Neural Network output

After running the code below I get values for accuracy and I can get the values for all the Ws and bs. My question is how do I use the output to classify things in the future? and how do I save the model?
#########################################
################ SETUP ##################
training_epochs = 500
n_neurons_in_h1 = 60
n_neurons_in_h2 = 60
learning_rate = 0.01
n_features = 3
n_classes = 3
X = tf.placeholder(tf.float32, [None, n_features], name='features')
Y = tf.placeholder(tf.float32, [None, n_classes], name='labels')
W1 = tf.Variable(tf.truncated_normal([n_features, n_neurons_in_h1], mean=0, stddev=1 / np.sqrt(n_features)), name='weights1')
b1 = tf.Variable(tf.truncated_normal([n_neurons_in_h1],mean=0, stddev=1 / np.sqrt(n_features)), name='biases1')
y1 = tf.nn.tanh((tf.matmul(X, W1)+b1), name='activationLayer1')
W2 = tf.Variable(tf.random_normal([n_neurons_in_h1, n_neurons_in_h2],mean=0,stddev=1/np.sqrt(n_features)),name='weights2')
b2 = tf.Variable(tf.random_normal([n_neurons_in_h2],mean=0,stddev=1/np.sqrt(n_features)),name='biases2')
y2 = tf.nn.sigmoid((tf.matmul(y1,W2)+b2),name='activationLayer2')
Wo = tf.Variable(tf.random_normal([n_neurons_in_h2, n_classes], mean=0, stddev=1/np.sqrt(n_features)), name='weightsOut')
bo = tf.Variable(tf.random_normal([n_classes], mean=0, stddev=1/np.sqrt(n_features)), name='biasesOut')
a = tf.nn.softmax((tf.matmul(y2, Wo) + bo), name='activationOutputLayer')
cross_entropy = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(a),reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(a, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="Accuracy")
#########################################
############### GET DATA ################
(Removed for sake of conciseness)
Data is in form [var1 ,var2, var3, label]. Is split 80% into training and 20% into testing.
#########################################
################ RUN ####################
# initialization of all variables
initial = tf.global_variables_initializer()
#creating a session
with tf.Session() as sess:
sess.run(initial)
# training loop over the number of epoches
batchsize=10
for epoch in range(training_epochs):
for i in range(len(tr_features)):
start=i
end=i+batchsize
x_batch=tr_features[start:end]
y_batch=tr_labels[start:end]
# feeding training data/examples
sess.run(train_step, feed_dict={X:x_batch , Y:y_batch})
i+=batchsize
# feeding testing data to determine model accuracy
y_pred = sess.run(tf.argmax(a, 1), feed_dict={X: ts_features})
y_true = sess.run(tf.argmax(ts_labels, 1))
acc = sess.run(accuracy, feed_dict={X: ts_features, Y: ts_labels})
# print accuracy for each epoch
print('epoch',epoch, acc)
print ('---------------')
print(y_pred, y_true)
You can use the tf.train.Saver() class to save your model (weights + atchitecture). The save method of this class will save everything, and the restore method will load the file created by the save method, so you can predict without having to retrain.
Look at the doc : Save and Restore - Tensorflow

What do I need to save and restore for LSTM model in Tensorflow?

I am new to Tensorflow and I am working for training with LSTM-RNN in Tensorflow.
I need to save the model so that I can restore and run with Test data again.
I am not sure what to save.
I need to save sess or I need to save pred
When I save sess, restore and test the Test data as
one_hot_predictions, accuracy, final_loss = sess.run(
[pred, accuracy, cost],
feed_dict={
x: X_test,
y: one_hot(y_test)
}
)
Then the error is unknown for pred.
Since I am new to Tensorflow, I am not sure what to save and what to restore to test with new data?
X_train = load_X(X_train_path)
X_test = load_X(X_test_path)
y_train = load_y(y_train_path)
y_test = load_y(y_test_path)
# proof that it actually works for the skeptical: replace labelled classes with random classes to train on
#for i in range(len(y_train)):
# y_train[i] = randint(0, 5)
# Input Data
training_data_count = len(X_train) # 4519 training series (with 50% overlap between each serie)
test_data_count = len(X_test) # 1197 test series
n_input = len(X_train[0][0]) # num input parameters per timestep
n_hidden = 34 # Hidden layer num of features
n_classes = 6
#updated for learning-rate decay
# calculated as: decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
decaying_learning_rate = True
learning_rate = 0.0025 #used if decaying_learning_rate set to False
init_learning_rate = 0.005
decay_rate = 0.96 #the base of the exponential in the decay
decay_steps = 100000 #used in decay every 60000 steps with a base of 0.96
global_step = tf.Variable(0, trainable=False)
lambda_loss_amount = 0.0015
training_iters = training_data_count *300 # Loop 300 times on the dataset, ie 300 epochs
batch_size = 512
display_iter = batch_size*8 # To show test set accuracy during training
#Utility functions for training:
def LSTM_RNN(_X, _weights, _biases):
# model architecture based on "guillaume-chevalier" and "aymericdamien" under the MIT license.
_X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size
_X = tf.reshape(_X, [-1, n_input])
# Rectifies Linear Unit activation function used
_X = tf.nn.relu(tf.matmul(_X, _weights['hidden']) + _biases['hidden'])
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(_X, n_steps, 0)
# Define two stacked LSTM cells (two recurrent layers deep) with tensorflow
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True)
lstm_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell_1, lstm_cell_2], state_is_tuple=True)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cells, _X, dtype=tf.float32)
# A single output is produced, in style of "many to one" classifier, refer to http://karpathy.github.io/2015/05/21/rnn-effectiveness/ for details
lstm_last_output = outputs[-1]
# Linear activation
return tf.matmul(lstm_last_output, _weights['out']) + _biases['out']
def extract_batch_size(_train, _labels, _unsampled, batch_size):
# Fetch a "batch_size" amount of data and labels from "(X|y)_train" data.
# Elements of each batch are chosen randomly, without replacement, from X_train with corresponding label from Y_train
# unsampled_indices keeps track of sampled data ensuring non-replacement. Resets when remaining datapoints < batch_size
shape = list(_train.shape)
shape[0] = batch_size
batch_s = np.empty(shape)
batch_labels = np.empty((batch_size,1))
for i in range(batch_size):
# Loop index
# index = random sample from _unsampled (indices)
index = random.choice(_unsampled)
batch_s[i] = _train[index]
batch_labels[i] = _labels[index]
_unsampled.remove(index)
return batch_s, batch_labels, _unsampled
def one_hot(y_):
# One hot encoding of the network outputs
# e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]
y_ = y_.reshape(len(y_))
n_values = int(np.max(y_)) + 1
return np.eye(n_values)[np.array(y_, dtype=np.int32)] # Returns FLOATS
# Graph input/output
x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Graph weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes], mean=1.0))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = LSTM_RNN(x, weights, biases)
# Loss, optimizer and evaluation
l2 = lambda_loss_amount * sum(
tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables()
) # L2 loss prevents this overkill neural network to overfit the data
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred)) + l2 # Softmax loss
if decaying_learning_rate:
learning_rate = tf.train.exponential_decay(init_learning_rate, global_step*batch_size, decay_steps, decay_rate, staircase=True)
#decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps) #exponentially decayed learning rate
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost,global_step=global_step) # Adam Optimizer
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#Train the network:
test_losses = []
test_accuracies = []
train_losses = []
train_accuracies = []
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
init = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
sess.run(init)
# Perform Training steps with "batch_size" amount of data at each loop.
# Elements of each batch are chosen randomly, without replacement, from X_train,
# restarting when remaining datapoints < batch_size
step = 1
time_start = time.time()
unsampled_indices = range(0,len(X_train))
while step * batch_size <= training_iters:
#print (sess.run(learning_rate)) #decaying learning rate
#print (sess.run(global_step)) # global number of iterations
if len(unsampled_indices) < batch_size:
unsampled_indices = range(0,len(X_train))
batch_xs, raw_labels, unsampled_indicies = extract_batch_size(X_train, y_train, unsampled_indices, batch_size)
batch_ys = one_hot(raw_labels)
# check that encoded output is same length as num_classes, if not, pad it
if len(batch_ys[0]) < n_classes:
temp_ys = np.zeros((batch_size, n_classes))
temp_ys[:batch_ys.shape[0],:batch_ys.shape[1]] = batch_ys
batch_ys = temp_ys
# Fit training using batch data
_, loss, acc = sess.run(
[optimizer, cost, accuracy],
feed_dict={
x: batch_xs,
y: batch_ys
}
)
train_losses.append(loss)
train_accuracies.append(acc)
# Evaluate network only at some steps for faster training:
if (step*batch_size % display_iter == 0) or (step == 1) or (step * batch_size > training_iters):
# To not spam console, show training accuracy/loss in this "if"
print("Iter #" + str(step*batch_size) + \
": Learning rate = " + "{:.6f}".format(sess.run(learning_rate)) + \
": Batch Loss = " + "{:.6f}".format(loss) + \
", Accuracy = {}".format(acc))
# Evaluation on the test set (no learning made here - just evaluation for diagnosis)
loss, acc = sess.run(
[cost, accuracy],
feed_dict={
x: X_test,
y: one_hot(y_test)
}
)
test_losses.append(loss)
test_accuracies.append(acc)
print("PERFORMANCE ON TEST SET: " + \
"Batch Loss = {}".format(loss) + \
", Accuracy = {}".format(acc))
step += 1
print("Optimization Finished!")
EDIT:
I can save the model as
print("Optimization Finished!")
save_path = saver.save(sess, "/home/test/venv/TFCodes/HumanActivityRecognition/model.ckpt")
Then I tried to restore and ok, I can restore. But I don't know how to test with test data.
My restore code is
X_test = load_X(X_test_path)
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/home/nyan/venv/TFCodes/HumanActivityRecognition/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
print("Model restored.")
all_vars = tf.trainable_variables()
for i in range(len(all_vars)):
name = all_vars[i].name
values = sess.run(name)
print('name', name)
#print('value', values)
print('shape',values.shape)
result = sess.run(prediction, feed_dict={X: X_test})
print("loss:", l, "prediction:", result, "true Y:", y_data)
# print char using dic
result_str = [idx2char[c] for c in np.squeeze(res
ult)]
print("\tPrediction str:", ''.join(result_str))
The output is
Model restored.
('name', u'Variable_1:0')
('shape', (36, 34))
('name', u'Variable_2:0')
('shape', (34, 6))
('name', u'Variable_3:0')
('shape', (34,))
('name', u'Variable_4:0')
('shape', (6,))
('name', u'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0')
('shape', (68, 136))
('name', u'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0')
('shape', (136,))
('name', u'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0')
('shape', (68, 136))
('name', u'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0')
('shape', (136,))
Traceback (most recent call last):
File "restore.py", line 74, in <module>
result = sess.run(prediction, feed_dict={X: X_test})
NameError: name 'prediction' is not defined
How to test the model restored?
What I find the easiest is the tf.saved_model.simple_save() function. It saves the computation graph you use, the weights, the input and the output in a .pb model and the weight variables.
You can later restore this model or even put it on ml-engine or use tf serving.
An example code snippit with a keras model and applied on YOLO:
inputs = {"image_bytes": model.input,
"shape": image_shape}
outputs = {"boxes": boxes,
"scores": scores,
"classes": classes}
tf.saved_model.simple_save(sess, "saved_model/", inputs, outputs)

Can someone help me with writing a piece of code in Python regarding neural networks and the MNIST dataset?

For a schoolproject I have analysed the code beneath, but I want to put a feature to it: I want to give the neural network, when its done with training, an image of a handwritten digit from MNIST (lets say an 8) so that it can try and define the number 8. Because Im totally new with coding and machine learning, although I really like it and want to learn more, I could not figure it out myself how such a code should look like. Can someone help me?
The code is written in Python:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
learning_rate = 0.0001
batch_size = 100
update_step = 10
layer_1_nodes = 500
layer_2_nodes = 500
layer_3_nodes = 500
output_nodes = 10
network_input = tf.placeholder(tf.float32, [None, 784])
target_output = tf.placeholder(tf.float32, [None, output_nodes])
layer_1 = tf.Variable(tf.random_normal([784, layer_1_nodes]))
layer_1_bias = tf.Variable(tf.random_normal([layer_1_nodes]))
layer_2 = tf.Variable(tf.random_normal([layer_1_nodes, layer_2_nodes]))
layer_2_bias = tf.Variable(tf.random_normal([layer_2_nodes]))
layer_3 = tf.Variable(tf.random_normal([layer_2_nodes, layer_3_nodes]))
layer_3_bias = tf.Variable(tf.random_normal([layer_3_nodes]))
out_layer = tf.Variable(tf.random_normal([layer_3_nodes, output_nodes]))
out_layer_bias = tf.Variable(tf.random_normal([output_nodes]))
l1_output = tf.nn.relu(tf.matmul(network_input, layer_1) + layer_1_bias)
l2_output = tf.nn.relu(tf.matmul(l1_output, layer_2) + layer_2_bias)
l3_output = tf.nn.relu(tf.matmul(l2_output, layer_3) + layer_3_bias)
ntwk_output_1 = tf.matmul(l3_output, out_layer) + out_layer_bias
ntwk_output_2 = tf.nn.softmax(ntwk_output_1)
cf =
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=ntwk_output_1,
labels=target_output))
ts = tf.train.GradientDescentOptimizer(learning_rate).minimize(cf)
cp = tf.equal(tf.argmax(ntwk_output_2, 1), tf.argmax(target_output, 1))
acc = tf.reduce_mean(tf.cast(cp, tf.float32))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_epochs = 10
for epoch in range(num_epochs):
total_cost = 0
for _ in range(int(mnist.train.num_examples / batch_size)):
batch_x, batch_y = mnist.train.next_batch(batch_size)
t, c = sess.run([ts, cf], feed_dict={network_input: batch_x, target_output: batch_y})
total_cost += c
print('Epoch', epoch, 'completed out of', num_epochs, 'loss:', total_cost)
print('Accuracy:', acc.eval({network_input: mnist.test.images,target_output: mnist.test.labels}))
with tf.Session() as sess:
number_prediction = tf.argmax(ntwk_output_2 , 1)
number_prediction = sess.run(number_prediction , feed_dict={network_input :
yourImageNdArray } )
print("your prediction : ",number_prediction)
What you need to know :
ntwk_ouput_2 is the ouput of the neural net which give you 10 probabilities - - you take the greatest one with tf.argmax ( tf argmax does not return the max value but the position of it )
sess.run is responsible of running your tensorflow graph and evaluating the tensor given in the first parameter
you need also to feed your network with the image you want to predict in feed_dict
Hope that helps!
The problem is you are not saving your model at any point of time during the training process.
You can do this during training:
ckpt_path = path_to_save_model
saver = tf.train.saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_epochs = 10
for epoch in range(num_epochs):
total_cost = 0
for _ in range(int(mnist.train.num_examples / batch_size)):
batch_x, batch_y =
mnist.train.next_batch(batch_size)
t, c = sess.run([ts, cf], feed_dict=
{network_input: batch_x, target_output: batch_y})
total_cost += c
print('Epoch', epoch, 'completed out of',
num_epochs, 'loss:', total_cost)
if (epoch+1)%10 == 0:
saver.saver(sess, ckpt_path)
print('Accuracy:', acc.eval({network_input:
mnist.test.images,target_output:
mnist.test.labels}))
For running the trained model you can do the following:
with tf.Session() as sess:
meta_graph = [ i for i in os.listdir(ckpt_path) if i.endswith('.meta')]
tf.train.import_meta_graph(os.path.join(checkpoint_path, meta_graph_path[0]))
saver = tf.train.Saver()
saver.restore(sess, ckpt_path)
#img = read your image here
pred = sess.run(ntwk_output_2, feed_dict={network_input: img}
output = np.argmax(pred)
For more reference you can follow this link

tensorflow rnn nan error

I want to train an RNN model to connect an article and an image. The input and the output are two arrays.
I define the parameters of RNN as follow:
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 128
n_steps = 168 # timesteps
n_hidden = 512 # hidden layer num of features
output = 200
the image is 128*168 and the article is 200
cost = tf.reduce_mean(pow(pred-y,2)/2)
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
For the end result, I want to train a network to transform an image to an article. However, when I try to train the model, the cost is returned as NaN.
Here is the code:
# coding=utf-8
from __future__ import print_function
from tensorflow.contrib import rnn
import scipy.io as scio
import tensorflow as tf
import numpy as np
import os
TextPath = 'F://matlab_code//readtxt//ImageTextVector.mat';
ImageDirPath = 'F://matlab_code//CVPR10-LLC//features//1';
Text = scio.loadmat(TextPath)
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 128 #
n_steps = 168 # timesteps
n_hidden = 512 # hidden layer num of features
output = 200 #
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, output])
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, output]))
}
biases = {
'out': tf.Variable(tf.random_normal([output]))
}
def RNN(x, weights, biases):
lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
pred = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(pow(pred-y,2)/2)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.global_variables_initializer()
train_count=0;
with tf.Session() as sess:
sess.run(init)
step = 0
while step* batch_size < training_iters:
iter = step*batch_size
batch_x = []
batch_y = []
while iter < (step+1)*batch_size:
ImagePath = ImageDirPath + '//' + Text['X'][train_count][0][0] +'.mat'
if os.path.exists(ImagePath):
batch_xx=[]
batch_yy=[]
Image = scio.loadmat(ImagePath)
i=0
while i<21504 :
batch_xx.append(Image['fea'][i][0])
i=i+1
batch_yy = Text['X'][train_count][1][0]
batch_xx = np.array(batch_xx)
batch_x=np.hstack((batch_x,batch_xx))
batch_y=np.hstack((batch_y,batch_yy))
iter = iter+1
train_count=train_count+1
batch_x = batch_x.reshape((batch_size,n_steps, n_input))
batch_y = batch_y.reshape((batch_size,output))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print("Iter " + str(step* batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) )
step += 1
print("Optimization Finished!")
when you pass tensor including nan values to lstm, the value in the cell of lstm's will be "forced" to nan because the numerical operation between number and nan. Check whether your data have nan value or just use numpy.nan_to_num to fill your nan data.

Strange NaN values for loss function (MLP) in TensorFlow

I hope you can help me. I'm implementing a small multilayer perceptron using TensorFlow and a few tutorials I found on the internet. The problem is that the net is able to learn something, and by this I mean that I am able to somehow optimize the value of the training error and get a decent accuracy, and that's what I was aiming for. However, I am recording with Tensorboard some strange NaN values for the loss function. Quite a lot actually. Here you can see my latest Tensorboard recording of the loss function output. Please all those triangles followed by discontinuities - those are the NaN values, note also that the general trend of the function is what you would expect it to be.
Tensorboard report
I thought that a high learning rate could be the problem, or maybe a net that's too deep, causing the gradients to explode, so I lowered the learning rate and used a single hidden layer (this is the configuration of the image above, and the code below). Nothing changed, I just caused the learning process to be slower.
Tensorflow Code
import tensorflow as tf
import numpy as np
import scipy.io, sys, time
from numpy import genfromtxt
from random import shuffle
#shuffles two related lists #TODO check that the two lists have same size
def shuffle_examples(examples, labels):
examples_shuffled = []
labels_shuffled = []
indexes = list(range(len(examples)))
shuffle(indexes)
for i in indexes:
examples_shuffled.append(examples[i])
labels_shuffled.append(labels[i])
examples_shuffled = np.asarray(examples_shuffled)
labels_shuffled = np.asarray(labels_shuffled)
return examples_shuffled, labels_shuffled
# Import and transform dataset
dataset = scipy.io.mmread(sys.argv[1])
dataset = dataset.astype(np.float32)
all_labels = genfromtxt('oh_labels.csv', delimiter=',')
num_examples = all_labels.shape[0]
dataset, all_labels = shuffle_examples(dataset, all_labels)
# Split dataset into training (66%) and test (33%) set
training_set_size = 2000
training_set = dataset[0:training_set_size]
training_labels = all_labels[0:training_set_size]
test_set = dataset[training_set_size:num_examples]
test_labels = all_labels[training_set_size:num_examples]
test_set, test_labels = shuffle_examples(test_set, test_labels)
# Parameters
learning_rate = 0.0001
training_epochs = 150
mini_batch_size = 100
total_batch = int(num_examples/mini_batch_size)
# Network Parameters
n_hidden_1 = 50 # 1st hidden layer of neurons
#n_hidden_2 = 16 # 2nd hidden layer of neurons
n_input = int(sys.argv[2]) # number of features after LSA
n_classes = 2;
# Tensorflow Graph input
with tf.name_scope("input"):
x = tf.placeholder(np.float32, shape=[None, n_input], name="x-data")
y = tf.placeholder(np.float32, shape=[None, n_classes], name="y-labels")
print("Creating model.")
# Create model
def multilayer_perceptron(x, weights, biases):
with tf.name_scope("h_layer_1"):
# First hidden layer with SIGMOID activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
#with tf.name_scope("h_layer_2"):
# Second hidden layer with SIGMOID activation
#layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
#layer_2 = tf.nn.sigmoid(layer_2)
with tf.name_scope("out_layer"):
# Output layer with SIGMOID activation
out_layer = tf.add(tf.matmul(layer_1, weights['out']), biases['bout'])
out_layer = tf.nn.sigmoid(out_layer)
return out_layer
# Layer weights
with tf.name_scope("weights"):
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], stddev=0.01, dtype=np.float32)),
#'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=0.05, dtype=np.float32)),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes], stddev=0.01, dtype=np.float32))
}
# Layer biases
with tf.name_scope("biases"):
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1], dtype=np.float32)),
#'b2': tf.Variable(tf.random_normal([n_hidden_2], dtype=np.float32)),
'bout': tf.Variable(tf.random_normal([n_classes], dtype=np.float32))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
with tf.name_scope("loss"):
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
with tf.name_scope("adam"):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Define summaries
tf.scalar_summary("loss", cost)
summary_op = tf.merge_all_summaries()
print("Model ready.")
# Launch the graph
with tf.Session() as sess:
sess.run(init)
board_path = sys.argv[3]+time.strftime("%Y%m%d%H%M%S")+"/"
writer = tf.train.SummaryWriter(board_path, graph=tf.get_default_graph())
print("Starting Training.")
for epoch in range(training_epochs):
training_set, training_labels = shuffle_examples(training_set, training_labels)
for i in range(total_batch):
# example loading
minibatch_x = training_set[i*mini_batch_size:(i+1)*mini_batch_size]
minibatch_y = training_labels[i*mini_batch_size:(i+1)*mini_batch_size]
# Run optimization op (backprop) and cost op
_, summary = sess.run([optimizer, summary_op], feed_dict={x: minibatch_x, y: minibatch_y})
# Write log
writer.add_summary(summary, epoch*total_batch+i)
print("Optimization Finished!")
# Test model
test_error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
accuracy = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(accuracy, np.float32))
test_error, accuracy = sess.run([test_error, accuracy], feed_dict={x: test_set, y: test_labels})
print("Test Error: " + test_error.__str__() + "; Accuracy: " + accuracy.__str__())
print("Tensorboard path: " + board_path)
I'll post the solution here just in case someone gets stuck in a similar way. If you see that plot very carefully, all of the NaN values (the triangles) come on a regular basis, like if at the end of every loop something causes the output of the loss function to just go NaN.
The problem is that, at every loop, I was giving a mini batch of "empty" examples. The problem lies in how I declared my inner training loop:
for i in range(total_batch):
Now what we'd like here is to have Tensorflow go through the entire training set, one minibatch at a time. So let's look at how total_batch was declared:
total_batch = int(num_examples / mini_batch_size)
That is not quite what we'd want to do - as we want to consider the training set only. So changing this line to:
total_batch = int(training_set_size / mini_batch_size)
Fixed the problem.
It is to be noted that Tensorflow seemed to ignore those "empty" batches, computing NaN for the loss but not updating the gradients - that's why the trend of the loss was one of a net that's learning something.

Categories