I am trying to write a two layer neural network to train a class labeler. The input to the network is a 150-feature list of about 1000 examples; all features on all examples have been L2 normalized.
I only have two outputs, and they should be disjoint--I am just attempting to predict whether the example is a one or a zero.
My code is relatively simple; I am feeding the input data into the hidden layer, and then the hidden layer into the output. As I really just want to see this working in action, I am training on the entire data set with each step.
My code is below. Based on the other NN implementations I have referred to, I believe that the performance of this network should be improving over time. However, regardless of the number of epochs I set, I am getting back an accuracy of about ~20%. The accuracy is not changing when the number of steps are changed, so I don't believe that my weights and biases are being updated.
Is there something obvious I am missing with my model? Thanks!
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.01
epochs = 200
n_input = 150
n_hidden = 75
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden]))
b1 = tf.Variable(tf.truncated_normal([n_output]))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden]))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output]))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
model = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run(model,feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
I changed your optimizer to AdamOptimizer (in many cases it performs better than GradientDescentOptimizer).
I also played a bit with the parameters. In particular, I took smaller std for your variable initialization, decreased learning rate (as your loss was unstable and "jumped around") and increased epochs (as I noticed that your loss continues to decrease).
I also reduced the size of the hidden layer. It is harder to train networks with large hidden layer when you don't have that much data.
Regarding your loss, it is better to apply tf.reduce_mean on it so that loss would be a number. In addition, following the answer of ml4294, I used softmax instead of sigmoid, so the loss looks like:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y))
The code below achieves accuracy of around 99.9% on the training data:
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
inputs = np.random.normal(size=[1000,150]).astype('float32')*1.5
label = np.round(np.random.uniform(low=0,high=1,size=[1000,1])*0.8)
reverse_label = 1-label
labels = np.append(label,reverse_label,1)
# parameters
learn_rate = 0.002
epochs = 400
n_input = 150
n_hidden = 60
n_output = 2
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden],stddev=0.2,seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output],stddev=0.2,seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input,n_hidden],stddev=0.2,seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden,n_output],stddev=0.2,seed=0))
# step function
def returnPred(x,w0,w1,b0,b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h #return the first response vector from the
y_ = returnPred(x,w0,w1,b0,b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_,labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) #initialize graph
for step in range(0,epochs):
sess.run([model,loss],feed_dict={x: inputs, y: labels }) #train model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
Just a suggestion in addition to the answer provided by Miriam Farber:
You use a multi-dimensional output label ([0., 1.]) for the classification. I suggest to use the softmax cross entropy tf.nn.softmax_cross_entropy_with_logits() instead of the sigmoid cross entropy, since you assume the outputs to be disjoint softmax on Wikipedia. I achieved much faster convergence with this small modification.
This should also improve your performance once you decide to increase your output dimensionality from 2 to a higher number.
I guess you have some problem here:
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y) # calculate loss between prediction and actual
It should look smth like that:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_,labels=y))
Did't look at you code much, so if this would't work out you can check udacity deep learning course or forum they have good samples of that are you trying to do.
GL
Related
Problem
I'm trying to classify some 64x64 images as a black box exercise. The NN I have written doesn't change my weights. First time writing something like this, the same code, but on MNIST letters input works just fine, but on this code it does not train like it should:
import tensorflow as tf
import numpy as np
path = ""
# x is a holder for the 64x64 image
x = tf.placeholder(tf.float32, shape=[None, 4096])
# y_ is a 1 element vector, containing the predicted probability of the label
y_ = tf.placeholder(tf.float32, [None, 1])
# define weights and balances
W = tf.Variable(tf.zeros([4096, 1]))
b = tf.Variable(tf.zeros([1]))
# define our model
y = tf.nn.softmax(tf.matmul(x, W) + b)
# loss is cross entropy
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
# each training step in gradient decent we want to minimize cross entropy
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
train_labels = np.reshape(np.genfromtxt(path + "train_labels.csv", delimiter=',', skip_header=1), (14999, 1))
train_data = np.genfromtxt(path + "train_samples.csv", delimiter=',', skip_header=1)
# perform 150 training steps with each taking 100 train data
for i in range(0, 15000, 100):
sess.run(train_step, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]})
if i % 500 == 0:
print(sess.run(cross_entropy, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]}))
print(sess.run(b), sess.run(W))
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.close()
How do I solve this problem?
The key to the problem is that the class number of you output y_ and y is 1.You should adopt one-hot mode when you use tf.nn.softmax_cross_entropy_with_logits on classification problems in tensorflow. tf.nn.softmax_cross_entropy_with_logits will first compute tf.nn.softmax. When your class number is 1, your results are all the same. For example:
import tensorflow as tf
y = tf.constant([[1],[0],[1]],dtype=tf.float32)
y_ = tf.constant([[1],[2],[3]],dtype=tf.float32)
softmax_var = tf.nn.softmax(logits=y_)
cross_entropy = tf.multiply(y, tf.log(softmax_var))
errors = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
with tf.Session() as sess:
print(sess.run(softmax_var))
print(sess.run(cross_entropy))
print(sess.run(errors))
[[1.]
[1.]
[1.]]
[[0.]
[0.]
[0.]]
[0. 0. 0.]
This means that no matter what your output y_, your loss will be zero. So your weights and bias haven't been updated.
The solution is to modify the class number of y_ and y.
I suppose your class number is n.
First approch:You can change data to one-hot before feed data.Then use the following code.
y_ = tf.placeholder(tf.float32, [None, n])
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
Second approch:change data to one-hot after feed data.
y_ = tf.placeholder(tf.int32, [None, 1])
y_ = tf.one_hot(y_,n) # your dtype of y_ need to be tf.int32
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
All your initial weights are zeros. When you have that way, the NN doesn't learn well. You need to initialize all the initial weights with random values.
See Why should weights of Neural Networks be initialized to random numbers?
"Why Not Set Weights to Zero?
We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.
In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value.
"
See
https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/
I am new to tensorflow and I am trying to implement a simple feed-forward network for regression, just for learning purposes. The complete executable code is as follows.
The regression mean squared error is around 6, which is quite large. It is a little unexpected because the function to regress is linear and simple 2*x+y, and I expect a better performance.
I am asking for help to check if I did anything wrong in the code. I carefully checked the matrix dimensions so that should be good, but it is possible that I misunderstand something so the network or the session is not properly configured (like, should I run the training session multiple times, instead of just one time (the code below enclosed by #TRAINING#)? I see in some examples they input data piece by piece, and run the training progressively. I run the training just one time and input all data).
If the code is good, maybe this is a modeling issue, but I really don't expect to use a complicated network for such a simple regression.
import tensorflow as tf
import numpy as np
from sklearn.metrics import mean_squared_error
# inputs are points from a 100x100 grid in domain [-2,2]x[-2,2], total 10000 points
lsp = np.linspace(-2,2,100)
gridx,gridy = np.meshgrid(lsp,lsp)
inputs = np.dstack((gridx,gridy))
inputs = inputs.reshape(-1,inputs.shape[-1]) # reshpaes the grid into a 10000x2 matrix
feature_size = inputs.shape[1] # feature_size is 2, features are the 2D coordinates of each point
input_size = inputs.shape[0] # input_size is 10000
# a simple function f(x)=2*x[0]+x[1] to regress
f = lambda x: 2 * x[0] + x[1]
label_size = 1
labels = f(inputs.transpose()).reshape(-1,1) # reshapes labels as a column vector
ph_inputs = tf.placeholder(tf.float32, shape=(None, feature_size), name='inputs')
ph_labels = tf.placeholder(tf.float32, shape=(None, label_size), name='labels')
# just one hidden layer with 16 units
hid1_size = 16
w1 = tf.Variable(tf.random_normal([hid1_size, feature_size], stddev=0.01), name='w1')
b1 = tf.Variable(tf.random_normal([hid1_size, label_size]), name='b1')
y1 = tf.nn.relu(tf.add(tf.matmul(w1, tf.transpose(ph_inputs)), b1))
# the output layer
wo = tf.Variable(tf.random_normal([label_size, hid1_size], stddev=0.01), name='wo')
bo = tf.Variable(tf.random_normal([label_size, label_size]), name='bo')
yo = tf.transpose(tf.add(tf.matmul(wo, y1), bo))
# defines optimizer and predictor
lr = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss = tf.losses.mean_squared_error(ph_labels,yo)
optimizer = tf.train.GradientDescentOptimizer(lr).minimize(loss)
predictor = tf.identity(yo)
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
_, c = sess.run([optimizer, loss], feed_dict={lr:0.05, ph_inputs: inputs, ph_labels: labels})
# TRAINING
# gets the regression results
predictions = np.zeros((input_size,1))
for i in range(input_size):
predictions[i] = sess.run(predictor, feed_dict={ph_inputs: inputs[i, None]}).squeeze()
# prints regression MSE
print(mean_squared_error(predictions, labels))
You're right, you understood the problem by yourself.
The problem is, in fact, that you're running the optimization step only one time. Hence you're doing one single update step of your network parameter and therefore the cost won't decrease.
I just changed the training session of your code in order to make it work as expected (100 training steps):
# TRAINING
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
_, c = sess.run(
[optimizer, loss],
feed_dict={
lr: 0.05,
ph_inputs: inputs,
ph_labels: labels
})
print("Train step {} loss value {}".format(i, c))
# TRAINING
and at the end of the training step I go:
Train step 99 loss value 0.04462708160281181
0.044106700712455045
My input data is as follows:
AT V AP RH PE
14.96 41.76 1024.07 73.17 463.26
25.18 62.96 1020.04 59.08 444.37
5.11 39.4 1012.16 92.14 488.56
20.86 57.32 1010.24 76.64 446.48
10.82 37.5 1009.23 96.62 473.9
26.27 59.44 1012.23 58.77 443.67
15.89 43.96 1014.02 75.24 467.35
9.48 44.71 1019.12 66.43 478.42
14.64 45 1021.78 41.25 475.98
....................................
I am basically working on Python using Tensorflow Library.
As of now,I have a linear model,which is working fine for 4 inputs and 1 output.This is basically a regression problem.
For e.g: After training my neural network with sufficient data(say if the size of data is some 10000), then while training my neural network,if I am passing the values 45,30,25,32,as inputs , it is returning the value 46 as Output.
I basically have two queries:
As of now, in my code, I am using the parameters
training_epochs , learning_rate etc. I am as of now giving the
value of training_epochs as 10000.So, when I am testing my neural
network by passing four input values, I am getting the output as
some 471.25, while I expect it to be 460.But if I am giving the
value of training_epochs as 20000, instead of 10000, I am getting
my output value as 120.5, which is not at all close when compared to
the actual value "460".
Can you please explain, how can one chose the values of training_epochs and learning_rate(or any other parameter values) in my code, so that I can get good accuracy.
Now, the second issue is, my neural network as of now is working
only for linear data as well as only for 1 output. If I want to have
3 inputs and 2 outputs and also a non-linear model, what are the
possible changes I can make in my code?
I am posting my code below:
import tensorflow as tf
import numpy as np
import pandas as pd
#import matplotlib.pyplot as plt
rng = np.random
# In[180]:
# Parameters
learning_rate = 0.01
training_epochs = 10000
display_step = 1000
# In[171]:
# Read data from CSV
df = pd.read_csv("H:\MiniThessis\Sample.csv")
# In[173]:
# Seperating out dependent & independent variable
train_x = df[['AT','V','AP','RH']]
train_y = df[['PE']]
trainx = train_x.as_matrix().astype(np.float32)
trainy = train_y.as_matrix().astype(np.float32)
# In[174]:
n_input = 4
n_classes = 1
n_hidden_1 = 5
n_samples = 9569
# tf Graph Input
#Inserts a placeholder for a tensor that will be always fed.
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Set model weights
W_h1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
W_out = tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
b_h1 = tf.Variable(tf.random_normal([n_hidden_1]))
b_out = tf.Variable(tf.random_normal([n_classes]))
# In[175]:
# Construct a linear model
layer_1 = tf.matmul(x, W_h1) + b_h1
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, W_out) + b_out
# In[176]:
# Mean squared error
cost = tf.reduce_sum(tf.pow(out_layer-y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# In[177]:
# Initializing the variables
init = tf.global_variables_initializer()
# In[181]:
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
_, c = sess.run([optimizer, cost], feed_dict={x: trainx,y: trainy})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={x: trainx,y: trainy})
print(training_cost)
correct_prediction = tf.equal(tf.argmax(out_layer, 1), tf.argmax(y, 1))
best = sess.run([out_layer], feed_dict=
{x:np.array([[14.96,41.76,1024.07,73.17]])})
print(correct_prediction)
print(best)
1.you can adjust these following lines;
# In general baises are either initialized as zeros or not zero constant, but not Gaussian
b_h1 = tf.Variable(tf.zeros([n_hidden_1]))
b_out = tf.Variable(tf.zeros([n_classes]))
# MSE error
cost = tf.reduce_mean(tf.pow(out_layer-y, 2))/(2*n_samples)
Also, Feed the data as mini batches; as the optimizer you are using is tuned for minibatch optimization; feeding the data as a whole doesn't result in optimal performance.
2.
for multiple ouputs you need to change only the n_classes and the cost fucntion (tf.nn.softmax_cross_entropy_with_logits). Also the model you defined here isn't linear; as you are using the non linear activation function tf.nn.relu.
Above picture is the distributions of a binary output from a Neural Network that I wrote. (Y-Axis is percentage of Yes output, X-Axis is iteration)
I trained a NN on a binary classification dataset that has labels 0-No, 1-Yes
However I try to change the parameters of the NN (tried different layer count, layer size, activation function) the NN output oscillates from outputting all Zeros or all Ones.
I am using tensorflow to build the NN with this as the cost function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
I am 90% sure that the NN is sound in term of architecture as it is the same one I used for the MNIST dataset. Also have tried many different datasets and it seems to be able to classify datasets well.
The training dataset contains 58% of No labels and 42% of Yes labels.
I have a suspicion that the feature inputted into the NN is uninformative to the NN to make a decision hence why defaulting to one answer is better for it.
I would like to know what kind of Cost function would better penalize the current behavior of the network?
Code:
def neural_network_model(data):
w1 = tf.Variable(tf.random_normal([n_input, n_nodes_hl1]), name="w1", collections=["weights", tf.GraphKeys.GLOBAL_VARIABLES])
b1 = tf.Variable(tf.random_normal([n_nodes_hl1]), name="b1", collections=["bias", tf.GraphKeys.GLOBAL_VARIABLES])
w2 = tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2]), name="w2", collections=["weights", tf.GraphKeys.GLOBAL_VARIABLES])
b2 = tf.Variable(tf.random_normal([n_nodes_hl2]), name="b2", collections=["bias", tf.GraphKeys.GLOBAL_VARIABLES])
wout = tf.Variable(tf.random_normal([n_nodes_hl2, n_output]), name="wout", collections=["weights", tf.GraphKeys.GLOBAL_VARIABLES])
bout = tf.Variable(tf.random_normal([n_output]), name="bout", collections=["bias", tf.GraphKeys.GLOBAL_VARIABLES])
l1 = tf.add(tf.matmul(data, w1), b1)
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1, w2), b2)
l2 = tf.nn.relu(l2)
output = tf.matmul(l2, wout) + bout
return output
n_input = len(features[1, :])
n_nodes_hl1 = 2000
n_nodes_hl2 = 2000
n_output = 2
batch_size = 100
iterations = 10000000
x = tf.placeholder("float", name="x")
y = tf.placeholder("float", name="y")
prediction = neural_network_model(x)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
optimizer = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for iteration in range(iterations):
iter_x, iter_y = nextbatch(features, label, batch_size, iteration)
_, c = sess.run([optimizer, cost], feed_dict={x: iter_x, y: iter_y})
if iteration % 500 == 0:
print("Iter:", iteration, "Cost:", c)
print("Count:", correct.eval({x: val_features, y:val_label})/len(val_label))
EDIT:
After shuffling the time-series input as suggested by #FlorentinHennecker, the oscillation seems to be reduced.
I hope you can help me. I'm implementing a small multilayer perceptron using TensorFlow and a few tutorials I found on the internet. The problem is that the net is able to learn something, and by this I mean that I am able to somehow optimize the value of the training error and get a decent accuracy, and that's what I was aiming for. However, I am recording with Tensorboard some strange NaN values for the loss function. Quite a lot actually. Here you can see my latest Tensorboard recording of the loss function output. Please all those triangles followed by discontinuities - those are the NaN values, note also that the general trend of the function is what you would expect it to be.
Tensorboard report
I thought that a high learning rate could be the problem, or maybe a net that's too deep, causing the gradients to explode, so I lowered the learning rate and used a single hidden layer (this is the configuration of the image above, and the code below). Nothing changed, I just caused the learning process to be slower.
Tensorflow Code
import tensorflow as tf
import numpy as np
import scipy.io, sys, time
from numpy import genfromtxt
from random import shuffle
#shuffles two related lists #TODO check that the two lists have same size
def shuffle_examples(examples, labels):
examples_shuffled = []
labels_shuffled = []
indexes = list(range(len(examples)))
shuffle(indexes)
for i in indexes:
examples_shuffled.append(examples[i])
labels_shuffled.append(labels[i])
examples_shuffled = np.asarray(examples_shuffled)
labels_shuffled = np.asarray(labels_shuffled)
return examples_shuffled, labels_shuffled
# Import and transform dataset
dataset = scipy.io.mmread(sys.argv[1])
dataset = dataset.astype(np.float32)
all_labels = genfromtxt('oh_labels.csv', delimiter=',')
num_examples = all_labels.shape[0]
dataset, all_labels = shuffle_examples(dataset, all_labels)
# Split dataset into training (66%) and test (33%) set
training_set_size = 2000
training_set = dataset[0:training_set_size]
training_labels = all_labels[0:training_set_size]
test_set = dataset[training_set_size:num_examples]
test_labels = all_labels[training_set_size:num_examples]
test_set, test_labels = shuffle_examples(test_set, test_labels)
# Parameters
learning_rate = 0.0001
training_epochs = 150
mini_batch_size = 100
total_batch = int(num_examples/mini_batch_size)
# Network Parameters
n_hidden_1 = 50 # 1st hidden layer of neurons
#n_hidden_2 = 16 # 2nd hidden layer of neurons
n_input = int(sys.argv[2]) # number of features after LSA
n_classes = 2;
# Tensorflow Graph input
with tf.name_scope("input"):
x = tf.placeholder(np.float32, shape=[None, n_input], name="x-data")
y = tf.placeholder(np.float32, shape=[None, n_classes], name="y-labels")
print("Creating model.")
# Create model
def multilayer_perceptron(x, weights, biases):
with tf.name_scope("h_layer_1"):
# First hidden layer with SIGMOID activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
#with tf.name_scope("h_layer_2"):
# Second hidden layer with SIGMOID activation
#layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
#layer_2 = tf.nn.sigmoid(layer_2)
with tf.name_scope("out_layer"):
# Output layer with SIGMOID activation
out_layer = tf.add(tf.matmul(layer_1, weights['out']), biases['bout'])
out_layer = tf.nn.sigmoid(out_layer)
return out_layer
# Layer weights
with tf.name_scope("weights"):
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], stddev=0.01, dtype=np.float32)),
#'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=0.05, dtype=np.float32)),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes], stddev=0.01, dtype=np.float32))
}
# Layer biases
with tf.name_scope("biases"):
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1], dtype=np.float32)),
#'b2': tf.Variable(tf.random_normal([n_hidden_2], dtype=np.float32)),
'bout': tf.Variable(tf.random_normal([n_classes], dtype=np.float32))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
with tf.name_scope("loss"):
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
with tf.name_scope("adam"):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Define summaries
tf.scalar_summary("loss", cost)
summary_op = tf.merge_all_summaries()
print("Model ready.")
# Launch the graph
with tf.Session() as sess:
sess.run(init)
board_path = sys.argv[3]+time.strftime("%Y%m%d%H%M%S")+"/"
writer = tf.train.SummaryWriter(board_path, graph=tf.get_default_graph())
print("Starting Training.")
for epoch in range(training_epochs):
training_set, training_labels = shuffle_examples(training_set, training_labels)
for i in range(total_batch):
# example loading
minibatch_x = training_set[i*mini_batch_size:(i+1)*mini_batch_size]
minibatch_y = training_labels[i*mini_batch_size:(i+1)*mini_batch_size]
# Run optimization op (backprop) and cost op
_, summary = sess.run([optimizer, summary_op], feed_dict={x: minibatch_x, y: minibatch_y})
# Write log
writer.add_summary(summary, epoch*total_batch+i)
print("Optimization Finished!")
# Test model
test_error = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
accuracy = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(accuracy, np.float32))
test_error, accuracy = sess.run([test_error, accuracy], feed_dict={x: test_set, y: test_labels})
print("Test Error: " + test_error.__str__() + "; Accuracy: " + accuracy.__str__())
print("Tensorboard path: " + board_path)
I'll post the solution here just in case someone gets stuck in a similar way. If you see that plot very carefully, all of the NaN values (the triangles) come on a regular basis, like if at the end of every loop something causes the output of the loss function to just go NaN.
The problem is that, at every loop, I was giving a mini batch of "empty" examples. The problem lies in how I declared my inner training loop:
for i in range(total_batch):
Now what we'd like here is to have Tensorflow go through the entire training set, one minibatch at a time. So let's look at how total_batch was declared:
total_batch = int(num_examples / mini_batch_size)
That is not quite what we'd want to do - as we want to consider the training set only. So changing this line to:
total_batch = int(training_set_size / mini_batch_size)
Fixed the problem.
It is to be noted that Tensorflow seemed to ignore those "empty" batches, computing NaN for the loss but not updating the gradients - that's why the trend of the loss was one of a net that's learning something.