Tensorflow variable initialization - python

rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
state = rnn_cell.zero_state(batch_size, tf.float32)
init = tf.global_variables_initializer()
sess = tf.Session()
for i in range(len(x_data)):
x = process_x(x_data[i])[:std_size]
y = word[i][:std_size]
x_split = tf.split(0, time_step_size, x)
outputs, state = tf.nn.rnn(rnn_cell, x_split, state)
prediction = tf.reshape(tf.concat(1, outputs), [-1, rnn_size])
real = tf.reshape(y, [-1])
ratio = tf.ones([time_step_size * batch_size])
loss = tf.nn.seq2seq.sequence_loss_by_example([prediction], [real], [ratio])
cost = tf.reduce_mean(loss)/batch_size
train = tf.train.AdamOptimizer(0.01).minimize(cost)
tf.global_variables_initializer().run(session=sess)
step = 0
print state
while step < 1000:
sess.run(train)
step+=1
result = sess.run(tf.arg_max(prediction, 1))
print result, [t for t in result] == y
tf.get_variable_scope().reuse_variables()
If source code is like above, rnn_cell and state is initialized in every steps in for loops?
If I want to use state in other training case then I have to reuse it. So rnn_cell and state should be initialized at first only not after that.
I can't imagine how this code works.

I think the problem is you have to separate your computational graph part with the session running part. what you are doing now is not how tensorflow usually works. maybe try this:
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
state = rnn_cell.zero_state(batch_size, tf.float32)
x_split = tf.split(0, time_step_size, x)
outputs, state = tf.nn.rnn(rnn_cell, x_split, state)
prediction = tf.reshape(tf.concat(1, outputs), [-1, rnn_size])
real = tf.reshape(y, [-1])
ratio = tf.ones([time_step_size * batch_size])
loss = tf.nn.seq2seq.sequence_loss_by_example([prediction], [real], [ratio])
cost = tf.reduce_mean(loss)/batch_size
train = tf.train.AdamOptimizer(0.01).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(len(x_data)):
x = process_x(x_data[i])[:std_size]
y = word[i][:std_size]
step = 0
while step < 1000:
sess.run(train, feed_dict={x_split:x, real:y})
step+=1
result = sess.run(tf.arg_max(prediction, 1))
print result, [t for t in result] == y
your code may have some design problem, but the point is separating your graph design with your "training".

Related

How to initialize the session of tensorflow in python and guarantee the reproducibility of the NN model? (Same number of layers but different results)

I compared the r2 score according to the number of hidden layers and hidden units using the for-loop and selected the layers and units with a high score and acceptable convergence time.
However, the re-calculation with the selected layers and units yields different r2 scores.
Even fixing the number of layers and the number of units and just running the loop result in different r2 score as shown below.
[same number of layers and units, but different results][1]
I have been thinking two possible reasons: firstly, for loop, the session is not initialized and secondly, the reproducibility in NN was not guaranteed.
I searched for other articles to solve both, but I'm asking because I still couldn't find the answer. Thank you in advance for your help.
The main code is below. To eliminate the randomness during the data split, the scikit-learn library with the same random_state was implemented.
n_layer = i+1
x_con[i] = n_layer
for j in range(m_neuron):
n_neuron = 2**(j+1)
y_con[j] = n_neuron
print('n_layer: ',n_layer,'n_neuron:',n_neuron)
# Launch the graph in a session.
sess = tf.Session()
tf.set_random_seed(777) # for reproducibility
# Create model and solver
m1 = FCNN(str(i)+str(j), n_feature, n_output, n_layer, n_neuron, learning_rate, use_batchnorm=True)
m1_solver = Solver(sess, m1)
# Initializes global variables in the graph
init = tf.global_variables_initializer()
sess.run(init)
cost_val_old = np.full((n_output), 0.)
for step in range(n_epoch):
cost_val, y_train_predict, _ = m1_solver.train(x_train_scaled, y_train_scaled)
diff_tmp = m1_solver.convergence_criterion(cost_val,cost_val_old)
cost_val_old = cost_val
if (step % n_print == 0 and step > 0) or diff_tmp <= tol:
print("{0} Cost: {1} Diff: {2:.10f}".format(step,cost_val,diff_tmp))
if diff_tmp <= tol:
cost_train[j,i,:] = cost_val[0:3]
iter_train[j,i] = step
break
y_valid_predict = np.squeeze(np.array(m1_solver.predict(x_valid_scaled)), axis=0)
y_test_predict = np.squeeze(np.array(m1_solver.predict(x_test_scaled)), axis=0)
# Evaluate r2 score
for k in range(n_output):
r2_train_tmp = m1_solver.evaluate_r2(y_train_scaled[:,i], y_train_predict[:,i])
r2_valid_tmp = m1_solver.evaluate_r2(y_valid_scaled[:,i], y_valid_predict[:,i])
r2_test_tmp = m1_solver.evaluate_r2(y_test_scaled[:,i], y_test_predict[:,i])
r2_train[j,i,k] = r2_train_tmp[0]
r2_valid[j,i,k] = r2_valid_tmp[0]
r2_test [j,i,k] = r2_test_tmp[0]
# Close session
sess.close()
The class of the model is also below. This class is mainly based on https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-10-6-mnist_nn_batchnorm.ipynb.
def __init__(self, name, n_feature, n_output, n_layer, n_neuron, lr, use_batchnorm=True):
with tf.variable_scope(name):
self.x = tf.placeholder(tf.float32, shape=[None, n_feature], name='x')
self.y = tf.placeholder(tf.float32, shape=[None, n_output], name='y')
self.mode = tf.placeholder(tf.bool, name='train_mode')
self.y_target = tf.placeholder(tf.float32, shape=[None])
self.y_prediction = tf.placeholder(tf.float32, shape=[None])
self.cost_new = tf.placeholder(tf.float32, shape=[n_output])
self.cost_old = tf.placeholder(tf.float32, shape=[n_output])
# Loop over hidden layers
net = self.x
hidden_dims = np.full((n_layer), n_neuron)
for i, h_dim in enumerate(hidden_dims):
with tf.variable_scope('layer{}'.format(i)):
net = tf.layers.dense(net, h_dim)
if use_batchnorm:
net = tf.layers.batch_normalization(net, training=self.mode)
net = tf.nn.relu(net)
# Attach fully connected layers
net = tf.contrib.layers.flatten(net)
self.hypothesis = tf.layers.dense(net, n_output)
self.cost = tf.reduce_mean(tf.square(self.hypothesis - self.y),axis=0, name='cost')
# When using the batchnormalization layers,
# it is necessary to manually add the update operations
# because the moving averages are not included in the graph
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope=name)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
self.train_op = optimizer.minimize(self.cost)
# convergence criterion
self.diff = tf.sqrt(tf.reduce_sum(tf.square(self.cost_new - self.cost_old)))
# R2 score
total_error = tf.reduce_sum(tf.square(self.y_target - tf.reduce_mean(self.y_target)))
unexplained_error = tf.reduce_sum(tf.square(self.y_target - self.y_prediction))
self.acc_R2 = 1. - unexplained_error/total_error ```
[1]: https://i.stack.imgur.com/Fo42X.png

Tensorflow Type error when trying to iterate the Tensors in loop

I have the following scenario:
y = tf.placeholder(tf.float32, [None, 1],name="output")
layers = [tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,activation=tf.nn.leaky_relu, name="layer"+str(layer))
for layer in range(2)]
multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, 100])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, 1)
outputs = tf.reshape(stacked_outputs, [-1, 2, 1])
outputs = tf.identity(outputs[:,1,:], name="prediction")
loss = Custom_loss(y,outputs)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss,name="training_op")
The custom loss function I tried is:
def Custom_loss(y,outputs):
hold_loss = []
for exp,pred in zip(y,outputs):
if exp >= pred:
result = tf.pow(pred * 0.5,2) - exp
hold_loss.append(result)
else:
hold_loss.append(tf.subtract(pred-exp))
return tf.reduce_mean(hold_loss)
Now when I am trying to implement this I am getting the following error:
TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.
I have tried implementing the tf.map_fn() but there is the same error I encounter. I have used the following question:
How to explain the result of tf.map_fn?
Kindly, help me get through this issue? How I can iterate the tensor? What way is best for the custom loss function implementation?
def Custom_loss(y,outputs):
mask = tf.greater_equal(y, outputs)
a = tf.pow(tf.boolean_mask(outputs, mask)*0.5, 2) - tf.boolean_mask(y, mask)
inv_mask = tf.logical_not(mask)
b = tf.boolean_mask(outputs, inv_mask)- tf.boolean_mask(y, inv_mask)
return tf.reduce_mean(tf.concat([a, b], axis=-1))
Test case
def Custom_loss_np(y,outputs):
hold_loss = []
for exp,pred in zip(y,outputs):
if exp >= pred:
result = pow(pred * 0.5,2) - exp
hold_loss.append(result)
else:
hold_loss.append(pred-exp)
return np.mean(hold_loss)
np_x = np.random.randn(100)
np_y = np.random.randn(100)
x = tf.constant(np_x)
y = tf.constant(np_y)
with tf.Session() as sess:
assert sess.run(Custom_loss(x, y)) == Custom_loss_np(np_x, np_y)
Use tf.math if you are in latest versoin of tensorflow.
Example using the custom loss to train a simple linear regression model
X = tf.placeholder(tf.float32,[None,1])
y = tf.placeholder(tf.float32,[None,1])
w = tf.Variable(tf.ones([1,1]))
b = tf.Variable(tf.ones([1,1]))
y_ = tf.matmul(X, w)+b
loss = Custom_loss(y, y_) #tf.reduce_mean(tf.square(y_ - y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
training_op = optimizer.minimize(loss,name="training_op")
#dummy data for linear regression
x_data = np.random.randn(100,1)
y_labels = 1.5*x_data + 2.5 + np.random.randn(100,1)
init = tf.global_variables_initializer()
sess.run(init)
sess = tf.Session()
sess.run(init)
for i in range(5000):
_, loss_ = sess.run([training_op,loss], feed_dict={X:x_data, y:y_labels})
if (i+1)%1000 == 0 :
print (loss_)
print (sess.run([w, b]))
The logic for calculating the loss is something OP have come up with.

Variables not updated after training in TensorFlow even when initiated with uniform random for a simple logistic regression

I am learning TensorFlow by implementing a simple logisitic regression classifier that outputs whether a digit is 7 or not when fed an MNIST image. I am using Stochastic gradient descent. The crux of the Tensorflow code is
# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2
ops.reset_default_graph()
n_x = 784
n_y = 1
x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')
w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());
z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')
Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
train_step = optimizer_tf.minimize(Loss_tf)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for Epoch in range(MaxEpochs):
for Sample in range(len(XTrain)):
x = XTrain[Sample]
y = YTrain[Sample].reshape([-1,1])
Train_sample = {x_tf: x, y_tf: y}
sess.run(train_step, feed_dict = Train_sample)
toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');
It builds the following graph (tensorboard related code has been removed for convenience):
The problem is even though the weights and biases are initialised randomly (non-zero), the neuron isn't being trained. The weight histogram is as follows.
I didnt want to post something so trivial, but I am at my wit's end. Sorry for the long post. Thank you very much in advance for any guidance. A little side note, it is taking 93.35s to run, it only took 10 or so seconds when I did this with numpy (same stochastic implementation), why would this be so?
EDIT:
The bias plot over the course of the training is as follows.
EDIT: The entire code, if the issue is cropping up on something outside what I previously thought.
import tensorflow as tf
import numpy as np
import h5py
from tensorflow.python.framework import ops
import time
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
def Flatten(Im):
FlatImArray = Im.reshape([Im.shape[0],-1,1])
return FlatImArray
DigitTested = 7
# Sperating the images with 7s from the rest
TrainIdxs = [];
for i in range(len(y_train)):
if(y_train[i] == DigitTested):
TrainIdxs.append(i)
TestIdxs = [];
for i in range(len(y_test)):
if(y_test[i] == DigitTested):
TestIdxs.append(i)
# Preparing the Datasets for training and testing
XTrain = Flatten(x_train);
YTrain = np.zeros([len(x_train),1]);
YTrain[TrainIdxs] = 1;
XTest = Flatten(x_test);
YTest = np.zeros([len(x_test),1]);
YTest[TestIdxs] = 1;
tic = time.time()
# Maximum number of epochs
MaxEpochs = 1
# Learning rate
eta = 1e-2
# Number of Epochs after which the neuron is validated
ValidationInterval = 1
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
n_x = 784
n_y = 1
x_tf = tf.placeholder(tf.float32, shape = [n_x, 1], name = 'x_tf')
y_tf = tf.placeholder(tf.float32, shape = [n_y, 1], name = 'y_tf')
w_tf = tf.get_variable(name = "w_tf", shape = [n_x, 1], initializer = tf.initializers.random_uniform());
b_tf = tf.get_variable(name = "b_tf", shape = [n_y, 1], initializer = tf.initializers.random_uniform());
z_tf = tf.add(tf.matmul(w_tf, x_tf, transpose_a = True), b_tf, name = 'z_tf')
yPred_tf = tf.sigmoid(z_tf, name = 'yPred_tf')
Loss_tf = tf.nn.sigmoid_cross_entropy_with_logits(logits = yPred_tf, labels = y_tf, name = 'Loss_tf')
with tf.name_scope('Training'):
optimizer_tf = tf.train.GradientDescentOptimizer(learning_rate = eta)
train_step = optimizer_tf.minimize(Loss_tf)
writer = tf.summary.FileWriter(r"C:\Users\braja\Documents\TBSummaries\MNIST1NTF\2")
tf.summary.histogram('Weights', w_tf)
tf.summary.scalar('Loss', tf.reshape(Loss_tf, []))
tf.summary.scalar('Bias', tf.reshape(b_tf, []))
merged_summary = tf.summary.merge_all()
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for Epoch in range(MaxEpochs):
for Sample in range(len(XTrain)):
x = XTrain[Sample]
y = YTrain[Sample].reshape([-1,1])
Train_sample = {x_tf: x, y_tf: y}
MergedSumm, _ = sess.run([merged_summary, train_step], feed_dict = Train_sample)
writer.add_summary(summary = MergedSumm, global_step = Sample)
if((Epoch+1) %ValidationInterval == 0):
ValidationError = 0
for Sample in range(len(XTest)):
x = XTest[Sample]
y = YTest[Sample].reshape([-1,1])
Test_sample = {x_tf: x, y_tf: y}
yPred = sess.run(yPred_tf, feed_dict = Test_sample)
ValidationError += abs(yPred - YTest[Sample])
print('Validation Error at', Epoch+1,'Epoch:', ValidationError);
writer.add_graph(tf.Session().graph)
writer.close()
toc = time.time()
print('\nElapsed time is: ', toc-tic,'s');
Looking at the bias value it looks like you are seeing saturation of the sigmoid function.
This happens when you push your sigmoid input(z_tf) to the extreme ends of the sigmoid function. When this happens, the gradient returned is so low that the training stagnates. The probable cause of this is that it seems you have doubled up on sigmoid functions; sigmoid_cross_entropy_with_logits applies a sigmoid to its input, but you have implemented one yourself already. Try removing one of these.
In addition, by default tf.initializers.random_uniform()) produces random values between 0:1. You probably want to initialise your Weights and biases symmetrically about 0 and at really small values to start with. This can be done by passing arguments minval and maxval to tf.initializers.random_uniform().
They should grow during training and again this prevents sigmoid saturation.

TensorFlow - cannot recreate neural network

I am trying to recreate/rerun a previous neural network. The idea is that I save the weights and biases with which the network is initialised in run (1), and reuse those exact weights and biases for initialisation in run (2), so that the outcome of run (2) will be exactly that of run (1). The relevant code is below.
However, I can't manage to do this. If I rerun the network, setting use_stored_weights to True (which prompts neural_network_model to restore the previously saved weights and biases, and, I hope, initialise the network with them), I do indeed always get the same results - but these are not the results of the run I am trying to emulate (in fact, they are much worse). It is also strange that I ALWAYS get the same results - they thus seem to be independent of the stored weights and biases.
The first thing I checked is whether I am correctly restoring my weights and biases (in neural_network_model), and this is the case.
I'm not really sure what is going on here. I have two suspicions, both of which I don't know how to check:
Although the correct weights and biases are restored in neural_network_model, the network is in fact not initialised with them. Is it the command sess.run(init) in train_neural_network that somehow does this?
I am reinitialising the network with the stored weights and biases not once, but every time - this would explain the poor results.
... but it might as well be something completely different. Any help much appreciated!
UPDATE: I figured out what is happening: when use_stored_weights is set to True, the weights are always initialised as all zeros. As far as I can see, this is due to the sess.run(init) in train_neural_network. However, if I leave out that line, I get an error:
FailedPreconditionError: Attempting to use uninitialized value b2
So I guess my question is now: how can I make the restored weights and biases accessible in train_neural_network?
Imports:
import tensorflow as tf
import numpy as np
from numpy import genfromtxt
This function builds the neural network:
def neural_network_model(data, layer_sizes, use_stored_weights):
num_layers = len(layer_sizes) - 1 # hidden and output layers
weights = {}
biases = {}
# initialise the weights
# (a) create new weights and biases
if not use_stored_weights:
for i in range(num_layers):
w_name = 'W' + str(i+1)
b_name = 'b' + str(i+1)
weights[w_name] = tf.get_variable(w_name, [layer_sizes[i], layer_sizes[i+1]],
initializer = tf.contrib.layers.xavier_initializer(), dtype=tf.float32)
biases[b_name] = tf.get_variable(b_name, [layer_sizes[i+1]],
initializer = tf.zeros_initializer(), dtype=tf.float32)
# save weights and biases
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
save_path = saver.save(sess, fold_path + 'weights/' + 'weights.ckpt')
# (b) restore weights and biases
else:
for i in range(num_layers):
# prepare variable
w_name = 'W' + str(i+1)
b_name = 'b' + str(i+1)
weights[w_name] = tf.get_variable(w_name, [layer_sizes[i], layer_sizes[i+1]],
initializer = tf.zeros_initializer(), dtype=tf.float32)
biases[b_name] = tf.get_variable(b_name, [layer_sizes[i+1]],
initializer = tf.zeros_initializer(), dtype=tf.float32)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, fold_path + 'weights/' + 'weights.ckpt')
# calculate linear and relu outputs for hidden layers
a_prev = data
for i in range(len(weights)-1):
z = tf.add(tf.matmul(a_prev, weights['W' + str(i+1)]), biases['b' + str(i+1)])
a = tf.nn.relu(z)
a_r = tf.nn.dropout(a, keep_prob)
a_prev = a_r
# calculate linear output for output layer
z_o = tf.add(tf.matmul(a_prev, weights['W' + str(len(weights))]), biases['b' + str(len(weights))])
return z_o
This function trains and evaluates the network:
def train_neural_network(x, layer_sizes, use_stored_weights):
prediction = neural_network_model(x, layer_sizes, use_stored_weights)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=lrn_rate).minimize(cost)
softm = tf.nn.softmax(prediction)
pred_class = tf.argmax(softm)
costs = []
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
epoch_loss = 0
for _ in range(int(len(x_train)/batch_size)):
epoch_x, epoch_y = x_train, y_train
_, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y, keep_prob: kp})
epoch_loss += c
softmaxes_tst = sess.run([softm, pred_class], feed_dict={x: x_test, keep_prob: 1.0})[0]
incorr = 0
for i in range(len(softmaxes_tst)):
curr_sm = softmaxes_tst[i]
curr_lbl = y_test[i]
if np.argmax(curr_sm) != np.argmax(curr_lbl):
incorr += 1
print('incorr: ', incorr)
num_ex = len(x_test)
print('acc = ' + str(num_ex-incorr) + '/' + str(num_ex) + ' = ' + str((num_ex-incorr)/num_ex))
print('accuracy train:', accuracy.eval({x:x_train, y:y_train, keep_prob: kp}))
print('accuracy test :', accuracy.eval({x:x_test, y:y_test, keep_prob: 1.0}))
And this, finally, is the code that calls the above functions:
lrn_rate = 0.01
kp = 0.85
epochs = 400
path = '/my/path/to/data/'
fold_path = ''
num_folds = 19
for i in range(num_folds):
tf.reset_default_graph()
fold_path = path + 'fold_' + str(i+1) + '/'
x_train = genfromtxt(fold_path + 'fv_train.csv', delimiter=',')
y_train = genfromtxt(fold_path + 'lbl_train.csv', delimiter=',')
x_test = genfromtxt(fold_path + 'fv_test.csv', delimiter=',')
y_test = genfromtxt(fold_path + 'lbl_test.csv', delimiter=',')
num_classes = len(y_train[0])
num_features = len(x_train[0])
batch_size = len(x_train)
num_nodes_hl1 = num_features
num_nodes_hl2 = num_features
layer_sizes = [num_features, num_nodes_hl1, num_nodes_hl2, num_classes]
x = tf.placeholder('float', [None, num_features])
y = tf.placeholder('float')
keep_prob = tf.placeholder('float')
use_stored_weights = True
train_neural_network(x, layer_sizes, use_stored_weights)

LSTM won't overfit training data

I have been trying to use an LSTM for regression in TensorFlow, but it doesn't fit the data. I have successfully fit the same data in Keras (with the same size network). My code for trying to overfit a sine wave is below:
import tensorflow as tf
import numpy as np
yt = np.cos(np.linspace(0, 2*np.pi, 256))
xt = np.array([yt[i-50:i] for i in range(50, len(yt))])[...,None]
yt = yt[-xt.shape[0]:]
g = tf.Graph()
with g.as_default():
x = tf.constant(xt, dtype=tf.float32)
y = tf.constant(yt, dtype=tf.float32)
lstm = tf.nn.rnn_cell.BasicLSTMCell(32)
outputs, state = tf.nn.dynamic_rnn(lstm, x, dtype=tf.float32)
pred = tf.layers.dense(outputs[:,-1], 1)
loss = tf.reduce_mean(tf.square(pred-y))
train_op = tf.train.AdamOptimizer().minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession(graph=g)
sess.run(init)
for i in range(200):
_, l = sess.run([train_op, loss])
print(l)
This results in a MSE of 0.436067 (while Keras got to 0.0022 after 50 epochs), and the predictions range from -0.1860 to -0.1798. What am I doing wrong here?
Edit:
When I change my loss function to the following, the model fits properly:
def pinball(y_true, y_pred):
tau = np.arange(1,100).reshape(1,-1)/100
pin = tf.reduce_mean(tf.maximum(y_true[:,None] - y_pred, 0) * tau +
tf.maximum(y_pred - y_true[:,None], 0) * (1 - tau))
return pin
I also change the assignments of pred and loss to
pred = tf.layers.dense(outputs[:,-1], 99)
loss = pinball(y, pred)
This results in a decrease of loss from 0.3 to 0.003 as it trains, and seems to properly fit the data.
Looks like a shape/broadcasting issue. Here's a working version:
import tensorflow as tf
import numpy as np
yt = np.cos(np.linspace(0, 2*np.pi, 256))
xt = np.array([yt[i-50:i] for i in range(50, len(yt))])
yt = yt[-xt.shape[0]:]
g = tf.Graph()
with g.as_default():
x = tf.constant(xt, dtype=tf.float32)
y = tf.constant(yt, dtype=tf.float32)
lstm = tf.nn.rnn_cell.BasicLSTMCell(32)
outputs, state = tf.nn.dynamic_rnn(lstm, x[None, ...], dtype=tf.float32)
pred = tf.squeeze(tf.layers.dense(outputs, 1), axis=[0, 2])
loss = tf.reduce_mean(tf.square(pred-y))
train_op = tf.train.AdamOptimizer().minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession(graph=g)
sess.run(init)
for i in range(200):
_, l = sess.run([train_op, loss])
print(l)
x gets a batch dimension of 1 before going into dynamic_rnn, since with time_major=False the first dimension is expected to be a batch dimension. It's important that the last dimension of the output of tf.layers.dense get squeezed off so that it doesn't broadcast with y (TensorShape([256, 1]) and TensorShape([256]) broadcast to TensorShape([256, 256])). With those fixes it converges:
5.78507e-05
You are not passing-on the state from one call of dynamic_rnn to next. That's the problem for sure.
Also, why take only last item of the output through the dense layer and onward?

Categories