Tensorflow: how it trains the model?

Tensorflow: how it trains the model? - python

Working on Tensorflow, the first step is build a data graph and use session to run it. While, during my practice, such as the MNIST tutorial. It firstly defines loss function and the optimizer, with the following codes (and the MLP model is defined before that):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) #define cross entropy error function
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean') #define loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate) #define optimizer
global_step = tf.Variable(0, name='global_step', trainable=False) #learning rate
train_op = optimizer.minimize(loss, global_step=global_step) #train operation in the graph
The training process:
train_step =tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
That's how Tensorflow did training in this case. But my question is, how did Tensorflow know which weight it needs to train and update? I mean, in the training codes, we only pass output y to cross_entropy, but for optimizer or loss, we didn't pass any information about the structure directly. In addition, we use dictionary to feed batch data to train_step, but train_step didn't directly use the data. How did Tensorflow know where to use these data as input?
To my question, I thought it might be all those variables or constants are stored in Tensor. Operations such as tf.matmul() should a "subclass" of Tensorflow's operation class(I haven't check the code yet). There might be some mechanism for Tensorflow to recognise relations among tensors (tf.Variable(), tf.constant()) and operations (tf.mul(), tf.div()...). I guess, it could check the tf.xxxx()'s super class to find out whether it is a tensor or operation. This assumption raises my second question: should I use Tensorflow's 'tf.xxx' function as possible to ensure tensorflow could build correct data flow graph, even sometimes it is more complicated than normal Python methods or some functions are supported better in Numpy than Tensorflow?
My last question is: Is there any link between Tensorflow and C++? I heard someone said Tensorflow is faster than normal Python since it uses C or C++ as backend. Is there any transform mechanism to transfer Tensorflow Python codes to C/C++?
I'd also be graceful if someone could share some debugging habits in coding with Tensorflow, since currently I just set up some terminals (Ubuntu) to test each part/functions of my codes.

You do pass information about your structure to Tensorflow when you define your loss with:
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
Notice that with Tensorflow you build a graph of operations, and every operation you use in your code is a node in the graph.
When you define your loss you are passing the operation stored in cross_entropy, which depends on y_ and y. y_ is a placeholder for your input whereas y is the result of y = tf.nn.softmax(tf.matmul(x, W) + b). See where I am going? The operation loss contains all the information it needs to build the model an process the input, because it depends on the operation cross_entropy, which depends on y_ and y, which depends on the input x and the model weights W.
So when you call
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Tensorflow knows perfectly which operations should be computed when you run train_step, and it knows exactly where to put in the operations graph the data you are passing through feed_dict.
As for how does Tensorflow know which variables should be trained, the answer is easy. It trains any tf.Variable() in the operations graph which is trainable. Notice how when you define the global_step you set trainable=False because you don't want to compute gradients w.r.t that variable.

Related

Additional optimizer affects regularization loss

I'm working with an existing tensorflow model.
For one part of the network, I want to set a different learning rate as in the remaining network. Let's say all_variables are made up of variables_1 and variables_2, then I want to change the learning rate for variables of variables_2.
The existing code for settings up optimizer, computing and applying gradients looks basically like this:
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
grads_and_vars = optimizer.compute_gradients(loss, all_variables)
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
I already tried to create a second optimizer following this scheme. However, for debugging, I set both learning rates equal, and the decrease of regularization loss was very dissimilar.
Isn't it possible to create a second optimizer, optimizer_new, and apply apply_gradients simply on the respective grads_and_vars of variables_1 and variables_2? I.e. Instead of having this line
grads_updates = optimizer.apply_gradients(grads_and_vars, global_step)
one could use
grads_updates = optimizer.apply_gradients(grads_and_vars['variables_1'], global_step)
grads_updates_new = optimizer_new.apply_gradients(grads_and_vars['variables_2'], global_step)
and finally, train_op = tf.group(grads_updates, grads_updates_new).
However, the regularization loss behavior is still present.

I came across the cause through a comment in this post . In my case, it doesn't make sense to to supply twice "global_step" for the global_step argument of apply_gradients. As the learning_rate and therefore the optimizer arguments depends on global_step, the training process, especially regularization loss behaviour, differs. Thanks to y.selivonchyk for pointing this out.

When does Tensorflow update weights and biases?

When does tensorflow update weights and biases in the for loop?
Below is the code from tf's github. mnist_softmax.py
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
When does tensorflow update weights and biases?
Does it update them when running sess.run()? If so, Does it mean, in this program, tf update weights and biases 1000 times?
Or does it update them after finishing the whole for loop?
If 2. is correct, my next question is, does tf update the model using different training data every time (since it uses next_batch(100). There are 1000*100 training data points in total. But all data points are considered only once individually. Am I correct or did I misunderstand something?
If 3. is correct, is it weird that after just one update step the model had been trained?
I think I must be misunderstanding something, It would be really great if anyone can give me a hint or refer some material.

It updates weights every time you run the train_step.
Yes, it is updating the weights 1000 times in this program.
See above
Yes, you are correct, it loads a mini-batch containing 100 points at once and uses it to compute gradients.
It's not weird at all. You don't necessarily need to see the same data again and again, all that is required is that you have enough data for the network to converge. You can iterate multiple times over the same data if you want, but since this model doesn't have many parameters, it converges in a single epoch.
Tensorflow works by creating a graph of the computations that are required for computing the output of a network. Each of the basic operations like matrix multiplication, addition, anything you can think of are nodes in this computation graph. In the tensorflow mnist example that you are following, the lines from 40-46 define the network architecture
x: placeholder
y_: placeholder
W: Variable - This is learnt during training
b: Variable - This is also learnt during training
The network represents a simple linear regression model where the prediction is made using y = W*x + b (see line 43).
Next, you configure the training procedure for your network. This code uses cross-entropy as the loss function to minimize (see line 57). The minimization is done using the gradient descent algorithm (see line 59).
At this point, your network is fully constructed. Now you need to run these nodes so that actual computation if performed (no computation has been performed up till this point).
In the loop where sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) is executed, tf computes the value of train_step which causes the GradientDescentOptimizer to try to minimize the cross_entropy and this is how training progresses.

Modify learning rate in imported Tensorflow graph

I have created a graph with an AdamOptimizer, which I have then saved with tf.train.Saver().save(session, "model_name")
After training it for a while I am able to import the whole graph and the variables in a different session and resume training with
saver = tf.train.import_meta_graph("model_name")
saver.restore(session, "model_name")
What I would like to do is, after importing the graph+variables and before resuming the optimization, to change the learning_rate of the AdamOptimizer. Is that possible?
EDIT: One way of doing this would be to define the learning rate as a placeholder and feed a different value every time. But let's assume the graph has already been saved without doing this for the sake of argument.

I think you can replace learning_rate with placeholder,ie.
learning_rate = tf.placeholder(tf.float32,shape=(),name="learing_rate")
train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(your_loss_tensor, name="train_op")
when you have restored your graph, get all the all ops and tensors that related to train like train_op and learning_rate using
train_op = graph.get_operation_by_name("train_op")
learning_rate = graph.get_tensor_by_name("learning_rate:0")
and run train
sess.run(train_op, feed_dict={learning_rate: whatever_you_what})
UPDATE:
see this if you want to change some input of your saved graph

How can I access the weights of a recurrent cell in Tensorflow?

One way to improve stability in deep Q-learning tasks is to maintain a set of target weights for the network that update slowly and are used for calculating Q-value targets. As a result at different times in the learning procedure, two different sets of weights are used in the forward pass. For normal DQN this is not difficult to implement, as the weights are tensorflow variables that can be set in a feed_dict ie:
sess = tf.Session()
input = tf.placeholder(tf.float32, shape=[None, 5])
weights = tf.Variable(tf.random_normal(shape=[5,4], stddev=0.1)
bias = tf.Variable(tf.constant(0.1, shape=[4])
output = tf.matmul(input, weights) + bias
target = tf.placeholder(tf.float32, [None, 4])
loss = ...
...
#Here we explicitly set weights to be the slowly updated target weights
sess.run(output, feed_dict={input: states, weights: target_weights, bias: target_bias})
# Targets for the learning procedure are computed using this output.
....
#Now we run the learning procedure, using the most up to date weights,
#as well as the previously computed targets
sess.run(loss, feed_dict={input: states, target: targets})
I'd like to use this target network technique in a recurrent version of DQN, but I don't know how to access and set the weights used inside a recurrent cell. Specifically I'm using a tf.nn.rnn_cell.BasicLSTMCell, but I'd like to know how to do this for any type of recurrent cell.

The BasicLSTMCell does not expose its variables as part of its public API. I recommend that you either look up what names these variables have in your graph and feed those names (those names are unlikely to change since they are in the checkpoints and changing these names would break checkpoint compatibility).
Alternatively, you can make a copy of BasicLSTMCell which does expose the variables. This is the cleanest approach, I think.

You can use the below line to get the variables in the graph
variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
Then you can inspect these variables to see how they are changing

Keras Custom Objective requires Tensor Evaluation

I want to create a custom objective function for training a Keras deep net. I'm researching classification of imbalanced data, and I use the F1 score a lot in scikit-learn. I therefore had the idea of inverting the F1 metric (1 - F1 score) to use it as a loss function/objective for Keras to minimise while training:
(from sklearn.metric import f1_score)
def F1Loss(y_true, y_pred):
return 1. - f1_score(y_true, y_pred)
However, this f1_score method from scikit-learn requires numpy arrays or lists to calculate the F1 score. I found that Tensors need to be evaluated to their numpy array counterparts using .eval(), which requires a TensorFlow session to perform this task.
I do not know the session object that Keras uses. I have tried using the code below, assuming the Keras backend has its own session object defined somewhere, but this also did not work.
from keras import backend as K
K.eval(y_true)
Admittedly, this was a shot in the dark since I don't really understand the deeper workings of Keras or Tensorflow a the moment.
My question is: how do I evaluate the y_true and y_pred tensors to their numpy array counterparts?

Your problem is a classic problem with implementing a discontinous objective in Theano. It's impossible beacuse of two reasons:
F1-score is discontinous : here you can read what should be expected from an objective function in neural networks training. F1-score doesn's satisfy this conditions - so it cannot be used to train neural network.
There is no equivalency between Tensor and Numpy array: it's an fundamental issue. Theano tensor is like x in school equations. You cannot expect from an algebraic variable to be equivalent any object to which it can be assigned to. On the other hand - as a part of a computational graph - a tensor operations should be provided in order to compute objective. If not - you cannot differentiate it w.r.t. parameters what makes most of usual way of training of a neural network impossible.

If you have predicted and actual tensors in numpy array format then I guess that you can use this code snippet:
correct_prediction = tf.equal(tf.argmax(actual_tensor,1), tf.argmax(predicted_tensor,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
And in keras, I think that you can use this:
model.fit_generator(train_generator, validation_data=val_generator, nb_val_samples=X_val.shape[0],
samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, verbose=1,
callbacks=[model_checkpoint, reduce_lr, tb], max_q_size=1000)
Where train_generator and val_generator generates the training and validation data while training and this also prints the loss and accuracies while training.
Hope this helps...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.