How to enable gradient flow in scatter_update? - python

I am trying to compute the local variance map of an image by taking data from all possible window of fixed-size (eg 5x5), inside a training loop. To vectorize this operation I am thinking about expanding the original image with an operation similar to this using scatter_update/scatter_nd_update inside the training loop. What this operation essentially does is to map each element in the original tensor to potentially many locations in the new tensor, and the locations are computed inside the training loop.
However, scatter_update does not allow gradient propagation, and my attempt at creating a simple custom gradient for the scatter_update did not work.
#tf.RegisterGradient("CustomGrad")
def _clip_grad(unused_op, grad):
return tf.constant(5., dtype=tf.float32, shape=(1)) # tf.clip_by_value(grad, -0.1, 0.1)
x = tf.Variable([3.0], dtype=tf.float32)
y = tf.get_variable('y', shape=(1), dtype=tf.float32)
g = tf.get_default_graph()
with g.gradient_override_map({"ScatterNdUpdate1": "CustomGrad"}):
output = tf.scatter_nd_update(y, [[0]], x, name="ScatterNdUpdate1")
grad_custom = tf.gradients(output, y)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(grad_custom)
Running the code above shows that grad_custom contains None. Does any one have any idea of how to properly implement a local variance map that can be used in the training loop? Solving the gradient problem would also help me with another problem I am having.

Related

Why prediction on activation values (Softmax) gives incorrect results?

I've implemented a basic neural network from scratch using Tensorflow and trained it on MNIST fashion dataset. It's trained correctly and outputs testing accuracy around ~88-90% over 10 classes.
Now I've written predict() function which predicts the class of given image using trained weights. Here is the code:
def predict(images, trained_parameters):
Ws, bs = [], []
parameters = {}
for param in trained_parameters.keys():
parameters[param] = tf.convert_to_tensor(trained_parameters[param])
X = tf.placeholder(tf.float32, [images.shape[0], None], name = 'X')
Z_L = forward_propagation(X, trained_parameters)
p = tf.argmax(Z_L) # Working fine
# p = tf.argmax(tf.nn.softmax(Z_L)) # not working if softmax is applied
with tf.Session() as session:
prediction = session.run(p, feed_dict={X: images})
return prediction
This uses forward_propagation() function which returns the weighted sum of the last layer (Z) and not the activitions (A) because of TensorFlows tf.nn.softmax_cross_entropy_with_logits() requires Z instead of A as it will calculate A by applying softmax Refer this link for details.
Now in predict() function, when I make predictions using Z instead of A (activations) it's working correctly. By if I calculate softmax on Z (which is activations A of the last layer) it's giving incorrect predictions.
Why it's giving correct predictions on weighted sums Z? We are not supposed to first apply softmax activation (and calculate A) and then make predictions?
Here is the link to my colab notebook if anyone wants to look at my entire code: Link to Notebook Gist
So what am I missing here?
Most TF functions, such as tf.nn.softmax, assume by default that the batch dimension is the first one - that is a common practice. Now, I noticed in your code that your batch dimension is the second, i.e. your output shape is (output_dim=10, batch_size=?), and as a result, tf.nn.softmax is computing the softmax activation along the batch dimension.
There is nothing wrong in not following the conventions - one just needs to be aware of them. Computing the argmax of the softmax along the first axis should yield the desired results (it is equivalent to taking the argmax of the logits):
p = tf.argmax(tf.nn.softmax(Z_L, axis=0))
Also, I would also recommend computing the argmax along the first axis in case more than one image is fed into the network.

Using Tensorflow to optimize a function in python

I'm new to tensorflow and try to understand how to use outside of a machine learning context. I would like to optimize a python function with the ADAM implemenation of tensorflow.
Let's assume I have the following function:
def fun_test(x):
"""
:param x: List of parameters, e.g. [1,2,3]
:return: real value
"""
res=do_something(x)
return res
When using scipy, I would call 'scipy.minimize(fun_test,x0,method="Nelder-Mead")'. How could I do this with tensorflow?
Best,
Michael
You need to rewrite the function do_something to take tensors as inputs and returns a scalar tensor (i.e. creating a computation graph). Then the following code is a sketch of how to perform optimization on the function. (BTW, in your code fun_test and do_something has no real difference so I picked the latter).
x = tf.get_variable("x", dtype=..., initializer=...)
target = do_something(x)
opt = tf.train.AdamOptimizer(...).minimize(target) # Defines one optimization step
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()) # Initialize x
NUM_STEPS = 1000
for _ in range(NUM_STEPS):
sess.run(opt) # Run optimization for NUM_STEPS steps
print(sess.run(x)) # Show values of x
print(sess.run(target)) # Show target value

Gradient of a function evaluated over a batch

I want to use Tensorflow to calculate the gradients of a function. However, if I use the tf.gradients function, it returns a single list of gradients. How to return a list for each point of the batch?
# in a tensorflow graph I have the following code
tf_x = tf.placeholder(dtype=tf.float32, shape=(None,N_in), name='x')
tf_net #... conveniently defined neural network
tf_y = tf.placeholder(dtype=tf.float32, shape=(None,1), name='y')
tf_cost = (tf_net(tf_x) - tf_y)**2 # this should have length N_samples because I did not apply a tf.reduce_mean
tf_cost_gradients = tf.gradients(tf_cost,tf_net.trainable_weights)
If we run it in a tensorflow session,
# suppose myx = np.random.randn(N_samples,N_in) and myy conveniently chosen
feed = {tf_x:myx, tx_y:myy}
sess.run(tf_cost_gradients,feed)
I get only one list, and not a list for each sample as I would like. I can use
for i in len(myx):
feed = {tf_x:myx[i], tx_y:myy[i]}
sess.run(tf_cost_gradients,feed)
but this is extremely slow! What can I do? Thank you
Although, there is an 'aggregation_method' parameter in tf.gradients, it is not easy to get the individual gradients.
aggregation_method: Specifies the method used to combine gradient terms.
Please see these threads:
https://github.com/tensorflow/tensorflow/issues/15760
https://github.com/tensorflow/tensorflow/issues/4897
In one of the threads(#4897), Ian Goodfellow makes the following suggestion to speed up individual gradient computation:
This is only pseudocode, but basic idea is:
examples = tf.split(batch)
weight_copies = [tf.identity(weights) for x in examples]
output = tf.stack(f(x, w) in zip(examples, weight_copies))
cost = cost_function(output)
per_example_gradients = tf.gradients(cost, weight_copies)

Running a training operation inside another training operation

I want to run a small training operation inside another training operation as follows:
def get_alphas(weights, filters):
alphas = tf.Variable(...)
# Define some loss and training_op here
with tf.Session() as sess:
for some_epochs:
sess.run(training_op)
return tf.convert_to_tensor(sess.run(alphas))
def get_updated_weights(default_weights):
weights = tf.Variable(default_weights)
# Some operation on weights to get filters
# Now, the following will produce errors since weights is not initialized
alphas = get_alphas(weights, filters)
# Other option is to initialize it here as follows
with tf.Session() as sess:
sess.run(tf.variables_initializer([weights]))
calculated_filters = sess.run(filters)
alphas = get_alphas(default_weights, calculated_filters)
return Some operation on alphas and filters
So, what I want to do is to create a Variable by the name of weights. alphas and filters are dynamically dependent (through some training) on weights. Now, as weights are trained, filters will change as it is created through some operations on weights, but alphas also need to change, which can be found only though another training operation.
I will provide the exact functions, if intention is not clear from above.
The trick you describe won't work, because tf.Session.close releases all associated resources, such as variables, queues, and readers. So the result of get_alphas won't be a valid tensor.
The best course of action is to define several losses and training ops (affecting different parts of the graph) and run them within a single session, when you need to.
alphas = tf.Variable(...)
# Define some loss and training_op here
def get_alphas(sess, weights, filters):
for some_epochs:
sess.run(training_op)
# The rest of the training...

stopping condition on gradient value tensorflow

I would like to implement a stopping condition based on the value of the gradient of the loss function w.r.t. the weights.
For example, let's say I have something like this:
optimizer = tf.train.AdamOptimizer()
grads_and_vars = optimizer.compute_gradients(a_loss_function)
train_op = optimizer.apply_gradients(grads_and_vars)
then I would like to run the graph with something like this:
for step in range(TotSteps):
output = sess.run([input], feed_dict=some_dict)
if(grad_taken_in_some_way < some_treshold):
print("Training finished.")
break
I am not sure what I should pass to sess.run() in order to get as output also the gradient (besides all other stuff I need). I am not even sure whether this is the correct approach or I should do it differently. I made some tries but I failed every time. Hope someone has some hints.
Thank you in advance!
EDIT: English correction
EDIT2: Answer by Iballes is exactly what I wanted to do. Still, I am not sure how to norm and sum all the gradients. Since I have different layer in my CNN and different weights with different shape, if I just do what you suggested, I get an error on the add_n() operation (since I am trying to add together matrices with different shapes). So probably I should do something like:
grad_norms = [tf.nn.l2_normalize(g[0], 0) for g in grads_and_vars]
grad_norm = [tf.reduce_sum(grads) for grads in grad_norms]
final_grad = tf.reduce_sum(grad_norm)
Can anyone confirm this?
Your line output = sess.run([input], feed_dict=some_dict) makes think that you have a little misunderstanding of the sess.run command. What you call [input] is supposed to be a list of tensors that are to be fetched by the sess.run command. Hence, it is an output rather than an input. To tackle your question, let's assume that you are doing something like output = sess.run(loss, feed_dict=some_dict) instead (in order to monitor the training loss).
Also, I suppose you want to formulate your stopping criterion using the norm of the gradient (the gradient itself is a multi-dimensional quantity). Hence, what you want to do is to fetch the norm of the gradient each time you execute the graph. To that end, you have to do two things. 1) Add the gradient norm to the computation graph. 2) Fetch it in each call to sess.run in your training loop.
Ad 1) You have added the gradients to the graph via
optimizer = tf.train.AdamOptimizer()
grads_and_vars = optimizer.compute_gradients(a_loss_function)
and now have the tensors holding the gradients in grads_and_vars (one for each trained variable in the graph). Let's take the norm of each gradient and then sum it up:
grad_norms = [tf.nn.l2_loss(g) for g, v in grads_and_vars]
grad_norm = tf.add_n(grad_norms)
There you have your gradient norm.
Ad 2) Inside your loop, fetch the gradient norm alongside the loss by telling the sess.run command to do so.
for step in range(TotSteps):
l, gn = sess.run([loss, grad_norm], feed_dict=some_dict)
if(gn < some_treshold):
print("Training finished.")
break

Categories