So I tried TensorFlow's eager execution and my implementation of it wasn't successful. I used gradient.tape, and while the program runs, there is no visible update in any of the weights. I've seen some sample algorithms and tutorials using optimizer.apply_gradients() in order to update all variables, but I'm assuming I'm not using it properly.
import tensorflow as tf
import tensorflow.contrib.eager as tfe
# emabling eager execution
tf.enable_eager_execution()
# establishing hyperparameters
LEARNING_RATE = 20
TRAINING_ITERATIONS = 3
# establishing all LABLES
LABELS = tf.constant(tf.random_normal([3, 1]))
# print(LABELS)
# stub statment for input
init = tf.Variable(tf.random_normal([3, 1]))
# declare and intialize all weights
weight1 = tfe.Variable(tf.random_normal([2, 3]))
bias1 = tfe.Variable(tf.random_normal([2, 1]))
weight2 = tfe.Variable(tf.random_normal([3, 2]))
bias2 = tfe.Variable(tf.random_normal([3, 1]))
weight3 = tfe.Variable(tf.random_normal([2, 3]))
bias3 = tfe.Variable(tf.random_normal([2, 1]))
weight4 = tfe.Variable(tf.random_normal([3, 2]))
bias4 = tfe.Variable(tf.random_normal([3, 1]))
weight5 = tfe.Variable(tf.random_normal([3, 3]))
bias5 = tfe.Variable(tf.random_normal([3, 1]))
VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]
def thanouseEyes(input): # nn model aka: Thanouse's Eyes
layerResult = tf.nn.relu(tf.matmul(weight1, input) + bias1)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight2, input) + bias2)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight3, input) + bias3)
input = layerResult
layerResult = tf.nn.relu(tf.matmul(weight4, input) + bias4)
input = layerResult
layerResult = tf.nn.softmax(tf.matmul(weight5, input) + bias5)
return layerResult
# Begin training and update variables
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
with tf.GradientTape(persistent=True) as tape: # gradient calculation
for i in range(TRAINING_ITERATIONS):
COST = tf.reduce_sum(LABELS - thanouseEyes(init))
GRADIENTS = tape.gradient(COST, VARIABLES)
optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))
print(weight1)
The usage of optimizer seems fine, however the computation defined by thanouseEyes() will always return [1., 1., 1.] irrespective of the variables, thus the gradients are always 0 and thus the variables will never be updated (print(thanouseEyes(init)) and print(GRADIENTS) should demonstrate that).
Digging in a bit more, tf.nn.softmax is applied to x = tf.matmul(weight5, input) + bias5 which has a shape of [3, 1]. So the tf.nn.softmax(x) is effectively computing [softmax(x[0]), softmax(x[1]), softmax(x[2])] as tf.nn.softmax applies (by default) on the last axis of the input. x[0], x[1], and x[2] are vectors with one element so softmax(x[i]) will always be 1.0.
Hope that helps.
Some additional points unrelated to your question that you may be interested in:
As of TensorFlow 1.11, you don't need the tf.contrib.eager module in your program. Replace all occurrences of tfe with tf (i.e., tf.Variable instead of tfe.Variable) and you'll get the same result
Computation performed inside the context of a GradientTape is "recorded", i.e., it holds on to intermediate tensors so that gradients can be computed later on. Long story short, you'd want to move the GradientTape inside the loop body:
-
for i in range(TRAINING_ITERATIONS):
with tf.GradientTape() as tape:
COST = tf.reduce_sum(LABELS - thanouseEyes(init))
GRADIENTS = tape.gradient(COST, VARIABLES)
optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))
Related
Consider the following code for Linear Regression implemented using PyTorch:
X is the input, Y is the output for the training set, w is the parameter that needs to be optimised
import torch
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)
def forward(x):
return w * x
def loss(y, y_pred):
return ((y_pred - y)**2).mean()
print(f'Prediction before training: f(5) = {forward(5).item():.3f}')
learning_rate = 0.01
n_iters = 100
for epoch in range(n_iters):
# predict = forward pass
y_pred = forward(X)
# loss
l = loss(Y, y_pred)
# calculate gradients = backward pass
l.backward()
# update weights
#w.data = w.data - learning_rate * w.grad
with torch.no_grad():
w -= learning_rate * w.grad
# zero the gradients after updating
w.grad.zero_()
if epoch % 10 == 0:
print(f'epoch {epoch+1}: w = {w.item():.3f}, loss = {l.item():.8f}')
What does the 'with' block do? The requires_grad argument for w is already set to True. Why is it then being put under a with torch.no_grad() block?
There is no reason to track gradients when updating the weights; that is why you will find a decorator (#torch.no_grad()) for the step method in any implementation of an optimizer.
"With torch.no_grad" block means doing these lines without keeping track of the gradients.
The requires_grad argument tells PyTorch that we want to be able to calculate the gradients for those values. However, the with torch.no_grad() tells PyTorch to not calculate the gradients, and the program explicitly uses it here (as with most neural networks) in order to not update the gradients when it is updating the weights as that would affect the back propagation.
I am trying to write a custom loss function with custom gradients. While I haven't implemented the gradients yet, Tensorflow is having difficulty processing the output of my loss function (because of the shape?). Here is the error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [1,3], In[1]: [64,2] [Op:MatMul]
Here is the incomplete "training" loop:
def main():
inputs = tf.keras.Input(shape=(2,))
x1 = tf.keras.layers.Dense(64, activation="relu")(inputs)
x2 = tf.keras.layers.Dense(64, activation="relu")(x1)
outputs = tf.keras.layers.Dense(2)(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="pulse_model")
# Input: lists of 2 floats
# Output: lists of 2 complex numbers
data = gen_hadamard_data(10)
# Arbitrary batch size
data = data.batch(batch_size=3)
epochs = 2
for epoch in range(epochs):
print(f"\nStart of Epoch {epoch}")
for step, (x_batch_train, y_batch_train) in enumerate(data):
print(f"{x_batch_train=}")
print(f"{y_batch_train=}")
with tf.GradientTape() as tape:
logits = model(tf.constant(x_batch_train.numpy().tolist()), training=True)
loss_fn = make_fidelity_cost(x_batch_train)
loss_value = loss_fn(logits, y_batch_train)
grads = tape.gradient(loss_value, model.trainable_weights)
print(f"Prediction: {logits}")
print(f"Loss value: {loss_value}")
print(f"Gradients: {grads}")
Here is the loss function:
def make_fidelity_cost(initial_states, backend=FakeArmonk()):
#tf.custom_gradient
def fidelity_cost(y_pred, y_actual):
fidelity_list = []
for in_state, pred, actual in zip(initial_states.numpy(),
y_pred.numpy(),
y_actual.numpy()):
init_state = [np.cos(in_state[0] / 2),
np.exp(in_state[1] * 1.j) * np.sin(in_state[0] / 2)]
job = run_gaussian(duration=16,
amp=pred[0],
sigma=pred[1],
init_state=init_state,
backend=backend)
result = job.result()
sv = result.get_statevector()
actual_sv = Statevector(actual.tolist() + [0])
# This is the actual calculation that gets returned as the loss
# state_fidelity returns a scalar
fidelity_list.append(state_fidelity(sv, actual_sv))
def grad(upstream):
# Don't know what I need to do here quite yet
print(f"{upstream=}")
return upstream, upstream
return tf.Variable([fidelity_list]), grad
return fidelity_cost
Some notes:
I've posted about this before, but realized it was practically unreadable, so I've reduced it down to the basics of what's happening
the main loss output is coming from state_fidelity, where the output is a scalar, which is then appended to a list. The list is then passed into tf.constant as a return value
While the code within the for loop may be from not-very-common libraries, the only line that matters is where the state fidelity is appended to the fidelity list
I'm not sure about what size matrices the functions are expecting, so it would be appreciated if someone were able to teach me as well
I am trying to follow your provide codes and found there are someway I doing with the concept, the real part and imagination part can individually process and vectorized.
for in_state, pred, actual in zip(initial_states, y_pred, y_actual):
real = np.cos(in_state[0] / 2)
imagination = np.exp(np.asarray(in_state[1], dtype=np.csingle) *
np.asarray(complex('1.j'), dtype=np.csingle)) * np.sin(in_state[0] / 2) #
[0.+0.j] ... [0.+0.j]]
init_state = [ real, imagination ]
👉👉👉 ### filter_result = gaussian_filter(np.asarray(real, dtype=np.float64), sigma=5, truncate=0 )
👉👉👉 ### job = run_gaussian(duration=16,
amp=pred[0],
sigma=pred[1],
init_state=init_state,
backend=backend)
### that is because it is not support imagnication number in calculation I can
do it by each type of real and imagination that can create a vector from their
result.
### append state vector
fidelity_list.append(state_fidelity(current_vector, actual_vector))
loss_fn = make_fidelity_cost(image)
# Convert to numpy array
loss_value = loss_fn(model, label)
I'm trying to build a workflow that uses tf.data.dataset batches and an iterator. For performance reasons, I am really trying to avoid using the placeholder->feed_dict loop workflow.
The process I'm trying to implement involves grad-cam (which requires the gradient of the loss with respect to the final convolutional layer of a CNN) as an intermediate step, and ideally I'd like to be able to try it out on several Keras pre-trained models, including non-sequential ones like ResNet.
Most implementations of grad-cam that I've found rely on hand-crafting the CNN of interest in tensorflow. I found one implementation, https://github.com/jacobgil/keras-grad-cam, that is made for keras models, and following that example, I get
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
The problem is that gradient_function([batch]) returns a numpy array whose value is determined by the first batch, so that heatmap doesn't change with subsequent evaluations.
I've tried replacing K.function with a Model in various ways, but nothing seems to work. I usually end up either with an error suggesting that grads evaluates to None or that one model or another is expecting a feed_dict and not receiving one.
Is this code salvageable? Is there a better way to do this besides looping through the data several times (once to get all the grad-cams and then again once I have them) or using placeholders and feed_dicts?
Edit:
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
# other operations on heatmap and batch ...
# ...
output_function = K.function(model.input, [node1, ..., nodeN])
for batch in range(n_batches):
outputs1, ... , outputsN = output_function(batch)
Gives me the desired outputs for each batch.
Yes, K.function returns numpy arrays because it evaluates the symbolic computation in your graph. What I think you should do is to keep everything symbolic up to K.function, and after getting the gradients, perform all computations of the Grad-CAM weights and final saliency map using numpy.
Then you can iterate on your dataset, evaluate gradient_function on a new batch of data, and compute the saliency map.
If you want to keep everything symbolic, then you should not use K.function to produce the gradient function, but use the symbolic gradient (the output of K.gradient, without lambda) and convolutional feature maps (conv_output) and perform the saliency map computation on top of that, and then build a function (using K.function) that takes the model input, and outputs the saliency map.
Hope the explanation is enough.
Is it possible to minimise a loss function by changing only some elements of a variable? In other words, if I have a variable X of length 2, how can I minimise my loss function by changing X[0] and keeping X[1] constant?
Hopefully this code I have attempted will describe my problem:
import tensorflow as tf
import tensorflow.contrib.opt as opt
X = tf.Variable([1.0, 2.0])
X0 = tf.Variable([3.0])
Y = tf.constant([2.0, -3.0])
scatter = tf.scatter_update(X, [0], X0)
with tf.control_dependencies([scatter]):
loss = tf.reduce_sum(tf.squared_difference(X, Y))
opt = opt.ScipyOptimizerInterface(loss, [X0])
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
opt.minimize(sess)
print("X: {}".format(X.eval()))
print("X0: {}".format(X0.eval()))
which outputs:
INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
Objective function value: 26.000000
Number of iterations: 0
Number of functions evaluations: 1
X: [3. 2.]
X0: [3.]
where I would like to to find the optimal value of X0 = 2 and thus X = [2, 2]
edit
Motivation for doing this: I would like to import a trained graph/model and then tweak various elements of some of the variables depending on some new data I have.
You can use this trick to restrict the gradient calculation to one index:
import tensorflow as tf
import tensorflow.contrib.opt as opt
X = tf.Variable([1.0, 2.0])
part_X = tf.scatter_nd([[0]], [X[0]], [2])
X_2 = part_X + tf.stop_gradient(-part_X + X)
Y = tf.constant([2.0, -3.0])
loss = tf.reduce_sum(tf.squared_difference(X_2, Y))
opt = opt.ScipyOptimizerInterface(loss, [X])
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
opt.minimize(sess)
print("X: {}".format(X.eval()))
part_X becomes the value you want to change in a one-hot vector of the same shape as X. part_X + tf.stop_gradient(-part_X + X) is the same as X in the forward pass, since part_X - part_X is 0. However in the backward pass the tf.stop_gradient prevents all unnecessary gradient calculations.
I'm not sure if it is possible with the SciPy optimizer interface, but using one of the regular tf.train.Optimizer subclasses you can do something like that by calling compute_gradients first, then masking the gradients and then calling apply_gradients,
instead of calling minimize (which, as the docs say, basically calls the previous ones).
import tensorflow as tf
X = tf.Variable([3.0, 2.0])
# Select updatable parameters
X_mask = tf.constant([True, False], dtype=tf.bool)
Y = tf.constant([2.0, -3.0])
loss = tf.reduce_sum(tf.squared_difference(X, Y))
opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)
# Get gradients and mask them
((X_grad, _),) = opt.compute_gradients(loss, var_list=[X])
X_grad_masked = X_grad * tf.cast(X_mask, dtype=X_grad.dtype)
# Apply masked gradients
train_step = opt.apply_gradients([(X_grad_masked, X)])
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(10):
_, X_val = sess.run([train_step, X])
print("Step {}: X = {}".format(i, X_val))
print("Final X = {}".format(X.eval()))
Output:
Step 0: X = [ 2.79999995 2. ]
Step 1: X = [ 2.63999987 2. ]
Step 2: X = [ 2.51199985 2. ]
Step 3: X = [ 2.40959978 2. ]
Step 4: X = [ 2.32767987 2. ]
Step 5: X = [ 2.26214385 2. ]
Step 6: X = [ 2.20971513 2. ]
Step 7: X = [ 2.16777205 2. ]
Step 8: X = [ 2.13421774 2. ]
Step 9: X = [ 2.10737419 2. ]
Final X = [ 2.10737419 2. ]
This should be pretty easy to do by using the var_list parameter of the minimize function.
trainable_var = X[0]
train_op = tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(loss, var_list=[trainable_var])
You should note that by convention all trainable variables are added to the tensorflow default collection GraphKeys.TRAINABLE_VARIABLES, so you can get a list of all trainable variables using:
all_trainable_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
This is just a list of variables which you can manipulate as you see fit and use as the var_list parameter.
As a tangent to your question, if you ever want to take customizing the optimization process a step further you can also compute the gradients manually using grads = tf.gradients(loss, var_list) manipulate the gradients as you see fit, then call tf.train.GradientDescentOptimizer(...).apply_gradients(grads_and_vars_as_list_of_tuples). Under the hood minimize is just doing these two steps for you.
Also note that you are perfectly free to create different optimizers for different collections of variables. You could create an SGD optimizer with learning rate 1e-4 for some variables, and another Adam optimizer with learning rate 1e-2 for another set of variables. Not that there's any specific use case for this, I'm just pointing out the flexibility you now have.
The answer by Oren in the second link below calls a function (defined in the first link) that takes a Boolean hot matrix of the parameters to optimize and the tensor of parameters. It uses stop_gradient and works like a charm for a neural network I developed.
Update only part of the word embedding matrix in Tensorflow
https://github.com/tensorflow/tensorflow/issues/9162
I have the following code extraction for calculation of cost function with iteration. Before that, feature scaling, resharp, lstm and training have been done and using the same set of data and variables to perform the calculation of cost function.
# learning parameter
learning_rate = 0.01
# iterative parameters
EPOCHS = 1000 # number of iterations
PRINT_STEP = 100 # the interval of printing validation result
# read data and data preprocessings
read_data_pd = pd.read_csv('./price.csv')
input_pd = read_data_pd.drop(['year','month','day'], axis=1)
temp_pd = feature_scaling(input_pd[feature_to_scale],sacling_meathod) # call the function feature scaling
input_pd[feature_to_scale] = temp_pd
x_ = tf.placeholder(tf.float32, [None, batch_size, n_features])
y_ = tf.placeholder(tf.float32, [None, 1])
# call the lstm-rnn function
lstm_output = lstm(x_, n_features, batch_size, n_lstm_layers, lstm_scope_name)
# linear regressor
# w is the weight vector
W = tf.Variable(tf.random_normal([n_features, n_pred_class]))
# b is the bias
b = tf.Variable(tf.random_normal([n_pred_class]))
# Y = WX + b
y = tf.matmul(lstm_output, W) + b
#define the cost function
cost_func = tf.reduce_mean(tf.square(y - y_))
train_op = tf.train.AdamOptimizer(learning_rate).minimize(cost_func)
# initialize all variables
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for ii in range(EPOCHS):
sess.run(train_op, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
if ii % PRINT_STEP == 0:
cost = sess.run(cost_func, feed_dict={x_:train_input_nparr, y_:train_target_nparr})
print 'iteration =', ii, 'training cost:', cost
When I run the program, the print of calculation results of cost function is work. Yet, each time when the program runs, the results is different. For example, the results of 100 iteration sometimes will print 0.868856, but sometime might be 0.905526, which means the code have some problem.
One thing I noticed is the lines with initialize all variables: tf.initialize_all_variables() as message said that
initialize_all_variables is deprecated and will be removed after 2017-03-02.
Instructions for updating: Use `tf.global_variables_initializer` instead.
I follow the instruction, but it has no effect to amend the calculation error.
Therefore, I would like to know what is wrong with the codes and also how to correct it?