Tensorflow get_variable into Pytorch

Tensorflow get_variable into Pytorch - python

I am trying to convert this TensorFlow code into PyTorch. For example, I converted the below TF code
tf.get_variable("char_embeddings", [len(data.char_dict), data.char_embedding_size]), char_index) # [num_sentences, max_sentence_length, max_word_length, emb]
into
class CharEmbeddings(nn.Module):
def __init__(self, config, data):
....
self.embeddings = nn.init.xavier_uniform_(torch.empty(len(data.char_dict), data.char_embedding_size))
def forward(self, char_index):
# [num_sentences, max_sentence_length, max_word_length, emb]
char_emb = self.embeddings[char_index]
I don't understand 100% what TF is doing there. Is it supposed to first initialize char_embeddings, gather (which I understand) and then backprogate gradients to update the char_embeddings value so that in the next iteration, the char_embeddings will be updated?
If so, I tried to convert that into PyTorch and from what I read, if no initializer is passed to the get_variable here, the glorot_uniform_initializer will be used which is I think it equivalent to pytorch xavier_uniform_
Two questions here:
Is my interpretation of TF code correct?
Is that conversion valid?
Should I expect the original embeddings self.embeddings to backpropagate and update its values? Is that the expected behavior from the tensorflow version as well? and how to achieve that in Pytorch? I added requires_grad to the embeddings tensor but that doesn't update the values.
Those might be newbie's question but I am new to this. Thanks!

Related

Why use Variable() in inference?

I am learning PyTorch for an image classification task, and I ran into code where someone used a PyTorch Variable() in their function for prediction:
def predict_image(image):
image_tensor = test_transforms(image).float()
image_tensor = image_tensor.unsqueeze_(0)
input = Variable(image_tensor)
input = input.to(device)
output = model(input)
index = output.data.cpu().numpy().argmax()
return index
Why do they use Variable() here? (even though it works fine without it.)

You can safely omit it. Variables are a legacy component of PyTorch, now deprecated, that used to be required for autograd:
Variable (deprecated)
WARNING
The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with requires_grad set to True. Below please find a quick guide on what has changed:
Variable(tensor) and Variable(tensor, requires_grad) still work as expected, but they return Tensors instead of Variables.

How use properly tensorflow functions within the model

I'm using the functional API of TensorFlow 2 and tensorflow.keras.layers to build the model.
I have an input tensor (in_1) with shape [batch_size, length, dim] and I would like to compute the mean along the length dimension and obtain an output tensor (out_1) with shape [batch_size, dim].
Which of this should I use to do it? (all these options works, in terms of output shape and training)
out_1 = Lambda(lambda x: tf.math.reduce_mean(x, axis=1))(in_1)
out_1 = Lambda(lambda x: tf.keras.backend.mean(x, axis=1))(in_1)
out_1 = tf.math.reduce_mean(in_1, axis=1)
This last one automatically creates a TensorFlowOpLayer, is this something that should be avoided?
Are there other ways to do this?
What's the difference between tf.math.reduce_mean and tf.keras.backend.mean, which should I use?
I know that custom functions should be called inside the Lambda layer, but is it true also for TensorFlow functions such as tf.math.reduce_mean which can process the tensor in "one fell swoop"? How should I call them if I need to specify a parameter (e.g. axis)?

First, for the difference between tf.keras.backend.mean and tf.math.reduce_mean: There is none. You can check the source code for the keras backend version, which simply uses reduce_mean (from math_ops, but internally that's the same one that's exposed in tf.math). IMHO this is a bit of a failure in the TF re-design where they incorporated Keras: Keras is now contained in TF, but Keras also uses TF in the "backend", so you basically have every operation twice: Once the TF version, and once the Keras version which, after all, also just uses the TF version.
Anyway, for the difference between using Lambda or not: It also doesn't (really) matter. Here is a minimal example:
inp = tf.keras.Input((10,))
layer = tf.reduce_mean(inp, axis=-1)
model = tf.keras.Model(inp, layer)
print(model.layers)
gives the output
[<tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f1a651500b8>,
<tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer at 0x7f1a9912d8d0>]
We can see that the reduce_mean operation was automatically converted to a TensorFlowOpLayer. Now, this may be technically different from a Lambda layer, but I doubt that this makes any practical difference. I suppose this would not work for a Sequential model, where you need to supply a list of layers, so there Lambda would likely be needed.

How to update parameter at each epoch within an intermediate Layer between training runs ? (tensorflow eager execution)

I have a sequential keras model and there i have a custom Layer similar to the following example named 'CounterLayer'. I am using tensorflow 2.0 (eager execution)
class CounterLayer(tf.keras.layers.Layer):
def __init__(self, stateful=False,**kwargs):
self.stateful = stateful
super(CounterLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.count = tf.keras.backend.variable(0, name="count")
super(CounterLayer, self).build(input_shape)
def call(self, input):
updates = []
updates.append((self.count, self.count+1))
self.add_update(updates)
tf.print('-------------')
tf.print(self.count)
return input
when i run this for example epoch=5 or something, the value of self.count does not get updated with each run. It always remains the same. I got this example from https://stackoverflow.com/a/41710515/10645817 here. I need something almost similar to this but i was wondering does this work in eager execution of tensorflow or what would i have to do to get the expected output.
I have been trying to implement this for quite a while but could not figure it out. Can somebody help me please. Thank you...

yes, my issue got resolved. I have come across some of the built-in methods to update this sort of variables (which is to maintain the persistent state in between epochs like my case mentioned above).
Basically what i needed to do is for example:
def build(self, input_shape):
self.count = tf.Variable(0, dtype=tf.float32, trainable=False)
super(CounterLayer, self).build(input_shape)
def call(self, input):
............
self.count.assign_add(1)
............
return input
One can use to calculate the updated value in the call function and can also assign it by calling self.count.assign(some_updated_value). The details to this sort of operations are available in https://www.tensorflow.org/api_docs/python/tf/Variable. Thanks.

Using tf.contrib.opt.ScipyOptimizerInterface with tf.keras.layers, loss not changing

I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks

After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link

I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.

Tensorflow placeholder in Keras custom objective function

I need to implement a custom objective function for Keras where i need an additional tensorflow placeholder for computation. In tensorflow, i have it as following,
pre_cost1 = tf.multiply((self.input_R - self.Decoder) , self.input_mask_R)
cost1 = tf.square(self.l2_norm(pre_cost1))
where input_mask_R is the tensorflow placeholder. input_R and Decoder are the placeholders corresponding to y_true and y_pred for Keras loss function respectively. I have the Keras loss function implemented as,
def custom_objective(y_true, y_pred):
pre_cost1 = tf.multiply((y_true - y_pred))
cost1 = tf.square(l2_norm(pre_cost1))
return cost1
I need to add the additional information for input mask in the loss function for keras. (It needs to be tensorflow placeholder since its a mask for the input which is different for each row of the input data).

Use the keras backend:
import keras.backend as K
Most functions for tensors are there, such as:
input_mask_R = K.placeholder(shape=(yourshape))
But maybe, since you want a predefined mask, what you need is:
input_mask_R = K.constant(arrayWithValues, shape=(yourshape))
And you can actually multiply and square also with K.multiply and K.square. That way, if you ever think of changing the backend, everything will be ok. (Also I'm not sure if Keras will handle direct calls to tensorflow functions.....)
See documentation: https://keras.io/backend/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.