Gradient Computation broken by Sigmoid function in Pytorch

Gradient Computation broken by Sigmoid function in Pytorch - python

Hey I have been struggling with this weird problem. Here is my code for the Neural Net:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv_3d_=nn.Sequential(
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU(),
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU(),
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU()
)
self.linear_layers_ = nn.Sequential(
nn.Linear(batch_size*32*32*32,batch_size*32*32*3),
nn.LeakyReLU(),
nn.Linear(batch_size*32*32*3,batch_size*32*32*3),
nn.Sigmoid()
)
def forward(self,x,y,z):
conv_layer = x + y + z
conv_layer = self.conv_3d_(conv_layer)
conv_layer = torch.flatten(conv_layer)
conv_layer = self.linear_layers_(conv_layer)
conv_layer = conv_layer.view((batch_size,3,input_sizes,input_sizes))
return conv_layer
The weird problem I am facing is that running this NN gives me an error
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3072]], which is output 0 of SigmoidBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The stack trace shows that the issue is in line
conv_layer = self.linear_layers_(conv_layer)
However, if I replace the last activation function of my FCN from nn.Sigmoid() to nn.LeakyRelu(), the NN executes properly.
Can anyone tell me why Sigmoid activation function is causing my backward computation to break?

I found the problem with my code. I delved deeper into what in-place actually meant. So, if you check the line
conv_layer = self.linear_layers_(conv_layer)
linear_layers_ of the assignment is changing the values of conv_layer in-place and as a result the values are getting overwritten and because of this, gradient computation fails. Easy solution for this problem is to use the clone() function
i.e.
conv_layer = self.linear_layers_(conv_layer).clone()
This creates a copy of the right hand computation and Autograd is able to store the reference of the computation graph.

Related

Calculating gradient from network output in PyTorch gives error

I am trying to use a manually calculate a gradient using the output of my network, I will then use this in a loss function. I have managed to get an example working in keras, but converting it to PyTorch has proven more difficult
I have a model like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1, 50)
self.fc2 = nn.Linear(50, 10)
self.fc3 = nn.Linear(10, 1)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = self.fc3(x)
return x
and some data:
x = torch.unsqueeze(torch.linspace(-1, 1, 101), dim=1)
x = Variable(x)
I can then try find a gradient like:
output = net(x)
grad = torch.autograd.grad(outputs=output, inputs=x, retain_graph=True)[0]
I want to be able to find the gradient of each point, then do something like:
err_sqr = (grad - x)**2
loss = torch.mean(err_sqr)**2
However, at the moment if I try to do this I get the error:
grad can be implicitly created only for scalar outputs
I have tried changing the shape of my network output to fix this, but if I change it to much it says its not part of the graph. I can get rid of that error by allowing that, but then it says my gradient is None. I've managed to get this working in keras, so I'm confident that its possible here too, I just need a hand!
My questions are:
Is there a way to "fix" what I have to allow me to calculate the gradient

PyTorch expects an upstream gradient in the grad call. For usual (scalar) loss functions, the upstream gradient is implicitly assumed to be 1.
You can do a similar thing by passing ones as the upstream gradient:
grad = torch.autograd.grad(outputs=output, inputs=x, grad_outputs=torch.ones_like(output), retain_graph=True)[0]

Starting training takes a very long time in Tensorflow 2

Tensorflow 2 takes about 15 minutes to make its static graph (or whatever it's doing before the first pass). The training time after this is normal, but obviously it's hard to experiment with 15 mins of waiting for any feedback.
The generator encoder and discriminator are RNNs (not unrolled) with GRU cells in a Keras model.
The generator decoder is defined and called like this:
class GeneratorDecoder(tf.keras.layers.Layer):
def __init__(self, feature_dim):
super(GeneratorDecoder, self).__init__()
self.cell = tf.keras.layers.GRUCell(
GRUI_DIM, activation='tanh', recurrent_activation='sigmoid',
dropout=DROPOUT, recurrent_dropout=DROPOUT)
self.batch_normalization = tf.keras.layers.BatchNormalization()
self.dense = tf.keras.layers.Dense(
feature_dim, activation='tanh')
#tf.function
def __call__(self, z, timesteps, training):
# z has shape (batch_size, features)
outputs = []
output, state = z, z
for i in range(timesteps):
output, state = self.cell(inputs=output, states=state,
training=training)
dense_output = self.dense(
self.batch_normalization(output))
outputs.append(dense_output)
return outputs
Here is my training loop (the mask_gt and missing_data variables are cast using tf.cast and should so already be tensors):
for it in tqdm(range(NO_ITERATIONS)):
print(it)
train_step()
#tf.function
def train_step():
with tf.GradientTape(persistent=True) as tape:
generator_output = generator(missing_data, training=True)
imputed_data = get_imputed_data(missing_data, generator_output)
mask_pred = discriminator(imputed_data)
D_loss = discriminator.loss(mask_pred, mask_gt)
G_loss = generator.loss(missing_data, mask_gt,
generator_output, mask_pred)
gen_enc_grad = tape.gradient(
G_loss, generator.encoder.trainable_variables)
gen_dec_grad = tape.gradient(
G_loss, generator.decoder.trainable_variables)
disc_grad = tape.gradient(
D_loss, discriminator.model.trainable_variables)
del tape
generator.optimizer.apply_gradients(
zip(gen_enc_grad, generator.encoder.trainable_variables))
generator.optimizer.apply_gradients(
zip(gen_dec_grad, generator.decoder.trainable_variables))
discriminator.optimizer.apply_gradients(
zip(disc_grad, discriminator.model.trainable_variables))
Note that "0" is printed within a few seconds, so the slow part is definitely not earlier.
And this is the get_imputed_data function that is called:
def get_imputed_data(incomplete_series, generator_output):
return tf.where(tf.math.is_nan(incomplete_series), generator_output, incomplete_series)
Thanks for any answers! Hope I provided just enough code to give a sense of where the problem lies. This is my first time posting here after reading for at least five years :)
I use Python 3.6 and Tensorflow 2.1.

The problem was solved by removing the tf.function decorator for the calling functions of the generator and discriminator. I was using a single global python scalar (the iteration no.) in two of the tf.function decorated functions. This caused a new graph to be created every time (see the caution in the tf.function docs).
The solution is to drop the python variables used or convert them to tensorflow variables.

'None' gradient is being returned for variables

I am currently using TensorFlow version 1.14.
In the code below, I am trying to create a dummy model that takes in 2 inputs and provides two outputs, with all weights set to ones and biases to zeros (Single layered perceptron). I am defining a custom loss function that computes the jacobian of the input layer wrt the output layer.
# Prior function
def f_i(x):
x1 = np.arctanh(x)
return np.exp(-x1**2)
B = np.random.choice(x, (10000,2), p = f_i(x)/np.sum(f_i(x)))
def my_loss(y_pred, y_true):
jacobian_tf = jacobian_tensorflow3(sim.output, sim.input)
loss = tf.abs(tf.linalg.det(jacobian_tf))
return K.mean(loss)
def jacobian_tensorflow3(x,y, verbose=False):
jacobian_matrix = []
it = tqdm(range(ndim)) if verbose else range(ndim)
for o in it:
grad_func = tf.gradients(x[:,o], y)
jacobian_matrix.append(grad_func[0])
jacobian_matrix = tf.stack(jacobian_matrix)
jacobian_matrix1 = tf.transpose(jacobian_matrix, perm=[1,0,2])
return jacobian_matrix1
sim = Sequential()
sim.add(Dense(2, kernel_initializer='ones', bias_initializer='zeros', activation='linear', input_dim=2))
sim.compile(optimizer='adam', loss=my_loss)
sim.fit(B, np.random.random(B.shape), batch_size=100, epochs=2)
While this model works in giving the result of the Jacobian matrix and also has no issues with compilation, but when I run sim.fit I get the following error:
ValueError: Variable <tf.Variable 'dense_14/bias:0' shape=(2,) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I am stuck at this step for a long time, and I am not able to proceed ahead. Any help/suggestions would be beneficial.

Differentiating user-defined Variables when using Keras layers

I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).

I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.

This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.

"ValueError: Trying to share variable $var, but specified dtype float32 and found dtype float64_ref" when trying to use get_variable

I am trying to build a custom variational autoencoder network, where in I'm initializing the decoder weights using the transpose of the weights from the encoder layer, I couldn't find something native to tf.contrib.layers.fully_connected so I used tf.assign instead, here's my code for the layers:
def inference_network(inputs, hidden_units, n_outputs):
"""Layer definition for the encoder layer."""
net = inputs
with tf.variable_scope('inference_network', reuse=tf.AUTO_REUSE):
for layer_idx, hidden_dim in enumerate(hidden_units):
net = layers.fully_connected(
net,
num_outputs=hidden_dim,
weights_regularizer=layers.l2_regularizer(training_params.weight_decay),
scope='inf_layer_{}'.format(layer_idx))
add_layer_summary(net)
z_mean = layers.fully_connected(net, num_outputs=n_outputs, activation_fn=None)
z_log_sigma = layers.fully_connected(
net, num_outputs=n_outputs, activation_fn=None)
return z_mean, z_log_sigma
def generation_network(inputs, decoder_units, n_x):
"""Define the decoder network."""
net = inputs # inputs here is the latent representation.
with tf.variable_scope("generation_network", reuse=tf.AUTO_REUSE):
assert(len(decoder_units) >= 2)
# First layer does not have a regularizer
net = layers.fully_connected(
net,
decoder_units[0],
scope="gen_layer_0",
)
for idx, decoder_unit in enumerate([decoder_units[1], n_x], 1):
net = layers.fully_connected(
net,
decoder_unit,
scope="gen_layer_{}".format(idx),
weights_regularizer=layers.l2_regularizer(training_params.weight_decay)
)
# Assign the transpose of weights to the respective layers
tf.assign(tf.get_variable("generation_network/gen_layer_1/weights"),
tf.transpose(tf.get_variable("inference_network/inf_layer_1/weights")))
tf.assign(tf.get_variable("generation_network/gen_layer_1/bias"),
tf.get_variable("generation_network/inf_layer_0/bias"))
tf.assign(tf.get_variable("generation_network/gen_layer_2/weights"),
tf.transpose(tf.get_variable("inference_network/inf_layer_0/weights")))
return net # x_recon
It is wrapped using this tf.slim arg_scope:
def _autoencoder_arg_scope(activation_fn):
"""Create an argument scope for the network based on its parameters."""
with slim.arg_scope([layers.fully_connected],
weights_initializer=layers.xavier_initializer(),
biases_initializer=tf.initializers.constant(0.0),
activation_fn=activation_fn) as arg_sc:
return arg_sc
However I'm getting the error: ValueError: Trying to share variable VarAutoEnc/generation_network/gen_layer_1/weights, but specified dtype float32 and found dtype float64_ref.
I have narrowed this down to the get_variablecall, but I don't know why it's failing.
If there is a way where you can initialize a tf.contrib.layers.fully_connected from another fully connected layer without a tf.assign operation, that solution is fine with me.

I can't reproduce your error. Here is a minimalistic runnable example that does the same as your code:
import tensorflow as tf
with tf.contrib.slim.arg_scope([tf.contrib.layers.fully_connected],
weights_initializer=tf.contrib.layers.xavier_initializer(),
biases_initializer=tf.initializers.constant(0.0)):
i = tf.placeholder(tf.float32, [1, 30])
with tf.variable_scope("inference_network", reuse=tf.AUTO_REUSE):
tf.contrib.layers.fully_connected(i, 30, scope="gen_layer_0")
with tf.variable_scope("generation_network", reuse=tf.AUTO_REUSE):
tf.contrib.layers.fully_connected(i, 30, scope="gen_layer_0",
weights_regularizer=tf.contrib.layers.l2_regularizer(0.01))
with tf.variable_scope("", reuse=tf.AUTO_REUSE):
tf.assign(tf.get_variable("generation_network/gen_layer_0/weights"),
tf.transpose(tf.get_variable("inference_network/gen_layer_0/weights")))
The code runs without a ValueError. If you get a ValueError running this, then it is probably a bug that has been fixed in a later tensorflow version (I tested on 1.9). Otherwise the error is part of your code that you don't show in the question.
By the way, assign will return an op that will perform the assignment once the returned op is run in a session. So you will want to return the output of all assign calls in the generation_network function. You can bundle all assign ops into one using tf.group.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gradient Computation broken by Sigmoid function in Pytorch - python

Related

Calculating gradient from network output in PyTorch gives error

Starting training takes a very long time in Tensorflow 2

'None' gradient is being returned for variables

Differentiating user-defined Variables when using Keras layers

"ValueError: Trying to share variable $var, but specified dtype float32 and found dtype float64_ref" when trying to use get_variable

Categories

Resources