Say I have some custom operation binarizer used in a neural network. The operation takes a Tensor and constructs a new Tensor. I would like to modify that operation such that it is only used in the forward pass. In the backward pass, when gradients are calculated, it should just pass through the gradients reaching it.
More concretly, say binarizer is:
def binarizer(input):
prob = tf.truediv(tf.add(1.0, input), 2.0)
bernoulli = tf.contrib.distributions.Bernoulli(p=prob, dtype=tf.float32)
return 2 * bernoulli.sample() - 1
and I setup my network:
# ...
h1_before_my_op = tf.nn.tanh(tf.matmul(x, W) + bias_h1)
h1 = binarizer(h1_before_b)
# ...
loss = tf.reduce_mean(tf.square(y - y_true))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
How do I tell TensorFlow to skip gradient calculation in the backward pass?
I tried defining a custom operation as described in this answer, however: py_func cannot return Tensors, that's not what it is made for – I get:
UnimplementedError (see above for traceback): Unsupported object type Tensor
You're looking for tf.stop_gradient(input, name=None):
Stops gradient computation.
When executed in a graph, this op outputs its input tensor as-is.
h1 = binarizer(h1_before_b)
h1 = tf.stop_gradient(h1)
Related
I'm trying to combine a few "networks" into one final loss function. I'm wondering if what I'm doing is "legal", as of now I can't seem to make this work. I'm using tensorflow probability :
The main problem is here:
# Get gradients of the loss wrt the weights.
gradients = tape.gradient(loss, [m_phis.trainable_weights, m_mus.trainable_weights, m_sigmas.trainable_weights])
# Update the weights of our linear layer.
optimizer.apply_gradients(zip(gradients, [m_phis.trainable_weights, m_mus.trainable_weights, m_sigmas.trainable_weights])
Which gives me None gradients and throws on apply gradients:
AttributeError: 'list' object has no attribute 'device'
Full code:
univariate_gmm = tfp.distributions.MixtureSameFamily(
mixture_distribution=tfp.distributions.Categorical(probs=phis_true),
components_distribution=tfp.distributions.Normal(loc=mus_true,scale=sigmas_true)
)
x = univariate_gmm.sample(n_samples, seed=random_seed).numpy()
dataset = tf.data.Dataset.from_tensor_slices(x)
dataset = dataset.shuffle(buffer_size=1024).batch(64)
m_phis = keras.layers.Dense(2, activation=tf.nn.softmax)
m_mus = keras.layers.Dense(2)
m_sigmas = keras.layers.Dense(2, activation=tf.nn.softplus)
def neg_log_likelihood(y, phis, mus, sigmas):
a = tfp.distributions.Normal(loc=mus[0],scale=sigmas[0]).prob(y)
b = tfp.distributions.Normal(loc=mus[1],scale=sigmas[1]).prob(y)
c = np.log(phis[0]*a + phis[1]*b)
return tf.reduce_sum(-c, axis=-1)
# Instantiate a logistic loss function that expects integer targets.
loss_fn = neg_log_likelihood
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
# Iterate over the batches of the dataset.
for step, y in enumerate(dataset):
yy = np.expand_dims(y, axis=1)
# Open a GradientTape.
with tf.GradientTape() as tape:
# Forward pass.
phis = m_phis(yy)
mus = m_mus(yy)
sigmas = m_sigmas(yy)
# Loss value for this batch.
loss = loss_fn(yy, phis, mus, sigmas)
# Get gradients of the loss wrt the weights.
gradients = tape.gradient(loss, [m_phis.trainable_weights, m_mus.trainable_weights, m_sigmas.trainable_weights])
# Update the weights of our linear layer.
optimizer.apply_gradients(zip(gradients, [m_phis.trainable_weights, m_mus.trainable_weights, m_sigmas.trainable_weights]))
# Logging.
if step % 100 == 0:
print("Step:", step, "Loss:", float(loss))
There are two separate problems to take into account.
1. Gradients are None:
Typically this happens, if non-tensorflow operations are executed in the code that is watched by the GradientTape. Concretely, this concerns the computation of np.log in your neg_log_likelihood functions. If you replace np.log with tf.math.log, the gradients should compute. It may be a good habit to try not to use numpy in your "internal" tensorflow components, since this avoids errors like this. For most numpy operations, there is a good tensorflow substitute.
2. apply_gradients for multiple trainables:
This mainly has to do with the input that apply_gradients expects. There you have two options:
First option: Call apply_gradients three times, each time with different trainables
optimizer.apply_gradients(zip(m_phis_gradients, m_phis.trainable_weights))
optimizer.apply_gradients(zip(m_mus_gradients, m_mus.trainable_weights))
optimizer.apply_gradients(zip(m_sigmas_gradients, m_sigmas.trainable_weights))
The alternative would be to create a list of tuples, like indicated in the tensorflow documentation (quote: "grads_and_vars: List of (gradient, variable) pairs.").
This would mean calling something like
optimizer.apply_gradients(
[
zip(m_phis_gradients, m_phis.trainable_weights),
zip(m_mus_gradients, m_mus.trainable_weights),
zip(m_sigmas_gradients, m_sigmas.trainable_weights),
]
)
Both options require you to split the gradients. You can either do that by computing the gradients and indexing them separately (gradients[0],...), or you can simply compute the gradiens separately. Note that this may require persistent=True in your GradientTape.
# [...]
# Open a GradientTape.
with tf.GradientTape(persistent=True) as tape:
# Forward pass.
phis = m_phis(yy)
mus = m_mus(yy)
sigmas = m_sigmas(yy)
# Loss value for this batch.
loss = loss_fn(yy, phis, mus, sigmas)
# Get gradients of the loss wrt the weights.
m_phis_gradients = tape.gradient(loss, m_phis.trainable_weights)
m_mus_gradients = tape.gradient(loss, m_mus.trainable_weights)
m_sigmas_gradients = tape.gradient(loss, m_sigmas .trainable_weights)
# Update the weights of our linear layer.
optimizer.apply_gradients(
[
zip(m_phis_gradients, m_phis.trainable_weights),
zip(m_mus_gradients, m_mus.trainable_weights),
zip(m_sigmas_gradients, m_sigmas.trainable_weights),
]
)
# [...]
I am trying to write a custom loss function for triplet loss(using keras), which takes 3 arguments anchor,positive and negative. The triplets are generated using gru layer and the arguments for model.fit is provided through data generators.
The problem I am facing is while training :
TypeError: Cannot convert a symbolic Keras input/output to a numpy array.
This error may indicate that you're trying to pass a symbolic value to a NumPy
call, which is not supported. Or, you may be trying to pass Keras symbolic
inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically
converting the API call to a lambda layer in the Functional Model.
Implementation of loss function
def batch_hard_triplet_loss(self, anchor_embeddings, pos_embeddings, neg_embeddings, margin):
def loss(y_true, y_pred):
'''print(anchor_embeddings)
print(pos_embeddings)
print(neg_embeddings)'''
# distance between the anchor and the positive
pos_dist = K.sum(K.square(anchor_embeddings - pos_embeddings), axis=-1)
max_pos_dist = K.max(pos_dist)
# distance between the anchor and the negative
neg_dist = K.sum(K.square(anchor_embeddings - neg_embeddings), axis=-1)
max_neg_dist = K.min(neg_dist)
# compute loss
basic_loss = max_pos_dist - max_neg_dist + margin
tr_loss = K.maximum(basic_loss, 0.0)
return tr_loss
#return triplet_loss
return loss
Can this be because keras is expecting array as returned loss but I am providing a scalar value?
I create a NN. I'm having a problem with recounting gradients. The problem is that I scalarly multiply 2 tensors u # v and normalize one of them. It is important that gradients cannot be calculated for h. Therefore, I use detach(). In addition, during the recalculation of gradients, normalization should not be taken into account (I do not know how to do this).
import torch
from torch import nn
class Nn(nn.Module):
def __init__(self):
super(Nn, self).__init__()
self.ln = nn.Linear(5, 5)
def forward(self, x):
v = self.ln(x)
u = v.clone()
h = v.clone()
u /= u.norm()
h = h.detach()
h /= h.norm()
res = torch.stack([torch.stack([u # h, u # h])])
return res
def patches_generator():
while True:
decoder = torch.rand((5, ))
target = torch.randint(2, (1,))
yield decoder, target
net = Nn()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())
net.train()
torch.autograd.set_detect_anomaly(True)
for decoder, targets in patches_generator():
optimizer.zero_grad()
outputs = net(decoder)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
As a result, I get the following error:
RuntimeError: one of the variables needed for gradient computation has
been modified by an inplace operation: [torch.FloatTensor [9, 512, 1,
1]], which is output 0 of ReluBackward1, is at version 3; expected
version 2 instead. Hint: the backtrace further above shows the
operation that failed to compute its gradient. The variable in
question was changed in there or anywhere later. Good luck!
The problem is the in-place division operator applied to u in this line:
u /= u.norm()
changing it to
u = u / u.norm()
makes the code run. The reason is that the in-place operator overwrites the intermediate result from this line
u = v.clone()
which makes it impossible for Pytorch to compute the gradient.
(The error message in the question contains a reference to a ReluBackward1 layer which is not in the reduced code example. Pytorch ReLU layers have an optional in_place argument which makes the operation in place while supporting backprop. This often works, because in a sequential network there is no need to distinguish between the output of the ReLU activation and the output of the weights to compute the gradient, but in more complex architectures it might be necessary to retain the output of the weights.)
I am creating a custom loss function to use with Keras in a CNN-architecture for segmentation. The loss should be a binary-cross-entropy-loss with each pixel weighted by the distance to the boundary of the GT foreground.
This distance is easily calculated with the scipy function scipy.ndimage.morphology.distance_transform_edt, but this functions requires a numpy-array as an input. For the loss function I only have "y_true" and "y_pred" which are both tensors.
I have tried converting "y_true" to a numpy array using np_y_true = y_true.eval(), but get the following error:
('conv3d_19_target' is the name for the placeholder of y_true. The shape of this is unknown to the program at this stage, though it is always (1,64,64,64,2).
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'conv3d_19_target' with dtype float and shape [?,?,?,?,?]
I have also tried np_y_true = y_true.numpy(), with the following result:
AttributeError: 'Tensor' object has no attribute 'numpy'
I believe there is two issues:
y_true is just a placeholder, and is therefore unknown when the loss function is first read.
Keras/tensorflow believes that the gradient should pass through all parts that are dependent on y_true. This is however not
necessary here, as this is just a weight parameter to be calculated
at each pass.
A first attempt on how I thought of my loss function:
def DFweighted_entropy():
def weighted_loss(y_true,y_pred):
np_ytrue = y_true.numpy() #OR
#np_y_true = K.eval(y_true)
#Calculate distance-field:
df_inside = distance_transform_edt(np_ytrue[:,:,:,1]) #Background
df_outside = distance_transform_edt(np_ytrue[:,:,:,0]) #Foreground
np_df = np_ytrue[:,:,:,1]*df_inside+np_ytrue[:,:,:,0]*df_outside #Combined
#Loss:
df_loss = (K.max(y_pred,0)-y_pred * y_true + K.log(1+K.exp((-1)*K.abs(y_pred))))*np_df
return df_loss
return weighted_loss
The loss function is used when the model is compiled:
model.compile(optimizer=keras.optimizers.Adam(lr=1e-4,beta_1=0.9, beta_2=0.999, epsilon=1e-08,decay=0.0),loss = DFweighted_entropy(), metrics=['acc',dice_coefficient])
Any ideas for solutions are appreciated!
I am trying to implement the "feed-forward convolutional/deconvolutional residual encoder" as described in this paper: https://arxiv.org/abs/1511.06085
In the network architecture they use a binarization layer, where they first use a standard fully-connected layer with tanh-activation to produce a vector with components in the continuous interval [-1,1]. Then they probabilistically map each component to either -1 or 1. The problem now is that backpropagation is not trivially applied for this second step. After some reasoning they say that they pass the gradients through this stage unchanged.
Now my question is, how can I implement this in tensorflow? Is there a simply way to define custom gradients for an operation? A simple example would be very appreciated.
EDIT:
Would the following code do what I want?
def binarization(x):
g = tf.get_default_graph()
with ops.name_scope("Binarization") as name:
with g.gradient_override_map({"Ceil": "Identity",
"Sub": "CustomGrad",
"Div": "CustomGrad",
"Add": "CustomGrad",
"Mul": "CustomGrad"}):
scaled_x = (x + 1) / 2
binary_x = tf.ceil(scaled_x - tf.random_uniform(tf.shape(x)), name=name)
return (binary_x * 2) - 1
#ops.RegisterGradient("CustomGrad")
def customGrad(op, grad):
return [grad, tf.zeros(tf.shape(op.inputs[1]))]