Is it possible to use a Keras layer (pre-trained or fixed layer with no trainable parameters) inside a custom loss function?
I would like to do something like:
def custom_loss(y_true, y_pred):
y_true_trans = SomeKerasLayer()(y_true)
y_true_trans = SomeKerasLayer()(y_pred)
return K.mean(K.abs(y_pred_trans - y_true_trans), axis=-1)
In the Tensorflow backend, I get the error:
File "/home/drb/venvs/keras/lib/python3.5/site-packages/tensorflow/python /framework/tensor_util.py", line 364, in make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Of course I could transform y_pred with the Keras layer outside the loss function (by providing an extra output), but I can't do the same with the reference value y_true.
Another way to rephrase the same question in more general terms would be: Is it possible to encapsulate a Keras layer as a Keras backend function?
Is there any solution or workaround?
The question is kind of vague, so it has both a yes and no response.
Depending on your implementation you may try
model = keras.layers.Add(..something..)(x)
where x = the name of the previous relevant value.
Related
Is there any best practice or efficient way to have a random classifier in pytorch? My random classifier basically looks like this:
def forward(self, inputs):
# get a random tensor
logits = torch.rand(batch_size, num_targets, num_classes)
return logits
This should be fine in principle, but the optimizer raises a ValueError because the classifier - in contrast to all other classifiers / models in the system - does not have any parameters that can be optimized, obviously. Is there a torch built-in solution to this or must I change the system's architecture (to not perform optimization)?
Edit: If adding some arbitrary parameters to the model as shown below, the loss will raise an RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
def __init__(self, transformer_models: Dict, opt: Namespace):
super(RandomMulti, self).__init__()
self.num_classes = opt.polarities_dim
# add some parameters so that the optimizer doesn't raise an exception
self.some_params = nn.Linear(2, 2)
My assumption really would be that there is a simpler solution, since having a random baseline classifier is a rather common thing in machine learning.
Indeed having a "random" baseline is common practice, but usually you do not need to explicitly generate one, let alone "train" it. In most cases you can have quite accurate expectation values for the "random" baseline. For instance, in ImageNet classification you have 1000 categories of equal size, than predicting a category at random should give you an expected accuracy of 1/1000. You do not need to instantiate a random classifier to produce that number.
If you insist on explicitly instantiate a random classifier - what is the meaning of "training" it? There are the errors you get, pytorch simply cannot understand what you are doing. You can have a random classifier and you can evaluate its performance, but there is no meaning to training it.
I am trying to convert this TensorFlow code into PyTorch. For example, I converted the below TF code
tf.get_variable("char_embeddings", [len(data.char_dict), data.char_embedding_size]), char_index) # [num_sentences, max_sentence_length, max_word_length, emb]
into
class CharEmbeddings(nn.Module):
def __init__(self, config, data):
....
self.embeddings = nn.init.xavier_uniform_(torch.empty(len(data.char_dict), data.char_embedding_size))
def forward(self, char_index):
# [num_sentences, max_sentence_length, max_word_length, emb]
char_emb = self.embeddings[char_index]
I don't understand 100% what TF is doing there. Is it supposed to first initialize char_embeddings, gather (which I understand) and then backprogate gradients to update the char_embeddings value so that in the next iteration, the char_embeddings will be updated?
If so, I tried to convert that into PyTorch and from what I read, if no initializer is passed to the get_variable here, the glorot_uniform_initializer will be used which is I think it equivalent to pytorch xavier_uniform_
Two questions here:
Is my interpretation of TF code correct?
Is that conversion valid?
Should I expect the original embeddings self.embeddings to backpropagate and update its values? Is that the expected behavior from the tensorflow version as well? and how to achieve that in Pytorch? I added requires_grad to the embeddings tensor but that doesn't update the values.
Those might be newbie's question but I am new to this. Thanks!
I'm reading Object Detection API source code and I wonder how to use TFSlim to train model?
More specifically, when we use Tensorflow to train the model, we use something like this:
parameters = model(X_train, Y_train, X_test, Y_test)
# Returns: parameters -- parameters learnt by the model.
# They can then be used to predict.
And to predict the result, we use something like:
y_image_prediction = predict(my_image, parameters)
But in file trainer.py, we don't have something like above, we only get:
slim.learning.train(
train_tensor,
logdir=train_dir,
master=master,
is_chief=is_chief,
session_config=session_config,
startup_delay_steps=train_config.startup_delay_steps,
init_fn=init_fn,
summary_op=summary_op,
number_of_steps=(
train_config.num_steps if train_config.num_steps else None),
save_summaries_secs=120,
sync_optimizer=sync_optimizer,
saver=saver)
And there are no return about this slim.learning.train function. So I wonder what is the using of slim.learning.train function, and how do we get the parameters -- that can be used to predict the result?
HERE is source code of trainer.py.
The train function does not return a value because it modifies the actual parameters of the model. The function does that by running the train_tensor which is: "A Tensor that, when executed, will apply the gradients and return the loss value." as written in the function documentation.
The tensor the documentation talks about what you get when you tell an optimizer to optimize some cost function. It is opt_op in the following example:
opt = GradientDescentOptimizer(learning_rate=0.1)
opt_op = opt.minimize(cost)
Find more in the optimizer documentation.
I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.
I need to implement a custom objective function for Keras where i need an additional tensorflow placeholder for computation. In tensorflow, i have it as following,
pre_cost1 = tf.multiply((self.input_R - self.Decoder) , self.input_mask_R)
cost1 = tf.square(self.l2_norm(pre_cost1))
where input_mask_R is the tensorflow placeholder. input_R and Decoder are the placeholders corresponding to y_true and y_pred for Keras loss function respectively. I have the Keras loss function implemented as,
def custom_objective(y_true, y_pred):
pre_cost1 = tf.multiply((y_true - y_pred))
cost1 = tf.square(l2_norm(pre_cost1))
return cost1
I need to add the additional information for input mask in the loss function for keras. (It needs to be tensorflow placeholder since its a mask for the input which is different for each row of the input data).
Use the keras backend:
import keras.backend as K
Most functions for tensors are there, such as:
input_mask_R = K.placeholder(shape=(yourshape))
But maybe, since you want a predefined mask, what you need is:
input_mask_R = K.constant(arrayWithValues, shape=(yourshape))
And you can actually multiply and square also with K.multiply and K.square. That way, if you ever think of changing the backend, everything will be ok. (Also I'm not sure if Keras will handle direct calls to tensorflow functions.....)
See documentation: https://keras.io/backend/