I have a convolutional neural network with some layers in keras. The last layer in this network is a custom layer that is responsible for sorting some numbers those this layer gets from previous layer, then, the output of custom layer is sent for calculate loss function.
for this purpose (sorting) I use some operator in this layer such as K.argmax and K.gather.
In the back-propagation phase I get error from keras that says:
An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval
that is reasonable cause the involvement of this layer in the derivation process.
Given that my custom layer do not need to corporate in differential chain rule, how can I control differential chain in keras? can I disable this process in custom layer?
Reorder layer that I used in my code is simply following:
def Reorder(args):
z = args[0]
l = args[1]
index = K.tf.argmax(l, axis=1)
return K.tf.gather(z, index)
Reorder_Layer = Lambda(Reorder, name='out_x')
pred_x = Reorder_Layer([z, op])
A few things:
It's impossible to train without a derivative, so, there is no solution if you want to train this model
It's not necessary to "compile" if you are only going to predict, so you don't need custom derivation rules
If the problem is really in that layer, I suppose that l is computed by the model using trainable layers before it.
If you really want to try this, which doesn't seem a good idea, you can try a l = keras.backend.stop_gradient(args[1]). But this means that absolutely nothing will be trained from l until the beginning of the model. If this doesn't work, then you have to make all layers that produce l have trainable=False before compiling the model.
Related
I would need to do a kind of custom backpropagation so that, in an arbitrary layer of the network I can decide if actually modify the weights going outside that layer, or make them unchanged.
For example: I would like to study what happens if, during training, I force to not update some weight connecting input layer to 1st layer.
Is there a simple way to just "correct" the normal backpropagation intersecting between the layers ?
Thanks
I have a binary classification problem. I use the following keras model to do my classification.
input1 = Input(shape=(25,6))
x1 = LSTM(200)(input1)
input2 = Input(shape=(24,6))
x2 = LSTM(200)(input2)
input3 = Input(shape=(21,6))
x3 = LSTM(200)(input3)
input4 = Input(shape=(20,6))
x4 = LSTM(200)(input4)
x = concatenate([x1,x2,x3,x4])
x = Dropout(0.2)(x)
x = Dense(200)(x)
x = Dropout(0.2)(x)
output = Dense(1, activation='sigmoid')(x)
However, the results I get is extremely bad. I thought the reason is that I have too many features, thus, needs have more improved layers after the concatenate.
I was also thinking if it would be helpful to used a flatten() layer after the concatenate.
anyway, since I am new to deep learning, I am not so sure how to make this a better model.
I am happy to provide more details if needed.
Here is what I can suggest
Remove every things that prevent overfitting, such as Dropout and regularizer. What can happen is that your model may not be able to capture the complexity of your data using given layer, so you need to make sure that your model is able to overfit first before adding regularizer.
Now try increase number of Dense layer and number of neuron in each layer until you can see some improvement. There is also a possibility that your data is too noisy or you have only few data to train the model so you can't even produce a useful predictions.
Now if you are LUCKY and you can see overfitting, you can add Dropout and regularizer.
Because every neural network is a gradient base algorithm, you may end up at local minimum. You may also need to run the algorithm multiple times with different initial weight before you can get a good result or You can change your loss function so that you have a convex problem where local minimum is global minimum.
If you can't achieve better result
You may need to try different topology because LSTM is just trying to model a system that assume to have Markov property. you can look at nested-LSTM or something like that, which model the system in the way that next time step is not just depend on current time step.
The Dropout right before the output layer could be problematic. I would suggest removing both Dropout layers and evaluating performance, then re-introduce regularization once the model is performing well on the the training set.
i have a keras sequential model with some custom layers in it. Now in one of the layers, based on the input of that specific layer, i want to calculate a penalty and i want the penalty to be added to the loss function which the optimizer tries to minimize overall.
I have gone through the concept of tf.keras.layers.ActivityRegularization but struggling to figure out how to solve my issue.
If you want to "add", you just need to calculate the layer output/loss, and use model.add_loss(loss_tensor)
....
loss_tensor = MyCustomLayer(...)(layer_inputs)
....
model = Model(model_inputs, model_outputs)
model.add_loss(loss_tensor)
model.compile(loss=any_normal_loss)
I'm using this implementation of SegNet in Pytorch, and I want to finetune it.
I've read online and I've found this method (basically freezing all layers except the last one in your net). My problem is that SegNet has more than 100 layers and I'm looking for a simpler way to do it, rather than writing 100 lines of code.
Do you think this could work? Or is this utter nonsense?
import torch.optim as optim
model = SegNet()
for name, param in model.named_modules():
if name != 'conv11d': # the last layer should remain active
param.requires_grad = False
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
def train():
...
How can I check if this is working as intended?
This process is called finetuning and setting requires_grad to False is a good way to do this. From the pytorch docs:
Every Tensor has a flag: requires_grad that allows for fine grained exclusion of subgraphs from gradient computation and can increase efficiency.
...
If there’s a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Tensors didn’t require gradients.
See this pytorch tutorial for a relevant example.
One simple way of checking to see this is working is looking at the initial error rates. Assuming the task is similar to the task the net was originally trained on, they should be much lower than for a randomly initialized net.
I am new to Keras. How can I print the outputs of a layer, both intermediate or final, during the training phase?
I am trying to debug my neural network and wanted to know how the layers behave during training. To do so I am trying to exact input and output of a layer during training, for every step.
The FAQ (https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer) has a method to extract output of intermediate layer for building another model but that is not what I want. I don't need to use the intermediate layer output as input to other layer, I just need to print their values out and perhaps graph/chart/visualize it.
I am using Keras 2.1.4
I think I have found an answer myself, although not strictly accomplished by Keras.
Basically, to access layer output during training, one needs to modify the computation graph by adding a print node.
A more detailed description can be found in this StackOverflow question:
How can I print the intermediate variables in the loss function in TensorFlow and Keras?
I will quote an example here, say you would like to have your loss get printed per step, you need to set your custom loss function as:
for Theano backend:
diff = y_pred - y_true
diff = theano.printing.Print('shape of diff', attrs=['shape'])(diff)
return K.square(diff)
for Tensorflow backend:
diff = y_pred - y_true
diff = tf.Print(diff, [tf.shape(diff)])
return K.square(diff)
Outputs of other layers can be accessed similarly.
There is also a nice vice tutorial about using tf.Print() from Google
Using tf.Print() in TensorFlow
If you want to know more info on each neuron, you need to use the following to get their bias and weights.
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
0 index defines weights and 1 defines the bias.
You can also get per layer too,
for layer in model.layers:
weights = layer.get_weights() # list of numpy arrays
After each training, if you can access each layer with its dimension and obtain the weights and bias to a numpy array, you should be able to visualize how the neuron after each training.
Hope it helps.