I have trained a network model using keras which includes multiple dropout layers. I have also implemented a stochastic predictor function (using the keras backend) which allows me to get predictions with dropout turned "on."
import keras.backend as K
F = K.function([model.layers[0].input, K.learning_phase()], [m.layers[-1].output])
I call the function using
output = F([x_test[0:1], 1])
where x_test is a sample input.
Currently, the dropout rate used in this predictor function is the same as the dropout rate used for training. I would like to set a different dropout rate without retraining the network.
I wrote a script to change all dropout layer rates:
for layer in [l for l in m.layers if "dropout" in np.str.lower(l.name)]:
layer.rate = 0.5
However, when I call my custom function, it does not change the output. For example, if my trained network uses a rate of 0 (or K.epsilon()), then repeated function calls will yield the same result. Changing the dropout to 0.5 should yield unique results on each function call. Yet, this is not the case. Changing the dropout has no effect.
What does work is extracting a single layer, changing the rate, and calling that single layer:
L = my_net.model.layers[0]
L.rate = 0.5
L_out1 = K.eval(L.call(x_test[0], training=True))
L.rate = K.epsilon()
L_out2 = K.eval(L.call(x_test[0], training=True))
Here, L_out1 and L_out2 are unique. I don't know how to implement this functionality across the whole network.
What is it about the backend function that prevents my model changes from being effective?
Related
Is there a way to pass a feature to a keras model as an input only to be accessed by a custom loss function without affecting the model as an input feature? I only need the feature to calculate the loss, not to feed-forward through the hidden layers in the network. (Basically what I want is to feed the feature in as an input and extract it as it is as an output along with y_pred to be accessed in the loss function).
A worked example would be much appreciated.
If you are writing your custom loss, you could use pass the feature as an input, and then using a Lambda layer, you can make it bypass the network and directly concatenate at the end. Something like the following -
from tensorflow.keras import layers, Model, utils
inp = layers.Input((11,))
x = layers.Lambda(lambda x: x[:,:-1])(inp)
o2 = layers.Lambda(lambda x: x[:,-1:])(inp)
x = layers.Dense(20)(x)
x = layers.Dense(20)(x)
o1 = layers.Dense(1)(x)
out = layers.concatenate([o1, o2])
model = Model(inp, out)
def custom_loss(outputs, actuals):
...
utils.plot_model(model, show_shapes=True, show_layer_names=False)
Here the first 10 features are the ones you want to pass via the network, and the last feature is the one you just want as is, for the custom loss. The final output is going to just be a concatenation of your expected output for the first 10 features via the network + the untouched feature.
If you want to know how to write a custom loss, please check this excellent SO post that explains it.
I have CNN that I have built using on Tensor-flow 2.0. I need to access outputs of the intermediate layers. I was going over other stackoverflow questions that were similar but all had solutions involving Keras sequential model.
I have tried using model.layers[index].output but I get
Layer conv2d has no inbound nodes.
I can post my code here (which is super long) but I am sure even without that someone can point to me how it can be done using just Tensorflow 2.0 in eager mode.
I stumbled onto this question while looking for an answer and it took me some time to figure out as I use the model subclassing API in TF 2.0 by default (as in here https://www.tensorflow.org/tutorials/quickstart/advanced).
If somebody is in a similar situation, all you need to do is assign the intermediate output you want, as an attribute of the class. Then keep the test_step without the #tf.function decorator and create its decorated copy, say val_step, for efficient internal computation of validation performance during training. As a short example, I have modified a few functions of the tutorial from the link accordingly. I'm assuming we need to access the output after flattening.
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
self.intermediate=x #assign it as an object attribute for accessing later
x = self.d1(x)
return self.d2(x)
#Remove #tf.function decorator from test_step for prediction
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
return
#Create a decorated val_step for object's internal use during training
#tf.function
def val_step(images, labels):
return test_step(images, labels)
Now when you run model.predict() after training, using the un-decorated test step, you can access the intermediate output using model.intermediate which would be an EagerTensor whose value is obtained simply by model.intermediate.numpy(). However, if you don't remove the #tf_function decorator from test_step, this would return a Tensor whose value is not so straightforward to obtain.
Thanks for answering my earlier question. I wrote this simple example to illustrate how what you're trying to do might be done in TensorFlow 2.x, using the MNIST dataset as the example problem.
The gist of the approach:
Build an auxiliary model (aux_model in the example below), which is so-called "functional model" with multiple outputs. The first output is the output of the original model and will be used for loss calculation and backprop, while the remaining output(s) are the intermediate-layer outputs that you want to access.
Use tf.GradientTape() to write a custom training loop and expose the detailed gradient values on each individual variable of the model. Then you can pick out the gradients that are of interest to you. This requires that you know the ordering of the model's variables. But that should be relatively easy for a sequential model.
import tensorflow as tf
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# This is the original model.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=[28, 28, 1]),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")])
# Make an auxiliary model that exposes the output from the intermediate layer
# of interest, which is the first Dense layer in this case.
aux_model = tf.keras.Model(inputs=model.inputs,
outputs=model.outputs + [model.layers[1].output])
# Define a custom training loop using `tf.GradientTape()`, to make it easier
# to access gradients on specific variables (the kernel and bias of the first
# Dense layer in this case).
cce = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.optimizers.Adam()
with tf.GradientTape() as tape:
# Do a forward pass on the model, retrieving the intermediate layer's output.
y_pred, intermediate_output = aux_model(x_train)
print(intermediate_output) # Now you can access the intermediate layer's output.
# Compute loss, to enable backprop.
loss = cce(tf.one_hot(y_train, 10), y_pred)
# Do backprop. `gradients` here are for all variables of the model.
# But we know we want the gradients on the kernel and bias of the first
# Dense layer, which happens to be the first two variables of the model.
gradients = tape.gradient(loss, aux_model.variables)
# This is the gradient on the first Dense layer's kernel.
intermediate_layer_kerenl_gradients = gradients[0]
print(intermediate_layer_kerenl_gradients)
# This is the gradient on the first Dense layer's bias.
intermediate_layer_bias_gradients = gradients[1]
print(intermediate_layer_bias_gradients)
# Update the variables of the model.
optimizer.apply_gradients(zip(gradients, aux_model.variables))
The most straightforward solution would go like this:
mid_layer = model.get_layer("layer_name")
you can now treat the "mid_layer" as a model, and for instance:
mid_layer.predict(X)
Oh, also, to get the name of a hidden layer, you can use this:
model.summary()
this will give you some insights about the layer input/output as well.
tf.keras.application contains many famous neural network link VGG, densenet, mobilenet and so on. Take tf.keras.application.MobileNet as an example, what I am interested in is not only the final output, but also the output of the intermediate layer, how could I get all these output when retraining the network.
May be model.get_output_at(index) helps. However, every time I call this function, I get a DeferredTensor because I cannot forward the data at the same time. Does a convenient way exists?
Thanks in advance~
I suggest you to read the keras documentation:
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]
Here is another similar answer written by fchollet himself:
How can I get hidden layer representation of the given data?
I'm studying the paper "An Introduction to Deep Learning for the Physical Layer". While implementing the proposed network with python keras, I should normalize output of some layer.
One way is simple L2 Normalization (||X||^2 = 1), where X is a tensor of former layer output. I can implement simple L2 Normalization by the following code:
from keras import backend as K
Lambda(lambda x: K.l2_normalize(x,axis=1))
The other way, what I want to know, is ||X||^2 ≤ 1.
Is there any way that constrains the value of layer outputs?
You can apply constraint on layer wights (kernels) for some keras layers. For example on a Dense() layer like:
from keras.constraints import max_norm
from keras.layers import Dense
model.add(Dense(units, kernel_constraint=max_norm(1.)))
But keras layer does not accept an activity_constraint argument, However they accept activity_regularizer and you can use that to implement the first kind of regularization easier).
You can also clip output values of any layer to have maximum norm 1.0 (although I'm not sure if this is what you're looking for). For example if you're using a tensorflow backend, you can define a custom activation layer that clips the value of the layer by norm like:
import tensorflow as tf
def norm_clip(x):
return tf.clip_by_norm(x, 1, axes=[1])
And use it in your model like:
model.add(Dense(units, activation=norm_clip))
I'm trying to get the activation values for each layer in this baseline autoencoder built using Keras since I want to add a sparsity penalty to the loss function based on the Kullbach-Leibler (KL) divergence, as shown here, pag. 14.
In this scenario, I'm going to calculate the KL divergence for each layer and then sum all of them with the main loss function, e.g. mse.
I therefore made a script in Jupyter where I do that but all the time, when I try to compile I get ZeroDivisionError: integer division or modulo by zero.
This is the code
import numpy as np
from keras.layers import Conv2D, Activation
from keras.models import Sequential
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32')
beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
model = Sequential()
model.add(Conv2D(filters=16,kernel_size=(4,4),padding='same',
name='encoder',input_shape=(128,128,1)))
model.add(Activation('relu'))
# get the average activation
A = K.mean(x=model.output)
# calculate the value for the KL divergence
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, A)],axis=0)
# decoder
model.add(Conv2D(filters=1,kernel_size=(4,4),padding='same', name='encoder'))
model.add(Activation('relu'))
B = K.mean(x=model.output)
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, B)],axis=0)
Here seems the cause
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in _normalize_axis(axis, ndim)
989 else:
990 if axis is not None and axis < 0:
991 axis %= ndim <----------
992 return axis
993
so there might be something wrong in the mean calculation. If I print the value I get
Tensor("Mean_10:0", shape=(), dtype=float32)
that is quite strange because the weights and the biases are non-zero initialised. Thus, there might be something wrong in the way of getting the activation values either.
I really would not know hot to fix it, I'm not much of a skilled programmer.
Could anyone help me in understanding where I'm wrong?
First, you shouldn't be doing calculations outside layers. The model must keep track of all calculations.
If you need a specific calculation to be done in the middle of the model, you should use a Lambda layer.
If you need that a specific output be used in the loss function, you should split your model for that output and do calculations inside a custom loss function.
Here, I used Lambda layer to calculate the mean, and a customLoss to calculate the kullback-leibler divergence.
import numpy as np
from keras.layers import *
from keras.models import Model
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32') #you'll probably not need this anymore, since losses will be treated individually in each output.
beta = beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
inp = Input((128,128,1))
lay = Convolution2D(filters=16,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(inp)
#apply the mean using a lambda layer:
intermediateOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(lay)
# decoder
finalOut = Convolution2D(filters=1,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(lay)
#but from that, let's also calculate a mean output for loss:
meanFinalOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(finalOut)
#Now, you have to create a model taking one input and those three outputs:
splitModel = Model(inp,[intermediateOut,meanFinalOut,finalOut])
And finally, compile your model with your custom loss function (we will define that later). But since I don't know if you're actually using the final output (not mean) for training, I'll suggest creating one model for training and another for predicting:
trainingModel = Model(inp,[intermediateOut,meanFinalOut])
trainingModel.compile(...,loss=customLoss)
predictingModel = Model(inp,finalOut)
#you don't need to compile the predicting model since you're only training the trainingModel
#both will share the same weights, you train one, and predict in the other
Our custom loss function should then deal with the kullback.
def customLoss(p,mean):
return #your own kullback expression (I don't know how it works, but maybe keras' one can be used with single values?)
Alternatively, if you want a single loss function to be called instead of two:
summedMeans = Add([intermediateOut,meanFinalOut])
trainingModel = Model(inp, summedMeans)