Connecting Keras models / replacing input but keeping layers - python

This questions is similar to Keras replacing input layer.
I have a classifier network and an autoencoder network and I want to use the output of the autoencoder (i.e. encoding + decoding, as a preprocessing step) as the input to the classifier - but after the classifier was already trained on the regular data.
The classification network was built with the functional API like this (based on this example):
clf_input = Input(shape=(28,28,1))
clf_layer = Conv2D(...)(clf_input)
clf_layer = MaxPooling2D(...)(clf_layer)
...
clf_output = Dense(num_classes, activation='softmax')(clf_layer)
model = Model(clf_input, clf_output)
model.compile(...)
model.fit(...)
And the autoencoder like this (based on this example):
ae_input = Input(shape=(28,28,1))
x = Conv2D(...)(ae_input)
x = MaxPooling2D(...)(x)
...
encoded = MaxPooling2D(...)(x)
x = Conv2d(...)(encoded)
x = UpSampling2D(...)(x)
...
decoded = Conv2D(...)(x)
autoencoder = Model(ae_input, decoded)
autoencoder.compile(...)
autoencoder.fit(...)
I can concatenate the two models like this (I still need the original models, hence the copying):
model_copy = keras.models.clone_model(model)
model_copy.set_weights(model.get_weights())
# remove original input layer
model_copy.layers.pop(0)
# set the new input
new_clf_output = model_copy(decoded)
# get the stacked model
stacked_model = Model(ae_input, new_clf_output)
stacked_model.compile(...)
And this works great when all I want to do is apply the model to new test data, but it gives an error on something like this:
for layer in stacked_model.layers:
print layer.get_config()
where it gets to the end of the autoencoder but then fails with a KeyError at the point where the classifier model gets its input. Also when plotting the model with keras.utils.plot_model I get this:
where you can see the autoencoder layers but then at the end, instead of the individual layers from the classifier model, there is only the complete model in one block.
Is there a way to connect two models such the new stacked model is actually made up of all the individual layers?

Ok, what I could come up with is to really manually go through each layer of the model and reconnect them one by one again like this:
l = model.layers[1](decoded) # layer 0 is the input layer, which we're replacing
for i in range(2, len(model.layers)):
l = model.layers[i](l)
stacked_model = Model(ae_input, l)
stacked_model.compile(...)
while this works and produces the correct plot and no errors, this does not seem like the most elegant solution...
(btw, the copying of the model actually seems to be unnecessary as I'm not retraining anything.)

Related

Feed input to a keras model only to be accessed by a custom loss function

Is there a way to pass a feature to a keras model as an input only to be accessed by a custom loss function without affecting the model as an input feature? I only need the feature to calculate the loss, not to feed-forward through the hidden layers in the network. (Basically what I want is to feed the feature in as an input and extract it as it is as an output along with y_pred to be accessed in the loss function).
A worked example would be much appreciated.
If you are writing your custom loss, you could use pass the feature as an input, and then using a Lambda layer, you can make it bypass the network and directly concatenate at the end. Something like the following -
from tensorflow.keras import layers, Model, utils
inp = layers.Input((11,))
x = layers.Lambda(lambda x: x[:,:-1])(inp)
o2 = layers.Lambda(lambda x: x[:,-1:])(inp)
x = layers.Dense(20)(x)
x = layers.Dense(20)(x)
o1 = layers.Dense(1)(x)
out = layers.concatenate([o1, o2])
model = Model(inp, out)
def custom_loss(outputs, actuals):
...
utils.plot_model(model, show_shapes=True, show_layer_names=False)
Here the first 10 features are the ones you want to pass via the network, and the last feature is the one you just want as is, for the custom loss. The final output is going to just be a concatenation of your expected output for the first 10 features via the network + the untouched feature.
If you want to know how to write a custom loss, please check this excellent SO post that explains it.

Implementation of Gradient-Reversal-layer into a functioning keras model for multiclassification

My question is about the practical implementation of "Domain Adaptation" into a functional model in keras with tensorflow backend.
Description of the problem:
I have a collection of particle collision samples which consist of n variables. One half of them is simulated data with certain class labels (e.g "W-Boson"). The other half is real collision data which is not labeled. The key idea now is to setup a keras model, which has two outputs. One for classifying the class of a sample and one for classifying the domain, so wether it is simulated or real data. The thing is that the model shall be trained so that the domain classifier performs very poor. This is achieved by flipping the sign of the incoming gradient from the domain end of the network during training. This technique is called "Domain Adaptation". The model is expected to be trained to find domain-invariant features, or in other words, to perform the same on simulated and real collision data.
The framework I am working with has an existin functional keras model, which I wanted to expand with said domain classifier. This is a prototype I came up with:
# common layers
inputs = keras.Input(shape=(n_variables, ))
X = layers.Dense(units=50, activation="relu")(inputs)
# domain end
flip_layer = flipGradientTF.GradientReversal(hp_lambda=0.3)(X)
X_domain = layers.Dense(units=50, activation="relu")(flip_layer)
domain_out = layers.Dense(units=2, activation="softmax", name="domain_out")(X_domain)
# class end
X_class = layers.Dense(units=50, activation="relu")(X)
class_out = layers.Dense(units=n_classes, activation="softmax", name="class_out")(X_class)
The code for flipGradientTF is taken from https://github.com/michetonu/gradient_reversal_keras_tf
And further on for compiling and training the model:
model = keras.Model(inputs=inputs, outputs=[class_out, domain_out])
model.compile(optimizer="adam", loss=loss_function, metrics="accuracy")
# train model
model.fit(
x = train_data,
y = [train_class_labels, train_domain_labels],
batch_size = 200,
epochs = 200,
sample_weight = {"class_out": class_weights, "domain_out": None}
)
For train_data I am passing the dataframe which consists of the data from both domains. As I have tried to use either "categorical_crossentropy" or "sparse_categorical_crossentropy" as the loss_function, train_class_labels and train_domain_labels where either in the one-hot representation or in the integer representation. My biggest issue is figuring out what to use for the class labels of the unlabeled data and this led to a gut feeling that I am on the wrong track here.
So in a nutshell:
Is this implementation strategy legit and assuming it is, what should I do about the class labels for the unlabeled data? And if it is not legit, what would be a better way of attacking this problem?
Any help would be much appreciated :)

Keras sequence models - how to generate data during test/generation?

Is there a way to use the already trained RNN (SimpleRNN or LSTM) model to generate new sequences in Keras?
I'm trying to modify an exercise from the Coursera Deep Learning Specialization - Sequence Models course, where you train an RNN to generate dinosaurus's names. In the exercise you build the RNN using only numpy, but I want to use Keras.
One of the problems is different lengths of the sequences (dino names), so I used padding and set sequence length to the max size appearing in the dataset (I padded with 0, which is also the code for '\n').
My question is how to generate the actual sequence once training is done? In the numpy version of the exercise you take the softmax output of the previous cell and use it as a distribution to sample a new input for the next cell. But is there a way to connect the output of the previous cell as the input of the next cell in Keras, during testing/generation time?
Also - some additional side-question:
Since I'm using padding, I suspect the accuracy is way too optimistic. Is there a way to tell Keras not to include the padding values in its accuracy calculations?
Am I even doing this right? Is there a better way to use Keras with sequences of different lengths?
You can check my (WIP) code here.
Inferring from a model that has been trained on a sequence
So it's a pretty common thing to do in RNN models and in Keras the best way (at least from what I know) is to create two different models.
One model for training (which uses sequences instead of individual items)
Another model for predicting (which uses a single element instead of a sequence)
So let's see an example. Suppose you have the following model.
from tensorflow.keras import models, layers
n_chars = 26
timesteps = 10
inp = layers.Input(shape=(timesteps, n_chars))
lstm = layers.LSTM(100, return_sequences=True)
out1 = lstm(inp)
dense = layers.Dense(n_chars, activation='softmax')
out2 = layers.TimeDistributed(dense)(out1)
model = models.Model(inp, out2)
model.summary()
Now to infer from this model, you create another model which looks like the one below.
inp_infer = layers.Input(shape=(1, n_chars))
# Inputs to feed LSTM states back in
h_inp_infer = layers.Input(shape=(100,))
c_inp_infer = layers.Input(shape=(100,))
# We need return_state=True so we are creating a new layer
lstm_infer = layers.LSTM(100, return_state=True, return_sequences=True)
out1_infer, h, c = lstm_infer(inp_infer, initial_state=[h_inp_infer, c_inp_infer])
out2_infer = layers.TimeDistributed(dense)(out1_infer)
# Our model takes the previous states as inputs and spits out new states as outputs
model_infer = models.Model([inp_infer, h_inp_infer, c_inp_infer], [out2_infer, h, c])
# We are setting the weights from the trained model
lstm_infer.set_weights(lstm.get_weights())
model_infer.summary()
So what's different. You see that we have defined a new input layer which accepts an input which has only one timestep (or in other words, just a single item). Then the model outputs an output which has a single timestep (technically we don't need the TimeDistributedLayer. But I've kept that for consistency). Other than that we take the previous LSTM state output as an input and produces the new state as the output. More specifically we have the following inference model.
Input: [(None, 1, n_chars) (None, 100), (None, 100)] list of tensor
Output: [(None, 1, n_chars), (None, 100), (None, 100)] list of Tensor
Note that I'm updating the weights of the new layers from the trained model or using the existing layers from the training model. It will be a pretty useless model if you don't reuse the trained layers and weights.
Now we can write inference logic.
import numpy as np
x = np.random.randint(0,2,size=(1, 1, n_chars))
h = np.zeros(shape=(1, 100))
c = np.zeros(shape=(1, 100))
seq_len = 10
for _ in range(seq_len):
print(x)
y_pred, h, c = model_infer.predict([x, h, c])
y_pred = x[:,0,:]
y_onehot = np.zeros(shape=(x.shape[0],n_chars))
y_onehot[np.arange(x.shape[0]),np.argmax(y_pred,axis=1)] = 1.0
x = np.expand_dims(y_onehot, axis=1)
This part starts with an initial x, h, c. Gets the prediction y_pred, h, c and convert that to an input in the following lines and assign it back to x, h, c. So you keep going for n iterations of your choice.
About masking zeros
Keras does offer a Masking layer which can be used for this purpose. And the second answer in this question seems to be what you're looking for.

How to get intermediate output when using tf.keras.application

tf.keras.application contains many famous neural network link VGG, densenet, mobilenet and so on. Take tf.keras.application.MobileNet as an example, what I am interested in is not only the final output, but also the output of the intermediate layer, how could I get all these output when retraining the network.
May be model.get_output_at(index) helps. However, every time I call this function, I get a DeferredTensor because I cannot forward the data at the same time. Does a convenient way exists?
Thanks in advance~
I suggest you to read the keras documentation:
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]
Here is another similar answer written by fchollet himself:
How can I get hidden layer representation of the given data?

Constructing a keras model

I don't understand what's happening in this code:
def construct_model(use_imagenet=True):
# line 1: how do we keep all layers of this model ?
model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?
new_output = keras.layers.GlobalAveragePooling2D()(model.output)
new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
model = keras.engine.training.Model(model.inputs, new_output)
return model
Specifically, my confusion is, when we call the last constructor
model = keras.engine.training.Model(model.inputs, new_output)
we specify input layer and output layer, but how does it know we want all the other layers to stay?
In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?
Side question 1): What is the difference between keras.engine.training.Model and keras.models.Model?
Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?
This model was created using the Functional API Model
Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):
You have an input tensor (you can see it as "input data" too)
You create (or reuse) a layer
You pass the input tensor to a layer (you "call" a layer with an input)
You get an output tensor
You keep working with these tensors until you have created the entire graph.
But this hasn't created a "model" yet. (One you can train and use other things).
All you have is a graph telling which tensors go where.
To create a model, you define it's start end end points.
In the example.
They take an existing model: model = keras.applications.InceptionV3(...)
They want to expand this model, so they get its output tensor: model.output
They pass this tensor as the input of a GlobalAveragePooling2D layer
They get this layer's output tensor as new_output
They pass this as input to yet another layer: Dense(N_CLASSES, ....)
And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)
But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:
new_model = Model(old_model.inputs, new_output)
Now you have your model.
If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.
Question: how does it know what other layers we want in between?
When you do:
outputTensor = SomeLayer(...)(inputTensor)
you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.
When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.
Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.
So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.
If you try doing this with tensors that are not connected, you will get an error.
Side question 1
Prefer to import from "keras.models". Basically, this module will import from the other module:
https://github.com/keras-team/keras/blob/master/keras/models.py
Notice that the file keras/models.py imports Model from keras.engine.training. So, it's the same thing.
Side question 2
It's not new_layer = keras.layers.Dense(...)(prev_layer).
It is output_tensor = keras.layers.Dense(...)(input_tensor).
You're doing two things in the same line:
Creating a layer - with keras.layers.Dense(...)
Calling the layer with an input tensor to get an output tensor
If you wanted to use the same layer with different inputs:
denseLayer = keras.layers.Dense(...) #creating a layer
output1 = denseLayer(input1) #calling a layer with an input and getting an output
output2 = denseLayer(input2) #calling the same layer on another input
output3 = denseLayer(input3) #again
Bonus - Creating a functional model that is equal to a sequential model
If you create this sequential model:
model = Sequential()
model.add(Layer1(...., input_shape=some_shape))
model.add(Layer2(...))
model.add(Layer3(...))
You're doing exactly the same as:
inputTensor = Input(some_shape)
outputTensor = Layer1(...)(inputTensor)
outputTensor = Layer2(...)(outputTensor)
outputTensor = Layer3(...)(outputTensor)
model = Model(inputTensor,outputTensor)
What is the difference?
Well, functional API models are totally free to be build anyway you want. You can create branches:
out1 = Layer1(..)(inputTensor)
out2 = Layer2(..)(inputTensor)
You can join tensors:
joinedOut = Concatenate()([out1,out2])
With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.
In fact, a Sequential model is also a Model, but created for a quick use in models without branches.
There's this way of building a model from a pretrained one that you may build upon.
See https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes:
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(200, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Each time a layer is added by an op like "x=Dense(...", information about the computational graph is updated. You can type this interactively to see what it contains:
x.graph.__dict__
You can see there's all kinds of attributes, including about previous and next layers. These are internal implementation details and possibly change over time.

Categories