tf.keras.application contains many famous neural network link VGG, densenet, mobilenet and so on. Take tf.keras.application.MobileNet as an example, what I am interested in is not only the final output, but also the output of the intermediate layer, how could I get all these output when retraining the network.
May be model.get_output_at(index) helps. However, every time I call this function, I get a DeferredTensor because I cannot forward the data at the same time. Does a convenient way exists?
Thanks in advance~
I suggest you to read the keras documentation:
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
layer_output = get_3rd_layer_output([x])[0]
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]
Here is another similar answer written by fchollet himself:
How can I get hidden layer representation of the given data?
Is there a way to pass a feature to a keras model as an input only to be accessed by a custom loss function without affecting the model as an input feature? I only need the feature to calculate the loss, not to feed-forward through the hidden layers in the network. (Basically what I want is to feed the feature in as an input and extract it as it is as an output along with y_pred to be accessed in the loss function).
A worked example would be much appreciated.
If you are writing your custom loss, you could use pass the feature as an input, and then using a Lambda layer, you can make it bypass the network and directly concatenate at the end. Something like the following -
from tensorflow.keras import layers, Model, utils
inp = layers.Input((11,))
x = layers.Lambda(lambda x: x[:,:-1])(inp)
o2 = layers.Lambda(lambda x: x[:,-1:])(inp)
x = layers.Dense(20)(x)
x = layers.Dense(20)(x)
o1 = layers.Dense(1)(x)
out = layers.concatenate([o1, o2])
model = Model(inp, out)
def custom_loss(outputs, actuals):
utils.plot_model(model, show_shapes=True, show_layer_names=False)
Here the first 10 features are the ones you want to pass via the network, and the last feature is the one you just want as is, for the custom loss. The final output is going to just be a concatenation of your expected output for the first 10 features via the network + the untouched feature.
If you want to know how to write a custom loss, please check this excellent SO post that explains it.
Is the activation function for each layer stored in the .h5 file produced by Or is it already "baked in" to the weights?
I am writing an AWS Lambda function to generate time-series predictions from multiple regression models every five minutes. Unfortunately, TensorFlow is too large of a library to be loaded into an AWS Lambda function, so I am writing my own Python code to load the saved .h5 model file and generate predictions based on the weights and input data. Here's where I'm at so far:
def generate_predictions(model_path, df):
model_info = h5py.File(model_path, 'r')
model_weights = model_info['model_weights']
# Initialize predictions matrix with preprocessed inputs
predictions = preprocessing.scale(df[inputs])
layer_list = list(model_weights.keys())
for layer in layer_list:
weights = model_weights[layer][layer]['kernel:0'][:]
bias = model_weights[layer][layer]['bias:0'][:]
predictions =
predictions += bias
# How to retrieve activation function for layer?
# predictions = activation_function(predictions)
return predictions
I understand I'll probably want some kind of case/switch statement to handle the various activation functions.
The model configuration is accessible through an attribute called "model_config" on the top group that seems to contain the full model configuration JSON that is produced by model.to_json().
import json
import h5py
model_info = h5py.File('model.h5', 'r')
model_config_json = json.loads(model_info.attrs['model_config'])
If you save the full model with, you can access each layer and it's activation function.
from tensorflow.keras.models import load_model
model = load_model('model.h5')
for l in model.layers:
except: # some layers don't have any activation
<function tanh at 0x7fa513b4a8c8>
<function softmax at 0x7fa513b4a510>
Here, for example, softmax is used in the last layer.
If you don't want to import tensorflow, you can also read from h5py.
import h5py
import json
model_info = h5py.File('model.h5', 'r')
model_config = json.loads(model_info.attrs.get('model_config').decode('utf-8'))
for k in model_config['config']['layers']:
if 'activation' in k['config']:
print(f"{k['class_name']}: {k['config']['activation']}")
LSTM: tanh
Dense: softmax
Here, last layer is a dense layer which has softmax activation.
In my TensorFlow model I have some data that I feed into a stack of CNNs before it goes into a few fully connected layers. I have implemented that with Keras' Sequential model. However, I now have some data that should not go into the CNN and instead be fed directly into the first fully connected layer because that data contains some values and labels that are part of the input data but that data should not undergo convolutions as it is not image data.
Is such a thing possible with tensorflow.keras or should I do that with tensorflow.nn instead? As far as I understand Keras' sequential models is that the input goes in one end and comes out the other with no special wiring in the middle.
Am I correct that to do this I have to use tensorflow.concat on the data from the last CNN layer and the data that bypasses the CNNs before feeding it into the first fully connected layer?
Here is an simple example in which the operation is to sum the activations from different subnets:
import keras
import numpy as np
import tensorflow as tf
from keras.layers import Input, Dense, Activation
# this represents your cnn model
def nn_model(input_x):
feature_maker = Dense(10, activation='relu')(input_x)
feature_maker = Dense(20, activation='relu')(feature_maker)
feature_maker = Dense(1, activation='linear')(feature_maker)
return feature_maker
# a list of input layers, of course the input shapes can be different
input_layers = [Input(shape=(3, )) for _ in range(2)]
coupled_feature = [nn_model(input_x) for input_x in input_layers]
# assume you take the sum of the outputs
coupled_feature = keras.layers.Add()(coupled_feature)
prediction = Dense(1, activation='relu')(coupled_feature)
model = keras.models.Model(inputs=input_layers, outputs=prediction)
model.compile(loss='mse', optimizer='adam')
# example training set
x_1 = np.linspace(1, 90, 270).reshape(90, 3)
x_2 = np.linspace(1, 90, 270).reshape(90, 3)
y = np.random.rand(90)
inputs_x = [x_1, x_2], y, batch_size=32, epochs=10)
You can actually plot the model to gain more intuition
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True)
The model of the above code looks like this
With a little remodeling and the functional API you can:
#create the CNN - it can also be a sequential
cnn_input = Input(image_shape)
cnn_output = Conv2D(...)(cnn_input)
cnn_output = Conv2D(...)(cnn_output)
cnn_output = MaxPooling2D()(cnn_output)
cnn_model = Model(cnn_input, cnn_output)
#create the FC model - can also be a sequential
fc_input = Input(fc_input_shape)
fc_output = Dense(...)(fc_input)
fc_output = Dense(...)(fc_output)
fc_model = Model(fc_input, fc_output)
There is a lot of space for creativity, this is just one of the ways.
#create the full model
full_input = Input(image_shape)
full_output = cnn_model(full_input)
full_output = fc_model(full_output)
full_model = Model(full_input, full_output)
You can use any of the three models in any way you want. They share the layers and the weights, so internally they are the same.
Saving and loading the full model might be quirky. I'd probably save the other two separately and when loading create the full model again.
Notice also that if you save two models that share the same layers, after loading they will probably not share these layers anymore. (Another reason for saving/loading only fc_model and cnn_model, while creating full_model again from code)
I don't understand what's happening in this code:
def construct_model(use_imagenet=True):
# line 1: how do we keep all layers of this model ?
model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?
new_output = keras.layers.GlobalAveragePooling2D()(model.output)
new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
model =, new_output)
return model
Specifically, my confusion is, when we call the last constructor
model =, new_output)
we specify input layer and output layer, but how does it know we want all the other layers to stay?
In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?
Side question 1): What is the difference between and keras.models.Model?
Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?
This model was created using the Functional API Model
Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):
You have an input tensor (you can see it as "input data" too)
You create (or reuse) a layer
You pass the input tensor to a layer (you "call" a layer with an input)
You get an output tensor
You keep working with these tensors until you have created the entire graph.
But this hasn't created a "model" yet. (One you can train and use other things).
All you have is a graph telling which tensors go where.
To create a model, you define it's start end end points.
In the example.
They take an existing model: model = keras.applications.InceptionV3(...)
They want to expand this model, so they get its output tensor: model.output
They pass this tensor as the input of a GlobalAveragePooling2D layer
They get this layer's output tensor as new_output
They pass this as input to yet another layer: Dense(N_CLASSES, ....)
And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)
But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:
new_model = Model(old_model.inputs, new_output)
Now you have your model.
If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.
Question: how does it know what other layers we want in between?
When you do:
outputTensor = SomeLayer(...)(inputTensor)
you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.
When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.
Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.
So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.
If you try doing this with tensors that are not connected, you will get an error.
Side question 1
Prefer to import from "keras.models". Basically, this module will import from the other module:
Notice that the file keras/ imports Model from So, it's the same thing.
Side question 2
It's not new_layer = keras.layers.Dense(...)(prev_layer).
It is output_tensor = keras.layers.Dense(...)(input_tensor).
You're doing two things in the same line:
Creating a layer - with keras.layers.Dense(...)
Calling the layer with an input tensor to get an output tensor
If you wanted to use the same layer with different inputs:
denseLayer = keras.layers.Dense(...) #creating a layer
output1 = denseLayer(input1) #calling a layer with an input and getting an output
output2 = denseLayer(input2) #calling the same layer on another input
output3 = denseLayer(input3) #again
Bonus - Creating a functional model that is equal to a sequential model
If you create this sequential model:
model = Sequential()
model.add(Layer1(...., input_shape=some_shape))
You're doing exactly the same as:
inputTensor = Input(some_shape)
outputTensor = Layer1(...)(inputTensor)
outputTensor = Layer2(...)(outputTensor)
outputTensor = Layer3(...)(outputTensor)
model = Model(inputTensor,outputTensor)
What is the difference?
Well, functional API models are totally free to be build anyway you want. You can create branches:
out1 = Layer1(..)(inputTensor)
out2 = Layer2(..)(inputTensor)
You can join tensors:
joinedOut = Concatenate()([out1,out2])
With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.
In fact, a Sequential model is also a Model, but created for a quick use in models without branches.
There's this way of building a model from a pretrained one that you may build upon.
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(200, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Each time a layer is added by an op like "x=Dense(...", information about the computational graph is updated. You can type this interactively to see what it contains:
You can see there's all kinds of attributes, including about previous and next layers. These are internal implementation details and possibly change over time.
I have trained a network model using keras which includes multiple dropout layers. I have also implemented a stochastic predictor function (using the keras backend) which allows me to get predictions with dropout turned "on."
import keras.backend as K
F = K.function([model.layers[0].input, K.learning_phase()], [m.layers[-1].output])
I call the function using
output = F([x_test[0:1], 1])
where x_test is a sample input.
Currently, the dropout rate used in this predictor function is the same as the dropout rate used for training. I would like to set a different dropout rate without retraining the network.
I wrote a script to change all dropout layer rates:
for layer in [l for l in m.layers if "dropout" in np.str.lower(]:
layer.rate = 0.5
However, when I call my custom function, it does not change the output. For example, if my trained network uses a rate of 0 (or K.epsilon()), then repeated function calls will yield the same result. Changing the dropout to 0.5 should yield unique results on each function call. Yet, this is not the case. Changing the dropout has no effect.
What does work is extracting a single layer, changing the rate, and calling that single layer:
L = my_net.model.layers[0]
L.rate = 0.5
L_out1 = K.eval([0], training=True))
L.rate = K.epsilon()
L_out2 = K.eval([0], training=True))
Here, L_out1 and L_out2 are unique. I don't know how to implement this functionality across the whole network.
What is it about the backend function that prevents my model changes from being effective?