I currently have a modified resnet 50 architecture that takes two inputs. Building the model and training the model works fine, but when I’m trying to extract layer outputs using the backend function, I encounter errors.
I would prefer to extract layers using the backend function, rather than creating a new truncated model with just my layer of interest as the output.
The following snippet is self contained, and should be able to run and give the error I’ve been seeing.
I've tried reformatting the function in a few ways, such as K.function( [ mymodel.input[0],mymodel.input[1] ] , [mymodel.layers[-1].layers[-6].output])
or
K.function( [ mymodel.layers[0].input,mymodel.layers[1].input ] , [mymodel.layers[-1].layers[-6].output])
but nothing seems to fix the issue
##imports
from keras.applications.resnet50 import ResNet50
from keras.layers import Input
from keras.layers import Lambda
from keras.models import Model
from keras.optimizers import Adam
import keras
import keras.backend as K
import numpy as np
#pop off the input
res = ResNet50(weights=None,include_top=True,classes=2)
res.layers.pop(0)
#add two inputs
auxinput= Input(batch_shape=(None,224,224,1), name='aux_input')
main_input = Input(batch_shape=(None,224,224,3), name='main_input')
#use a lambda functon to return just our main input (avoids errors from out auxilary input not being used in resnet50 component)
l_output = Lambda(lambda x: x[0])([main_input, auxinput])
#feed our main layer to resnet50
data_passed_thru = res(l_output)
#assemble the model with our two inputs, and output
mymodel = Model(inputs=[main_input, auxinput], outputs=[data_passed_thru])
mymodel.compile(optimizer=Adam(lr=0.001), loss= keras.losses.poisson, metrics=[ 'accuracy'])
print("my model summary:")
mymodel.summary()
##generate some fake data for testing
fake_aux= np.zeros((224,224))
fake_aux=fake_aux[None,...]
fake_aux=fake_aux[...,None]
print('fake aux input shape:', fake_aux.shape)
fake_main= np.zeros((224,224,3))
fake_main=fake_main[None,...]
print('fake main input shape:', fake_main.shape)
#check our model inputs and target layer
print("inputs:", mymodel.input)
print("layer outout I'm trying to extract:", mymodel.layers[-1].layers[-6])
#create function to feed inputs, get our desired layer outputs
get_output_func = K.function( mymodel.input , [mymodel.layers[-1].layers[-6].output])
##this is the line that fails
X= [fake_main,fake_aux]
preds=get_output_func(X)
The error message I get is
InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,224,224,3]
[[{{node input_1}}]]
I managed to fix it by accessing the Resnet50 inputs directly for the function, rather than just the whole model's initial inputs. The K.function that works is
get_output_func = K.function( [mymodel.layers[-1].get_input_at(0)] , [mymodel.layers[-1].layers[-6].output])
X= [fake_main]
preds=get_output_func(X)
It only works because of my architecture only depends on the 1 input passing through, so not sure what the solution would be for other situations, but works for my case
Related
from keras_multi_head import MultiHeadAttention
import keras
from keras.layers import Dense,Input,Multiply
from keras import backend as K
from keras.layers.core import Dropout, Layer
from keras.models import Sequential,Model
import numpy as np
import tensorflow as tf
from self_attention_layer import Encoder
## multi source attention
class Multi_source_attention(keras.Model):
def __init__(self,read_n,embed_dim,num_heads,ff_dim,num_layers):
super().__init__()
self.read_n = read_n
self.embed_dim = embed_dim
self.num_heads = num_heads
self.ff_dim = ff_dim
self.num_layers = num_layers
self.get_weights = Dense(49, activation = 'relu',name = "get_weights")
def compute_output_shape(self,input_shape):
#([batch,7,7,256],[1,256])
return input_shape
def call(self,inputs):
## weights matrix
#(1,49)
weights_res = self.get_weights(inputs[1])
#(1,7,7)
weights = tf.reshape(weights_res,(1,7,7))
#(256,7,7)
weights = tf.tile(weights,[256,1,1])
## img from mobilenet
img=tf.reshape(inputs[0],[-1,7,7])
inter_res = tf.multiply(img,weights)
inter_res = tf.reshape(inter_res, (-1,256,49))
print(inter_res.shape)
att = Encoder(self.embed_dim,self.num_heads,self.ff_dim,self.num_layers)(inter_res)
return att
I try to construct a network to implement the part circled in the image. The output from LSTM **(1,256) and from the previous Mobilenet (batch,7,7,256). Then the output of LSTM is transformed to a weights matrix in form of (7,7).
But the problem is that the input shape of the output from mobilenet has a attribute batch. I have no idea how to deal with "batch" or how to set up a parameter to constraint the batch?
Could someone give me a tip?
And if I remove the function compute_output_shape(), one error unimplementerror occurs. the keras official doc tells me that I don't need to overwrite the function.
Could someone explain me about that?
Compute_output_shape is crucial to custom the layer. if the function summary() is called, the corresponding Graph is generated where the input and output shapes are showed in every layer. The compute_output_shape is responsible for the output shape.
Is the activation function for each layer stored in the .h5 file produced by model.save()? Or is it already "baked in" to the weights?
I am writing an AWS Lambda function to generate time-series predictions from multiple regression models every five minutes. Unfortunately, TensorFlow is too large of a library to be loaded into an AWS Lambda function, so I am writing my own Python code to load the saved .h5 model file and generate predictions based on the weights and input data. Here's where I'm at so far:
def generate_predictions(model_path, df):
model_info = h5py.File(model_path, 'r')
model_weights = model_info['model_weights']
# Initialize predictions matrix with preprocessed inputs
predictions = preprocessing.scale(df[inputs])
layer_list = list(model_weights.keys())
for layer in layer_list:
weights = model_weights[layer][layer]['kernel:0'][:]
bias = model_weights[layer][layer]['bias:0'][:]
predictions = predictions.dot(weights)
predictions += bias
# How to retrieve activation function for layer?
# predictions = activation_function(predictions)
return predictions
I understand I'll probably want some kind of case/switch statement to handle the various activation functions.
The model configuration is accessible through an attribute called "model_config" on the top group that seems to contain the full model configuration JSON that is produced by model.to_json().
import json
import h5py
model_info = h5py.File('model.h5', 'r')
model_config_json = json.loads(model_info.attrs['model_config'])
If you save the full model with model.save, you can access each layer and it's activation function.
from tensorflow.keras.models import load_model
model = load_model('model.h5')
for l in model.layers:
try:
print(l.activation)
except: # some layers don't have any activation
pass
<function tanh at 0x7fa513b4a8c8>
<function softmax at 0x7fa513b4a510>
Here, for example, softmax is used in the last layer.
If you don't want to import tensorflow, you can also read from h5py.
import h5py
import json
model_info = h5py.File('model.h5', 'r')
model_config = json.loads(model_info.attrs.get('model_config').decode('utf-8'))
for k in model_config['config']['layers']:
if 'activation' in k['config']:
print(f"{k['class_name']}: {k['config']['activation']}")
LSTM: tanh
Dense: softmax
Here, last layer is a dense layer which has softmax activation.
In my TensorFlow model I have some data that I feed into a stack of CNNs before it goes into a few fully connected layers. I have implemented that with Keras' Sequential model. However, I now have some data that should not go into the CNN and instead be fed directly into the first fully connected layer because that data contains some values and labels that are part of the input data but that data should not undergo convolutions as it is not image data.
Is such a thing possible with tensorflow.keras or should I do that with tensorflow.nn instead? As far as I understand Keras' sequential models is that the input goes in one end and comes out the other with no special wiring in the middle.
Am I correct that to do this I have to use tensorflow.concat on the data from the last CNN layer and the data that bypasses the CNNs before feeding it into the first fully connected layer?
Here is an simple example in which the operation is to sum the activations from different subnets:
import keras
import numpy as np
import tensorflow as tf
from keras.layers import Input, Dense, Activation
tf.reset_default_graph()
# this represents your cnn model
def nn_model(input_x):
feature_maker = Dense(10, activation='relu')(input_x)
feature_maker = Dense(20, activation='relu')(feature_maker)
feature_maker = Dense(1, activation='linear')(feature_maker)
return feature_maker
# a list of input layers, of course the input shapes can be different
input_layers = [Input(shape=(3, )) for _ in range(2)]
coupled_feature = [nn_model(input_x) for input_x in input_layers]
# assume you take the sum of the outputs
coupled_feature = keras.layers.Add()(coupled_feature)
prediction = Dense(1, activation='relu')(coupled_feature)
model = keras.models.Model(inputs=input_layers, outputs=prediction)
model.compile(loss='mse', optimizer='adam')
# example training set
x_1 = np.linspace(1, 90, 270).reshape(90, 3)
x_2 = np.linspace(1, 90, 270).reshape(90, 3)
y = np.random.rand(90)
inputs_x = [x_1, x_2]
model.fit(inputs_x, y, batch_size=32, epochs=10)
You can actually plot the model to gain more intuition
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True)
The model of the above code looks like this
With a little remodeling and the functional API you can:
#create the CNN - it can also be a sequential
cnn_input = Input(image_shape)
cnn_output = Conv2D(...)(cnn_input)
cnn_output = Conv2D(...)(cnn_output)
cnn_output = MaxPooling2D()(cnn_output)
....
cnn_model = Model(cnn_input, cnn_output)
#create the FC model - can also be a sequential
fc_input = Input(fc_input_shape)
fc_output = Dense(...)(fc_input)
fc_output = Dense(...)(fc_output)
fc_model = Model(fc_input, fc_output)
There is a lot of space for creativity, this is just one of the ways.
#create the full model
full_input = Input(image_shape)
full_output = cnn_model(full_input)
full_output = fc_model(full_output)
full_model = Model(full_input, full_output)
You can use any of the three models in any way you want. They share the layers and the weights, so internally they are the same.
Saving and loading the full model might be quirky. I'd probably save the other two separately and when loading create the full model again.
Notice also that if you save two models that share the same layers, after loading they will probably not share these layers anymore. (Another reason for saving/loading only fc_model and cnn_model, while creating full_model again from code)
tf.keras.application contains many famous neural network link VGG, densenet, mobilenet and so on. Take tf.keras.application.MobileNet as an example, what I am interested in is not only the final output, but also the output of the intermediate layer, how could I get all these output when retraining the network.
May be model.get_output_at(index) helps. However, every time I call this function, I get a DeferredTensor because I cannot forward the data at the same time. Does a convenient way exists?
Thanks in advance~
I suggest you to read the keras documentation:
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:
get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[3].output])
# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]
# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]
Here is another similar answer written by fchollet himself:
How can I get hidden layer representation of the given data?
I am trying to change the activation function of the last layer of a keras model without replacing the whole layer. In this case, only the softmax function
import keras.backend as K
from keras.models import load_model
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
model = load_model(model_path) # Load any model
img = load_img(img_path, target_size=(224, 224))
img = img_to_array(img)
print(model.predict(img))
My output:
array([[1.53172877e-07, 7.13159451e-08, 6.18941920e-09, 8.52070968e-07,
1.25813088e-07, 9.98970985e-01, 1.48254022e-08, 6.09538893e-06,
1.16236095e-07, 3.91888688e-10, 6.29304608e-08, 1.79565995e-09,
1.75571788e-08, 1.02110009e-03, 2.14380114e-09, 9.54465733e-08,
1.05938483e-07, 2.20544337e-07]], dtype=float32)
Then I do this to change the activation:
model.layers[-1].activation = custom_softmax
print(model.predict(test_img))
and the output I got is exactly the same. Any ideas how to fix? Thanks!
You could try to use the custom_softmax below:
def custom_softmax(x, axis=-1):
"""Softmax activation function.
# Arguments
x : Tensor.
axis: Integer, axis along which the softmax normalization is applied.
# Returns
Tensor, output of softmax transformation.
# Raises
ValueError: In case `dim(x) == 1`.
"""
ndim = K.ndim(x)
if ndim >= 2:
return K.zeros_like(x)
else:
raise ValueError('Cannot apply softmax to a tensor that is 1D')
At the current state of things there's no official, clean way to do that. As pointed by #layser in the comments, the Tensorflow graph isn't being updated - which results in the lack of change in your output. One option is to use keras-vis' utils. My recommendation is to isolate that in your own utils.py, like so:
from vis.utils.utils import apply_modifications
def update_layer_activation(model, activation, index=-1):
model.layers[index].activation = activation
return apply_modifications(model)
Which would lead to a similar use:
model = update_layer_activation(model, custom_softmax)
If you follow the given link, you'll see what they do is quite simple: they save the model to a temporary path, then load it back and return, finally deleting the temp file.