How to implement a custom layer wit multiple outputs in Keras? - python

Like stated in the title, I was wondering as to how to have the custom layer returning multiple tensors: out1, out2,...outn?
I tried
keras.backend.concatenate([out1, out2], axis = 1)
But this does only work for tensors having the same length, and it has to be another solution rather than concatenating two by two tensors every time, is it?

In the call method of your layer, where you perform the layer calculations, you can return a list of tensors:
def call(self, inputTensor):
#calculations with inputTensor and the weights you defined in "build"
#inputTensor may be a single tensor or a list of tensors
#output can also be a single tensor or a list of tensors
return [output1,output2,output3]
Take care of the output shapes:
def compute_output_shape(self,inputShape):
#calculate shapes from input shape
return [shape1,shape2,shape3]
The result of using the layer is a list of tensors.
Naturally, some kinds of keras layers accept lists as inputs, others don't.
You have to manage the outputs properly using a functional API Model. You're probably going to have problems using a Sequential model while having multiple outputs.
I tested this code on my machine (Keras 2.0.8) and it works perfectly:
from keras.layers import *
from keras.models import *
import numpy as np
class Lay(Layer):
def init(self):
def build(self,inputShape):
def call(self,x):
return [x[:,:1],x[:,-1:]]
def compute_output_shape(self,inputShape):
return [(None,1),(None,1)]
inp = Input((2,))
out = Lay()(inp)
out = Concatenate()(out)
model = Model(inp,out)
data = np.array([[1,2],[3,4],[5,6]])
import keras


a problem and how to deal with batch while creating a Model

from keras_multi_head import MultiHeadAttention
import keras
from keras.layers import Dense,Input,Multiply
from keras import backend as K
from keras.layers.core import Dropout, Layer
from keras.models import Sequential,Model
import numpy as np
import tensorflow as tf
from self_attention_layer import Encoder
## multi source attention
class Multi_source_attention(keras.Model):
def __init__(self,read_n,embed_dim,num_heads,ff_dim,num_layers):
self.read_n = read_n
self.embed_dim = embed_dim
self.num_heads = num_heads
self.ff_dim = ff_dim
self.num_layers = num_layers
self.get_weights = Dense(49, activation = 'relu',name = "get_weights")
def compute_output_shape(self,input_shape):
return input_shape
def call(self,inputs):
## weights matrix
weights_res = self.get_weights(inputs[1])
weights = tf.reshape(weights_res,(1,7,7))
weights = tf.tile(weights,[256,1,1])
## img from mobilenet
inter_res = tf.multiply(img,weights)
inter_res = tf.reshape(inter_res, (-1,256,49))
att = Encoder(self.embed_dim,self.num_heads,self.ff_dim,self.num_layers)(inter_res)
return att
I try to construct a network to implement the part circled in the image. The output from LSTM **(1,256) and from the previous Mobilenet (batch,7,7,256). Then the output of LSTM is transformed to a weights matrix in form of (7,7).
But the problem is that the input shape of the output from mobilenet has a attribute batch. I have no idea how to deal with "batch" or how to set up a parameter to constraint the batch?
Could someone give me a tip?
And if I remove the function compute_output_shape(), one error unimplementerror occurs. the keras official doc tells me that I don't need to overwrite the function.
Could someone explain me about that?
Compute_output_shape is crucial to custom the layer. if the function summary() is called, the corresponding Graph is generated where the input and output shapes are showed in every layer. The compute_output_shape is responsible for the output shape.

Keras permute_dimension causes weird tensor shapes

I'm running into an issue with a model I'm trying to build. I've been trying to debug it and ran into an oddity that I think may be the cause, but I'm not sure what I'm doing wrong. I've reduced what I think the problem is into a small snippet you can run on colab.
Here's a colab where you can try running this:
import keras
from keras.layers import Layer, Dense, Input, Reshape
import keras.backend as K
class SimplePermute(Layer):
def __init__(self, **kwargs):
super(SimplePermute, self).__init__(**kwargs)
def call(self, inputs, **kwargs):
return K.permute_dimensions(inputs, [0,2,1])
test_i = Input(shape=(10, 256))
test = SimplePermute()(test_i)
test = Dense(units=100, activation="softmax", name="sft2")(test)
I'd expect the second series of prints to print the permuted tensor shape - that is [?, 256, 10]. However, the K.int_shape() returns [?, 10, 256], while TF's get_shape() returns the properly permuted shape.
I believe this internal mismatch is causing the errors I'm seeing downstream in the model.
Your custom layer doesn't have the compute_output_shape method implemented. This is what Keras uses to determine the _keras_shape property of the tensors, which is returned by K.int_shape.
You can use the standard Permute((2,1)) layer.
Or you can use a Lambda(lambda x: K.permute_dimensions(x, [0,2,1])) layer.
Or you can implement the compute_output_shape method:
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[2], input_shape[1])

complete Keras Lambda confusion

I was trying to define a Lambda layer Keras, as follows:
First, a function which computes the wavelet transform of an image and then gloms it together:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.layers import BatchNormalization
from keras.layers import Lambda
from keras import regularizers
from keras import backend as K
import pywt
import numpy as np
from keras.engine.topology import Layer
def mkwtarray(image):
channels = K.image_data_format()
if channels is 'channels_first':
axbase = 1
axbase = 0
(a,( b, c, d ))= pywt.dwt2(image, 'db1', axes=(axbase, axbase+1))
ab = np.concatenate((a, b), axis=axbase)
cd = np.concatenate((c, d), axis=axbase)
abcd = np.concatenate((ab, cd), axis=axbase+1)
return abcd
def wtoutshape(input_shape):
return input_shape
train_data_dir = 'train'
validation_data_dir = 'validation'
nb_train_samples = 21558
nb_validation_samples = 3446
epochs = 30
batch_size = 32
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
input_shape = (img_width, img_height, 3)
model = Sequential()
model.add(Lambda(mkwtarray, input_shape=input_shape, output_shape = wtoutshape))
<more random layers>
Much to my amazement, as I was defining the model (meaning, evaluated the lines above), it errored out, claiming:
ValueError: Input array has fewer dimensions than the specified axes
Also, the 'print' statements, which printed the expected values 0 and (?, 150, 150, 3) fired, which means that the function was actually evaluated at definition time, not when the model was actually running. I am obviously missing something about Keras' Lambda functionality - any enlightenment would be appreciated.
UPDATE The exact same problem presents itself if you define a layer in the "general" way (via a class, where the lambda is now in the call function of the layer, so this is not lambda-specific.
This looks like a disastrous mix of NumPy and Keras. Let's look at the 2 main confusion points:
Once you are inside a Keras model, example Lambda layer, you are dealing with tensors and not NumPy arrays. Although convinient it would be, you can't use any NumPy operations, external libraries inside models. Having said that, tensor operators are very similar to arrays for good reason. Because it's your first layer, you can pre-process it in NumPy and then pass that into your model, this would work.
Why you get prints working? There are 2 main steps in Keras, Tensorflow: 1-> build the computation graph, 2-> actually run it. So you are building the graph and your operations get called yes, but they create symbolic tensors that have no value. So you can print the shape which can be determined when building the graph but not for example the values it holds.
Take away message, don't mix NumPy with Tensorflow inside computation graphs (models) and by all means print the shapes while building the graph to get an idea of what the graph looks like but you won't get anything more out of symbolic tensors at build time.
Maybe it's a little late, but this week I've been having a similar problem and managed to solve it.
I stopped using lambda layers to fix the problem, instead I created my own layer.
You can see how it works in my GitHub or Hugging Face repository.
Hugging Face:
I hope it at least solves the problem for some future person.
/ Fernando

Changing activation function of a keras layer w/o replacing whole layer

I am trying to change the activation function of the last layer of a keras model without replacing the whole layer. In this case, only the softmax function
import keras.backend as K
from keras.models import load_model
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
model = load_model(model_path) # Load any model
img = load_img(img_path, target_size=(224, 224))
img = img_to_array(img)
My output:
array([[1.53172877e-07, 7.13159451e-08, 6.18941920e-09, 8.52070968e-07,
1.25813088e-07, 9.98970985e-01, 1.48254022e-08, 6.09538893e-06,
1.16236095e-07, 3.91888688e-10, 6.29304608e-08, 1.79565995e-09,
1.75571788e-08, 1.02110009e-03, 2.14380114e-09, 9.54465733e-08,
1.05938483e-07, 2.20544337e-07]], dtype=float32)
Then I do this to change the activation:
model.layers[-1].activation = custom_softmax
and the output I got is exactly the same. Any ideas how to fix? Thanks!
You could try to use the custom_softmax below:
def custom_softmax(x, axis=-1):
"""Softmax activation function.
# Arguments
x : Tensor.
axis: Integer, axis along which the softmax normalization is applied.
# Returns
Tensor, output of softmax transformation.
# Raises
ValueError: In case `dim(x) == 1`.
ndim = K.ndim(x)
if ndim >= 2:
return K.zeros_like(x)
raise ValueError('Cannot apply softmax to a tensor that is 1D')
At the current state of things there's no official, clean way to do that. As pointed by #layser in the comments, the Tensorflow graph isn't being updated - which results in the lack of change in your output. One option is to use keras-vis' utils. My recommendation is to isolate that in your own, like so:
from vis.utils.utils import apply_modifications
def update_layer_activation(model, activation, index=-1):
model.layers[index].activation = activation
return apply_modifications(model)
Which would lead to a similar use:
model = update_layer_activation(model, custom_softmax)
If you follow the given link, you'll see what they do is quite simple: they save the model to a temporary path, then load it back and return, finally deleting the temp file.

Keras retrieve value of node before activation function

Imagine a fully-connected neural network with its last two layers of the following structure:
units = 612
activation = softplus
units = 1
activation = sigmoid
The output value of the net is 1, but I'd like to know what the input x to the sigmoidal function was (must be some high number, since sigm(x) is 1 here).
Folllowing indraforyou's answer I managed to retrieve the output and weights of Keras layers:
outputs = [layer.output for layer in model.layers[-2:]]
functors = [K.function( [model.input]+[K.learning_phase()], [out] ) for out in outputs]
test_input = np.array(...)
layer_outs = [func([test_input, 0.]) for func in functors]
print layer_outs[-1][0] # -> array([[ 1.]])
dense_0_out = layer_outs[-2][0] # shape (612, 1)
dense_1_weights = model.layers[-1].weights[0].get_value() # shape (1, 612)
dense_1_bias = model.layers[-1].weights[1].get_value()
x =, dense_1_weights) + dense_1_bias
print x # -> -11.7
How can x be a negative number? In that case the last layers output should be a number closer to 0.0 than 1.0. Are dense_0_out or dense_1_weights the wrong outputs or weights?
Since you're using get_value(), I'll assume that you're using Theano backend. To get the value of the node before the sigmoid activation, you can traverse the computation graph.
The graph can be traversed starting from outputs (the result of some computation) down to its inputs using the owner field.
In your case, what you want is the input x of the sigmoid activation op. The output of the sigmoid op is model.output. Putting these together, the variable x is model.output.owner.inputs[0].
If you print out this value, you'll see Elemwise{add,no_inplace}.0, which is an element-wise addition op. It can be verified from the source code of
def call(self, inputs):
output =, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias)
if self.activation is not None:
output = self.activation(output)
return output
The input to the activation function is the output of K.bias_add().
With a small modification of your code, you can get the value of the node before activation:
x = model.output.owner.inputs[0]
func = K.function([model.input] + [K.learning_phase()], [x])
print func([test_input, 0.])
For anyone using TensorFlow backend: use x = model.output.op.inputs[0] instead.
I can see a simple way just changing a little the model structure. (See at the end how to use the existing model and change only the ending).
The advantages of this method are:
You don't have to guess if you're doing the right calculations
You don't need to care about the dropout layers and how to implement a dropout calculation
This is a pure Keras solution (applies to any backend, either Theano or Tensorflow).
There are two possible solutions below:
Option 1 - Create a new model from start with the proposed structure
Option 2 - Reuse an existing model changing only its ending
Model structure
You could just have the last dense separated in two layers at the end:
units = 612
activation = softplus
units = 1
#no activation
activation = sigmoid
Then you simply get the output of the last dense layer.
I'd say you should create two models, one for training, the other for checking this value.
Option 1 - Building the models from the beginning:
from keras.models import Model
#build the initial part of the model the same way you would
#add the Dense layer without an activation:
#if using the functional Model API
denseOut = Dense(1)(outputFromThePreviousLayer)
sigmoidOut = Activation('sigmoid')(denseOut)
#if using the sequential model - will need the functional API
sigmoidOut = Activation('sigmoid')(model.output)
Create two models from that, one for training, one for checking the output of dense:
#if using the functional API
checkingModel = Model(yourInputs, denseOut)
#if using the sequential model:
checkingModel = model
trainingModel = Model(checkingModel.inputs, sigmoidOut)
Use trianingModel for training normally. The two models share weights, so training one is training the other.
Use checkingModel just to see the outputs of the Dense layer, using checkingModel.predict(X)
Option 2 - Building this from an existing model:
from keras.models import Model
#find the softplus dense layer and get its output:
softplusOut = oldModel.layers[indexForSoftplusLayer].output
#or should this be the output from the dropout? Whichever comes immediately after the last Dense(1)
#recreate the dense layer
outDense = Dense(1, name='newDense', ...)(softPlusOut)
#create the new model
checkingModel = Model(oldModel.inputs,outDense)
It's important, since you created a new Dense layer, to get the weights from the old one:
wgts = oldModel.layers[indexForDense].get_weights()
In this case, training the old model will not update the last dense layer in the new model, so, let's create a trainingModel:
outSigmoid = Activation('sigmoid')(checkingModel.output)
trainingModel = Model(checkingModel.inputs,outSigmoid)
Use checkingModel for checking the values you want with checkingModel.predict(X). And train the trainingModel.
So this is for fellow googlers, the working of the keras API has changed significantly since the accepted answer was posted. The working code for extracting a layer's output before activation (for tensorflow backend) is:
model = Your_Keras_Model()
the_tensor_you_need = model.output.op.inputs[0] #<- this is indexable, if there are multiple inputs to this node then you can find it with indexing.
In my case, the final layer was a dense layer with activation softmax, so the tensor output I needed was <tf.Tensor 'predictions/BiasAdd:0' shape=(?, 1000) dtype=float32>.
(TF backend)
Solution for Conv layers.
I had the same question, and to rewrite a model's configuration was not an option.
The simple hack would be to perform the call function manually. It gives control over the activation.
Copy-paste from the Keras source, with self changed to layer. You can do the same with any other layer.
def conv_no_activation(layer, inputs, activation=False):
if layer.rank == 1:
outputs = K.conv1d(
if layer.rank == 2:
outputs = K.conv2d(
if layer.rank == 3:
outputs = K.conv3d(
if layer.use_bias:
outputs = K.bias_add(
if activation and layer.activation is not None:
outputs = layer.activation(outputs)
return outputs
Now we need to modify the main function a little. First, identify the layer by its name. Then retrieve activations from the previous layer. And at last, compute the output from the target layer.
def get_output_activation_control(model, images, layername, activation=False):
"""Get activations for the input from specified layer"""
inp = model.input
layer_id, layer = [(n, l) for n, l in enumerate(model.layers) if == layername][0]
prev_layer = model.layers[layer_id - 1]
conv_out = conv_no_activation(layer, prev_layer.output, activation=activation)
functor = K.function([inp] + [K.learning_phase()], [conv_out])
return functor([images])
Here is a tiny test. I'm using VGG16 model.
a_relu = get_output_activation_control(vgg_model, img, 'block4_conv1', activation=True)[0]
a_no_relu = get_output_activation_control(vgg_model, img, 'block4_conv1', activation=False)[0]
print(np.sum(a_no_relu < 0))
> 245293
Set all negatives to zero to compare with the results retrieved after an embedded in VGG16 ReLu operation.
a_no_relu[a_no_relu < 0] = 0
print(np.allclose(a_relu, a_no_relu))
> True
easy way to define new layer with new activation function:
def change_layer_activation(layer):
if isinstance(layer, keras.layers.Conv2D):
config = layer.get_config()
config["activation"] = "linear"
new = keras.layers.Conv2D.from_config(config)
elif isinstance(layer, keras.layers.Dense):
config = layer.get_config()
config["activation"] = "linear"
new = keras.layers.Dense.from_config(config)
weights = [x.numpy() for x in layer.weights]
return new, weights
I had the same problem but none of the other answers worked for me. Im using a newer version of Keras with Tensorflow so some answers dont work now. Also the structure of the model is given so i can't change it easely. The general idea is to create a copy of the original model that will work exactly like the original one but spliting the activation from the outputs layers. Once this is done we can easely access the outputs values before the activation is applied.
First we will create a copy of the original model but with no activation on the outputs layers. This will be done using Keras clone_model function (See Docs).
from tensorflow.keras.models import clone_model
from tensorflow.keras.layers import Activation
original_model = get_model()
def f(layer):
config = layer.get_config()
if not isinstance(layer, Activation) and in original_model.output_names:
config.pop('activation', None)
layer_copy = layer.__class__.from_config(config)
return layer_copy
copy_model = clone_model(model, clone_function=f)
This alone will only make a clone with new weights so we must copy the original_model weights to the new one:
Now we will add the activations layers:
from tensorflow.keras.models import Model
old_outputs = [ original_model.get_layer(name=name) for name in copy_model.output_names ]
new_outputs = [ Activation(old_output.activation)(output) if old_output.activation else output
for output, old_output in zip(copy_model.outputs, old_outputs) ]
copy_model = Model(copy_model.inputs, new_outputs)
Finally we could create a new model whose evaluation will be the outputs with no activation applied:
no_activation_outputs = [ copy_model.get_layer(name=name).output for name in original_model.output_names ]
no_activation_model = Model(copy.inputs, no_activation_outputs)
Now we could use copy_model like the original_model and no_activation_model to access pre-activation outputs. Actually you could even modify the code to split a custom set of layers instead of the outputs.
