Getting gradient of model output w.r.t weights using Keras - python

I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.

To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
evaluated_gradients =,feed_dict={model.input:trainingExample})
And thats it!

The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
evaluated_gradients =,feed_dict={model.input:training_data1})


How to modify cost functions by weight gradient variance in keras?

I am writing a neural network in keras. I want to modify the loss function so that I can use the array (in the shape of a gradient array) of parameters as additional tool to modify the cost function.
To be precise, I'd like to use the variance of the gradients from past training. Parameters that have a high gradient variance - let's call it h, are assumed to be parameters that hold the features.
I would like the cost function to use parameters whose h value is as small as possible when training new features - for this I have to modify the cost functions for the parameter like this:
Loss (parameter) = Standard_loss (y, y_pred) + h * (parameter - old parameter) ** 2
I would very much like to ask for an answer.
Here is an excerpt from my code:
from keras import models
from keras.datasets import mnist
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
#I import CIFAR 10 dataset
from tensorflow.keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
train_y = to_categorical(train_y, num_classes=10, dtype='float32')
test_y = to_categorical(test_y, num_classes=10, dtype='float32')
train_X = K.cast(train_X, dtype='float32')
test_X = K.cast(test_X, dtype='float32')
def get_model():
model = models.Sequential()
model.add(layers.Conv2D(1, 5, (1,1), input_shape=(32,32,3,), padding='same'))
model.add(layers.Conv2D(4, 5, (2,2), padding='same'))
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(10, activation='linear'))
return model
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']), train_y, epochs=50, validation_split=0.2)
weights = model.get_weights()
Unfortunately, I don't know how to take the gradient from the weights :/
I want get a gradient table for each parameter for a single training example. I do not mean the total gradient of the cost function as mentioned elsewhere on the internet.
From what I can see, the cost function is modifiable, but it only takes y_pred and y_true. How could I input something that corresponds to the weights (but it is not a weight)?
Thanks in advance!

Prediction Interval for Neural Net in Python

I'm currently using keras to create a neural net in python. I have a basic model and the code looks like this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation="relu"))
model.compile(loss='mean_squared_error', optimizer='adam')
It works well and gives me good predictions for my use case. However, I would like to be able to use a Variational Gaussian Process layer to give me an estimate for the prediction interval as well. I'm new to this type of layer and am struggling a bit to implement it. The tensorflow documentation on it can be found here:
However, I'm not seeing that same layer in the keras library. For further reference, I'm trying to do something similar to what was done in this article:
There seems to be a bit more complexity when you have 23 inputs vs one that I'm not understanding. I'm also open to other methods to achieving the target objective. Any examples on how to do this or insights on other approaches would be greatly appreciated!
tensorflow_probability is a separate library but suitable to use with Keras and TensorFlow. You can add those custom layers in your code and change it to a probabilistic model. If your goal is just to get a prediction interval it would be simpler to use the DistributionLambda layer. So your code would be as follows:
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import make_regression
import tensorflow_probability as tfp
import tensorflow as tf
tfd = tfp.distributions
# Sample data
X, y = make_regression(n_samples=100, n_features=23, noise=4.0, bias=15)
# loss function Negative log likelyhood
negloglik = lambda y, p_y: -p_y.log_prob(y)
# Model
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))))
model.compile(loss=negloglik, optimizer='adam'),y, epochs=250, verbose=None)
After training your model, you can get your prediction distribution with the following lines:
yhat = model(X) # make predictions
means = yhat.mean() # prediction means
stds = yhat.stddev() # prediction standard deviation

Keras custom loss function using outputs of each neuron

I am trying to custom a loss function using the outputs of each neuron of the last layer. And the function may not be linear. Here is what I am working on:
## some previous layers##
## my last dense layer##
dense1 = Dense(4, activation="relu", name="dense_layer1")(previous layer)
dense11 = Dense(1, activation = "sigmoid", name = "dense11")(dense1)
dense12 = Dense(1, activation = "sigmoid", name = "dense12")(dense1)
dense13 = Dense(1, activation = "sigmoid", name = "dense13")(dense1)
dense14 = Dense(1, activation = "sigmoid", name = "dense14")(dense1)
## custom loss function ##
def custom_layer(tensor):
return tensor[1]*2+tensor[2]+tensor[3]/(tensor[4]*2) #some nonlinear function like this
lambda_layer = Lambda(custom_layer, name="lambda_layer")([dense11,dense12,dense13,dense14])
model = Model(inputs=Input, outputs=lambda_layer) # "Input" are in previous layers, not shown here
model.compile(loss='mse', optimizer='adam'), Y_train, epochs=2, batch_size=512, verbose=1)
My Y_train is n*1 (n is the sample size).
So I am basically applying a nonlinear transformation of those final four neurons' output, which is equivalent as to construct a new loss function. After the transformation, the y hat should also be a n*1 vector.
But the code keeps not working. I think it is due to the lambda_layer or the custom_layer function. I also tried to define a new loss function (then there would be no "lambda_layer"), but it didn't work either. I have no idea what's wrong with it. (headache!)
Any ideas or suggestions are appreciated!! Thanks a lot! (I'm using Python3.7 with Tensorflow version 2.0.0)
Solved, thanks!

How to apply sigmoid function for each outputs in Keras?

This is part of my codes.
model = Sequential()
model.add(Dense(3, input_shape=(4,), activation='softmax'))
with this code, it will apply softmax to all the outputs at once. So the output indicates probability among all. However, I am working on non-exclusive classifire, which means I want the outputs to have independent probability.
Sorry my English is bad...
But what I want to do is to apply sigmoid function to each outputs so that they will have independent probabilities.
There is no need to create 3 separate outputs like suggested by the accepted answer.
The same result can be achieved with just one line:
model.add(Dense(3, input_shape=(4,), activation='sigmoid'))
You can just use 'sigmoid' activation for the last layer:
from tensorflow.keras.layers import GRU
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
import numpy as np
from tensorflow.keras.optimizers import Adam
model = Sequential()
model.add(Dense(3, input_shape=(4,), activation='sigmoid'))
pred = model.predict(np.random.rand(5, 4))
Output of independent probabilities:
[[0.58463055 0.53531045 0.51800555]
[0.56402034 0.51676977 0.506389 ]
[0.665879 0.58982867 0.5555959 ]
[0.66690147 0.57951677 0.5439698 ]
[0.56204814 0.54893976 0.5488999 ]]
As you can see the classes probabilities are independent from each other. The sigmoid is applied to every class separately.
You can try using Functional API to create a model with n outputs where each output is activated with sigmoid.
You can do it like this
in = Input(shape=(4, ))
dense_1 = Dense(units=4, activation='relu')(in)
out_1 = Dense(units=1, activation='sigmoid')(dense_1)
out_2 = Dense(units=1, activation='sigmoid')(dense_1)
out_3 = Dense(units=1, activation='sigmoid')(dense_1)
model = Model(inputs=[in], outputs=[out_1, out_2, out_3])

Grid Search the number of hidden layers with keras

I am trying to optimize the hyperparameters of my NN using Keras and sklearn.
I am wrapping up with KerasClassifier (it´s a classification problem).
I am trying to optimize the number of hidden layers.
I can´t figure it out how to do it with keras (actually I am wondering how to set up the function create_model in order to maximize the number of hidden layers)
Could anyone please help me?
My code (just the important part):
## Import `Sequential` from `keras.models`
from keras.models import Sequential
# Import `Dense` from `keras.layers`
from keras.layers import Dense
def create_model(optimizer='adam', activation = 'sigmoid'):
# Initialize the constructor
model = Sequential()
# Add an input layer
model.add(Dense(5, activation=activation, input_shape=(5,)))
# Add one hidden layer
model.add(Dense(8, activation=activation))
# Add an output layer
model.add(Dense(1, activation=activation))
#compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=
return model
my_classifier = KerasClassifier(build_fn=create_model, verbose=0)# Create
hyperparameter space
epochs = [5, 10]
batches = [5, 10, 100]
optimizers = ['rmsprop', 'adam']
activation1 = ['relu','sigmoid']
# Create grid search
grid = RandomizedSearchCV(estimator=my_classifier,
param_distributions=hyperparameters) #inserir param_distributions
# Fit grid search
grid_result =, y_train)
# Create hyperparameter options
hyperparameters = dict(optimizer=optimizers, epochs=epochs,
batch_size=batches, activation=activation1)
# View hyperparameters of best neural network
If you want to make the number of hidden layers a hyperparameter you have to add it as parameter to your KerasClassifier build_fn like:
def create_model(optimizer='adam', activation = 'sigmoid', hidden_layers=1):
# Initialize the constructor
model = Sequential()
# Add an input layer
model.add(Dense(5, activation=activation, input_shape=(5,)))
for i in range(hidden_layers):
# Add one hidden layer
model.add(Dense(8, activation=activation))
# Add an output layer
model.add(Dense(1, activation=activation))
#compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=
return model
Then you will be able to optimize the number of hidden layers by adding it to the dictionary, which is passed to RandomizedSearchCV's param_distributions.
One more thing, you probably should separate the activation you use for the output layer from the other layers.
Different classes of activation functions are suitable for hidden layers and for output layers used in binary classification.
