I am writing a neural network in keras. I want to modify the loss function so that I can use the array (in the shape of a gradient array) of parameters as additional tool to modify the cost function.
To be precise, I'd like to use the variance of the gradients from past training. Parameters that have a high gradient variance - let's call it h, are assumed to be parameters that hold the features.
I would like the cost function to use parameters whose h value is as small as possible when training new features - for this I have to modify the cost functions for the parameter like this:
Loss (parameter) = Standard_loss (y, y_pred) + h * (parameter - old parameter) ** 2
I would very much like to ask for an answer.
Here is an excerpt from my code:
from keras import models
from keras.datasets import mnist
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
#I import CIFAR 10 dataset
from tensorflow.keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
train_y = to_categorical(train_y, num_classes=10, dtype='float32')
test_y = to_categorical(test_y, num_classes=10, dtype='float32')
train_X = K.cast(train_X, dtype='float32')
test_X = K.cast(test_X, dtype='float32')
def get_model():
model = models.Sequential()
model.add(layers.Conv2D(1, 5, (1,1), input_shape=(32,32,3,), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Conv2D(4, 5, (2,2), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(10, activation='linear'))
model.add(layers.Softmax())
print(model.summary())
return model
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_X, train_y, epochs=50, validation_split=0.2)
weights = model.get_weights()
Unfortunately, I don't know how to take the gradient from the weights :/
I want get a gradient table for each parameter for a single training example. I do not mean the total gradient of the cost function as mentioned elsewhere on the internet.
From what I can see, the cost function is modifiable, but it only takes y_pred and y_true. How could I input something that corresponds to the weights (but it is not a weight)?
Thanks in advance!
Related
I'm currently using keras to create a neural net in python. I have a basic model and the code looks like this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation="relu"))
model.compile(loss='mean_squared_error', optimizer='adam')
It works well and gives me good predictions for my use case. However, I would like to be able to use a Variational Gaussian Process layer to give me an estimate for the prediction interval as well. I'm new to this type of layer and am struggling a bit to implement it. The tensorflow documentation on it can be found here:
https://www.tensorflow.org/probability/api_docs/python/tfp/layers/VariationalGaussianProcess
However, I'm not seeing that same layer in the keras library. For further reference, I'm trying to do something similar to what was done in this article:
https://blog.tensorflow.org/2019/03/regression-with-probabilistic-layers-in.html
There seems to be a bit more complexity when you have 23 inputs vs one that I'm not understanding. I'm also open to other methods to achieving the target objective. Any examples on how to do this or insights on other approaches would be greatly appreciated!
tensorflow_probability is a separate library but suitable to use with Keras and TensorFlow. You can add those custom layers in your code and change it to a probabilistic model. If your goal is just to get a prediction interval it would be simpler to use the DistributionLambda layer. So your code would be as follows:
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import make_regression
import tensorflow_probability as tfp
import tensorflow as tf
tfd = tfp.distributions
# Sample data
X, y = make_regression(n_samples=100, n_features=23, noise=4.0, bias=15)
# loss function Negative log likelyhood
negloglik = lambda y, p_y: -p_y.log_prob(y)
# Model
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(2))
model.add(tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))))
model.compile(loss=negloglik, optimizer='adam')
model.fit(X,y, epochs=250, verbose=None)
After training your model, you can get your prediction distribution with the following lines:
yhat = model(X) # make predictions
means = yhat.mean() # prediction means
stds = yhat.stddev() # prediction standard deviation
I have created a simple machine learning model to predict the multiplication of two given numbers. I followed a youtube tutorial to learn the basic and try to work on this simple idea.
My model has three dense layers - input, hidden, output. Input and hidden were using same activation function 'relu' which were giving me loss as NaN on model fit so I changed one of them to sigmoid which started giving me 0.00000+e... something as loss.
I don't know what is wrong. Anyone can please direct me what I am doing wrong or assuming wrong?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
x = np.array(df['X'])
y = np.array(df['Y'])
s = np.array(df['S'])
def build_model():
model = keras.Sequential()
inputLayer = layers.Dense(64, activation='sigmoid', input_shape=[2])
hiddenLayer = layers.Dense(64, activation='relu')
outputLayer = layers.Dense(1)
model.add(inputLayer)
model.add(hiddenLayer)
model.add(outputLayer)
model.compile(optimizer='sgd', loss='mean_squared_error',metrics=['accuracy'])
return model
model = build_model()
print(model.summary())
EPOCHS = 1000
# I didn't know how to provide mulitple input to my model for
# training so I checked stackoverflow here
# https://stackoverflow.com/questions/55233377/keras-sequential-model-with-multiple-inputs?noredirect=1&lq=1
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=EPOCHS, validation_split = 0.2, verbose=2)
print(history)
print(model.predict([[2,3],]))
Disclaimer: I am a beginner and I have just started using keras and python for the first time in my life.
It does work for smaller numbers with ReLU activation.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.Adam(lr=0.01),
loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2)
test_input = [2, 3]
print('\n{} x {} ='.format(*test_input),
np.round(model.predict([test_input])[0][0]).astype(int))
2 x 3 = 6
SGD also works, but it requires standardization/normalization, which kind of defeats the purpose of your task, so I changed it. But it also works.
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
x = np.random.randint(0, 10, 1000)
y = np.random.randint(0, 10, 1000)
s = x*y
x = x/10
y = y/10
def build_model():
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=[2]))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=keras.optimizers.SGD(0.001), loss='mean_squared_error')
return model
model = build_model()
merged_array = np.stack([x, y], axis=1)
history = model.fit(merged_array, s, epochs=250,
validation_split=0.2, batch_size=16)
test_input = [2/10, 3/10]
print('\n{} x {} ='.format(*map(lambda l: int(l*10), test_input)),
np.round(model.predict([test_input])[0][0]).astype(int))
i noticed a couple of issues with your model:
Your input layer is not an input. You do not need to have a designated input layer in this case. The arguement input_shape=[2] is sufficient to add a proper input layer before this layer.
You do not determine any batchsize in the fit function: batches are usually a small subset of your training and validation set (commonly some base-2 numbers like 4, 8, 16, 32, ...). During training not only one sample of your set is used for backpropagating and adjusting your weights (aka "learning") but in batches, which makes it faster. Since your input data are two single floating numbers (I assume) you can choose a really high batchsize like 1024 or higher. The batch size belongs to the so called hyperparameter, which affect your overall training success.
history = model.fit(merged_array, s, batch_size=1024, epochs=EPOCHS, validation_split=0.2, verbose=2)
During training you track the "accuracy" metric. As you are working on a regression problem, this is not helping you in estimating your model's performance. (Accuracy is used for classification problems) You can leave it out
I cannnot give you more specific advice with knowledge about the data you are using, how many, datapoints you have and what kind of numbers you want to multiply (bounded to numbers between 0 and 10, float or integeres,...)
Hope this helps sofar (;
I am trying to get derivative of output of a Keras model with respect to the input (x) of the model (not the weights). It seems like the easiest way is to use "gradients" from keras.backend which returns a tensor of gradients (https://keras.io/backend/). I am new with tensorflow and not comfortable with it yet. I have got the gradient tensor, and trying to get numerical values for it for different values of input (x). But it seems like the gradient value is independent of the input x (which is not expected to be) or I am doing something wrong. Any help or comment will be appreciated.
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Dense, Dropout, Activation
from keras.models import Sequential
import keras.backend as K
import tensorflow as tf
%matplotlib inline
n = 100 # sample size
x = np.linspace(0,1,n) #input
y = 4*(x-0.5)**2 #output
dy = 8*(x-0.5) #derivative of output wrt the input
model = Sequential()
model.add(Dense(32, input_dim=1, activation='relu')) # 1d input
model.add(Dense(32, activation='relu'))
model.add(Dense(1)) # 1d output
# Minimize mse
model.compile(loss='mse', optimizer='adam', metrics=["accuracy"])
model.fit(x, y, batch_size=10, epochs=1000, verbose=0)
gradients = K.gradients(model.output, model.input) #Gradient of output wrt the input of the model (Tensor)
print(gradients)
#value of gradient for the first x_test
x_test_1 = np.array([[0.2]])
sess = tf.Session()
sess.run(tf.global_variables_initializer())
evaluated_gradients_1 = sess.run(gradients[0], feed_dict={model.input:
x_test_1})
print(evaluated_gradients_1)
#value of gradient for the second x_test
x_test_2 = np.array([[0.6]])
evaluated_gradients_2 = sess.run(gradients[0], feed_dict={model.input: x_test_2})
print(evaluated_gradients_2)
output of my code:
[<tf.Tensor 'gradients_1/dense_7/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]
[[-0.21614937]]
[[-0.21614937]]
evaluated_gradients_1 and evaluated_gradients_2 are different for different runs, but always equal! I expected them to be different for the same run, because they are for different values of input (x).
Output of the network seems to be correct. Here's a plot of the network output: Output of the network vs. true value
Here's the answer:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
should be replaced by:
sess = K.get_session()
The former creates a new tensorflow session and initializes all the values, that's why it gives random values as the output of gradient function. The latter pulls out the session which was used inside the Keras, which has after training values.
I need help with calculating derivatives for model output wrt inputs in Keras.
I want to add a regularization functional to the loss function. The regularizer contains the derivative of the classifier function. So I tried to take the derivative of model output. The model is a MLP with one hidden layer. The dataset is MNIST. When I compile the model and take the derivative, I get [None] as the result instead of the derivative function.
I have seen a similar post, but didn't get answer there either:
Taking derivative of Keras model wrt to inputs is returning all zeros
Here is my code. Please help me to solve the problem.
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras import backend as K
num_hiddenNodes = 1024
num_classes = 10
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28 * 28)
X_train = X_train.astype('float32')
X_train /= 255
y_train = keras.utils.to_categorical(y_train, num_classes)
model = Sequential()
model.add(Dense(num_hiddenNodes, activation='softplus', input_shape=(784,)))
model.add(Dense(num_classes, activation='softmax'))
# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
logits = model.output
# logits = model.layers[-1].output
print(logits)
X = K.identity(X_train)
# X = tf.placeholder(dtype=tf.float32, shape=(None, 784))
print(X)
print(K.gradients(logits, X))
Here is the output for the code. The two parameters are Tensors. The gradients function returns None.
Tensor("dense_2/Softmax:0", shape=(?, 10), dtype=float32)
Tensor("Identity:0", shape=(60000, 784), dtype=float32)
[None]
You are computing the gradients respect to X_train, which is not an input variable to the computation graph. Instead you need to get the symbolic input tensor to the model, so try something like:
grads = K.gradients(model.output, model.input)
I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)