Prediction Interval for Neural Net in Python - python

I'm currently using keras to create a neural net in python. I have a basic model and the code looks like this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation="relu"))
model.compile(loss='mean_squared_error', optimizer='adam')
It works well and gives me good predictions for my use case. However, I would like to be able to use a Variational Gaussian Process layer to give me an estimate for the prediction interval as well. I'm new to this type of layer and am struggling a bit to implement it. The tensorflow documentation on it can be found here:
https://www.tensorflow.org/probability/api_docs/python/tfp/layers/VariationalGaussianProcess
However, I'm not seeing that same layer in the keras library. For further reference, I'm trying to do something similar to what was done in this article:
https://blog.tensorflow.org/2019/03/regression-with-probabilistic-layers-in.html
There seems to be a bit more complexity when you have 23 inputs vs one that I'm not understanding. I'm also open to other methods to achieving the target objective. Any examples on how to do this or insights on other approaches would be greatly appreciated!

tensorflow_probability is a separate library but suitable to use with Keras and TensorFlow. You can add those custom layers in your code and change it to a probabilistic model. If your goal is just to get a prediction interval it would be simpler to use the DistributionLambda layer. So your code would be as follows:
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import make_regression
import tensorflow_probability as tfp
import tensorflow as tf
tfd = tfp.distributions
# Sample data
X, y = make_regression(n_samples=100, n_features=23, noise=4.0, bias=15)
# loss function Negative log likelyhood
negloglik = lambda y, p_y: -p_y.log_prob(y)
# Model
model = Sequential()
model.add(Dense(23, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(500, kernel_initializer='normal', activation='relu'))
model.add(Dense(2))
model.add(tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[..., 1:]))))
model.compile(loss=negloglik, optimizer='adam')
model.fit(X,y, epochs=250, verbose=None)
After training your model, you can get your prediction distribution with the following lines:
yhat = model(X) # make predictions
means = yhat.mean() # prediction means
stds = yhat.stddev() # prediction standard deviation

Related

LOSS not changeing in very simple KERAS binary classifier

I'm trying to get a very (over) simplified Keras binary classifier neural network running without success. The LOSS just stays constant. I've played around with Optimizers (SGD, Adam, RMSProp), Learningrates, Weight-Initializations, Batch Size and input data normalization so far.
Nothing changes at all. Am I doing something fundamentally wrong? Here is the code:
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
data = np.array(
[
[100,35,35,12,0],
[101,46,35,21,0],
[130,56,46,3412,1],
[131,58,48,3542,1]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
x = x / np.linalg.norm(x)
model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='softmax', kernel_initializer='lecun_normal',
bias_initializer='lecun_normal'))
model.add(Dense(1, activation='softmax', kernel_initializer='lecun_normal',
bias_initializer='lecun_normal'))
model.compile(optimizer=SGD(learning_rate=0.1),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x, y_target, batch_size=2, epochs=10,
verbose=1)
Softmax definition is:
exp(a) / sum(exp(a)
so when you use with a single neuron you will get:
exp(a) / exp(a) = 1
That is why your classifier doesn't work with a single neuron.
You can use sigmoid instead in this special case:
exp(a) / (exp(a) + 1)
Furthermore sigmoid function is for two class classifiers. Softmax is an extension of sigmoid for multiclass classifers.
For the first layer you should use relu or sigmoid function instead of softmax.
This is the working solution based on the feedback I got
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.utils import to_categorical
data = np.array(
[
[100,35,35,12,0],
[101,46,35,21,0],
[130,56,46,3412,1],
[131,58,48,3542,1]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
x = x / np.linalg.norm(x)
model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=SGD(learning_rate=0.1),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x, y_target, epochs=1000,
verbose=1)

Strange behavior of keras v1.2.2 vs. keras v2+ (HUGE differences in accuracy)

Today I've ran into some very strange behavior of Keras. When I try to do a classification run on the iris-dataset with a simple model, keras version 1.2.2 gives me +- 95% accuracy, whereas a keras version of 2.0+ predicts the same class for every training example (leading to an accuracy of +- 35%, as there are three types of iris). The only thing that makes my model predict +-95% accuracy is downgrading keras to a version below 2.0:
I think it is a problem with Keras, as I have tried the following things, all do not make a difference;
Switching activation function in the last layer (from Sigmoid to softmax).
Switching backend (Theano and Tensorflow both give roughly same performance).
Using a random seed.
Varying the number of neurons in the hidden layer (I only have 1 hidden layer in this simple model).
Switching loss-functions.
As the model is very simple and it runs on it's own (You just need the easy-to-obtain iris.csv dataset) I decided to include the entire code;
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
#Load data
data_frame = pd.read_csv("iris.csv", header=None)
data_set = data_frame.values
X = data_set[:, 0:4].astype(float)
Y = data_set[:, 4]
#Encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
def baseline_model():
#Create & Compile model
model = Sequential()
model.add(Dense(8, input_dim=4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
#Create Wrapper For Neural Network Model For Use in scikit-learn
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
#Create kfolds-cross validation
kfold = KFold(n_splits=10, shuffle=True)
#Evaluate our model (Estimator) on dataset (X and dummy_y) using a 10-fold cross-validation procedure (kfold).
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: {:2f}% ({:2f}%)".format(results.mean()*100, results.std()*100))
if anyone wants to replicate the error here are the dependencies I used to observe the problem:
numpy=1.16.4
pandas=0.25.0
sk-learn=0.21.2
theano=1.0.4
tensorflow=1.14.0
In Keras 2.0, many parameters changed names, there is compatibility layer to keep things working, but somehow it did not apply when using KerasClassifier.
In this part of the code:
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
You are using the old name nb_epoch instead of the modern name of epochs. The default value is epochs=1, meaning that your model was only being trained for one epoch, producing very low quality predictions.
Also note that here:
model.add(Dense(3, init='normal', activation='sigmoid'))
You should be using a softmax activation instead of sigmoid, as you are using the categorical cross-entropy loss:
model.add(Dense(3, init='normal', activation='softmax'))
I've managed to isolate the issue, if you change nb_epoch to epochs, (All else being exactly equal) the model predicts very good again, in keras 2 as well. I don't know if this is intended behavior or a bug.

Unable to train neural network properly

I am trying to train a Neural Network(NN) implemented through Keras to implement the following function.
y(n) = y(n-1)*0.9 + x(n)*0.1
So the idea is to have a signal as train_x data and pass through the above function to get a train_y data, giving us a (train_x, train_y) training data.
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train_x data vs train_y data plot
The function under question y(n) is a low pass function and makes the x(n) value to not change abruptly, as shown in the plot.
Then I make a NN and fit it with (train_x, train_y) and plot the
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=1, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train_x, train_y, epochs=200, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()
loss_plot_200_epoch
And the final loss value is approximately 2.9, which I thought was pretty good. But then the accuracy plot was like this
accuracy_plot_200_epochs
So when I check the prediction of the neural network over the data it was trained on
plt.plot(model.predict(train_x))
plt.plot(train_x)
plt.show()
train_x_vs_predict_x
The values have just offsetted by a little and that's all. I tried changing the activation functions, number of neurons and layers but the result still is the same. What am I doing wrong?
---- Edit ----
Made the NN to accept 2 dimensional input and it works as intended
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
train_x = np.concatenate((np.ones(100)*120,np.ones(150)*150,np.ones(150)*90,np.ones(100)*110), axis=None)
train_y = np.ones(train_x.size)*train_x[0]
alpha = 0.9
for i in range(train_x.size):
train_y[i] = train_y[i-1]*alpha + train_x[i]*(1 - alpha)
train = np.empty((500,2))
for i in range(500):
train[i][0]=train_x[i]
train[i][1]=train_y[i]
model = Sequential()
model.add(Dense(128, kernel_initializer='normal', input_dim=2, activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(256, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='linear'))
model.compile(loss='mean_absolute_error',
optimizer='adam',
metrics=['accuracy'])
history = model.fit(train, train_y, epochs=100, verbose=0)
print(history.history['loss'][-1])
plt.plot(history.history['loss'])
plt.show()
If I execute your code, I get the following plot for the X-Y values:
If I didn't miss something important here and you really feed that to your neural net, you probably can't expect better results. The reason is, that a neural net is just a function that can only calculate one output vector for one input. In your case the output vector would consist of only one element (your y value), but as you can see in the diagram above, for x=90 there is not just one single output. So what you feed to your neural net, cannot really be calculated as a function and so most likely the network tries to calculate the straight line between point ~(90, 145) and ~(150, 150). I mean, the "upper line" in the diagram.
The neural network you're building is a simple multi-layer perceptron with one input node and one output node. This means that it is essentially a function that accepts one real number and returns one real number -- context is not passed in and can therefore not be considered. The expression
model.predict(train_x)
Does not evaluate a vector-to-vector function for the vector train_x but evaluates a number-to-number function for every number in train_x, then returns the list of results. This is why you get flat segments in the train_x_vs_predict_x plot: the same input numbers produce the same output numbers every time.
Given this constraint, the approximation is actually quite good. For example, the network has seen for x values of 150 many y values of 150 and a few lower ones but never anything above 150. So, given an x of 150, it predicts a y value of slightly lower than 150.
The function you wanted, on the other hand, refers to the previous function value and will need information about this in its input. If what you're trying to build is a function that accepts a sequence of real numbers and returns a sequence of real numbers, you could do that with a many-to-many recurrent network (and you're going to need a lot more training data than one example sequence), but since you can calculate the function directly, why bother with neural networks at all? There's no need to whip out the chainsaw where a butter knife will do.

How to apply sigmoid function for each outputs in Keras?

This is part of my codes.
model = Sequential()
model.add(Dense(3, input_shape=(4,), activation='softmax'))
model.compile(Adam(lr=0.1),
loss='categorical_crossentropy',
metrics=['accuracy'])
with this code, it will apply softmax to all the outputs at once. So the output indicates probability among all. However, I am working on non-exclusive classifire, which means I want the outputs to have independent probability.
Sorry my English is bad...
But what I want to do is to apply sigmoid function to each outputs so that they will have independent probabilities.
There is no need to create 3 separate outputs like suggested by the accepted answer.
The same result can be achieved with just one line:
model.add(Dense(3, input_shape=(4,), activation='sigmoid'))
You can just use 'sigmoid' activation for the last layer:
from tensorflow.keras.layers import GRU
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
import numpy as np
from tensorflow.keras.optimizers import Adam
model = Sequential()
model.add(Dense(3, input_shape=(4,), activation='sigmoid'))
model.compile(Adam(lr=0.1),
loss='categorical_crossentropy',
metrics=['accuracy'])
pred = model.predict(np.random.rand(5, 4))
print(pred)
Output of independent probabilities:
[[0.58463055 0.53531045 0.51800555]
[0.56402034 0.51676977 0.506389 ]
[0.665879 0.58982867 0.5555959 ]
[0.66690147 0.57951677 0.5439698 ]
[0.56204814 0.54893976 0.5488999 ]]
As you can see the classes probabilities are independent from each other. The sigmoid is applied to every class separately.
You can try using Functional API to create a model with n outputs where each output is activated with sigmoid.
You can do it like this
in = Input(shape=(4, ))
dense_1 = Dense(units=4, activation='relu')(in)
out_1 = Dense(units=1, activation='sigmoid')(dense_1)
out_2 = Dense(units=1, activation='sigmoid')(dense_1)
out_3 = Dense(units=1, activation='sigmoid')(dense_1)
model = Model(inputs=[in], outputs=[out_1, out_2, out_3])

Getting gradient of model output w.r.t weights using Keras

I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)

Categories