I am trying to use a custom loss function in my Keras sequential model (TensorFlow 2.6.0). This custom loss (ideally) will calculate the data loss plus the residual of a physical equation (say, diffusion equation, Navier Stokes, etc.). This residual error is based on the model output derivative wrt its inputs and I want to use GradientTape.
In this MWE, I removed the data loss term and other equation losses, and just used the derivative of the output wrt its input. The dataset can be found here.
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
import tensorflow as tf #tf.__version__ = '2.6.0'
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8] #X.shape = (768, 8)
y = dataset[:,8]
X = tf.convert_to_tensor(X, dtype=tf.float32)
y = tf.convert_to_tensor(y, dtype=tf.float32)
def customLoss(y_true,y_pred):
x_tensor = tf.convert_to_tensor(model.input, dtype=tf.float32)
# x_tensor = tf.cast(x_tensor, tf.float32)
with tf.GradientTape() as t:
t.watch(x_tensor)
output = model(x_tensor)
DyDX = t.gradient(output, x_tensor)
dy_t = DyDX[:, 5:6]
R_pred=dy_t
# loss_data = tf.reduce_mean(tf.square(yTrue - yPred), axis=-1)
loss_PDE = tf.reduce_mean(tf.square(R_pred))
return loss_PDE
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss=customLoss, optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=15)
After execution, I get this ValueError:
ValueError: Passed in object of type <class 'keras.engine.keras_tensor.KerasTensor'>, not tf.Tensor
When I change loss=customLoss to loss='mse', the model starts training, but using that customLoss is the whole point. Any ideas?
The problem seems to come from model.input in the loss function, If I understand your code correctly, you can use the loss :
def custom_loss_pass(model, x_tensor):
def custom_loss(y_true,y_pred):
with tf.GradientTape() as t:
t.watch(x_tensor)
output = model(x_tensor)
DyDX = t.gradient(output, x_tensor)
dy_t = DyDX[:, 5:6]
R_pred=dy_t
# loss_data = tf.reduce_mean(tf.square(yTrue - yPred), axis=-1)
loss_PDE = tf.reduce_mean(tf.square(R_pred))
return loss_PDE
return custom_loss
And then:
model.compile(loss=custom_loss_pass(model, X), optimizer='adam', metrics=['accuracy'])
I am not sure it does what you want but at least it works!
Related
I am writing a neural network in keras. I want to modify the loss function so that I can use the array (in the shape of a gradient array) of parameters as additional tool to modify the cost function.
To be precise, I'd like to use the variance of the gradients from past training. Parameters that have a high gradient variance - let's call it h, are assumed to be parameters that hold the features.
I would like the cost function to use parameters whose h value is as small as possible when training new features - for this I have to modify the cost functions for the parameter like this:
Loss (parameter) = Standard_loss (y, y_pred) + h * (parameter - old parameter) ** 2
I would very much like to ask for an answer.
Here is an excerpt from my code:
from keras import models
from keras.datasets import mnist
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
#I import CIFAR 10 dataset
from tensorflow.keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
train_y = to_categorical(train_y, num_classes=10, dtype='float32')
test_y = to_categorical(test_y, num_classes=10, dtype='float32')
train_X = K.cast(train_X, dtype='float32')
test_X = K.cast(test_X, dtype='float32')
def get_model():
model = models.Sequential()
model.add(layers.Conv2D(1, 5, (1,1), input_shape=(32,32,3,), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Conv2D(4, 5, (2,2), padding='same'))
model.add(layers.MaxPooling2D())
model.add(layers.ReLU())
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(10, activation='linear'))
model.add(layers.Softmax())
print(model.summary())
return model
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_X, train_y, epochs=50, validation_split=0.2)
weights = model.get_weights()
Unfortunately, I don't know how to take the gradient from the weights :/
I want get a gradient table for each parameter for a single training example. I do not mean the total gradient of the cost function as mentioned elsewhere on the internet.
From what I can see, the cost function is modifiable, but it only takes y_pred and y_true. How could I input something that corresponds to the weights (but it is not a weight)?
Thanks in advance!
I am completely new to PyTorch, I would like to move my TF code to PyTorch, and I think I am missing something.
I have X as input and Y as output. X is a time series data, on which I would like to do 1D convolution. Y is just a plain number.
X has a shape of (1050589, 81, 21). I have 1050589 experiments, each experiment has 81 timestamps and each timestamp has 21 points of data. This is the required format for TF, but as far as I was able to get out in PyTorch the time dimension should be the last one.
I have my data in a numpy array, so first I transformed the data to fit PyTorch, and also transformed into a list.
a = []
for n, i in enumerate(X):
a.append([X[n].T, Y[n]])
train_data = DataLoader(a, batch_size=128)
My model looks like this:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.linear_relu_stack = nn.Sequential(
nn.Conv1d(EMBED_SIZE, 32, 7, padding='same'),
nn.ReLU(),
nn.Flatten(),
nn.Linear(81*32, 32),
nn.ReLU(),
nn.Linear(32, 1),
)
def forward(self, x):
logits = self.linear_relu_stack(x)
return logits.double()
The architecture is simple, as I want to keep it the same as I have in Tensorflow. One convolution with a kernel of 7 and 32 channels, followed by a dense layer and a single output layer.
Same network in Tensorflow:
def conv_1d_model():
model = Sequential(name="model_conv1D")
model.add(Conv1D(filters=32, kernel_size=7, activation='relu', input_shape=(81, 21), padding="same"))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
return model
Now when I try to optimize this network in PyTorch my losses are all over the place, not decreasing at all, while in TensorFlow it runs perfectly well.
I am sure I am missing something, can anyone point me in the right direction?
My optimization function in PyTorch:
model = NeuralNetwork()
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = torch.squeeze(model(X)) # I was getting a warning about the pred being in different shape than y, so I squeezed it
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 10 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Optimization in Tensorflow
model = conv_1d_model()
opt = Adam(learning_rate=learning_rate)
model.compile(loss='mse', optimizer=opt, metrics=['mae'])
model_history = model.fit(X, Y, validation_split=0.2, epochs=epochs, batch_size=batch_size, verbose=1)
I am running an LSTM, GRU and bilstm model using the following code
# Create BiLSTM model
def create_model_bilstm(units):
model = Sequential()
model.add(Bidirectional(LSTM(units = units,
return_sequences=True),
input_shape=(X_train.shape[1], X_train.shape[2])))
#model.add(Bidirectional(LSTM(units = units)))
model.add(Dense(1))
#Compile model
model.compile(loss='mse', optimizer='adam')
return model
# Create LSTM or GRU model
def create_model(units, m):
model = Sequential()
model.add(m (units = units, return_sequences = True,
input_shape = [X_train.shape[1], X_train.shape[2]]))
model.add(Dropout(0.1))
#model.add(m (units = units))
#model.add(Dropout(0.2))
model.add(Dense(units = 1))
#Compile model
model.compile(loss='mse', optimizer='adam')
return model
# BiLSTM
model_bilstm = create_model_bilstm(20)
# GRU and LSTM
model_gru = create_model(50, GRU)
model_lstm = create_model(20, LSTM)
# Fit BiLSTM, LSTM and GRU
def fit_model(model):
early_stop = EarlyStopping(monitor = 'val_loss',
patience = 100)
history = model.fit(X_train, y_train, epochs = 700,
validation_split = 0.2, batch_size = 32,
shuffle = False, callbacks = [early_stop])
return history
history_bilstm = fit_model(model_bilstm)
history_lstm = fit_model(model_lstm)
history_gru = fit_model(model_gru)
This all runs smoothly and prints out my loss graphs. but when it comes to predictions i run the following code
# Make prediction
def prediction(model):
prediction = model.predict(X_test)
prediction = scaler_y.inverse_transform(prediction)
return prediction
prediction_bilstm = prediction(model_bilstm)
prediction_lstm = prediction(model_lstm)
prediction_gru = prediction(model_gru)
and i get the following error
ValueError Traceback (most recent call last)
<ipython-input-387-9d45f01ae2a2> in <module>
5 return prediction
6
----> 7 prediction_bilstm = prediction(model_bilstm)
8 prediction_lstm = prediction(model_lstm)
9 prediction_gru = prediction(model_gru)
<ipython-input-387-9d45f01ae2a2> in prediction(model)
2 def prediction(model):
3 prediction = model.predict(X_test)
----> 4 prediction = scaler_y.inverse_transform(prediction)
5 return prediction
...
ValueError: Found array with dim 3. Estimator expected <= 2.
I am assuming this has something to do with my X_test shape based on other posts i have read so i tried to reshape it to 2d but got another error telling me "expected bidirectional_3_input to have 3 dimensions, but got array with shape (62, 36)" on line 7 again.
What am i doing wrong and how can i fix it?
Data Explanation:
So I am trying to predict discharge rates (target variable) using groundwater levels (34 features), precipitation and temperature as input which gives me a total of 36 features. My data is in monthly resolution. I am using 63 observation for my test (5 year pred) and the rest for my train.
What are you doing wrong? Let's assume your input data has shape X_train.shape = [d0,d1,d2], then after setting up your BiLSTM-model like
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Bidirectional,LSTM,Dense
model = tf.keras.Sequential()
model.add(
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
units = 10,
return_sequences=True),
input_shape=(d1, d2)
)
)
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
we can check the input- and output-shapes your model expects by
>>model.input.shape
TensorShape([None, d1, d2])
>>model.output.shape
TensorShape([None, d1, 1])
So your model expects input of shape (n_batch,d1,d2), where n_batch is the batch size of the data, and returns a shape (n_batch,d1,1), thus a 3d-tensor.
Now if you provide a 3d-tensor to your model, the model.prediction-method will succesfully return a 3d-tensor, however sklearn.preprocessing.StandardScaler.inverse_transform only works for 2d-data, thats why it says
ValueError: Found array with dim 3. Estimator expected <= 2.
On the other hand, if you first reshape your data to be 2d, then model.prediction complains, because it is set up to expect a 3d-tensor.
How can you fix it? For further help on how to fix your code, you will need to provide us with more detailled information on what you expect your model to do, especially what output-shape you want your BiLSTM-model to have. I assume you actually want your BiLSTM-model to return a scalar for each sample, so an additional Flatten-layer might do the trick:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Bidirectional,LSTM,Dense,Flatten
model = tf.keras.Sequential()
model.add(
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(
units = 10,
return_sequences=True),
input_shape=(d1, d2)
)
)
model.add(Flatten()) #<-- additional flatten-layer
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
I understand that the features extracted from an auto-encoder can be fed into an mlp for classification or regression purpose. This is something that I did earlier.
But what if I have 2 auto-encoders? Can I extract the features from the bottleneck layers of 2 auto-encoders and feed them into an mlp which performs classification based on these features? If yes, then how? I am not sure how to concatenate these two feature sets. I tried with numpy.hstack() which gives me 'unhashable slice' error, whereas, using tf.concat() gives me the error 'Input tensors to a Model must be Keras tensors.' the bottleneck layers of the two auto-encoders are of dimension (None,100) each. So, essentially, if I stack them horizontally, I should be getting a (None, 200). The hidden layer of the mlp may contain some (num_hidden=100) neurons. Could anyone please help?
x1 = autoencoder1.get_layer('encoder2').output
x2 = autoencoder2.get_layer('encoder2').output
#inp = np.hstack((x1, x2))
inp = tf.concat([x1, x2], 1)
x = tf.concat([x1, x2], 1)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inp, outputs=y)
# Compile model
mymlp.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train model
mymlp.fit(x_train, y_train, epochs=20, batch_size=8)
updated as per #twolffpiggott's suggestion:
from keras.layers import Input, Dense, Dropout
from keras import layers
from keras.models import Model
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import numpy as np
x1 = Data1
x2 = Data2
y = Data3
num_neurons1 = x1.shape[1]
num_neurons2 = x2.shape[1]
# Train-test split
x1_train, x1_test, x2_train, x2_test, y_train, y_test = train_test_split(x1, x2, y, test_size=0.2)
# scale data within [0-1] range
scalar = MinMaxScaler()
x1_train = scalar.fit_transform(x1_train)
x1_test = scalar.transform(x1_test)
x2_train = scalar.fit_transform(x2_train)
x2_test = scalar.transform(x2_test)
x_train = np.concatenate([x1_train, x2_train], axis =-1)
x_test = np.concatenate([x1_test, x2_test], axis =-1)
# Auto-encoder1
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons1,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded1 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded1)
decoded = Dense(num_neurons1, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder1 = Model(inputs=input_data, outputs=decoded)
autoencoder1.compile(optimizer='sgd', loss='mse')
# training
autoencoder1.fit(x1_train, x1_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x1_test, x1_test))
# Auto-encoder2
encoding_dim1 = 500
encoding_dim2 = 100
input_data = Input(shape=(num_neurons2,))
encoded = Dense(encoding_dim1, activation='relu', name='encoder1')(input_data)
encoded2 = Dense(encoding_dim2, activation='relu', name='encoder2')(encoded)
decoded = Dense(encoding_dim2, activation='relu', name='decoder1')(encoded2)
decoded = Dense(num_neurons2, activation='sigmoid', name='decoder2')(decoded)
# this model maps an input to its reconstruction
autoencoder2 = Model(inputs=input_data, outputs=decoded)
autoencoder2.compile(optimizer='sgd', loss='mse')
# training
autoencoder2.fit(x2_train, x2_train,
epochs=100,
batch_size=8,
shuffle=True,
validation_data=(x2_test, x2_test))
# MLP
num_hidden = 100
encoded1.trainable = False
encoded2.trainable = False
encoded1 = autoencoder1(autoencoder1.inputs)
encoded2 = autoencoder2(autoencoder2.inputs)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
x = Dropout(0.2)(concatenated)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
# Compile model
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit(x_train, y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict(x_test)
giving me an error: unhashable type: 'list' from the line:
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
The problem is that you're mixing numpy arrays with keras tensors. This can't go.
There are two approaches.
Predict numpy arrays from each autoencoder, concat the arrays, send them to the third model
Connect all models, probably make the autoencoders untrainable, fit with one input for each autoencoder.
Personally, I'd go for the first. (Assuming the autoencoders are already trained and don't need change).
First approach
numpyOutputFromAuto1 = autoencoder1.predict(numpyInputs1)
numpyOutputFromAuto2 = autoencoder2.predict(numpyInputs2)
inputDataForThird = np.concatenate([numpyOutputFromAuto1,numpyOutputFromAuto2],axis=-1)
inputTensorForMlp = Input(inputsForThird.shape[1:])
h = Dense(num_hidden, activation='relu', name='hidden')(inputTensorForMlp)
y = Dense(1, activation='sigmoid', name='prediction')(h)
mymlp = Model(inputs=inputTensorForMlp, outputs=y)
....
mymlp.fit(inputDataForThird ,someY)
Second Approach
This is a little more complicated, and at first I don't see much reason to do this. (But of course there may be cases where it's a good choice)
Now we're totally forgetting numpy and working with keras tensors.
Creating the mlp on its own (good if you will use it later without the autoencoders):
inputTensorForMlp = Input(input_shape_compatible_with_concatenated_encoder_outputs)
x = Dropout(0.2)(inputTensorForMlp)
h = Dense(num_hidden, activation='relu', name='hidden')(x)
h = Dropout(0.5)(h)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model(inputs=[autoencoder1.inputs, autoencoder2.inputs], outputs=y)
We probably want the bottleneck features of the autoencoders, right? If you happened to create the autoencoders properly with: encoder model, decoder model, join both, then it's easier to use just the encoder model. Else:
encodedOutput1 = autoencoder1.layers[bottleneckLayer].outputs #or encoder1.outputs
encodedOutput2 = autoencoder1.layers[bottleneckLayer].outputs #or encoder2.outputs
Creating a joined model. The concatenation must use a keras layer (we're working with keras tensors):
concatenated = Concatenate()([encodedOutput1,encodedOutput2])
output = myMLP(concatenated)
joinedModel = Model([autoencoder1.input,autoencoder2.input],output)
I'd also go with Daniel's first approach (for simplicity and efficiency), but if you're interested in the second; for instance if you're interested in running the network end-to-end, you'd approach it like this:
# make autoencoders not trainable
autoencoder1.trainable = False
autoencoder2.trainable = False
encoded1 = autoencoder1(kerasInputs1)
encoded2 = autoencoder2(kerasInputs2)
concatenated = layers.concatenate([encoded1, encoded2], axis=-1)
h = Dense(num_hidden, activation='relu', name='hidden')(concatenated)
y = Dense(1, activation='sigmoid', name='prediction')(h)
myMLP = Model([input_data1, input_data2], y)
myMLP.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Training
myMLP.fit([x1_train, x2_train], y_train, epochs=200, batch_size=8)
# Testing
myMLP.predict([x1_test, x2_test])
Key edits
The weights of both autoencoders should be frozen end-to-end (otherwise early-stage gradient updates from the randomly initialized MLP will likely result in the loss of much of their learning).
The autoencoder input layers should be assigned to separate variables input_data1 and input_data2 per autoencoder (instead of both to input_data). Even though autoencoder1.inputs returns a tf tensor, this is the source of the unhashable type: list exception, and replacing with [input_data1, input_data2] solves the issue.
When fitting the MLP for the end-to-end model, the input should be a list of x1_train and x2_train rather than the concatenated inputs. Same when predicting.
I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)