How to implement gradient ascent in a Keras DQN - python

Have built a Reinforcement Learning DQN with variable length sequences as inputs, and positive and negative rewards calculated for actions. Some problem with my DQN model in Keras means that although the model runs, average rewards over time decrease, over single and multiple cycles of epsilon. This does not change even after significant period of training.
My thinking is that this is due to using MeanSquareError in Keras as the Loss function (minimising error). So I am trying to implement gradient ascent (to maximise reward). How to do this in Keras? My current model is:
model = Sequential()
inp = (env.NUM_TIMEPERIODS, env.NUM_FEATURES)
model.add(Input(shape=inp)) # 'a shape tuple(integers), not including batch-size
model.add(Masking(mask_value=0., input_shape=inp))
model.add(LSTM(env.NUM_FEATURES, input_shape=inp, return_sequences=True))
model.add(LSTM(env.NUM_FEATURES))
model.add(Dense(env.NUM_FEATURES))
model.add(Dense(4))
model.compile(loss='mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
In trying to implement gradient ascent, by 'flipping' the gradient (as negative or inverse loss?), I have tried various loss definitions:
loss=-'mse'
loss=-tf.keras.losses.MeanSquaredError()
loss=1/tf.keras.losses.MeanSquaredError()
but these all generate bad operand [for unary] errors.
How to adapt current Keras model to maximise rewards ?
Or is this gradient ascent not even the problem? Could it be some issue with the action policy?

Writing a custom loss function
Here is the loss function you want
#tf.function
def positive_mse(y_true, y_pred):
return -1 * tf.keras.losses.MSE(y_true, y_pred)
And then your compile line becomes
model.compile(loss=positive_mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
Please note : use loss=positive_mse and not loss=positive_mse(). That's not a typo. This is because you need to pass the function, not the results of executing the function.

Related

Tensorflow Custom Regularization Term comparing the Prediction to the True value

Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/

How does PyTorch compute the backward pass when optimizing triplet loss?

I am implementing a triplet network in Pytorch where the 3 instances (sub-networks) share the same weights. Since the weights are shared, I implemented it as a single instance network that is called three times to produce the anchor, positive, and negative embeddings. The embeddings are learned by optimizing the triplet loss. Here is a small snippet for illustration:
from dependencies import *
model = SingleSubNet() # represents each instance in the triplet net
for epoch in epochs:
for anch, pos, neg in enumerate(train_loader):
optimizer.zero_grad()
fa, fp, fn = model(anch), model(pos), model(neg)
loss = triplet_loss(fa, fp, fn)
loss.backward()
optimizer.step()
# Do more stuff ...
My complete code works as expected. However, I do not understand what does the loss.backward() compute the gradient(s) in this case. I am confused because there are 3 gradients of loss is in each learning step (the gradients formulas are here). I assume the gradients are summed before performing optimizer.step(). But then it looks from the equations that if the gradients are summed, they will cancel each other out and yield zero update term. Of course, this is not true as the network learns meaningful embeddings at the end.
Thanks in advance
Late answer, but hope this helps someone.
The gradients that you linked are the gradients of the loss with respect to the embeddings (the anchor, positive embedding and negative embedding). To update the model parameters, you use the gradient of the loss with respect to the model parameters. This does not sum to zero.
The reason for this is that when calculating the gradient of the loss with respect to the model parameters, the formula makes use of the activations from the forward pass, and the 3 different inputs (anchor image, positive example and negative example) have different activations in the forward pass.

Use Hamming Distance Loss Function with Tensorflow GradientTape: no gradients. Is it not differentiable?

I'm using Tensorflow 2.1 and Python 3, creating my custom training model following the tutorial "Tensorflow - Custom training: walkthrough".
I'm trying to use Hamming Distance on my loss function:
import tensorflow as tf
import tensorflow_addons as tfa
def my_loss_hamming(model, x, y):
global output
output = model(x)
return tfa.metrics.hamming.hamming_loss_fn(y, output, threshold=0.5, mode='multilabel')
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
tape.watch(model.trainable_variables)
loss_value = my_loss_hamming(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
When I call it:
loss_value, grads = grad(model, feature, label)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
grads variable is a list with 38 None.
And I get the error:
No gradients provided for any variable: ['conv1_1/kernel:0', ...]
Is there any way to use Hamming Distance without "interrupts the gradient chain registered by the gradient tape"?
Apology if I'm saying something obvious, but the way how backpropagation works as a fitting algorithm for neural networks is through gradients - e.g. for each batch of training data you compute how much the loss function will improve/degrade if you move a particular trainable weight by a very small amount delta.
Hamming loss is by definition not differentiable, so for small movements of trainable weights you will never experience any changes in the loss. I imagine it is only added to be used for final measurements of trained models' performance rather than for training.
If you want to train a neural net through backpropagation you need to use some differentiable loss - such that can help the model to move weights in the right direction. Sometimes people use different techniques to smooth such losses as Hamming less and create approximations - e.g. here it could be something which would penalize less predictions which are closer to the target answer rather then just giving out 1 for everything above threshold and 0 for everything else.

Custom loss function for predicting interget outputs?

I'm currenly working on a dataset where I've to predict an integer output. It starts from 1 to N. I've build a network with loss function mse. But I feel like mse loss function may not be an ideal loss function to minimize in the case of integer output.
I'm also round my prediction to get integer output. Is there a way to make/optimize the model better in case of integer output.
Can anyone provide some help on how to deal with integer output/targets. This is the loss function I'm using right now.
model.compile(optimizer=SGD(0.001), loss='mse')
You are using the wrong loss, mean squared error is a loss for regression, and you have a classification problem (discrete outputs, not continuous).
So for this your model should have a softmax output layer:
model.add(Dense(N, activation="softmax"))
And you should be using a classification loss:
model.compile(optimizer=SGD(0.001), loss='sparse_categorical_crossentropy')
Assuming your labels are integers in the [0, N-1] range (off by one), this should work. To make a prediction, you should do:
output = np.argmax(model.predict(some_data), axis=1) + 1
The +1 is because integer labels go from 0 to N-1
Ordinal regression could be an appropriate approach, in case predicting the wrong month but close to the true month is considered a smaller mistake than predicting a value one year earlier or later. Only you can know that, based on the specific problem you want to solve.
I found an implementation of the appropriate loss function on github (no affiliation). For completeness, below I copy-paste the code from that repo:
from keras import backend as K
from keras import losses
def loss(y_true, y_pred):
weights = K.cast(
K.abs(K.argmax(y_true, axis=1) - K.argmax(y_pred, axis=1))/(K.int_shape(y_pred)[1] - 1),
dtype='float32'
)
return (1.0 + weights) * losses.categorical_crossentropy(y_true, y_pred)

How to update weights in keras for reinforcement learning?

I am working in a reinforcement learning program and I am using this article as the reference. I am using python with keras(theano) for creating neural network and the pseudo code I am using for this program is
Do a feedforward pass for the current state s to get predicted Q-values for all actions.
Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).
Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.
Update the weights using backpropagation.
The loss function equation here is this
where my reward is +1, maxQ(s',a') =0.8375 and Q(s,a)=0.6892
My L would be 1/2*(1+0.8375-0.6892)^2=0.659296445
Now how should I update my model neural network weights using the above loss function value if my model structure is this
model = Sequential()
model.add(Dense(150, input_dim=150))
model.add(Dense(10))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')
Assuming the NN is modeling the Q value function, you would just pass the target to the network. e.g.
model.train_on_batch(state_action_vector, target)
Where state_action_vector is some preprocessed vector representing the state-action input to your network. Since your network is using an MSE loss function, it will compute the prediction term using the state-action on the forward pass and then update the weights according to your target.

Categories