I cannot find anywhere how exactly is backpropagation done in Keras? Let me explain:
Lets say i have network
input = Input(shape=(X,X,Y))
x = Conv2D(32,(3,3),padding="same")(input)
x = Conv2D(64,(3,3),padding="same")(x)
x = Conv2D(128,(3,3),padding="same")(x)
x = Conv2D(64,(3,3),padding="same")(x)
Output = Flatten(1024)(x)
Output = Flatten(6)(Output)
model = Model(input,Output)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(trainingData,trainingLabels)
The output of last layer is compared to trainingLabels, mean squared error is computed and Backpropagation happens based on the value of mean squaed error
However, what if i wanted to something more. And for example I want to try every permutation of output vector, and the one that results in minimal mean squared error should be treated as output, thus Backpropagation happens based on permutation with least error.
Is something like this possible in Keras? If so, how can i achieve it
The loss argument of model.compile method accepts a python function. You could compute the minimum over the set of permutations in a custom function:
def custom_loss(y_true, y_predicted):
''' code here '''
and then pass it to model.compile:
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
See here and here for reference.
Related
Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/
I've implemented a basic neural network from scratch using Tensorflow and trained it on MNIST fashion dataset. It's trained correctly and outputs testing accuracy around ~88-90% over 10 classes.
Now I've written predict() function which predicts the class of given image using trained weights. Here is the code:
def predict(images, trained_parameters):
Ws, bs = [], []
parameters = {}
for param in trained_parameters.keys():
parameters[param] = tf.convert_to_tensor(trained_parameters[param])
X = tf.placeholder(tf.float32, [images.shape[0], None], name = 'X')
Z_L = forward_propagation(X, trained_parameters)
p = tf.argmax(Z_L) # Working fine
# p = tf.argmax(tf.nn.softmax(Z_L)) # not working if softmax is applied
with tf.Session() as session:
prediction = session.run(p, feed_dict={X: images})
return prediction
This uses forward_propagation() function which returns the weighted sum of the last layer (Z) and not the activitions (A) because of TensorFlows tf.nn.softmax_cross_entropy_with_logits() requires Z instead of A as it will calculate A by applying softmax Refer this link for details.
Now in predict() function, when I make predictions using Z instead of A (activations) it's working correctly. By if I calculate softmax on Z (which is activations A of the last layer) it's giving incorrect predictions.
Why it's giving correct predictions on weighted sums Z? We are not supposed to first apply softmax activation (and calculate A) and then make predictions?
Here is the link to my colab notebook if anyone wants to look at my entire code: Link to Notebook Gist
So what am I missing here?
Most TF functions, such as tf.nn.softmax, assume by default that the batch dimension is the first one - that is a common practice. Now, I noticed in your code that your batch dimension is the second, i.e. your output shape is (output_dim=10, batch_size=?), and as a result, tf.nn.softmax is computing the softmax activation along the batch dimension.
There is nothing wrong in not following the conventions - one just needs to be aware of them. Computing the argmax of the softmax along the first axis should yield the desired results (it is equivalent to taking the argmax of the logits):
p = tf.argmax(tf.nn.softmax(Z_L, axis=0))
Also, I would also recommend computing the argmax along the first axis in case more than one image is fed into the network.
Currently, I have this out put from my model:
egen = keras.models.Model(egen_input, [classes,x])
where x has [None, 32, 32, 3] and classes has [None, 2] as their dimension. How can I reference only part of the output in a custom loss function?
for example,
def customLoss():
def loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred[0])
return loss
currently the above loss function returns me error on mismatched dimension,yet if i just use y_pred, it does not return error...very confused here
Thanks!
If you want to use only classes, which is the first output, to calculate the loss, then you can set the loss_weights option (https://keras.io/models/model/) when compiling.
model.compile(...., loss_weights=[1.0, 0.0])
Note also that loss is computed for each output separately, then averaged (with equal weight at default) across outputs to obtain a single loss metric. So y_pred[0] does not mean classes, but the first element of classes and x.
EDITS.
if it's the first element of classes and x, what would be the shape of y_pred[0] ? bit confused here
Both! Keras computes the loss for classes and x separately, then take the (weighted) average. So, if the loss function is defined as return keras.losses.binary_crossentropy(y_true, y_pred[0]) as in the question, keras tries to calculate the loss with classes_true vs class_pred[0], and with x_true vs x_pred[0], which raises shape mismatch error.
This question is about the tf.losses.huber_loss() function and how it can be applied on scalars rather than vectors. Thank you for your time!
My model is similar to a classification problem like MNIST. I based my code on the TensorFlow layers tutorial and made changes where I saw fit. I do not think the exact code is needed for my question.
I have lables that take integer values in {0,..,8}, that are converted into onehot labels like this:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
The last layer in the model is
logits = tf.layers.dense(inputs=dense4, units=n_classes)
which is converted into predictions like this:
predictions = {"classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor")}
From the tutorial, I started with the tf.losses.softmax_cross_entropy() loss function. But in my model, I am predicting in which discretized bin a value will fall. So I started looking for a loss function that would translate that a prediction of one bin off is less of a problem than two bins off. Something like the absolute_difference or Huber function.
The code
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
in combination with the optimizer:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=ps.learning_rate)
works without any errors. When changing to the Huber function:
loss = tf.losses.huber_loss(labels=onehot_labels, predictions=logits)
there are still no errors. But at this point I am unsure about what exactly happens. Based on the reduction definition I expect that the Huber function is applied pairwise for elements of the vectors and then summed up or averaged.
I would like to apply the Huber function only on the label integer (in {0,...,9}) and predicted value:
preds = tf.argmax(input=logits, axis=1)
So this is what I tried:
loss = tf.losses.huber_loss(labels=indices, predictions=preds)
This is raising the error
ValueError: No gradients provided for any variable
I have found two common causes that I do not think are happening in my situation:
This where there is no path between tf.Variable objects and the loss function. But since my prediction code is often used and the labels were provided as integers, I do not think this applies here.
The function is not derivable into a gradient. But the Huber function does work when vectors are used as input, so I do not think this is the case.
My question is: what code lets me use the Huber loss function on my two integer tensors (labels and predictions)?
I have a neural network built with Keras that I'm attempting to train. The output layer has 4 nodes. For the problem I'm trying to solve, I only want to compute the gradient on a single one of the output nodes based upon the true value. Basically, y_true will look like this [0,0,2,0] where the zeros represent nodes that should be ignored. y_pred however will be of the form [1.2,3.2,4.5,6]. I'd like to make it such that only the third index is taken into account in mse. This would require that I zero out index 0, 1, and 3 in y_pred. I haven't found a proper way to do this.
Below is code I've tried, but which returns NaN from the loss function.
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.truediv(y_pred*y_true,y_true)-y_pred), axis=-1)
Is there a way to do this simple operation on these Tensor objects?
doing it like this:
[1.2,3.2,4.5,6]*[0,0,2,0] = [0,0,9,0]
[0,0,9,0]/2 = [0,0,4.5,0]
and then continue normally.
This is the code to do that:
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.divide(tf.multiply(y_pred, y_true),tf.reduce_max(y_true))-y_true), axis=-1)