Make a Custom loss function in Keras in detail - python

I try to make a custom loss function in Keras.
I want to make this loss function
The dimension of output is 80. Batch size is 5000.
So I build this loss function below. But this doesn't work.
def normalize_activation(y_true, y_pred):
nb_divide = K.reshape(K.sqrt(K.sum(K.square(y_pred), axis=1)),(5000, 1))
nb_divide=numpy.tile(nb_divide,80)
predicted=numpy.divide(y_pred,nb_divide)
return K.sum(K.square(y_true-predicted))
ValueError: setting an array element with a sequence.
This error occurs. I think that the shape of y_true, y_pred is (5000,80).
where should I fix it??

Loss functions should avoid all kinds of operations that are not from keras backend. The values are tensors and you must keep them like tensors.
And you don't need to reshape things unless you actually want them to behave in a specific way.
If you have shapes (5000,80) and (5000,1), you can make operations with them without needing K.repeat_elements() (the equivalent to numpy.tile).
So, supposing that 5000 is the batch size (number of samples) and 80 is the only actual dimension belonging to a sample:
def normalize_loss(yTrue,yPred):
nb_divide = K.sqrt(K.sum(K.square(yPred),axis=1,keepdims=True))
#keepdims=True keeps the shape like (5000,1)
#this is not summing the entire batch, but only a single sample, is that correct?
predicted = yPred/nb_divide
return K.sum(K.square(yTrue-predicted))
Some observations:
(I'm not a loss function expert here) You're dividing only the predicted part, but not the true part. Wouldn't that create big differences between both values and result in a misguiding loss function? (Again, I'm not the expert here)
Usually people use K.mean() at the end of the loss function, but I see you used K.sum(). This is not a problem and doesn't prevent training from working. But you may like to visualize this same loss function for data with different sizes, and be able to compare them size-independently.

Related

Loss Function in PyTorch where not all my training examples are equally weighted

I want to train a Neural Network in PyTorch. I have my training dataset, however I care more about some examples than about others. I want to include this information in the loss function - to let the NN know that it is very important to get some examples right and to not punish errors on other examples very much.
I want to do this by weighting the loss for training examples, let's say:
loss = weight_for_example*(y_true - y_pred)^2
Is there an easy way to do this in PyTorch?
It mainly depends on your task: for instance, BCEWithLogitsLoss has a weight parameter that allows a custom weight for each batch. Many other built-in losses also provide this option.
Aside from solutions already available in the framework such as this, a simple approach could be the following:
build a custom dataset, returning your data and a scalar weight for that sample in your __getitem__
proceed with the forward pass
compute your loss, which you can now multiply by the factors you provided.
There's only a caveat (which is the same of the BCELoss): you probably iterate on batches with size > 1, so your dataloader will provide a batch of data, with a batch of weights. You need to make sure you don't reduce your loss beforehand, so that you can still multiply it by your batch weight, then you can proceed with a manual reduction (e.g. loss = loss.mean()).
See some examples here.

Neural segmentation network gives different output based on test batch size

I have implemented and trained a neural segmentation model on (224, 224) images.
However, during testing, the model returns slightly different results based on the shape of the test batch.
The following images are results obtained during testing on my pre-trained model.
The first image is the prediction I get when I predict a single example (let's call it img0) (so the input is [img0] and has shape (1,224,224))
The second image is the prediction I get for the same image but when it's among a batch with 7 other images (so the input is [img0, img1, ..., img7] and has shape (8,224,224)).
The first output is closer than what I expected, compared to second output.
However, I don't understand why the outputs are different to begin with... Is this supposed to be normal behaviour?
Thanks in advance.
yes, the batch size is a hyperparameter, meaning you should do trial and error to find the best value for it (hyperparameter tunning). but you should also be aware of its effect on the training process. in each batch the loss would be calculated by feedforwarding the samples in the batch and then the backpropagation will be performed by using that loss value. so if you choose a small value for the batch size, it is very possible that you won't be able to find the global optima and you just fluctuate around it or even stuck in local optima (from an optimization point of view). A very small value for batch size (especially 1) is not very recommended.
also, you need a validation set (more than one sample) to become fully sure that whether your model is accurate or not.
This behavior was coming from the batch normalization layers that were in my model.
I use training=true during my calls to the model.
As a result, batch normalization normalizes the batches based on their norm, and that norm changes based on batch size.
Therefore, this is normal behavior!

Keras built-in MSE loss on 2D data returns 2D matrix, not scalar loss

I'm trying to evaluate the MSE loss for single 2D test samples with an autoencoder (AE) in Keras once the model is trained and I'm surprised that when I call Keras MSE built-in function to get individual samples' loss it returns 2D tensors. That means the loss function computes one loss per pixel for each sample, and not one loss per sample as it should (?). To be perfectly clear, I expected MSE to associate to each 2D sample the mean of the squared errors computed over all pixels (as I've read on this SO post).
Since I didn't manage to get an array of scalar MSE errors with one scalar per test sample after training my AE using .predict() and .evaluate() (perhaps I missed something there as well), I went on trying to directly use keras.losses.mean_squared_error(), sample by sample. This returned me 2D tensors as a loss for each sample (input tensors are of size (N,M,1)). When one looks at Keras' original implementation of MSE loss, one finds:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
The axis=-1 explains why multiple dimensions aren't immediately reduced to a scalar when computing the loss.
I therefore wonder:
What exactly has my model been using during training ? Was it the
mean of squared error over all pixels for each sample as I expected
? This isn't what the built-in code suggests.
Do I absolutely need to re-define the MSE loss to get the individual MSE losses for each test sample ? To obtain a scalar I
would then have to flatten the samples and the associated
predictions, and then re-apply the built-in MSE (and this sample by sample).
Manually flattening before computing MSE seems what needs to be done according to this SO answer on Keras' MSE loss. Using MSE for an AE model with 2D data seemed fine to me as I read this keras.io Mnist denoising tutorial.
My code:
import keras
AE_testOutputs = autoencoder.predict(samplesList)
samplesMSE = []
for testSampleIndex in range(samplesList.shape[0]):
AE_output = AE_testOutputs[testSampleIndex,:,:,:]
samplesMSE.append(keras.losses.mean_squared_error(samplesList[testSampleIndex,:,:,:],AE_output))
Which returns a list samplesMSE of Tensor("Mean:0", shape=(15, 800), dtype=float64) objects.
I'm sorry if I missed a similar question, I did actively research before posting, and I'm still afraid there is a very simple explanation/I must have missed a built-in function somewhere.
Although it is not absolutely required, Keras loss functions are conventionally defined "per-sample", where "sample" is basically each element in the output tensor of the model. The loss function is then pass through a wrapping function weighted_masked_objective that adds support for masking and sample weighting. By default, the total loss is the average of the samples losses.
If you want to get the mean of some value across every dimension but the first one, you can simply use K.mean over the value that you get.

Tensor Flow passing a tensor to optimizer minimize function trains better

I am encountering something a bit strange (to me) in tensorflow and was hoping someone could shed some light on the situation.
I have a simple neural network that processes images. The cost function I am minimizing is the simple MSE.
At first I implemented the following:
cost = tf.square(DECONV - Y)
which I then passed to my optimizer as follows:
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
I was able to obtain great results with this implementation. However, as I tried to implement a regularizer, I realized that I wasn't passing a scalar value to the optimizer.minimize() but in fact passing a tensor of shape [batch, dim_x, dim_y].
I changed my implementation to the following:
cost = tf.losses.mean_squared_error(Y, DECONV)
as well as many variations of this like:
cost = tf.reduce_mean(tf.square(tf.subtract(DECONV, Y)))
etc.
My issue is that with these new implementations of the MSE I am not able to even come close to the results I obtained using the original "wrong" implementation.
Is the original way a valid way to train? If so, how can I implement regularizers? If not, what am I doing wrong with the new implementations? Why can't I replicate the results?
Can you precise what you mean by
I was able to achieve greater result [..]
I assume that you have another metric than cost - this time an actual scalar, which enables you to compare the models trained by each method.
Also, have you tried adjusting the learning rate for the second method? I ask this because my intuition is that when you ask tensorflow to minimize a tensor (which has no mathematical meaning as far as I know), it minimizes the scalar obtained by summing over all the axis of the tensor. This is how tf.gradients works, and the reason why I think this is happening. So maybe in the second method if you multiply the learning rate by batch*dim_x*dim_y you would get the same behavior as in the first method.
Even if this works, I don't think passing a tensor to the minimize function is a good idea - minimization of a d-dimensional value has no meaning as you have no order rule in such spaces.

How to check if `compute_gradients` operation has been executed in tensorflow graph?

Here's my use-case
I am trying to implement the Model Agnostic Meta Learning algorithm. At some phase of the training process I need to calculate the gradients of some variables without actually updating the variables and at a later step I would like to do certain things ONLY if the compute gradient operations are complete.
A simple way to do this is to use tf.control_dependencies()
# In this step I would like to COMPUTE gradients
optimizer = tf.train.AdamOptimizer()
# let's assume that I already have loss and var_list
gradients = optimizer.compute_gradients(loss, var_list)
# In this step I would like to do some things ONLY if the gradients are computed
with tf.control_dependencies([gradients]):
# do some stuff
Problem
Unfortunately the above snippet throws an error since tf.control_dependencies expects gradients to be a tf.Operation or tf.Tensor but compute_gradients returns a list.
Error message:
TypeError: Can not convert a list into a Tensor or Operation.
What I would like?
I would like one of two things:
A way for me to get either a tf.Operation or a tf.Tensor from the optimizer.compute_gradients function that I can use intf.control_dependencies.
Or any other reliable way for me to check if optimizer.compute_gradients is actually computed.
Since gradients is the list of (gradient, variable) pairs you'd like to make sure being calculated, you can covert it to a list of tensors/variables and use it as the control_inputs:
with tf.control_dependencies([t for tup in gradients for t in tup]):

Categories