Problem
I want to train a keras2 neural network (theano backend) with data of variable relevance. That means some of the samples are less important than others. They shall affect the training less than others. However I'm not able to simply omit them completely (I have a time series that goes into Conv1D layers).
Question
How can I tell keras to weigh certain training data samples less than others during the training?
Idea
I'm thinking about defining an own loss function that takes y_true, y_pred and y_weight as a third argument. Something like:
def mean_squared_error_weighted(y_true, y_pred, y_weight):
return y_weight * K.mean(K.square(y_pred - y_true), axis=-1)
But how would I let keras know about that third argument?
The fit function of of a keras model accepts an optional argument sample_weight that does exactly what you're looking for. More specifically from keras documentation:
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
Related
I want to train a Neural Network in PyTorch. I have my training dataset, however I care more about some examples than about others. I want to include this information in the loss function - to let the NN know that it is very important to get some examples right and to not punish errors on other examples very much.
I want to do this by weighting the loss for training examples, let's say:
loss = weight_for_example*(y_true - y_pred)^2
Is there an easy way to do this in PyTorch?
It mainly depends on your task: for instance, BCEWithLogitsLoss has a weight parameter that allows a custom weight for each batch. Many other built-in losses also provide this option.
Aside from solutions already available in the framework such as this, a simple approach could be the following:
build a custom dataset, returning your data and a scalar weight for that sample in your __getitem__
proceed with the forward pass
compute your loss, which you can now multiply by the factors you provided.
There's only a caveat (which is the same of the BCELoss): you probably iterate on batches with size > 1, so your dataloader will provide a batch of data, with a batch of weights. You need to make sure you don't reduce your loss beforehand, so that you can still multiply it by your batch weight, then you can proceed with a manual reduction (e.g. loss = loss.mean()).
See some examples here.
I am trying to use keras to fit a CNN model to classify images. The data set has much more images form certain classes, so its unbalanced.
I read different thing on how to weight the loss to account for this in Keras, e.g.:
https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras, which is nicely explained. But, its always explaining for the fit() function, not the fit_generator() one.
Indeed, in the fit_generator() function we dont have the 'class_weights' parameter, but instead we have 'weighted_metrics', which I dont understand its description: "weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing."
How can I pass from 'class_weights' to 'weighted_metrics'? Would any one have a simple example?
We have class_weight in fit_generator (Keras v.2.2.2) According to docs:
Class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.
Assume you have two classes [positive and negative], you can pass class_weight to fit_generator with:
model.fit_generator(gen,class_weight=[0.7,1.3])
I'm trying to evaluate the MSE loss for single 2D test samples with an autoencoder (AE) in Keras once the model is trained and I'm surprised that when I call Keras MSE built-in function to get individual samples' loss it returns 2D tensors. That means the loss function computes one loss per pixel for each sample, and not one loss per sample as it should (?). To be perfectly clear, I expected MSE to associate to each 2D sample the mean of the squared errors computed over all pixels (as I've read on this SO post).
Since I didn't manage to get an array of scalar MSE errors with one scalar per test sample after training my AE using .predict() and .evaluate() (perhaps I missed something there as well), I went on trying to directly use keras.losses.mean_squared_error(), sample by sample. This returned me 2D tensors as a loss for each sample (input tensors are of size (N,M,1)). When one looks at Keras' original implementation of MSE loss, one finds:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
The axis=-1 explains why multiple dimensions aren't immediately reduced to a scalar when computing the loss.
I therefore wonder:
What exactly has my model been using during training ? Was it the
mean of squared error over all pixels for each sample as I expected
? This isn't what the built-in code suggests.
Do I absolutely need to re-define the MSE loss to get the individual MSE losses for each test sample ? To obtain a scalar I
would then have to flatten the samples and the associated
predictions, and then re-apply the built-in MSE (and this sample by sample).
Manually flattening before computing MSE seems what needs to be done according to this SO answer on Keras' MSE loss. Using MSE for an AE model with 2D data seemed fine to me as I read this keras.io Mnist denoising tutorial.
My code:
import keras
AE_testOutputs = autoencoder.predict(samplesList)
samplesMSE = []
for testSampleIndex in range(samplesList.shape[0]):
AE_output = AE_testOutputs[testSampleIndex,:,:,:]
samplesMSE.append(keras.losses.mean_squared_error(samplesList[testSampleIndex,:,:,:],AE_output))
Which returns a list samplesMSE of Tensor("Mean:0", shape=(15, 800), dtype=float64) objects.
I'm sorry if I missed a similar question, I did actively research before posting, and I'm still afraid there is a very simple explanation/I must have missed a built-in function somewhere.
Although it is not absolutely required, Keras loss functions are conventionally defined "per-sample", where "sample" is basically each element in the output tensor of the model. The loss function is then pass through a wrapping function weighted_masked_objective that adds support for masking and sample weighting. By default, the total loss is the average of the samples losses.
If you want to get the mean of some value across every dimension but the first one, you can simply use K.mean over the value that you get.
I am trying to train a multi-task multi-label classifier using Keras. The output layer is a fork of two outputs. The task of each output layer is to predict the categories of its task. The y vectors are OneHot encoded.
I am using a custom generator for my data that yields the y arrays in a list to the fit_generator function
I am using a categorigal_crossentropy loss function at each layer
fork1.compile(loss={'O1': 'categorical_crossentropy', 'O2': 'categorical_crossentropy'},
optimizer=optimizers.Adam(lr=0.001),
metrics=['accuracy'])
The problem: The loss doesn't decrease with this setup. However, if I train each task separately, I have low loss and high accuracy. So what could be the problem ?
To perform multilabel categorical classification (where each sample can have several classes), end your stack of layers with a Dense layer with a number of units equal to the number of classes and a sigmoid activation, and use binary_crossentropy as the loss. Your targets should be k-hot encoded.
Regarding the multi-output model, training such a model requires the ability to specify different loss functions for different heads of the network requiring a different training procedure.
You should provide more info in order to give a clear indication of what you want to achieve.
I want to create a custom objective function for training a Keras deep net. I'm researching classification of imbalanced data, and I use the F1 score a lot in scikit-learn. I therefore had the idea of inverting the F1 metric (1 - F1 score) to use it as a loss function/objective for Keras to minimise while training:
(from sklearn.metric import f1_score)
def F1Loss(y_true, y_pred):
return 1. - f1_score(y_true, y_pred)
However, this f1_score method from scikit-learn requires numpy arrays or lists to calculate the F1 score. I found that Tensors need to be evaluated to their numpy array counterparts using .eval(), which requires a TensorFlow session to perform this task.
I do not know the session object that Keras uses. I have tried using the code below, assuming the Keras backend has its own session object defined somewhere, but this also did not work.
from keras import backend as K
K.eval(y_true)
Admittedly, this was a shot in the dark since I don't really understand the deeper workings of Keras or Tensorflow a the moment.
My question is: how do I evaluate the y_true and y_pred tensors to their numpy array counterparts?
Your problem is a classic problem with implementing a discontinous objective in Theano. It's impossible beacuse of two reasons:
F1-score is discontinous : here you can read what should be expected from an objective function in neural networks training. F1-score doesn's satisfy this conditions - so it cannot be used to train neural network.
There is no equivalency between Tensor and Numpy array: it's an fundamental issue. Theano tensor is like x in school equations. You cannot expect from an algebraic variable to be equivalent any object to which it can be assigned to. On the other hand - as a part of a computational graph - a tensor operations should be provided in order to compute objective. If not - you cannot differentiate it w.r.t. parameters what makes most of usual way of training of a neural network impossible.
If you have predicted and actual tensors in numpy array format then I guess that you can use this code snippet:
correct_prediction = tf.equal(tf.argmax(actual_tensor,1), tf.argmax(predicted_tensor,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
And in keras, I think that you can use this:
model.fit_generator(train_generator, validation_data=val_generator, nb_val_samples=X_val.shape[0],
samples_per_epoch=X_train.shape[0], nb_epoch=nb_epoch, verbose=1,
callbacks=[model_checkpoint, reduce_lr, tb], max_q_size=1000)
Where train_generator and val_generator generates the training and validation data while training and this also prints the loss and accuracies while training.
Hope this helps...