How to solve problem of imbalanced data using fit_generator [duplicate]

How to solve problem of imbalanced data using fit_generator [duplicate] - python

I am trying to use keras to fit a CNN model to classify images. The data set has much more images form certain classes, so its unbalanced.
I read different thing on how to weight the loss to account for this in Keras, e.g.:
https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras, which is nicely explained. But, its always explaining for the fit() function, not the fit_generator() one.
Indeed, in the fit_generator() function we dont have the 'class_weights' parameter, but instead we have 'weighted_metrics', which I dont understand its description: "weighted_metrics: List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing."
How can I pass from 'class_weights' to 'weighted_metrics'? Would any one have a simple example?

We have class_weight in fit_generator (Keras v.2.2.2) According to docs:
Class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.
Assume you have two classes [positive and negative], you can pass class_weight to fit_generator with:
model.fit_generator(gen,class_weight=[0.7,1.3])

Related

Loss Function in PyTorch where not all my training examples are equally weighted

I want to train a Neural Network in PyTorch. I have my training dataset, however I care more about some examples than about others. I want to include this information in the loss function - to let the NN know that it is very important to get some examples right and to not punish errors on other examples very much.
I want to do this by weighting the loss for training examples, let's say:
loss = weight_for_example*(y_true - y_pred)^2
Is there an easy way to do this in PyTorch?

It mainly depends on your task: for instance, BCEWithLogitsLoss has a weight parameter that allows a custom weight for each batch. Many other built-in losses also provide this option.
Aside from solutions already available in the framework such as this, a simple approach could be the following:
build a custom dataset, returning your data and a scalar weight for that sample in your __getitem__
proceed with the forward pass
compute your loss, which you can now multiply by the factors you provided.
There's only a caveat (which is the same of the BCELoss): you probably iterate on batches with size > 1, so your dataloader will provide a batch of data, with a batch of weights. You need to make sure you don't reduce your loss beforehand, so that you can still multiply it by your batch weight, then you can proceed with a manual reduction (e.g. loss = loss.mean()).
See some examples here.

With scikit learn, how to use predict_proba in fit_predict?

I am implementing custom estimators thanks to the scikit library and its Pipeline, BaseEstimators, TransformerMixin and other base classes. (you can check the API here)
Given an pipeline, you can call pipeline.fit(X) then pipeline.predict(X) or you can use pipeline.fit_predict(X) which is a bit faster because it applies necessary transformations once instead of twice (one for the fit and one for the predict). So it is used to get an optimization when you want to predict on the same dataset you used to fit.
But some models, like classifiers or clusterers, have a method called predict_proba that return the probability of the classification or labelization.
From the scikit glossary (link):
fit_predict
Used especially for unsupervised, transductive estimators, this fits
the model and returns the predictions (similar to predict) on the
training data. In clusterers, these predictions are also stored in the
labels_ attribute, and the output of .fit_predict(X) is usually
equivalent to .fit(X).predict(X). The parameters to fit_predict
are the same as those to fit.
predict_proba
A method in classifiers and clusterers that are able to return
probability estimates for each class/cluster. Its input is usually
only some observed data, X.
If the estimator was not already fitted, calling this method should
raise a exceptions.NotFittedError.
Output conventions are like those for decision_function except in the
binary classification case, where one column is output for each class
(while decision_function outputs a 1d array). For binary and
multiclass predictions, each row should add to 1.
Like other methods, predict_proba should only be present when the
estimator can make probabilistic predictions (see duck typing). This
means that the presence of the method may depend on estimator
parameters (e.g. in linear_model.SGDClassifier) or training data
(e.g. in model_selection.GridSearchCV) and may only appear after
fitting.
I am looking for a way to get a fit_predict_proba method which has the same advantages of the fit_predict but that return probabilities

Keras: class_weight for each training batch in fit_generator

I am using Keras for a segmentation problem. I use fit_generator for training my model. I am unsure how to use class_weight in case of fit_generator?
I actually want to calculate class weights for each batch and not the entire dataset. And I am unable to find a way to do this using fit_generator without duplication of effort.
I tried using a generator to return class_weight for each batch but that gives me
TypeError: object of type 'generator' has no len()
I also thought about using a customised callback to handle class weights. As per my knowledge, Keras callbacks need to be initialised with the training batches and labels. Is there a way that I can pass my training batches to a callback from within my fit_generator call?
Thanks.

Weighing Training Data for Keras

Problem
I want to train a keras2 neural network (theano backend) with data of variable relevance. That means some of the samples are less important than others. They shall affect the training less than others. However I'm not able to simply omit them completely (I have a time series that goes into Conv1D layers).
Question
How can I tell keras to weigh certain training data samples less than others during the training?
Idea
I'm thinking about defining an own loss function that takes y_true, y_pred and y_weight as a third argument. Something like:
def mean_squared_error_weighted(y_true, y_pred, y_weight):
return y_weight * K.mean(K.square(y_pred - y_true), axis=-1)
But how would I let keras know about that third argument?

The fit function of of a keras model accepts an optional argument sample_weight that does exactly what you're looking for. More specifically from keras documentation:
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).

neural nets - How can I associate a confidence to my loss function?

I am trying doing OCC (one class classification) using an autoencoder based neural network.
To make a long story short I train my neural network with 200 matrices each containing 128 dataelements. Those are then compressed (see autoencoder).
Once the training is done I pass a new matrix to my neural net (test data) and based on the loss function I know whether the data I passed to it belongs to the target class or not.
I would like to know how I can compute a classification confidence in % based on the loss function I obtain when passing test data.
Thanks
In case it helps I am using Tensorflow

Well actually normally you try to minimize your cost function (or in the case of one training observation your loss function). Normally the probability of the class you want to predict is not done using the loss function, but using a sigmoid output layer for example. You need a function that goes from 0 to 1 and that behaves like a probability. Where did you get the idea of using the loss function to evaluate your proabibility? But I am not an expert in one class classification (or outliers detection)... I guess you actually want the probability of your observation of not belonging to your class right?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.