In a classification problem involving the identification of fraudulent transactions, I reduced the dimensionality of the data(28 columns)[A complete quasi-separation was detected by Logit in statsmodels] using a stacked auto encoder(28->15->5) and fed the compressed data(5 columns) into a neural network with two hidden layers, each having 10 nodes and 'relu' activation function. I trained the model over a 100 epochs(The AUC metric didn't go beyond 0.500 and the train loss became constant after a few epochs).The model predicted all the records of the test set as non-fraudulent(0 class) and yielded a confusion matrix like this:
Confusion matrix:
[[70999 0]
[ 115 0]]
Accuracy Score : 0.9983828781955733
Can someone please explain the problem behind this result and suggest a feasible solution?..
Since your accuracy is over 99% on all zero class prediction, the percent of fraud cases in your train set is less than 1%
Typically if the fraud cases are rare, the model will not place enough importance on the fraud cases to predict well.
to fix this you can add costs to penalize the majority class, or add weights to penalize the majority class or use class balancing methods such as SMOTE
Related
I am training a Network on images for binary classification. The input images are normalized to have pixel values in the range[0,1]. Also, the weight matrices are initialized from a normal distribution. However, the output from my last Dense layer with sigmoid activation yields values with a very minute difference for the two classes. For example -
output for class1- 0.377525 output for class2- 0.377539
The difference for the classes comes after 4 decimal places. Is there any workaround to make sure that the output for class 1 falls around 0 to 0.5 and for class 2 , it falls between 0.5 to 1.
Edit:
I have tried both the cases.
Case 1 - Dense(1, 'sigmoid') with binary crossentropy
Case 2- Dense(2, 'softmax') with binary crossentropy
For case1, the output values differ by a very small amount as mentioned in the problem above. As such , i am taking mean of the predicted values to act as threshold for classification. This works upto some extent, but not a permanent solution.
For case 2 - the prediction overfits to one class only.
A sample code : -
inputs = Input(shape = (128,156,1))
x = Conv2D(.....)(inputs)
x = BatchNormalization()(x)
x = Maxpooling2D()(x)
...
.
.
flat=Flatten()(x)
out = Dense(1,'sigmoid')(x)
model = Model(inputs,out)
model.compile(optimizer='adamax',loss='binary_crossentropy',metrics=['binary_accuracy'])
It seems you are confusing a binary classification architecture with a 2 label multi-class classification architecture setup.
Since you mention the probabilities for the 2 classes, class1 and class2, you have, set up a single label multi-class setup. That means, you are trying to predict the probabilities of 2 classes, where a sample can have only one of the labels at a time.
In this setup, it's proper to use softmax instead of sigmoid. Your loss function would be binary_crossentropy as well.
Right now, with the multi-label setup and sigmoid activation, you are independently predicting the probability of a sample being class1 and class2 simultaneously (aka, multi-label multi-class classification).
Once you change to softmax you should see more significant differences between the probabilities IF the sample actually definitively belongs to one of the 2 classes and if your model is well trained & confident about its predictions (validation vs training results)
First, I would like to say the information you provided is insufficient to exactly debug your problem, because you didn't provide any code of your model and optimizer. I suspect there might be an error in the labels, and I also suggest you use a softmax activation fuction instead of the sigmoid function in the final layer, although it will still work through your approach, binary classification problems must output one single node and loss must be binary cross entropy.
If you want to receive an accurate solution, please provide more information.
https://www.tensorflow.org/tutorials/estimator/linear
I am following the Tensorflow documentation to implement a Linear Classifier but I like to use my own data instead of the tutorial set. I just have a few general questions.
My dataset is as follows. It's not a time series.
row[0] - float (changed to binary, 0 = negative, 1 = positive) VALUE TO ESTIMATE
row[1] - string (categorical, changed to vocabulary, ints 1,2,3,4,5,6,7,8,9)
row[2-19] - float (positive and negative)
row[20-60] - ints (percentile ranks, ints 10,20,30,40,50,60,70,80,90)
row[61-95] - ints (binary 1, 0)
I started by using 50k (45k training) rows of data and num_epochs=100, batch_size=256.
{'accuracy': 0.8912, 'accuracy_baseline': 0.8932, 'auc': 0.7101819, 'auc_precision_recall': 0.2830853, 'average_loss': 0.30982444, 'label/mean': 0.1068, 'loss': 0.31013006, 'precision': 0.4537037, 'prediction/mean': 0.11840516, 'recall': 0.0917603, 'global_step': 17600}
Does the column I want to estimate need to be a column of binaries for this model?
Is it a bad idea to mix data types like this? Would it be necessary to normalize the data using something like preprocessing.Normalization ?
Should I alter the epochs/batch if I want to use more data?
The accuracy seems high but the loss also seems quite high, why is that?
Any other suggestions?
Thanks for any help or advice.
Here is the answer to your questions.
By default tf.estimator.LinearClassifier considers as binary classification with n_classes=2, but you can have more than 2 classes as well.
For a linear classification normalizing data won't affect much in terms of accuracy compared to non linear classifier accuracy change after normalizing on the same data.
You can observe the change in accuracy and loss, if it does not change much for about 5-10 epochs, you can restrict the number of epochs there only. Again you can repeat the same step by changing the batch size.
Accuracy and loss are not dependent on each other, consider an example of your case to classify 0 and 1. A model with 2 classes that always predicts 0.51 for the true class would have the same accuracy as one that predicts 0.99. Best model would be with high accuracy and with less loss, if your model is giving good accuracy and high loss that means your model made huge errors on few data.
Try to tune your model hyper parameters based on several observations and to feed quality data with some preprocessing is always best way to reach high accuracy and less loss, with some additional data would be good to have always.
I trained a model for multiclass classification. There were three classes. In the first approach, I trained a model by converting the classes into one-hot vectors and training a model with loss function, categorical crossentropy, I achieved a loss of 0.07 in 1000 epochs. But when I used the same approach, but this time I did not converted the classes into one-hot vectors and used sparse_categorical_crossentropy, as the loss function, this time i achieved a loss of 0.05 in 1000 epochs.. Does this mean that sparse_categorical_crossentropy is better than categorical_crossentropy?
Thank You!
You can't compare two loss functions in term of losses since the definition of loss itself changed. you can compare the performance on the same test dataset.
In general use sparse_categorical_crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical_crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).
You got different losses because the representation of the labels changes, actually in keras the sparse_categorical_crossentropy is defined as categorical crossentropy with integer targets
I'm training a classification model, and getting some weird metric values. The top1 accuracy is fairly low and has relatively normal behavior, but the top_k_categorical_accuracy (with the default k=5) is often exactly 1.0000. This seems highly implausible, given how low the top1 accuracy is. What could be going on here?
This is due to the model predicting 0.00000000 for most categories for a given example. Keras uses in_top_k for calculating the top_k_categorical_accuracy metric. In the documentation:
Note that the behavior of InTopK differs from the TopK op in its handling of ties; if multiple classes have the same prediction value and straddle the top-k boundary, all of those classes are considered to be in the top k.
So all predictions of 0 are tied, and will count as part of the top "5" as long as 4 or fewer predictions are nonzero. This means that all 200 classes count as the top 5, and thus we get 1.0000 accuracy for this metric.
Relevant GitHub issue: #10767
I've depeloped a neural network for classification and I'm getting a 0.93 of accuracy, the problem is that I'm predicting all zeros because the distribution of the data.
How can I fix it? Should I change from neural network to another algorithm?
Thanks in advance
Edit: i've just checked and my model is predicting the same probability for each row.
The model is a NN with 5 layers, and tf.nn.relu6 as activation function. The cost function is tf.nn.sigmoid_cross_entropy_with_logits
To predict the values I use:
predicted = tf.nn.sigmoid(Z5)
correct_pred = tf.equal(tf.round(predicted), Y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
EDIT 2
I have 'fixed' the imbalance class problem (undersampling and upsampling 0s and 1s) but the net is still predicting the same values for each row:
I have tested to change activation function to tanh or sigmoid but then outputs NaN's
There are multiple solutions for unbalanced data. But first, the accuracy is not a good metric for unbalanced data, because if you only had 5 positives and 95 negatives, you accuracy will be 95% of predicting negatives. You should check sensitivity and specificity, or other metrics that work good with unbalanced data like the LIFT score.
To train the model with unbalanced data, there are multiple solutions. One of them is the Up-sample Minority Class.
Up-sampling is the process of randomly duplicating observations from
the minority class in order to reinforce its signal.
You can upsample data with a code like this:
from sklearn.utils import resample
# Separate majority and minority classes
df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
# Upsample minority class
df_minority_upsampled = resample(df_minority,
replace=True, # sample with replacement
n_samples=576, # to match majority class
random_state=123) # reproducible results
# Combine majority class with upsampled minority class
df_upsampled = pd.concat([df_majority, df_minority_upsampled])
# Display new class counts
df_upsampled.balance.value_counts()
# 1 576
# 0 576
# Name: balance, dtype: int64
You can find more information and other solutions that are well explained here.