can the accuracy measure val_acc be trusted?

can the accuracy measure val_acc be trusted? - python

Building a sequence
simple_seq= [x for x in list(range(1000)) if x % 3 == 0]
after reshape and split
x_train, x_test shape = (159, 5, 1)
y_train, y_test shape = (159, 2)
Model
model = Sequential(name='acc_test')
model.add(Conv1D(
kernel_size = 2,
filters= 128,
strides= 1,
use_bias= True,
activation= 'relu',
padding='same',
input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(AveragePooling1D(pool_size =(2), strides= [1]))
model.add(Flatten())
model.add(Dense(2))
optimizer = Adam(lr=0.001)
model.compile( optimizer= optimizer, loss= 'mse', metrics=['accuracy'])
Train
hist = model.fit(
x=x_train,
y = y_train,
epochs=100,
validation_split=0.2)
The Result:
Epoch 100/100
127/127 [==============================] - 0s 133us/sample - loss: 0.0096 - acc: 1.0000 - val_loss: 0.6305 - val_acc: 1.0000
But if using this model to predict:
x_test[-1:] = array([[[9981],
[9984],
[9987],
[9990],
[9993]]])
model.predict(x_test[-1:])
result is: array([[10141.571, 10277.236]], dtype=float32)
How can the vall_acc be 1 if the result is so far from the truth, result was
step 1 2
true [9996, 9999 ]
pred [10141.571, 10277.236]

Accuracy metric is only valid for classification tasks. Therefore, if you use accuracy as the metric in a regression tasks, the reported metric values may not be valid at all. From your code, I feel that you are having a regression task, so this shouldn't be used.
Below is a list of the metrics that you can use in Keras on regression problems.
Mean Squared Error: mean_squared_error, MSE or mse
Mean Absolute Error: mean_absolute_error, MAE, mae
Mean Absolute Percentage Error: mean_absolute_percentage_error, MAPE, mape
Cosine Proximity: cosine_proximity, cosine
You can read about some theory at link and see some keras exampler code at link.
Sorry, little short of time, but I am sure these links will really help you. :)

By the range of your true/predicted values and used loss - it seems like you're trying to solve regression problem, not classification.
So if I understood you correctly - you're trying to predict two numeric values based on input - instead of predicting, which of two classes is valid for these input.
If so - you shouldn't use accuracy metric. Because it'll just compare indices of maximal input for each input sample/prediction (a bit simplified). E.g. 9996 < 9999 and 10141.571 < 10277.236.

Related

Why is training-set accuracy during fit() different to accuracy calculated right after using predict on same data?

Have written a basic deep learning model in Tensorflow - Keras.
Why is the training-set accuracy as reported at the end of training (0.4097) differs to that reported directly afterwards with a direct calculation on the same training data using the predict function (or using evaluate, which gives the same number) = 0.6463?
MWE below; output directly after.
from extra_keras_datasets import kmnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np
# Model configuration
no_classes = 10
# Load KMNIST dataset
(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')
# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape
# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)
# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)
input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Normalize input data
input_train = input_train / 255
input_test = input_test / 255
# Create the model
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(no_classes, activation='softmax'))
# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer=tensorflow.keras.optimizers.Adam(),
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=1,
verbose=1)
prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))
model.evaluate(input_train, target_train, verbose=2)
Last couple of lines of output:
30/30 [==============================] - 0s 3ms/step - loss: 1.8336 - accuracy: 0.4097
Prediction accuracy = 0.6463166666666667
1875/1875 - 1s - loss: 1.3406 - accuracy: 0.6463
Edit.
The initial answers below have solved my first problem by pointing out that the batch size matters when you only run 1 epoch. When running small batch sizes (or batch size = 1), or more epochs, you can push the post-fitting prediction accuracy pretty close to the final accuracy thrown out in fitting itself. Which is good!
I originally asked this question because I was having trouble with a more complex model.
I'm still have trouble with understanding what's happening in this case (and yes, it involves batch normalisation). To get my MWE, replace everything below 'create the model' above with the code below to implement a few fully connected layers with batch normalisation.
When you run two epochs of this - you'll see really stable accuracies from all 30 mini-batches (30 because 60,000 in training set divided by 2000 in each batch). I see very consistently 83% accuracy across the whole second epoch of training.
But the prediction after fitting is an abysmal 10% or thereabouts after doing this. Can anyone explain this?
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = input_shape))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(no_classes, activation='softmax'))
# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer=tensorflow.keras.optimizers.Adam(),
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=2,
verbose=1)
prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))
model.evaluate(input_train, target_train, verbose=2, batch_size = batch_size)
30/30 [==============================] - 46s 2s/step - loss: 0.5567 - accuracy: 0.8345
Prediction accuracy = 0.10098333333333333

One reason this can happen, is the last accuracy reported takes into account the entire epoch, with its parameters non constant, and still being optimized.
When evaluating the model, the parameters stop changing, and they remain in their final (hopefully, most optimized) state. Unlike during the last epoch, for which the parameters were in all kinds of (hopefully, less optimized) states, more so at the start of the epoch.
Deleted because I now see you didn't use batch norm in this case.
I am assuming this is due to BatchNormalization.
See for example here
During training, a moving average is used.
During inference, we already have the normalization parameters
This is likely to be the cause of the difference.
Please try without it, and see if still such drastic differences exist.

Just adding to #Gulzar answer: this effect can be very clear because OP used only one epoch (a lot of parameters are changing in the very beginning of trainning), batch size not equal in evaluate method (default to 32) and fit method, batch size lot less than whole data (meaning a lot of updating during each epoch).
Just adding more epochs to same experiment would attenuate this effect.
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=40,
verbose=1)
Result
Epoch 40/40
30/30 [==============================] - 0s 11ms/step - loss: 0.5663 - accuracy: 0.8339
Prediction accuracy = 0.8348
1875/1875 - 2s - loss: 0.5643 - accuracy: 0.8348 - 2s/epoch - 1ms/step
[0.5643048882484436, 0.8348000049591064]

Classification for a case with multiple labels per data point [duplicate]

This question already has answers here:
How does Keras handle multilabel classification?
(2 answers)
Closed 2 years ago.
In classification problems in machine learning, typically we use a single label for a single data point. How can we go ahead with multiple labels for a single data point?
As an example, suppose a character recognition problem. As the labels for a single image of a letter, we have the encoded values for both the letter and the font family. Then there are two labels per data point.
How can we make a keras deep learning model for this? Which hyperparameters should be changed compared with a single labelled problem?

In short, you let the model output two predictions.
...
previous-to-last layer
/ \
label_1 label_2
Then you could do total_loss = loss_1(label_1) + loss_2(label_2).
With loss_1 and loss_2 of your choosing.
You'd then backpropagate the total_loss through the network to finetune the weights.
More in-depth example: https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff.

In comparison with a standard multi-class task, you just need to change your activation function to 'sigmoid':
import tensorflow as tf
from tensorflow.keras.layers import Dense
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
y = tf.one_hot(y, depth=3).numpy()
y[:, 0] = 1.
ds = tf.data.Dataset.from_tensor_slices((X, y)).shuffle(25).batch(8)
model = tf.keras.Sequential([
Dense(16, activation='relu'),
Dense(32, activation='relu'),
Dense(3, activation='sigmoid')])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(ds, epochs=25)
Epoch 25/25
1/19 [>.............................] - ETA: 0s - loss: 0.0418 - acc: 1.0000
19/19 [==============================] - 0s 2ms/step - loss: 1.3129 - acc: 1.0000

Discrepancy in the results of model.evaluate and model.predict in Keras

I'm performing multi-class classification with three class labels in Keras. During training, both the training and validation losses were decreasing and accuracies were increasing. After training, I tested out the model on the training set as a sanity check and there seems to be a huge discrepancy between model.evaluate and model.predict. I did find some solutions that seemed to indicate this was an issue with BatchNorm and Dropout layers, but that shouldn't result in such a huge difference. The relevant code is as shown below.
model=Sequential()
model.add(Conv2D(32, (3, 3), padding="same",input_shape=input_shape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
.
.
model.add(Dense(n_classes))
model.add(Activation("softmax"))
optimizer=Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['categorical_accuracy'])
datagen = ImageDataGenerator(horizontal_flip=True, fill_mode='nearest')
train_datagen = datagen.flow(X_train, y_train, batch_size=batch_size)
val_datagen = ImageDataGenerator().flow(X_val, y_val, batch_size=batch_size)
history=model.fit(train_datagen, steps_per_epoch=math.ceil(nb_train_samples/batch_size), verbose=2, epochs=50, validation_data=val_datagen, validation_steps=math.ceil(nb_validation_samples/batch_size), class_weight=d_class_weights)
print('model.evaluate accuracy: ', model.evaluate(X_train, y_train, batch_size=batch_size)[1])
test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size), steps=math.ceil(nb_train_samples/batch_size))
test_result=np.array(test_pred)
test_result = np.zeros(test_result.shape)
test_result[np.arange(len(test_pred)), test_pred.argmax(1)] = 1
total=0
count=0
for i in range(test_result.shape[0]):
total+=1
count+=(test_result[i]==y_train[i]).all()
print('model.predict accuracy: ', count/total)
The output I get is as follows:-
66/66 [==============================] - 12s 177ms/step - loss: 0.0010 - categorical_accuracy: 1.0000
model.evaluate accuracy: 1.0
model.predict accuracy: 0.42138063279002874
I've been trying to solve this for a while now and have failed to find anything. I'm already using categorical_crossentropy, categorical_accuracy, and softmax activation in the last layer, so I have no idea what's wrong. Any help would be greatly appreciated!

I finally found the solution, turns out that I'm only passing X_train into the predict function, and the shuffle parameter is True by default, because of which the predictions didn't correspond to the ground truth. Setting shuffle=False solved the problem.
test_pred = model.predict(ImageDataGenerator().flow(X_train, y=None, batch_size=batch_size, shuffle=False), steps=math.ceil(nb_train_samples/batch_size))

Big difference in accuracy after training the model and after loading that model

I made Keras NN model for fake news detection. My features are avg length of the words, avg length of the sentence, number of punctuation signs, number of capital words, number of questions etc. I have 34 features. I have one output, 0 and 1 (0 for fake and 1 for real news).
I have used 50000 samples for training, 10000 for testing and 2000 for validation. Values of my data are going from -1 to 10, so there is not big difference between values. I have used Standard Scaler like this:
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.20, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
validation_features = scaler.transform(validation_features)
My NN:
model = Sequential()
model.add(Dense(34, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=0, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=64, validation_data=(validation_features, validation_results), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
Results:
Train on 50407 samples, validate on 2000 samples
Epoch 1/15
- 3s - loss: 0.3293 - acc: 0.8587 - val_loss: 0.2826 - val_acc: 0.8725
Epoch 2/15
- 1s - loss: 0.2647 - acc: 0.8807 - val_loss: 0.2629 - val_acc: 0.8745
Epoch 3/15
- 1s - loss: 0.2459 - acc: 0.8885 - val_loss: 0.2602 - val_acc: 0.8825
Epoch 4/15
- 1s - loss: 0.2375 - acc: 0.8930 - val_loss: 0.2524 - val_acc: 0.8870
Epoch 5/15
- 1s - loss: 0.2291 - acc: 0.8960 - val_loss: 0.2423 - val_acc: 0.8905
Epoch 6/15
- 1s - loss: 0.2229 - acc: 0.8976 - val_loss: 0.2495 - val_acc: 0.8870
12602/12602 [==============================] - 0s 21us/step
loss 23.95 acc 88.81
Accuracy check:
prediction = model.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", acc_score) # 0.887
Saving the model:
model.save("new keras fake news acc 88.7.h5")
scaler_filename = "keras nn scaler.save"
joblib.dump(scaler, scaler_filename)
I have saved that model and that scaler.
When I load that model and that scaler, and when I want to make prediction, I get accuracy of 52%, and thats very low because I had accuracy of 88.7% when I was training that model.
I applied .transform on my new data for testing.
validation_df = pd.read_csv("validation.csv")
validation_features = validation_df.iloc[:,:-1]
validation_results = validation_df.iloc[:,-1].tolist()
scaler = joblib.load("keras nn scaler.save")
validation_features = scaler.transform(validation_features)
my_model_1 = load_model("new keras fake news acc 88.7.h5")
prediction = my_model_1.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - much lower
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", round(acc_score,2)) # 0.52
Can you tell me what am I doing wrong, I have read a lot about this on github and stackoverflow but I couldnt find the answer?

It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:
If i use fit_transform on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).
Although using fit_transform on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.
In other words, I have actually seen many cases where people erroneously apply such fit_transform approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.
But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.
So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.
In particular, scikit-learn's StandardScaler uses
z = (x - u) / s
for the transformation, where u is the mean value and s the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.

ResNet50 Model is not learning with transfer learning in keras

I am trying to perform transfer learning on ResNet50 model pretrained on Imagenet weights for PASCAL VOC 2012 dataset. As it is a multi label dataset, I am using sigmoid activation function in the final layer and binary_crossentropy loss. The metrics are precision,recall and accuracy. Below is the code I used to build the model for 20 classes (PASCAL VOC has 20 classes).
img_height,img_width = 128,128
num_classes = 20
#If imagenet weights are being loaded,
#input must have a static square shape (one of (128, 128), (160, 160), (192, 192), or (224, 224))
base_model = applications.resnet50.ResNet50(weights= 'imagenet', include_top=False, input_shape= (img_height,img_width,3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dropout(0.7)(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
for layer in model.layers[-2:]:
layer.trainable=True
for layer in model.layers[:-3]:
layer.trainable=False
adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
#print(model.summary())
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y, random_state=42, test_size=0.2)
savingcheckpoint = ModelCheckpoint('ResnetTL.h5',monitor='val_loss',verbose=1,save_best_only=True,mode='min')
earlystopcheckpoint = EarlyStopping(monitor='val_loss',patience=10,verbose=1,mode='min',restore_best_weights=True)
model.fit(X_train, Y_train, epochs=epochs, validation_data=(X_test,Y_test), batch_size=batch_size,callbacks=[savingcheckpoint,earlystopcheckpoint],shuffle=True)
model.save_weights('ResnetTLweights.h5')
It ran for 35 epochs until earlystopping and the metrics are as follows (without Dropout layer):
loss: 0.1195 - accuracy: 0.9551 - precision_m: 0.8200 - recall_m: 0.5420 - val_loss: 0.3535 - val_accuracy: 0.8358 - val_precision_m: 0.0583 - val_recall_m: 0.0757
Even with Dropout layer, I don't see much difference.
loss: 0.1584 - accuracy: 0.9428 - precision_m: 0.7212 - recall_m: 0.4333 - val_loss: 0.3508 - val_accuracy: 0.8783 - val_precision_m: 0.0595 - val_recall_m: 0.0403
With dropout, for a few epochs, the model is reaching to a validation precision and accuracy of 0.2 but not above that.
I see that precision and recall of validation set is pretty low compared to training set with and without dropout layer. How should I interpret this? Does this mean the model is overfitting. If so, what should I do? As of now the model predictions are quite random (totally incorrect). The dataset size is 11000 images.

Please can you modify code as below and try to execute
From:
predictions = Dense(num_classes, activation= 'sigmoid')(x)
To:
predictions = Dense(num_classes, activation= 'softmax')(x)
From:
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
To:
model.compile(optimizer= adam, loss='categorical_crossentropy', metrics=['accuracy',precision_m,recall_m])

This question is pretty old, but I'll answer it in case it is helpful to someone else:
In this example, you froze all layers except by the last two (Global Average Pooling and the last Dense one). There is a cleaner way to achieve the same state:
rn50 = applications.resnet50.ResNet50(weights='imagenet', include_top=False,
input_shape=(img_height, img_width, 3))
x = rn50.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
rn50.trainable = False # <- this
model.compile(...)
In this case, features are being extracted from the ResNet50 network and fed to a linear softmax classifier, but the ResNet50's weights are not being trained. This is called feature extraction, not fine-tuning.
The only weights being trained are from your classifier, which was instantiated with weights drawn from a random distribution, and thus should be entirely trained. You should be using Adam with its default learning rate:
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
So you can train it for a few epochs, and, once it's done, then you unfreeze the backbone and "fine-tune" it:
backbone.trainable = False
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
model.fit(epochs=50)
backbone.trainable = True
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.00001))
model.fit(epochs=60, initial_epoch=50)
There is a nice article about this on Keras website: https://keras.io/guides/transfer_learning/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

can the accuracy measure val_acc be trusted? - python

Related

Why is training-set accuracy during fit() different to accuracy calculated right after using predict on same data?

Classification for a case with multiple labels per data point [duplicate]

Discrepancy in the results of model.evaluate and model.predict in Keras

Big difference in accuracy after training the model and after loading that model

ResNet50 Model is not learning with transfer learning in keras

Categories

Resources