Classification for a case with multiple labels per data point [duplicate] - python

This question already has answers here:
How does Keras handle multilabel classification?
(2 answers)
Closed 2 years ago.
In classification problems in machine learning, typically we use a single label for a single data point. How can we go ahead with multiple labels for a single data point?
As an example, suppose a character recognition problem. As the labels for a single image of a letter, we have the encoded values for both the letter and the font family. Then there are two labels per data point.
How can we make a keras deep learning model for this? Which hyperparameters should be changed compared with a single labelled problem?

In short, you let the model output two predictions.
...
previous-to-last layer
/ \
label_1 label_2
Then you could do total_loss = loss_1(label_1) + loss_2(label_2).
With loss_1 and loss_2 of your choosing.
You'd then backpropagate the total_loss through the network to finetune the weights.
More in-depth example: https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff.

In comparison with a standard multi-class task, you just need to change your activation function to 'sigmoid':
import tensorflow as tf
from tensorflow.keras.layers import Dense
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
y = tf.one_hot(y, depth=3).numpy()
y[:, 0] = 1.
ds = tf.data.Dataset.from_tensor_slices((X, y)).shuffle(25).batch(8)
model = tf.keras.Sequential([
Dense(16, activation='relu'),
Dense(32, activation='relu'),
Dense(3, activation='sigmoid')])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(ds, epochs=25)
Epoch 25/25
1/19 [>.............................] - ETA: 0s - loss: 0.0418 - acc: 1.0000
19/19 [==============================] - 0s 2ms/step - loss: 1.3129 - acc: 1.0000

Related

Why is training-set accuracy during fit() different to accuracy calculated right after using predict on same data?

Have written a basic deep learning model in Tensorflow - Keras.
Why is the training-set accuracy as reported at the end of training (0.4097) differs to that reported directly afterwards with a direct calculation on the same training data using the predict function (or using evaluate, which gives the same number) = 0.6463?
MWE below; output directly after.
from extra_keras_datasets import kmnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np
# Model configuration
no_classes = 10
# Load KMNIST dataset
(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')
# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape
# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)
# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)
input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Normalize input data
input_train = input_train / 255
input_test = input_test / 255
# Create the model
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(no_classes, activation='softmax'))
# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer=tensorflow.keras.optimizers.Adam(),
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=1,
verbose=1)
prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))
model.evaluate(input_train, target_train, verbose=2)
Last couple of lines of output:
30/30 [==============================] - 0s 3ms/step - loss: 1.8336 - accuracy: 0.4097
Prediction accuracy = 0.6463166666666667
1875/1875 - 1s - loss: 1.3406 - accuracy: 0.6463
Edit.
The initial answers below have solved my first problem by pointing out that the batch size matters when you only run 1 epoch. When running small batch sizes (or batch size = 1), or more epochs, you can push the post-fitting prediction accuracy pretty close to the final accuracy thrown out in fitting itself. Which is good!
I originally asked this question because I was having trouble with a more complex model.
I'm still have trouble with understanding what's happening in this case (and yes, it involves batch normalisation). To get my MWE, replace everything below 'create the model' above with the code below to implement a few fully connected layers with batch normalisation.
When you run two epochs of this - you'll see really stable accuracies from all 30 mini-batches (30 because 60,000 in training set divided by 2000 in each batch). I see very consistently 83% accuracy across the whole second epoch of training.
But the prediction after fitting is an abysmal 10% or thereabouts after doing this. Can anyone explain this?
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = input_shape))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(no_classes, activation='softmax'))
# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer=tensorflow.keras.optimizers.Adam(),
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=2,
verbose=1)
prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))
model.evaluate(input_train, target_train, verbose=2, batch_size = batch_size)
30/30 [==============================] - 46s 2s/step - loss: 0.5567 - accuracy: 0.8345
Prediction accuracy = 0.10098333333333333
One reason this can happen, is the last accuracy reported takes into account the entire epoch, with its parameters non constant, and still being optimized.
When evaluating the model, the parameters stop changing, and they remain in their final (hopefully, most optimized) state. Unlike during the last epoch, for which the parameters were in all kinds of (hopefully, less optimized) states, more so at the start of the epoch.
Deleted because I now see you didn't use batch norm in this case.
I am assuming this is due to BatchNormalization.
See for example here
During training, a moving average is used.
During inference, we already have the normalization parameters
This is likely to be the cause of the difference.
Please try without it, and see if still such drastic differences exist.
Just adding to #Gulzar answer: this effect can be very clear because OP used only one epoch (a lot of parameters are changing in the very beginning of trainning), batch size not equal in evaluate method (default to 32) and fit method, batch size lot less than whole data (meaning a lot of updating during each epoch).
Just adding more epochs to same experiment would attenuate this effect.
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=2000,
epochs=40,
verbose=1)
Result
Epoch 40/40
30/30 [==============================] - 0s 11ms/step - loss: 0.5663 - accuracy: 0.8339
Prediction accuracy = 0.8348
1875/1875 - 2s - loss: 0.5643 - accuracy: 0.8348 - 2s/epoch - 1ms/step
[0.5643048882484436, 0.8348000049591064]

Tensorflow model not improving

I'm just starting learning ML/Tensorflow/etc, so I'm pretty novice and still don't really know what the troubleshooting method is like. I'm currently having an issue with my model as it doesn't seem to really ever improve. For instance, the output appears as
Epoch 1/10
4/4 [==============================] - 41s 10s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 2/10
4/4 [==============================] - 12s 3s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 3/10
4/4 [==============================] - 10s 3s/step - loss: 0.8833 - accuracy: 0.4300
Epoch 7/1000
4/4 [==============================] - 10s 3s/step - loss: 0.8833 - accuracy: 0.4300
The main aspect that worries me is that it doesn't change at all which makes me think I'm doing something completely wrong. To give some more context and code, I am trying to do some time series classification. Basically, the input is the normalized time series of the song and the net should classify if it's classical music (output of 1 means it is, output of 0 means it isn't).
This is the current model I am trying.
model = keras.Sequential([
keras.layers.Conv1D(filters=100, kernel_size=10000, strides=5000, input_shape=(1323000, 1), activation='relu'),
keras.layers.Conv1D(filters=100, kernel_size=10, strides=3, input_shape=(263, 100), activation='relu'),
keras.layers.LSTM(1000),
keras.layers.Dense(500, activation='relu'),
keras.layers.Dense(250, activation='relu'),
keras.layers.Dense(1, activation='softmax')
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
This is how I get the training data (x and y are dictionaries with time series from different songs).
minute = 1323000
x_train = np.zeros((100, minute, 1))
y_train = np.zeros((100,))
for kk in range(0, 100):
num = randint(0, 41)
ts = x[num]
start = randint(0, len(ts) - minute)
x_train[kk, :] = np.array([ts[start:(start + minute)]]).T
y_train[kk] = 1 - y[num]
and then training:
for kk in range(1, 1000):
x_train, y_train = create_training_set(x, y)
model.fit(x_train, y_train, epochs=1000)
I looked at some similar questions asked, however, I was already doing what was suggested or the advice was too specific for the asker. I also tried some relatively different models/activators, so I don't think it's because the model is too complicated and the data is already normalized, so that shouldn't be an issue. But, as I said, I'm knew to all this and could be wrong.
The usage of softmax activation in a single-node last layer is not correct. Additionally, the argument from_logits=True in your loss definition means that the model is expected to produce logits, and not probabilities (which are generally produced by softmax and sigmoid final activations).
So, you should change your last layer to
keras.layers.Dense(1) # linear activation by default
Alternatively, you could change both your last layer and your loss function respectively to
keras.layers.Dense(1, activation='sigmoid')
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False)
According to the docs, usage of from_logits=True may be more numerically stable, and probably this is the reason it is preferred in the standard Tensorflow classification tutorials (see here and here).
So I can't promise this is going to work, because I don't know your data and this is a very weird architecture, but here are a few things that seem wrong:
Last dense layer should have sigmoid activation function
from_logits should be False

Big difference in accuracy after training the model and after loading that model

I made Keras NN model for fake news detection. My features are avg length of the words, avg length of the sentence, number of punctuation signs, number of capital words, number of questions etc. I have 34 features. I have one output, 0 and 1 (0 for fake and 1 for real news).
I have used 50000 samples for training, 10000 for testing and 2000 for validation. Values of my data are going from -1 to 10, so there is not big difference between values. I have used Standard Scaler like this:
x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.20, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
validation_features = scaler.transform(validation_features)
My NN:
model = Sequential()
model.add(Dense(34, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=0, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=64, validation_data=(validation_features, validation_results), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
Results:
Train on 50407 samples, validate on 2000 samples
Epoch 1/15
- 3s - loss: 0.3293 - acc: 0.8587 - val_loss: 0.2826 - val_acc: 0.8725
Epoch 2/15
- 1s - loss: 0.2647 - acc: 0.8807 - val_loss: 0.2629 - val_acc: 0.8745
Epoch 3/15
- 1s - loss: 0.2459 - acc: 0.8885 - val_loss: 0.2602 - val_acc: 0.8825
Epoch 4/15
- 1s - loss: 0.2375 - acc: 0.8930 - val_loss: 0.2524 - val_acc: 0.8870
Epoch 5/15
- 1s - loss: 0.2291 - acc: 0.8960 - val_loss: 0.2423 - val_acc: 0.8905
Epoch 6/15
- 1s - loss: 0.2229 - acc: 0.8976 - val_loss: 0.2495 - val_acc: 0.8870
12602/12602 [==============================] - 0s 21us/step
loss 23.95 acc 88.81
Accuracy check:
prediction = model.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", acc_score) # 0.887
Saving the model:
model.save("new keras fake news acc 88.7.h5")
scaler_filename = "keras nn scaler.save"
joblib.dump(scaler, scaler_filename)
I have saved that model and that scaler.
When I load that model and that scaler, and when I want to make prediction, I get accuracy of 52%, and thats very low because I had accuracy of 88.7% when I was training that model.
I applied .transform on my new data for testing.
validation_df = pd.read_csv("validation.csv")
validation_features = validation_df.iloc[:,:-1]
validation_results = validation_df.iloc[:,-1].tolist()
scaler = joblib.load("keras nn scaler.save")
validation_features = scaler.transform(validation_features)
my_model_1 = load_model("new keras fake news acc 88.7.h5")
prediction = my_model_1.predict(validation_features , batch_size=64)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - much lower
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", round(acc_score,2)) # 0.52
Can you tell me what am I doing wrong, I have read a lot about this on github and stackoverflow but I couldnt find the answer?
It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:
If i use fit_transform on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).
Although using fit_transform on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.
In other words, I have actually seen many cases where people erroneously apply such fit_transform approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.
But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.
So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.
In particular, scikit-learn's StandardScaler uses
z = (x - u) / s
for the transformation, where u is the mean value and s the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.

can't train Tensorflow on a simple dataset

I'm trying to learn a bit about Tensorflow/Machine Learning. As a starting point, I'm trying to create a model that is trained on a simple 1-D function (y=x^2) and see how it behaves for other inputs outside of the training range.
The problem I'm having is that the training function doesn't really ever improve. I'm sure it's due to a lack of understanding and/or misconfiguration on my part, but there really doesn't seem to be any sort of "baby's first machine learning" out there that deals with a dataset of a known form.
My code is pretty simple, and is borrowed from TensorFlow's introduction notebook here
import tensorflow as tf
import numpy as np
# Load the dataset
x_train = np.linspace(0,10,1000)
y_train = np.power(x_train,2.0)
x_test = np.linspace(8,12,100)
y_test = np.power(x_test,2.0)
# (x_train, y_train), (x_test, y_test) = mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0
"""Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:"""
from tensorflow.keras import layers
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
"""Train and evaluate the model:"""
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
and I get output like this:
Train on 1000 samples
Epoch 1/5
1000/1000 [==============================] - 0s 489us/sample - loss: 1996.3631 - mae: 33.2543
Epoch 2/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3540 - mae: 33.2543
Epoch 3/5
1000/1000 [==============================] - 0s 36us/sample - loss: 1996.3495 - mae: 33.2543
Epoch 4/5
1000/1000 [==============================] - 0s 33us/sample - loss: 1996.3474 - mae: 33.2543
Epoch 5/5
1000/1000 [==============================] - 0s 38us/sample - loss: 1996.3450 - mae: 33.2543
100/1 - 0s - loss: 15546.3655 - mae: 101.2603
Like I said, I'm positive that this is a misconfiguration/lack of understanding on my part. I really learn best when I can take something this simple and incrementally make it more complex rather than starting on something whose patterns I can't readily identify, but I can't find any tutorials, etc that take this approach. Can anyone recommend either a good tutorial source, or just educate me on what I am doing wrong here?
I think you have mix of the problems here. I try to explain to you one by one:
First of all, the problem you want to solve is to learn the function f=x^2. So this can fit into a regression task. For a regression task ( and any other tasks ^_^ ) you should pay attention to the activation function and also to what you really try to predict.
You have chosen softmax for activation function, which does not make sense at all. I suggest to replace it with a linear activation function ( if you remove it completely, it will be considered linear automatically by TF/Keras).
On the other hand, why you have a 10 DENSE at the last layer? Per each entry, you want to predict one value ( for 5 as the input value you wanna predict 25, right),
so one DENSE should be enough to generate your value.
On the other hand, since your network is not big, I would start by SGD as the optimizer, but Adam might be good as well. Additionally, for the problem you are trying to solve, I do not believe you really need 128 DENSE as the first hidden layer. you can start by a smaller number and look at how it goes. I would start by 3-4 DENSE as a start
Long story short, let's replace your model with these lines, and hopefully, it gets working
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1)
])

Training accuracy while fitting the model not reflected in confusion matrix

I am training an image classification model to classify certain images containing mountain, waterfall and people classes. Currently, I am using Vgg-net (transfer learning) to train this data. While training, I am getting almost 93% accuracy on training data and 90% accuracy on validation data. However, when I want to check the classes being wrongly classified during training by using confusion matrix on the training data, the accuracy seems to be much less. Only about 30% of the data is classified correctly.
I have tried checking the confusion matrix on other similar image classification problems but there it seems to be showing the accuracy that we see while training.
Code to create ImageDataGenerator objects for training and validation
from keras.applications.inception_v3 import preprocess_input, decode_predictions
#Train DataSet Generator with Augmentation
print("\nTraining Data Set")
train_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
train_flow = train_generator.flow(
train_images, train_labels,
batch_size = BATCH_SIZE
)
#Validation DataSet Generator with Augmentation
print("\nValidation Data Set")
val_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
val_flow = val_generator.flow(
validation_images,validation_labels,
batch_size = BATCH_SIZE
)
Code to build the model and compile it
# Initialize InceptionV3 with transfer learning
base_model = applications.vgg16.VGG16(weights='imagenet',
include_top=False,
input_shape=(WIDTH, HEIGHT,3))
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# and a dense layer
x = Dense(1024, activation='relu')(x)
predictions = Dense(len(categories), activation='softmax')(x)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer=optimizers.RMSprop(lr=1e-4), metrics=['accuracy'], loss='categorical_crossentropy')
model.summary()
Code to fit the model to the training dataset and validate using validation dataset
import math
top_layers_file_path=r"/content/drive/My Drive/intel-image-classification/intel-image-classification/top_layers.iv3.hdf5"
checkpoint = ModelCheckpoint(top_layers_file_path, monitor='loss', verbose=1, save_best_only=True, mode='min')
tb = TensorBoard(log_dir=r'/content/drive/My Drive/intel-image-classification/intel-image-classification/logs', batch_size=BATCH_SIZE, write_graph=True, update_freq='batch')
early = EarlyStopping(monitor="loss", mode="min", patience=5)
csv_logger = CSVLogger(r'/content/drive/My Drive/intel-image-classification/intel-image-classification/logs/iv3-log.csv', append=True)
history = model.fit_generator(train_flow,
epochs=30,
verbose=1,
validation_data=val_flow,
validation_steps=math.ceil(validation_images.shape[0]/BATCH_SIZE),
steps_per_epoch=math.ceil(train_images.shape[0]/BATCH_SIZE),
callbacks=[checkpoint, early, tb, csv_logger])
training steps show the following accuracy:
Epoch 1/30
91/91 [==============================] - 44s 488ms/step - loss: 0.6757 - acc: 0.7709 - val_loss: 0.4982 - val_acc: 0.8513
Epoch 2/30
91/91 [==============================] - 32s 349ms/step - loss: 0.4454 - acc: 0.8395 - val_loss: 0.3980 - val_acc: 0.8557
.
.
Epoch 20/30
91/91 [==============================] - 32s 349ms/step - loss: 0.2026 - acc: 0.9238 - val_loss: 0.2684 - val_acc: 0.8940
.
.
Epoch 30/30
91/91 [==============================] - 32s 349ms/step - loss: 0.1739 - acc: 0.9364 - val_loss: 0.2616 - val_acc: 0.8984
Ran predictions on the training dataset itself:
import math
import numpy as np
predictions = model.predict_generator(
train_flow,
verbose=1,
steps=math.ceil(train_images.shape[0]/BATCH_SIZE))
predicted_classes = [x[0] for x in enc.inverse_transform(predictions)]
true_classes = [x[0] for x in enc.inverse_transform(train_labels)]
enc - OneHotEncoder
However, the confusion matrix looks like the following:
import sklearn
sklearn.metrics.confusion_matrix(
true_classes,
predicted_classes,
labels=['mountain','people','waterfall'])
Confusion Matrix (Could not upload a better looking picture)
([[315, 314, 283],
[334, 309, 273],
[337, 280, 263]])
The complete code has been uploaded on https://nbviewer.jupyter.org/github/paridhichoudhary/scene-image-classification/blob/master/Classification_v7.ipynb
I believe it's because of the train_generator.flow has shuffle=True by default. This result in predicted_classes doesn't match the train_labels.
Maybe set shuffle=False on train_generator.flow should help this, or instead use something like this might easier to understand.
predicted_classes = []
true_classes = []
for i in range(len(train_flow)): # you can just use `len(train_flow)` instead of `math.ceil(train_images.shape[0]/BATCH_SIZE))`
x, y = train_flow[i] # you can use `train_flow[i]` like this
z = model.predict(x)
for j in range(x.shape[0]):
predicted_classes.append(z[j])
true_classes.append(y[j])
I didn't try this yet but it should work though.

Categories