Linear Classifier Tensorflow2 not training(1 neuron model)

Linear Classifier Tensorflow2 not training(1 neuron model) - python

I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,

One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto

Related

Error while using VGG16 pretrained model for grayscale images

I am working on sign language detection using VGG16 pre-trained model with grayscale images. When I am trying to run the model.fit command, I am getting the following error.
CLARIFICATION
I already have images as RGB form but I want to use them as grayscale to check if they would work with grayscale. The reason being, with color images, I am not getting the accuracy which I am expecting. It is having test accuracy of max 40% only and getting overfitted on dataset.
Also, this is my model command
vgg = VGG16(input_shape= [128, 128] + [3], weights='imagenet', include_top=False)
This is my model.fit command
history = model.fit(
train_x,
train_y,
epochs=15,
validation_data=(test_x, test_y),
callbacks=[early_stop, checkpoint],
batch_size=32,shuffle=True)
I am new to working with pre-trained models. When I am trying to run the code with color images with 3 channels, my model is getting into overfitting and val_accuracy doesn't rise above 40% so I want to give try the grayscale images as I have added many data augmentation techniques but accuracy is not improving. Any leads are welcomed as I am stuck into this for long time now.

The simplest (and likely fastest) solution I can think of is to just convert your image to rgb. You can do this as part of your model.
model = Sequential([
tf.keras.layers.Lambda(tf.image.grayscale_to_rgb),
vgg
])
This will fix your issue with VGG. I also see that you're missing the last dimensionality for your images. Images in grayscale are expected to be of shape [height, width, 1], but you simply have [height, width]. You can fix this using tf.expand_dims:
model = Sequential([
tf.keras.layers.Lambda(
lambda x: tf.image.grayscale_to_rgb(tf.expand_dims(x, -1))
),
vgg,
])
Note that this solution solves the problem in the graph, so it runs online. Meaning, at runtime, you can feed data exactly the same way you have it now (in the shape [128, 128], without a channels dimension) and it will still functionally work. If this is your expected dimensionality during runtime, this will be faster than manipulating your data before throwing it into the model.
By the way, none of this is ideal, given that VGG was trained specifically to work best with color images. Just thought I should add that.

Why are you getting overfitting?
Maybe for different reasons:
Your images and labels don't equally exist in the train, Val, test. (maybe you have images in train and don't have them in test.) Or your train, Val, test data don't stratify correctly and you train your model on a specific area in your data and features.
You Dataset is very small and you need more data.
Maybe you have noise in your datase, first make sure to remove noise from the dataset. (if you have noise, model fit on your noise.)
How can you input grayscale images to VGG16?
For Using VGG16, you need to input 3 channels images. For this reason, you need to concatenate your images like below to get three channels images from grayscale:
image = tf.concat([image, image, image], -1)
Example of training VGG16 on grayscale images from fashion_mnist dataset:
from tensorflow.keras.applications.vgg16 import VGG16
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
train, val, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train[:85%]', 'train[85%:]', 'test']
)
def resize_preprocess(image, label):
image = tf.image.resize(image, (32, 32))
image = tf.concat([image, image, image], -1)
image = tf.keras.applications.densenet.preprocess_input(image)
return image, label
train = train.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
test = test.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
val = val.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
val = val.batch(64).prefetch(tf.data.AUTOTUNE)
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32,32,3))
base_model.trainable = False ## Not trainable weights
model = tf.keras.Sequential()
model.add(base_model)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
model.summary()
fit_callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_accuracy', patience = 4, restore_best_weights = True)]
history = model.fit(train, steps_per_epoch=150, epochs=5, batch_size=64, validation_data=val, callbacks=fit_callbacks)
model.evaluate(test)
Output:
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 1, 1, 512) 14714688
flatten_3 (Flatten) (None, 512) 0
dense_9 (Dense) (None, 1024) 525312
dropout_6 (Dropout) (None, 1024) 0
dense_10 (Dense) (None, 256) 262400
dropout_7 (Dropout) (None, 256) 0
dense_11 (Dense) (None, 10) 2570
=================================================================
Total params: 15,504,970
Trainable params: 790,282
Non-trainable params: 14,714,688
_________________________________________________________________
Epoch 1/5
150/150 [==============================] - 6s 35ms/step - loss: 0.8056 - accuracy: 0.7217 - val_loss: 0.5433 - val_accuracy: 0.7967
Epoch 2/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5560 - accuracy: 0.7965 - val_loss: 0.4772 - val_accuracy: 0.8224
Epoch 3/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5287 - accuracy: 0.8080 - val_loss: 0.4698 - val_accuracy: 0.8234
Epoch 4/5
150/150 [==============================] - 5s 32ms/step - loss: 0.5012 - accuracy: 0.8149 - val_loss: 0.4334 - val_accuracy: 0.8329
Epoch 5/5
150/150 [==============================] - 4s 25ms/step - loss: 0.4791 - accuracy: 0.8315 - val_loss: 0.4312 - val_accuracy: 0.8398
157/157 [==============================] - 2s 15ms/step - loss: 0.4457 - accuracy: 0.8325
[0.44566288590431213, 0.8324999809265137]

How to increase the accuracy of this CNN Model?

I have tried many combinations in the values for this model.
Can 2D Convolutions be used instead of 1D for the following case?
How can accuracy be improved for the training dataset?
shape of original dataset : (343889, 80)
shape of - training dataset : (257916, 80)
shape of - training Labels : (257916,)
shape of - testing dataset : (85973, 80)
shape of - testing Labels : (85973,)
The model is
inputShape = (80,1,)
model = Sequential()
model.add(Input(shape=inputShape))
model.add(Conv1D(filters=80, kernel_size=30, activation='relu'))
model.add(MaxPooling1D(40))
model.add(Dense(60))
model.add(Dense(9))
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
Model's summary
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_11 (Conv1D) (None, 51, 80) 2480
max_pooling1d_9 (MaxPooling (None, 1, 80) 0
1D)
dense_8 (Dense) (None, 1, 60) 4860
dense_9 (Dense) (None, 1, 9) 549
=================================================================
Total params: 7,889
Trainable params: 7,889
Non-trainable params: 0
_________________________________________________________________
The training is given below.
Epoch 1/5
8060/8060 [==============================] - 56s 7ms/step - loss: -25.7724 - accuracy: 0.0015
Epoch 2/5
8060/8060 [==============================] - 44s 5ms/step - loss: -26.7578 - accuracy: 0.0011
Epoch 3/5
8060/8060 [==============================] - 43s 5ms/step - loss: -26.7578 - accuracy: 0.0011

You can try a couple of things to adjust your model performance.
Firstly Try Using Conv2D layers
Modify kernel size to (3,3)
Change optimiser to SGD and loss to Sparse Categorical Crossentropy
Try the following, run the model for a longer epoch and let's see how that goes.

Since you want to classify something, your model is not doing so (at least not directly).
The problems I can see at first sight are:
You use no activation functions (especially in the last layer)
You use 9 output neurons, but binary crossentropy loss.
First of all, in your shoes, I would revise the classification problems with neural network.
About your model, a starting point could be this edit
inputShape = (80,1,)
model = Sequential()
model.add(Conv1D(filters=80, kernel_size=30, activation='relu', input_shape = inputShape))
model.add(MaxPooling1D(40))
model.add(Dense(60), activation='relu') # note activation function
model.add(Dense(9), activation='softmax') # note activation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy']) # note the loss function
I am not saying this is going to solve your problem (without knowing data it is impossible) but it is a start, then you have to work on fighting overfitting, hyperparameters tuning, etc.

Keras LSTM batched time-series input giving constant accuracy

I have a time-series data set with 1 feature that represents multiple games. The goal is to classify each game as a win or loss - binary classification. Each game has 61 rows, and the feature has been scaled to be between 0 and 1:
x_train = array([[0.55340617],
[0.54956823],
[0.54588505],
...,
[0.87483364],
[0.8956947 ],
[0.90343248]])
y_train = array([0, 0, 0, ..., 0, 0, 0])
The problem should be quite easy, and I was expecting around 70% accuracy based on other models.
I'm trying to train an LSTM with the data. I think I should be resetting the state on every game, and so the batch shape is defined by 61 timesteps, and 1 feature:
timesteps = 61
n_features = 1
# Reshape data for LSTM
x_train = x_train.reshape(len(x_train)//timesteps, timesteps, n_features)
# Get class of each game
y_train = x_test[0: len(y_train): timesteps]
model = Sequential()
# Hidden layer
n_neurons = 8
model.add(LSTM(n_neurons,
input_shape=(timesteps, n_features),
stateful=False))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(x_train,
y_train,
epochs=3,
batch_size=1)
But when I train the model, the accuracy remains constant:
Epoch 1/3
301/301 [==============================] - 7s 23ms/step - loss: 8.2524 - accuracy: 0.4618
Epoch 2/3
301/301 [==============================] - 6s 21ms/step - loss: 8.2524 - accuracy: 0.4618
Epoch 3/3
301/301 [==============================] - 6s 21ms/step - loss: 8.2524 - accuracy: 0.4618
I have tried switching the optimiser to 'RMSprop', but I get the exact same result? I think the problem lies with the batch shape?
Any help would be greatly appreciated!
EDIT: Fixed some typos in the code. Sorry!

Why is my neural network validation accuracy higher than my training accuracy and they both become constant?

I have built a model and when I train it, my validation loss is smaller than my training one and the validation accuracy is higher than the training one. Is the model being overfitted? Am I doing something wrong? Can someone please look at my model and see if there is anything wrong with it? Thank you.
input_text = Input(shape=(200,), dtype='int32', name='input_text')
meta_input = Input(shape=(2,), name='meta_input')
embedding = Embedding(input_dim=len(tokenizer.word_index) + 1,
output_dim=300,
input_length=200)(input_text)
lstm = Bidirectional(LSTM(units=128,
dropout=0.5,
recurrent_dropout=0.5,
return_sequences=True),
merge_mode='concat')(embedding)
pool = GlobalMaxPooling1D()(lstm)
dropout = Dropout(0.5)(pool)
text_output = Dense(n_codes, activation='sigmoid', name='aux_output')(dropout)
output = concatenate([text_output, meta_input])
output = Dense(n_codes, activation='relu')(output)
main_output = Dense(n_codes, activation='softmax', name='main_output')(output)
model = Model(inputs=[input_text,meta_input], outputs=[output])
optimer = Adam(lr=.001)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
model.fit([X1_train, X2_train], [y_train],
validation_data=([X1_valid,X2_valid], [y_valid]),
batch_size=64, epochs=20, verbose=1)
Here is the output:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_text (InputLayer) [(None, 200)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 200, 300) 889500 input_text[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) (None, 200, 256) 439296 embedding[0][0]
__________________________________________________________________________________________________
global_max_pooling1d (GlobalMax (None, 256) 0 bidirectional[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 256) 0 global_max_pooling1d[0][0]
__________________________________________________________________________________________________
aux_output (Dense) (None, 545) 140065 dropout[0][0]
__________________________________________________________________________________________________
meta_input (InputLayer) [(None, 2)] 0
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 547) 0 aux_output[0][0]
meta_input[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 545) 298660 concatenate[0][0]
==================================================================================================
Total params: 1,767,521
Trainable params: 1,767,521
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 11416 samples, validate on 2035 samples
Epoch 1/20
11416/11416 [==============================] - 158s 14ms/sample - loss: 0.0955 - accuracy: 0.9929 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 2/20
11416/11416 [==============================] - 152s 13ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 3/20
11416/11416 [==============================] - 209s 18ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 4/20
11416/11416 [==============================] - 178s 16ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 5/20
11416/11416 [==============================] - 211s 18ms/sample - loss: 0.0562 - accuracy: 0.9963 -
val_loss: 0.0559 - val_accuracy: 0.9964
Epoch 6/20

Overfitting would be when acc is higher than val_acc and loss lower than val_loss.
However, it looks for me that your validation dataset is not representative for the overall distribution in the dataset. For whatever reason the results of your validation dataset is constant and even constantly higher.
You are doing a binary classification. Be aware of class imbalance!
E.g. if 99% of your sample is class 0 and 1% is class 1,
then, even if your model doesn't learn anything, it will have 99% accuracy if it always predicts 0 without ever once predicting a 1.
Imagine your (mostly random) split of data created a datset with 99.5% of the validation data will be class 0 and 0.5% class 1.
Imagine in worst case your model doesn't learn anything. And spits out ("predicts") always a 0. Then train acc will be constantly 0.99 and a certain loss. And val_acc will be constantly 0.995.
For me puzzling is that your performance measures are constant. That is ALWAYS bad. Because usually if the model learns sth and even if it overfits there will be stochastic noise always.
No book tells you the following - no beginner book. And I learned this by experience: You have to put shuffle=True in your model.fit().
Because for me it seems you are training in a way that you present the model first only samples of the one class and then the samples of another class. Mixing up samples of the one and the other class perturbs the model well enough and avoids it to get stuck in some local minima.
Or sometimes I got such constant results even when shuffling.
In that case, I just try to choose another random split which then works better. (So: try other splits!)

The difference is marginal so I would not worry.
In general what might be happening is that by incident during the random splitting between train and validation sets the examples selected in the validation set are "easier" to guess than the ones in the training set.
You could overcome this by developing a cross validation strategy as following:
Take 10% of the dataset out (holdout) and consider it you test set.
With the remaining dataset make a 80%-20% split for training and validation sets.
Repeat the 80-20 tranining validation split 5 times.
Train 5 models on your 5 different train-valid datasets and see what the results are.
You can even compare all 5 models on the test sets just to see what would be the "real" or "closer to reality" accuracy. That might help to see which model generalizes better.
In the end you might even consider to stack them together:
https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/
The fact that both training and validation accuracy looks similar and do not change during the training indicates that the model might be stuck in a local minima.
It is worth to train for more epochs (at least 20) to see if the model can "jump" out of the local minimal with the current learning rate.
If this not solve the problem I would change the learning rate from .001 to .0001 or .00001. This should help the model to converge hopefully to a global minimal.
If this does not solve the problem, there many other parameters/hyperparameters in general which might be useful to check further: number of nodes in the layers, number of layers, optimizer strategy, size and distribution (generality and variance) of the training set...

No, there is nothing wrong, this effect (validation metrics being better than training ones) is common with the use of Dropout, as your network uses.
Dropout adds noise during training, and this noise is not present during validation/testing, so it's natural that training metrics get a bit worse, but validation metrics do not have this noise, and are a bit better due to the improved generalization produced by Dropout.

LSTM accuracy doesn't change no matter what I do

I'm implementing my first Neural Network, it being an LSTM for binary sentiment analysis classification. I've pre-processed the data with lowering the letters, tokenizing and removing most punctuation (keeping only .,').
I'm also using GloVe's 100d pre-trained embeddings for this.
The problem is: Whatever I do the accuracy is terrible and doesn't change with epocs (also doesn't change when changing the LSTM architecture)
I've tried changing the optimizer and its learning rate, adding more neurons to the LSTM, changing number of epochs and batch size.
Nothing seems to work
def setLSTM(data, stopRem, stemm, lemma, negHand):
#pre-processing data
data = pre_processing(data, stopRem, stemm, lemma, negHand)
print(data[1])
#splitting data
X_train, X_test, y_train, y_test = datasplit(data)
#Setting the words as unique indexes (max 10k unique indexes)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
#getting vocabulary
vocab = tokenizer.word_index.items()
print(vocab)
vocab_size = vocab_size = len(tokenizer.word_index) + 1
#maxlen = Maxlen is correspondes to the maximum tweet length (so that we can add padding to shorter ones)
maxlen = len(max((X_train + X_test)))
print("Maxlen is: ",maxlen)
#Padding the sequences to guarantee that all tweets have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
#Create embedding matrix with zeros (because some of the vocabulary might not exist in the embeddings)
#and adding the embeddings we have
embedding_matrix = zeros((vocab_size, 100))
for idx,word in vocab:
embedding_vector = embeddings.get(word)
if embedding_vector is not None:
embedding_matrix[idx] = embedding_vector
#creating the model with its layers (embedding layer, lstm layer, dense layer)
model = Sequential()
#The embedding layer has "trainable=False" because we're using pre-trained embeddings
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=maxlen, trainable=False)
model.add(embedding_layer)
model.add(Dropout(0.2))
#Adding an LSTM layer with 128 neurons
model.add(LSTM(units=100))
model.add(Dropout(0.2))
#Adding dense layer with sigmoid activation
model.add(Dense(1, activation='sigmoid'))
#opt = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=False)
#Compiling model ("loss='binary_crossentropy'" because we're dealing with a binary classification problem)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, batch_size=64, epochs=5, verbose=1, validation_split=0.2)
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])
setLSTM(tweets,False,False,False,False)
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_9 (Embedding) (None, 13, 100) 1916600
_________________________________________________________________
dropout_1 (Dropout) (None, 13, 100) 0
_________________________________________________________________
lstm_9 (LSTM) (None, 100) 80400
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 1,997,101
Trainable params: 80,501
Non-trainable params: 1,916,600
_________________________________________________________________
None
Train on 10852 samples, validate on 2713 samples
Epoch 1/5
10852/10852 [==============================] - 5s 448us/step - loss: 0.6920 - acc: 0.5275 - val_loss: 0.6916 - val_acc: 0.5404
Epoch 2/5
10852/10852 [==============================] - 4s 360us/step - loss: 0.6917 - acc: 0.5286 - val_loss: 0.6908 - val_acc: 0.5404
Epoch 3/5
10852/10852 [==============================] - 4s 365us/step - loss: 0.6920 - acc: 0.5286 - val_loss: 0.6907 - val_acc: 0.5404
Epoch 4/5
10852/10852 [==============================] - 4s 382us/step - loss: 0.6916 - acc: 0.5286 - val_loss: 0.6903 - val_acc: 0.5404
Epoch 5/5
10852/10852 [==============================] - 4s 383us/step - loss: 0.6916 - acc: 0.5264 - val_loss: 0.6906 - val_acc: 0.5404
4522/4522 [==============================] - 1s 150us/step
Test Score: 0.6925433831950933
Test Accuracy: 0.5176913142204285

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.