I'm implementing my first Neural Network, it being an LSTM for binary sentiment analysis classification. I've pre-processed the data with lowering the letters, tokenizing and removing most punctuation (keeping only .,').
I'm also using GloVe's 100d pre-trained embeddings for this.
The problem is: Whatever I do the accuracy is terrible and doesn't change with epocs (also doesn't change when changing the LSTM architecture)
I've tried changing the optimizer and its learning rate, adding more neurons to the LSTM, changing number of epochs and batch size.
Nothing seems to work
def setLSTM(data, stopRem, stemm, lemma, negHand):
#pre-processing data
data = pre_processing(data, stopRem, stemm, lemma, negHand)
print(data[1])
#splitting data
X_train, X_test, y_train, y_test = datasplit(data)
#Setting the words as unique indexes (max 10k unique indexes)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
#getting vocabulary
vocab = tokenizer.word_index.items()
print(vocab)
vocab_size = vocab_size = len(tokenizer.word_index) + 1
#maxlen = Maxlen is correspondes to the maximum tweet length (so that we can add padding to shorter ones)
maxlen = len(max((X_train + X_test)))
print("Maxlen is: ",maxlen)
#Padding the sequences to guarantee that all tweets have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
#Create embedding matrix with zeros (because some of the vocabulary might not exist in the embeddings)
#and adding the embeddings we have
embedding_matrix = zeros((vocab_size, 100))
for idx,word in vocab:
embedding_vector = embeddings.get(word)
if embedding_vector is not None:
embedding_matrix[idx] = embedding_vector
#creating the model with its layers (embedding layer, lstm layer, dense layer)
model = Sequential()
#The embedding layer has "trainable=False" because we're using pre-trained embeddings
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=maxlen, trainable=False)
model.add(embedding_layer)
model.add(Dropout(0.2))
#Adding an LSTM layer with 128 neurons
model.add(LSTM(units=100))
model.add(Dropout(0.2))
#Adding dense layer with sigmoid activation
model.add(Dense(1, activation='sigmoid'))
#opt = Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=False)
#Compiling model ("loss='binary_crossentropy'" because we're dealing with a binary classification problem)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, batch_size=64, epochs=5, verbose=1, validation_split=0.2)
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])
setLSTM(tweets,False,False,False,False)
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_9 (Embedding) (None, 13, 100) 1916600
_________________________________________________________________
dropout_1 (Dropout) (None, 13, 100) 0
_________________________________________________________________
lstm_9 (LSTM) (None, 100) 80400
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_9 (Dense) (None, 1) 101
=================================================================
Total params: 1,997,101
Trainable params: 80,501
Non-trainable params: 1,916,600
_________________________________________________________________
None
Train on 10852 samples, validate on 2713 samples
Epoch 1/5
10852/10852 [==============================] - 5s 448us/step - loss: 0.6920 - acc: 0.5275 - val_loss: 0.6916 - val_acc: 0.5404
Epoch 2/5
10852/10852 [==============================] - 4s 360us/step - loss: 0.6917 - acc: 0.5286 - val_loss: 0.6908 - val_acc: 0.5404
Epoch 3/5
10852/10852 [==============================] - 4s 365us/step - loss: 0.6920 - acc: 0.5286 - val_loss: 0.6907 - val_acc: 0.5404
Epoch 4/5
10852/10852 [==============================] - 4s 382us/step - loss: 0.6916 - acc: 0.5286 - val_loss: 0.6903 - val_acc: 0.5404
Epoch 5/5
10852/10852 [==============================] - 4s 383us/step - loss: 0.6916 - acc: 0.5264 - val_loss: 0.6906 - val_acc: 0.5404
4522/4522 [==============================] - 1s 150us/step
Test Score: 0.6925433831950933
Test Accuracy: 0.5176913142204285
Related
My code is
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28, 5)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(2)])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(X_train, train_labels, epochs=10)
And my output is
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 3920) 0
dense (Dense) (None, 128) 501888
dense_1 (Dense) (None, 2) 258
=================================================================
Total params: 502,146
Trainable params: 502,146
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
219/219 [==============================] - 2s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 2/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 3/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 4/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 5/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 6/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 7/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 8/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 9/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
Epoch 10/10
219/219 [==============================] - 1s 3ms/step - loss: nan - accuracy: 0.0000e+00
<keras.callbacks.History at 0x7f8750280790>
Why do all training accuracy converge to 0? My dataset is
print(X_train.shape)
print(X_test.shape)
(7000, 28, 28, 5)
(3000, 28, 28, 5)
print(train_labels.shape)
(7000, 1)
And I tried other models, including con2D model or logistic regression model, but accuracy is always 0. That's really weird. Does the issue come from my dataset? My train_labels only contains 1s and (-1)s.
Try adjusting the learning rates or the label is not suitable, because the loss function value return is NaN.
First you need to consider the label as an int or float format.
Look into the sample dataset label distributions and vary, have 1 or 2 (1, 2) as your networks required.
If from_logits is enabled, you need to compare the output of the networks with the label and logits shape return. Example (1, 2) with 2 number of the output layer.
The Flatten layer is working in old versions. You should use the Input layer that is suitable with a dataset or you rename it as 'flatten_input' as your input layer name.
The rest is about data and networks suitable tasks, contrast input, and target. Try to add more layers or image alignment to create a contrast of data, but if the data is not an image, but a screen or shared information in resizes scales, networks should match to data.
Sample: Working with the Flatten layer, you need to map of the input name to it.
dataset = {
"flatten_input" :[],
"label" : []
}
dataset["flatten_input"].append(tf.constant(image, shape=(1, 28, 28, 1)))
dataset["label"].append(tf.constant(label, shape=(1, 1, 1, 64)))
Sample: Simply working on the MNIST dataset
import tensorflow as tf
import tensorflow_datasets as tfds
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: DataSets
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
ds = tfds.load('mnist', split='train', shuffle_files=True)
ds = ds.shuffle(1024).batch(64).prefetch(tf.data.experimental.AUTOTUNE)
assert isinstance(ds, tf.data.Dataset)
for example in ds.take(1):
image, label = example["image"], example["label"]
ls_image = []
ls_label = []
for i in range(label.shape[0]):
ls_image.append(tf.constant(image[i], shape=(1, 28, 28, 1)).numpy())
### should reflects the label in number format ###
ls_label.append(tf.constant(0, shape=(1, 1, 1, 1)).numpy())
image = tf.constant( ls_image, shape=(64, 1, 784, 1) )
label = tf.constant( ls_label, shape=(64, 1, 1, 1) )
dataset = tf.data.Dataset.from_tensor_slices(( image, label ))
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(784, 1)),
tf.keras.layers.Dense(256),
tf.keras.layers.Dense(256),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(2)
])
model.summary()
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Optimizer
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
optimizer = tf.keras.optimizers.Nadam(
learning_rate=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-07,
name='Nadam'
)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Loss Fn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
lossfn = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True,
reduction=tf.keras.losses.Reduction.AUTO,
name='sparse_categorical_crossentropy'
)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Summary
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model.compile(optimizer=optimizer, loss=lossfn, metrics=['accuracy'])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model.fit( dataset, epochs=50 )
Output: Numbers happen
Epoch 1/50
51/64 [======================>.......] - ETA: 0s - loss: 7.0123e-09 - accuracy: 1.0000
I am working on sign language detection using VGG16 pre-trained model with grayscale images. When I am trying to run the model.fit command, I am getting the following error.
CLARIFICATION
I already have images as RGB form but I want to use them as grayscale to check if they would work with grayscale. The reason being, with color images, I am not getting the accuracy which I am expecting. It is having test accuracy of max 40% only and getting overfitted on dataset.
Also, this is my model command
vgg = VGG16(input_shape= [128, 128] + [3], weights='imagenet', include_top=False)
This is my model.fit command
history = model.fit(
train_x,
train_y,
epochs=15,
validation_data=(test_x, test_y),
callbacks=[early_stop, checkpoint],
batch_size=32,shuffle=True)
I am new to working with pre-trained models. When I am trying to run the code with color images with 3 channels, my model is getting into overfitting and val_accuracy doesn't rise above 40% so I want to give try the grayscale images as I have added many data augmentation techniques but accuracy is not improving. Any leads are welcomed as I am stuck into this for long time now.
The simplest (and likely fastest) solution I can think of is to just convert your image to rgb. You can do this as part of your model.
model = Sequential([
tf.keras.layers.Lambda(tf.image.grayscale_to_rgb),
vgg
])
This will fix your issue with VGG. I also see that you're missing the last dimensionality for your images. Images in grayscale are expected to be of shape [height, width, 1], but you simply have [height, width]. You can fix this using tf.expand_dims:
model = Sequential([
tf.keras.layers.Lambda(
lambda x: tf.image.grayscale_to_rgb(tf.expand_dims(x, -1))
),
vgg,
])
Note that this solution solves the problem in the graph, so it runs online. Meaning, at runtime, you can feed data exactly the same way you have it now (in the shape [128, 128], without a channels dimension) and it will still functionally work. If this is your expected dimensionality during runtime, this will be faster than manipulating your data before throwing it into the model.
By the way, none of this is ideal, given that VGG was trained specifically to work best with color images. Just thought I should add that.
Why are you getting overfitting?
Maybe for different reasons:
Your images and labels don't equally exist in the train, Val, test. (maybe you have images in train and don't have them in test.) Or your train, Val, test data don't stratify correctly and you train your model on a specific area in your data and features.
You Dataset is very small and you need more data.
Maybe you have noise in your datase, first make sure to remove noise from the dataset. (if you have noise, model fit on your noise.)
How can you input grayscale images to VGG16?
For Using VGG16, you need to input 3 channels images. For this reason, you need to concatenate your images like below to get three channels images from grayscale:
image = tf.concat([image, image, image], -1)
Example of training VGG16 on grayscale images from fashion_mnist dataset:
from tensorflow.keras.applications.vgg16 import VGG16
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
train, val, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train[:85%]', 'train[85%:]', 'test']
)
def resize_preprocess(image, label):
image = tf.image.resize(image, (32, 32))
image = tf.concat([image, image, image], -1)
image = tf.keras.applications.densenet.preprocess_input(image)
return image, label
train = train.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
test = test.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
val = val.map(resize_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
val = val.batch(64).prefetch(tf.data.AUTOTUNE)
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32,32,3))
base_model.trainable = False ## Not trainable weights
model = tf.keras.Sequential()
model.add(base_model)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
model.summary()
fit_callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_accuracy', patience = 4, restore_best_weights = True)]
history = model.fit(train, steps_per_epoch=150, epochs=5, batch_size=64, validation_data=val, callbacks=fit_callbacks)
model.evaluate(test)
Output:
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 1, 1, 512) 14714688
flatten_3 (Flatten) (None, 512) 0
dense_9 (Dense) (None, 1024) 525312
dropout_6 (Dropout) (None, 1024) 0
dense_10 (Dense) (None, 256) 262400
dropout_7 (Dropout) (None, 256) 0
dense_11 (Dense) (None, 10) 2570
=================================================================
Total params: 15,504,970
Trainable params: 790,282
Non-trainable params: 14,714,688
_________________________________________________________________
Epoch 1/5
150/150 [==============================] - 6s 35ms/step - loss: 0.8056 - accuracy: 0.7217 - val_loss: 0.5433 - val_accuracy: 0.7967
Epoch 2/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5560 - accuracy: 0.7965 - val_loss: 0.4772 - val_accuracy: 0.8224
Epoch 3/5
150/150 [==============================] - 4s 26ms/step - loss: 0.5287 - accuracy: 0.8080 - val_loss: 0.4698 - val_accuracy: 0.8234
Epoch 4/5
150/150 [==============================] - 5s 32ms/step - loss: 0.5012 - accuracy: 0.8149 - val_loss: 0.4334 - val_accuracy: 0.8329
Epoch 5/5
150/150 [==============================] - 4s 25ms/step - loss: 0.4791 - accuracy: 0.8315 - val_loss: 0.4312 - val_accuracy: 0.8398
157/157 [==============================] - 2s 15ms/step - loss: 0.4457 - accuracy: 0.8325
[0.44566288590431213, 0.8324999809265137]
I'm trying to achieve binary classification using MobileNetV2 in TensorFlow. I have two folders A and B and I'm using image_dataset_from_directory function to make them into two classes for training.
BATCH_SIZE = 32
IMG_SIZE = (224, 224)
train_directory = "Train_set/"
test_directory = "Test_set/"
train_dataset = image_dataset_from_directory(train_directory, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE)
validation_dataset = image_dataset_from_directory(test_directory, shuffle=True, batch_size=BATCH_SIZE, image_size=IMG_SIZE)
I'm preprocessing the input before passing it to the net.
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input```
Then I'm creating the model using the code:
def alpaca_model(image_shape=IMG_SIZE):
input_shape = image_shape + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
include_top=False, # <== Important!!!!
weights='imagenet') # From imageNet
# Freeze the base model by making it non trainable
base_model.trainable = False
# create the input layer (Same as the imageNetv2 input size)
inputs = tf.keras.Input(shape=input_shape)
# data preprocessing using the same weights the model was trained on
x = preprocess_input(inputs)
# set training to False to avoid keeping track of statistics in the batch norm layer
x = base_model(x, training=False)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
x = tf.keras.layers.GlobalAveragePooling2D()(x)
#include dropout with probability of 0.2 to avoid overfitting
x = tf.keras.layers.Dropout(0.2)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
prediction_layer = tf.keras.layers.Dense(1, activation="sigmoid")
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
The model summary looks something like this
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 224, 224, 3)] 0
tf.math.truediv_1 (TFOpLamb (None, 224, 224, 3) 0
da)
tf.math.subtract_1 (TFOpLam (None, 224, 224, 3) 0
bda)
mobilenetv2_1.00_224 (Funct (None, 7, 7, 1280) 2257984
ional)
global_average_pooling2d_1 (None, 1280) 0
(GlobalAveragePooling2D)
dropout_1 (Dropout) (None, 1280) 0
dense_1 (Dense) (None, 1) 1281
=================================================================
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_________________________________________________________________
Then the model is compiled using the following:
loss_function=tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
metrics=['accuracy', tf.metrics.Recall(), tf.metrics.Precision()]
These are the stats of model.fit and model.evaluate
total_epochs = 5
history_fine = model2.fit(train_dataset, epochs=total_epochs, validation_data=validation_dataset)
Epoch 1/5
54/54 [==============================] - 213s 3s/step - loss: 0.2236 - accuracy: 0.9013 - recall: 0.9149 - precision: 0.8852 - val_loss: 0.0856 - val_accuracy: 0.9887 - val_recall: 0.9950 - val_precision: 0.9803
Epoch 2/5
54/54 [==============================] - 217s 4s/step - loss: 0.0614 - accuracy: 0.9855 - recall: 0.9928 - precision: 0.9776 - val_loss: 0.0439 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
Epoch 3/5
54/54 [==============================] - 216s 4s/step - loss: 0.0316 - accuracy: 0.9948 - recall: 0.9988 - precision: 0.9905 - val_loss: 0.0297 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
Epoch 4/5
54/54 [==============================] - 217s 4s/step - loss: 0.0258 - accuracy: 0.9954 - recall: 1.0000 - precision: 0.9905 - val_loss: 0.0373 - val_accuracy: 0.9910 - val_recall: 0.9850 - val_precision: 0.9949
Epoch 5/5
54/54 [==============================] - 220s 4s/step - loss: 0.0242 - accuracy: 0.9942 - recall: 0.9988 - precision: 0.9893 - val_loss: 0.0225 - val_accuracy: 0.9977 - val_recall: 1.0000 - val_precision: 0.9950
model2.evaluate(validation_dataset)
14/14 [==============================] - 15s 354ms/step - loss: 0.0225 - accuracy: 0.9977 - recall: 1.0000 - precision: 0.9950
The stats are really good. But when I use the same validation set and check the prediction for individual pics from both folders A and B and plot the predictions, the points don't seem to be linearly seperable.
A = []
for i in os.listdir("Test_set\A"):
location = f"Test_set\A\{i}"
my_image = tf.keras.preprocessing.image.load_img(location, target_size=(224, 224))
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
#preprocess the image
my_image = tf.keras.preprocessing.image.img_to_array(my_image)
my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1],
my_image.shape[2]))
my_image = preprocess_input(my_image)
#make the prediction
prediction = model2.predict(my_image)
# print(prediction)
A.append(float(prediction))
B = []
for i in os.listdir("Test_set\B"):
location = f"Test_set\B\{i}"
my_image = tf.keras.preprocessing.image.load_img(location, target_size=(224, 224))
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
#preprocess the image
my_image = tf.keras.preprocessing.image.img_to_array(my_image)
my_image = my_image.reshape((1, my_image.shape[0], my_image.shape[1],
my_image.shape[2]))
my_image = preprocess_input(my_image)
#make the prediction
prediction = model2.predict(my_image)
# print(prediction)
B.append(float(prediction))
Since you have two classes you should replace
prediction_layer = tf.keras.layers.Dense(1, activation="sigmoid")
with
prediction_layer = tf.keras.layers.Dense(2, activation="softmax")
The number of units in the classifier's final layer is equal to the number of classes.
After this, you must re-train the model.
I'm currently working on the CIFAR-10 Dataset which is an image classification problem with 10 classes.
I have started to develop with Tensorflow 2 a Linear Classification without the LinearClassifier Object.
X shape corresponds to 10 000 images of 32*32 pixels RBG = (10000, 3072)
Y_one_hot is a one hot vector = (10000, 10)
model creation code:
model = tf.keras.Sequential()
model.add(Dense(1, activation="linear", input_dim=32*32*3))
model.add(Dense(10, activation="softmax", input_dim=1))
model.compile(optimizer="adam", loss="mean_squared_error", metrics=["accuracy"])
training code:
model.fit(X, Y_one_hot, batch_size=10000, verbose=1, epochs=100)
predict code:
img = X[0].reshape(1, 3072) # Select image 0
res = np.argmax((model.predict(img))) # select the max in output
Problem:
res value is always the same. It seems my model is not learning.
Model.summary
Summary displays :
dense (Dense) (None, 1) 3073
dense_1 (Dense) (None, 10) 20
Total params: 3,093
Trainable params: 3,093
Non-trainable params: 0
Accuracy & loss:
Epoch 1/100
10000/10000 [==============================] - 2s 184us/sample - loss: 0.0949 - accuracy: 0.1005
Epoch 50/100
10000/10000 [==============================] - 0s 10us/sample - loss: 0.0901 - accuracy: 0.1000
Epoch 100/100
10000/10000 [==============================] - 0s 8us/sample - loss: 0.0901 - accuracy: 0.1027
Do you have any idea why my model is always prediciting the same value ?
Thanks,
One remarks:
The loss you used loss="mean_squared_error"is not meant for classification. Is meant for regression. Two very different problems. Try a cross entropy. For example
`model.compile(optimizer=AdamOpt,
loss='categorical_crossentropy', metrics=['accuracy'])`
You can find an example here: https://github.com/michelucci/oreilly-london-ai/blob/master/day1/Beginner%20friendly%20networks/First_Example_of_a_CNN_(CIFAR10).ipynb. Is a note book I used for a training I gave. The network is CNN but you can change it with yours.
Try that...
Best of luck, Umberto
I'm trying to train an autoencoder in Keras for signal processing but I'm somehow failing.
My inputs are segments of 128 frames length for 6 measures (acceleration_x/y/z, gyro_x/y/z), so the overall shape of my dataset is (22836, 128, 6) where 22836 is the sample size.
This is the sample code I'm using for the autoencoder:
X_train, X_test, Y_train, Y_test = load_dataset()
# reshape the input, whose size is (22836, 128, 6)
X_train = X_train.reshape(X_train.shape[0], np.prod(X_train.shape[1:]))
X_test = X_test.reshape(X_test.shape[0], np.prod(X_test.shape[1:]))
# now the shape will be (22836, 768)
### MODEL ###
input_shape = [X_train.shape[1]]
X_input = Input(input_shape)
x = Dense(1000, activation='sigmoid', name='enc0')(X_input)
encoded = Dense(350, activation='sigmoid', name='enc1')(x)
x = Dense(1000, activation='sigmoid', name='dec0')(encoded)
decoded = Dense(input_shape[0], activation='sigmoid', name='dec1')(x)
model = Model(inputs=X_input, outputs=decoded, name='autoencoder')
model.compile(optimizer='rmsprop', loss='mean_squared_error')
print(model.summary())
The output of model.summary() is
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_55 (InputLayer) (None, 768) 0
_________________________________________________________________
enc0 (Dense) (None, 1000) 769000
_________________________________________________________________
enc1 (Dense) (None, 350) 350350
_________________________________________________________________
dec1 (Dense) (None, 1000) 351000
_________________________________________________________________
dec0 (Dense) (None, 768) 768768
=================================================================
Total params: 2,239,118
Trainable params: 2,239,118
Non-trainable params: 0
The training is done via
# train the model
history = model.fit(x = X_train, y = X_train,
epochs=5,
batch_size=32,
validation_data=(X_test, X_test))
where I'm simply trying to learn the identity function which yields:
Train on 22836 samples, validate on 5709 samples
Epoch 1/5
22836/22836 [==============================] - 27s 1ms/step - loss: 0.9481 - val_loss: 0.8862
Epoch 2/5
22836/22836 [==============================] - 24s 1ms/step - loss: 0.8669 - val_loss: 0.8358
Epoch 3/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8337 - val_loss: 0.8146
Epoch 4/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8164 - val_loss: 0.7960
Epoch 5/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8004 - val_loss: 0.7819
At this point, to try to understand how well it performed, I check the plot of some true inputs vs the predicted ones:
prediction = model.predict(X_test)
for i in np.random.randint(0, 100, 7):
pred = prediction[i, :].reshape(128,6)
# getting only values for acceleration_x
pred = pred[:, 0]
true = X_test[i, :].reshape(128,6)
# getting only values for acceleration_x
true = true[:, 0]
# plot original and reconstructed
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(20, 6))
ax1.plot(true, color='green')
ax2.plot(pred, color='red')
and these are some of the plots which appear to be completely wrong:
Do you have any suggestion on what's wrong, aside from the small number of epochs (which actually do not seem to make any difference)?
Your data is not in the range [0,1] so why do you use sigmoid as the activation function in the last layer? Remove the activation function from the last layer (and it might be better to use relu in the previous layers).
Also normalize the training data. You can use feature-wise normalization:
X_mean = X_train.mean(axis=0)
X_train -= X_mean
X_std = X_train.std(axis=0)
X_train /= X_std + 1e-8
And don't forget to use the computed statistics (X_mean and X_std) in inference time (i.e. testing) to normalize test data.