Keras - unable to reduce loss between epochs - python

I am training a VGG-like convnet (like in the example http://keras.io/examples/) with a set of images. I convert images to arrays and resize them using scipy:
mapper = [] # list of photo ids
data = np.empty((NB_FILES, 3, 100, 100)).astype('float32')
i = 0
for f in onlyfiles[:NB_FILES]:
img = load_img(mypath + f)
a = img_to_array(img)
a_resize = np.empty((3, 100, 100))
a_resize[0,:,:] = sp.misc.imresize(a[0,:,:], (100,100)) / 255.0 # - 0.5
a_resize[1,:,:] = sp.misc.imresize(a[1,:,:], (100,100)) / 255.0 # - 0.5
a_resize[2,:,:] = sp.misc.imresize(a[2,:,:], (100,100)) / 255.0 # - 0.5
photo_id = int(f.split('.')[0])
mapper.append(photo_id)
data[i, :, :, :] = a_resize; i += 1
In the last dense layer I have 2 neurons and I activate with softmax. Here are the last lines:
model.add(Dense(2))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(data, target_matrix, batch_size=32, nb_epoch=2, verbose=1, show_accuracy=True, validation_split=0.2)
I am not able to improve reduce the loss and every epoch has the same loss and the same precision as the one before. The loss actually goes up between 1st and 2nd epoch:
Train on 1600 samples, validate on 400 samples
Epoch 1/5
1600/1600 [==============================] - 23s - loss: 3.4371 - acc: 0.7744 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 2/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 3/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 4/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 5/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
What am I doing wrong?

From my experience, this often happens when the learning rate is too high.
The optimization will be unable to find a minima and just "turn around".
The ideal rate will depend on your data and on the architecture of your network.
(As a reference, I'm at the moment running a convnet with 8 layers, on a sample size similar to yours, and the same lack of convergence could be observed until I reduced the learning rate to 0.001)

My suggestions would be to reduce the learning rate, try data augmentation.
Data augmentation code:
print('Using real-time data augmentation.')
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=True, # apply ZCA whitening
rotation_range=90, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)
# fit the model on the batches generated by datagen.flow()
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch)

Related

Why isn't my CNN model for a Binary Classification not learning?

I have been given 10000 images of shape (100,100), representing detection of particles, I have then created 10000 empty images of shape (100,100) and mixed them together. I have given each respective type labels of 0 and 1, seen in the code here:
Labels = np.append(np.ones(10000),np.zeros(empty_sheets.shape[0]))
images_scale1 = np.zeros(s) #scaling each image so that it has a maximum number of 1
#scaling each image so that it has a maximum number of 1
l = s[0]
for i in range(l):
images_scale1[i] = images[i]/np.amax(images[i])
empty_sheets_noise1 = add_noise(empty_sheets,0)
scale1noise1 = np.concatenate((images_scale1,empty_sheets_noise1),axis=0)
y11 = Labels
scale1noise1s, y11s = shuffle(scale1noise1, y11)
scale1noise1s_train, scale1noise1s_test, y11s_train, y11s_test = train_test_split(
scale1noise1s, y11, test_size=0.25)
#reshaping image arrays so that they can be passed through CNN
scale1noise1s_train = scale1noise1s_train.reshape(scale1noise1s_train.shape[0],100,100,1)
scale1noise1s_test = scale1noise1s_test.reshape(scale1noise1s_test.shape[0],100,100,1)
y11s_train = y11s_train.reshape(y11s_train.shape[0],1)
y11s_test = y11s_test.reshape(y11s_test.shape[0],1)
Then to set up my model I create a new function:
def create_model():
#initiates new model
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(64, (3,3),activation='relu',input_shape=(100,100,1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(32))
model.add(keras.layers.Dense(64))
model.add(keras.layers.Dense(1,activation='sigmoid'))
return model
estimators1m1 = create_model()
estimators1m1.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()], loss='binary_crossentropy')
history = estimators1m1.fit(scale1noise1s_train, y11s_train, epochs=3,
validation_data=(scale1noise1s_test, y11s_test))
which produces the following:
Epoch 1/3 469/469 [==============================] - 62s 131ms/step -
loss: 0.6939 - accuracy: 0.4917 - precision_2: 0.4905 - recall_2:
0.4456 - val_loss: 0.6933 - val_accuracy: 0.5012 - val_precision_2: 0.5012 - val_recall_2: 1.0000 Epoch 2/3 469/469 [==============================] - 63s 134ms/step - loss: 0.6889 -
accuracy: 0.5227 - precision_2: 0.5209 - recall_2: 0.5564 - val_loss:
0.6976 - val_accuracy: 0.4994 - val_precision_2: 0.5014 - val_recall_2: 0.2191 Epoch 3/3 469/469
[==============================] - 59s 127ms/step - loss: 0.6527 -
accuracy: 0.5783 - precision_2: 0.5764 - recall_2: 0.5887 - val_loss:
0.7298 - val_accuracy: 0.5000 - val_precision_2: 0.5028 - val_recall_2: 0.2131
I have tried more epochs and I still only manage to get 50% accuracy which is useless as its just predicting the same things constantly.
There can be many reasons why your model is not working. One that seems more likely is that the model is under-fitting as both accuracy on training set and validation set is low meaning that the neural network is unable to capture the pattern in the data. Hence you should consider building little more complex model by adding more layer at the same time avoiding over-fitting with techniques like dropout. You should also get best parameters by doing hyperparameter tuning.

tensorflow sample sizes for epochs

I have a dataset of 50000 items: reviews & sentiment (positive or negative)
I distributed 90% to the training set and the rest to the testing set.
My question is, if I run 5 epochs on the training set that I have, shouldn't each epoch load 9000 instead of 1407?
# to divide train & test sets
test_sample_size = int(0.1*len(preprocessed_reviews)) # 10% of data as the validation set
# for sentiment
sentiment = [1 if x=='positive' else 0 for x in sentiment]
# separate data to train & test sets
X_test, X_train = (np.array(preprocessed_reviews[:test_sample_size]),
np.array(preprocessed_reviews[test_sample_size:])
)
y_test, y_train = (np.array(sentiment[:test_sample_size]),
np.array(sentiment[test_sample_size:])
)
tokenizer = Tokenizer(oov_token='<OOV>') # for the unknown words
tokenizer.fit_on_texts(X_train)
vocab_count = len(tokenizer.word_index) + 1 # +1 is for padding
training_sequences = tokenizer.texts_to_sequences(X_train) # tokenizer.word_index to see indexes
training_padded = pad_sequences(training_sequences, padding='post') # pad sequences with 0s
training_normal = preprocessing.normalize(training_padded) # normalize data
testing_sequences = tokenizer.texts_to_sequences(X_test)
testing_padded = pad_sequences(testing_sequences, padding='post')
testing_normal = preprocessing.normalize(testing_padded)
input_length = len(training_normal[0]) # length of all sequences
# build a model
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=vocab_count, output_dim=2,input_length=input_length))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(63, activation='relu')) # hidden layer
model.add(keras.layers.Dense(16, activation='relu')) # hidden layer
model.add(keras.layers.Dense(1, activation='sigmoid')) # output layer
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(training_normal, y_train, epochs=5)
Output:
Epoch 1/5
1407/1407 [==============================] - 9s 7ms/step - loss: 0.6932 - accuracy: 0.4992
Epoch 2/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5030
Epoch 3/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.4987
Epoch 4/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5024
Epoch 5/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5020
Sorry I'm quite new to tensorflow, I hope someone could help out!
So if you have around 50,000 datapoints, distributed with 90/10 ratio (train/test), that means that ~45,000 will be the training data, and remaining 5000 will be for testing.
When you call a fit method, Keras has the default parameter for batch_size set to 32(you can always change that to 64, 128..)
So the number 1407 tells you that the model needs to do 1407 feedforward and backpropagation steps, before one full epoch is completed(because 1407 * 32 ~ 45,000).

Why is my model overfitting on the second epoch?

I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.
Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)
It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.

Keras neural network to predict change in angle of a particle is not predicting correctly

I have put together a keras regresssion model to predict the change in angle of a single particle when supplied with data about that particle. To aquire the data, I created a program that models brownian motion between n particles. As well as random angular noise, depending on how close together the particles are they will induce a change in each others angle.
It is not too important how my code works, but essentially it outputs an array containing the x,y coordinates of all particles relative to the single particle, the value of theta of all particles, and the distance between all particles and the single particle. All of these parameters are found at each time step. Each 'image' I use to train the network is all these parameters at some point in time. So overall, the input variable is x,y,angle,distance, and the output variable is the change in theta of the target particle
For my neural network I first normalised all my data to be between -1 and 1, and then reshaped it to be fed into the NN:
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
## NORMALIZE IMAGES ##########################################################
# all images and labels imported, so obviously wont run without data. This is
# designed for running data with m iterations, n particles, 4 parameters
# (size of test data array is [m,n,4]).
L = 5
# length of 'box' that houses particles
n = 10
# number of particles
train_images[:,:,0:2] = train_images[:,:,0:2]/L
# normalise [x,y] from -L:L to -1:1.
train_images[:,:,2:3] = train_images[:,:,2:3]/(2*np.pi)
# normalise theta value from -2pi:2pi to -1:1
train_images[:,:,3:4] = (train_images[:,:,3:4]/(L*np.sqrt(2))*2)-1
# normalise distance value from 0:sqrt(2)L to -1:1
test_images[:,:,0:2] = test_images[:,:,0:2]/L
test_images[:,:,2:3] = test_images[:,:,2:3]/(2*np.pi)
test_images[:,:,3:4] = (test_images[:,:,3:4]/(L*np.sqrt(2))*2)-1
## FLATTEN IMAGES ############################################################
train_images = train_images.reshape((-1, 4*(n-1)))
# reshape so each input is a single dimension
# 4*(n-1) due to 4 parameters, adn n-1 particles (since one is redundant info)
test_images = test_images.reshape((-1, 4*(n-1)))
## BUILDING THE MODEL ########################################################
model = Sequential([
Dense(64, activation='tanh', input_shape=(4*(n-1),)),
Dense(16, activation='tanh'),
Dropout(0.25),
Dense(1, activation='tanh'),
])
## COMPILING THE MODEL #######################################################
model.compile(
optimizer='adam',
loss='mean_squared_error',
#metrics=['mean_squared_error'],
)
## TRAINING THE MODEL ########################################################
history = model.fit(
train_images, # training data
train_labels, # training targets
epochs=10,
batch_size=32,
#validation_data=(test_images, test_labels),
shuffle=True,
validation_split=0.2,
)
I have used a variety of activation types for the different layers (relu, sigmoid, tanh...), but none seem to give me the correct results. The true values of my data (the change in angle of the particle) are values ranging from about 0.02 to -0.02, but the values I am getting are much smaller, and tend to be predominantly one sign (pos/neg).
I am currently using the loss function 'mean absolute error', as I am looking to minimise the difference between the real and predicted value. I notice when doing this, that after only one epoch the loss is already incredibly tiny:
Epoch 1/10
12495/12495 [==============================] - 13s 1ms/step - loss: 0.0010 - val_loss: 3.3794e-05
Epoch 2/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4491e-05 - val_loss: 3.3769e-05
Epoch 3/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4391e-05 - val_loss: 3.3883e-05
Epoch 4/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4251e-05 - val_loss: 3.4755e-05
Epoch 5/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4183e-05 - val_loss: 3.4273e-05
Epoch 6/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4175e-05 - val_loss: 3.3770e-05
Epoch 7/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4160e-05 - val_loss: 3.3646e-05
Epoch 8/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4131e-05 - val_loss: 3.3629e-05
Epoch 9/10
12495/12495 [==============================] - 14s 1ms/step - loss: 3.4145e-05 - val_loss: 3.3581e-05
Epoch 10/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4148e-05 - val_loss: 3.4647e-05
Here is an example of the results I get from this:
Prediction: 4.8542774e-05
Actual: 0.006994473448353978
Is there anything obviously wrong I have done to get these results? Sorry if I have not provided enough information.
It is a regression problem,last layer does not have activation. Decrease the number of unit frim 32 to 16 in 1 st layer as this will prevent overfiting

Validation Accuracy stuck at .5073

I am trying to create a regression model but my validation accuracy stays at .5073. I am trying to train on images and have the network find the position of an object and the rough area it covers. I increased the unfrozen layers and the plateau for accuracy dropped to .4927. I would appreciate any help finding out what I am doing wrong.
base = MobileNet(weights='imagenet', include_top=False, input_shape=(200,200,3), dropout=.3)
location = base.output
location = GlobalAveragePooling2D()(location)
location = Dense(16, activation='relu', name="locdense1")(location)
location = Dense(32, activation='relu', name="locdense2")(location)
location = Dense(64, activation='relu', name="locdense3")(location)
finallocation = Dense(3, activation='sigmoid', name="finalLocation")(location)
model = Model(inputs=base_model.input,outputs=finallocation)#[types, finallocation])
for layer in model.layers[:91]: #freeze up to 87
if ('loc' or 'Loc') in layer.name:
layer.trainable=True
else: layer.trainable=False
optimizer = Adam(learning_rate=.001)
model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['accuracy'])
history = model.fit(get_batches(type='Train'), validation_data=get_batches(type='Validation'), validation_steps=500, steps_per_epoch=1000, epochs=10)
Data is generated from a tfrecord file which has image data and some labels. This is the last bit of that generator.
IMG_SIZE = 200
def format_position(image, positionx, positiony, width):
image = tf.cast(image, tf.float32)
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
labels = tf.stack([positionx, positiony, width])
return image, labels
Get batches:
dataset is loaded from two directories with tfrecord files, one for training, and other for validation
def get_batches(type):
dataset = load_dataset(type=type)
if type == 'Train':
databatch = dataset.repeat()
databatch = dataset.batch(32)
databatch = databatch.prefetch(2)
return databatch
```positionx positiony width``` are all normalized from 0-1 (relative position with respect to the image.
Here is an example output:
Epoch 1/10
1000/1000 [==============================] - 233s 233ms/step - loss: 0.0267 - accuracy: 0.5833 - val_loss: 0.0330 - val_accuracy: 0.5073
Epoch 2/10
1000/1000 [==============================] - 283s 283ms/step - loss: 0.0248 - accuracy: 0.6168 - val_loss: 0.0337 - val_accuracy: 0.5073
Epoch 3/10
1000/1000 [==============================] - 221s 221ms/step - loss: 0.0238 - accuracy: 0.6309 - val_loss: 0.0312 - val_accuracy: 0.5073
The final activation function in your model should not be sigmoid since it will output numbers between 0 and 1 and I am assuming your labels (i.e., positionx, positiony, and width are not in this range). You could replace it with either 'linear' or 'relu'.
You're doing regression, and your loss function is 'mean_squared_error'. You cannot use accuracy as the metric function. You should use 'mae' (mean absolute error) or 'mse' to check the difference between your predictions and actual target values.

Categories