Keras LSTM Multi input and output model not update after each epoch - python

I'm very new to keras and I was following this doc to produce a multi-input and multi-output model. However, after each epoch, the results remain the same. Could someone point me out where I got stuck in?
My code is something like
main_input = Input(shape = (maxlen, ), name="main_input")
x = Embedding(94, 64)(main_input) # dic length = 94
lstm_out0 = LSTM(256, activation="relu", dropout=0.1,
recurrent_dropout=0.2, return_sequences=True)(x)
lstm_out = LSTM(256, activation="relu", dropout=0.1, recurrent_dropout=0.2)(lstm_out0)
auxiliary_input = Input(shape=(maxlen,), dtype="int32", name='aux_input')
aux_embed = Embedding(94, 64)(auxiliary_input)
aux_lstm_out = LSTM(256, activation="relu", dropout=0.2, recurrent_dropout=0.2)(aux_embed)
auxiliary_output = Dense(10, activation="softmax", name="aux_output")(lstm_out)
x = keras.layers.concatenate([aux_lstm_out, lstm_out])
x = Dense(64, activation='relu')(x)
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='rmsprop', loss={'main_output': 'binary_crossentropy', 'aux_output': 'categorical_crossentropy'},metrics=['accuracy'])
model.fit([X_train, X_aux_train], [train_label, aux_train_label],
validation_data=[[X_dev, X_aux_dev], [dev_label,aux_dev_label]],
epochs=10, batch_size=batch_size)
The main input is a sequence of chars while the main output is a binary value. The aux input is also a sequence of chars while the aux output is a categorical label.
The output is something like
Train on 200000 samples, validate on 20000 samples
Epoch 1/10
200000/200000 [==============================] - 892s - loss: 7.3824 - main_output_loss: 5.8560 - aux_output_loss: 1.5264 - main_output_acc: 0.5186 - aux_output_acc: 0.5371 - val_loss: 9.5776 - val_main_output_loss: 8.0590 - val_aux_output_loss: 1.5186 - val_main_output_acc: 0.5000 - val_aux_output_acc: 0.5362
Epoch 2/10
200000/200000 [==============================] - 894s - loss: 9.5818 - main_output_loss: 8.0586 - aux_output_loss: 1.5233 - main_output_acc: 0.5000 - aux_output_acc: 0.5372 - val_loss: 9.5771 - val_main_output_loss: 8.0590 - val_aux_output_loss: 1.5181 - val_main_output_acc: 0.5000 - val_aux_output_acc: 0.5362
I ran > 5 epochs and the results are almost all the same. The input data is prepared through features: sequence.pad_sequences label: to_categorical(for multiclass)

Related

Multi-task learning with ANN?

I am trying to implement a simple multi-task learning with the following network:
y_train_target1 = Y_train.iloc[:, 0]
y_test_target1 = Y_test.iloc[:, 0]
y_train_target2 = Y_train.iloc[:, 1]
y_test_target2 = Y_test.iloc[:, 1]
input_dim_train=X_train.shape[1]
#shape of X_train is: (30000,126)
inputs = Input(shape=X_train.shape[1], name='main_input')
main_model = Sequential()
main_model.add(Dense(200, input_dim=input_dim_train, activation='relu'))
main_model.add(Dense(50, input_dim=input_dim_train, activation='relu'))
main_model.add(BatchNormalization())
main_model.add(Dropout(0.4))
main_model.add(Dense(1, activation='softmax'))
model_target1 = Sequential()
model_target2 = Sequential()
model_target1.add(main_model)
model_target2.add(main_model)
model_target1.add(Dense(1, activation='softmax', name='target1_output'))
model_target2.add(Dense(1, activation='softmax', name='target2_output'))
model_share = Model(inputs = inputs,outputs = [model_target1, model_target2])
model_share.summary()
But I face the following error, when I run Model(inputs = inputs,outputs = [model_target1, model_target2]) line:
ValueError: Output tensors of a Functional model must be the output of a TensorFlow `Layer` (thus holding past layer metadata). Found: <keras.engine.sequential.Sequential object at 0x00000214980FE5B0>
Any idea to handle this issue?
If the objective is to share the layers between the two outputs then you can write your code as shown below, see also this answer.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout, BatchNormalization
from tensorflow.keras.models import Model
np.random.seed(0)
tf.random.set_seed(0)
# generate the features
x = np.random.normal(0, 1, (100, 10))
# generate the targets
y1 = np.mean(x, axis=1)
y2 = np.median(x, axis=1)
# define the shared layers
d1 = Dense(200, activation='relu')
d2 = Dense(50, activation='relu')
d3 = BatchNormalization()
d4 = Dropout(0.4)
# create a function for applying the shared layers
def nn(x, name):
y = d1(x)
y = d2(y)
y = d3(y)
y = d4(y)
y = Dense(1, name=name)(y)
return y
# create the model
inputs = Input(shape=x.shape[1], name='common_input')
output1 = nn(inputs, name='target_1')
output2 = nn(inputs, name='target_2')
model = Model(inputs=inputs, outputs=[output1, output2])
model.compile(optimizer='adam', loss='mse')
# train the model
model.fit(x, [y1, y2], epochs=5)
# Epoch 1/5
# 4/4 [==============================] - 1s 1ms/step - loss: 5.3587 - target_1_loss: 2.7805 - target_2_loss: 2.5782
# Epoch 2/5
# 4/4 [==============================] - 0s 1ms/step - loss: 3.8924 - target_1_loss: 1.8996 - target_2_loss: 1.9927
# Epoch 3/5
# 4/4 [==============================] - 0s 970us/step - loss: 2.8755 - target_1_loss: 1.4582 - target_2_loss: 1.4173
# Epoch 4/5
# 4/4 [==============================] - 0s 943us/step - loss: 2.6111 - target_1_loss: 1.2023 - target_2_loss: 1.4088
# Epoch 5/5
# 4/4 [==============================] - 0s 910us/step - loss: 2.6412 - target_1_loss: 1.1902 - target_2_loss: 1.4510
# generate the model predictions
y1_pred, y2_pred = model.predict(x)
print(y1_pred)
# [[0.3716803 ]
# [0.22038066]
# [0.2840684 ]
# [0.09253158]
# [0.21785215]
# ...
print(y2_pred)
# [[ 0.17823327]
# [ 0.10360342]
# [ 0.12475234]
# [-0.04125798]
# [-0.25730723]
# ...

Using CNN to get value from dartboard where dart landed

I want to use CNN in python to get values from dartboard (or the value of the field where dart landed) using pictures.
I took 208 photos of dartboard, in each dart is in specific location. I want to predict if the dart in next image is in specific field (208 pictures represent 4 classes/52 each) (single, double and triple from same field represent same number or in our case, same class.
sample dart in a field
Then i use similar picture to test model.
When I try to fit model I get something like this
208/208 [==============================] - 3s 15ms/sample - loss: 0.0010 - accuracy: 1.0000 - val_loss: 8.1726 - val_accuracy: 0.2500
Epoch 29/100
208/208 [==============================] - 3s 15ms/sample - loss: 9.8222e-04 - accuracy: 1.0000 - val_loss: 8.6713 - val_accuracy: 0.2500
Epoch 30/100
208/208 [==============================] - 3s 15ms/sample - loss: 8.5902e-04 - accuracy: 1.0000 - val_loss: 9.2214 - val_accuracy: 0.2500
Epoch 31/100
208/208 [==============================] - 3s 15ms/sample - loss: 7.9463e-04 - accuracy: 1.0000 - val_loss: 9.6584 - val_accuracy: 0.2500
As the accuracy hits 1 the val_accuracy stays the same, some previous model got me a little better result, but it was little better than this.
As I am new in the field I need some advice to get my model or whole program better.
Here is my current model_
model = Sequential()
model.add(Conv2D(32, kernel_size=3, activation='relu', input_shape=(640, 480, 3)))
model.add(MaxPooling2D(2, 2))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=3, activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(128, kernel_size=3, activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(256, kernel_size=3, activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
history = model.fit(X, y, batch_size=16, epochs=100, validation_data=(Xtest,ytest))
AND MY SAMPLE PROGRAM
training_data = []
DATADIR = 'C:/PikadaNew'
dir = sorted(os.listdir(DATADIR), key=len)
def create_training_data():
for category in dir: # do dogs and cats
path = os.path.join(DATADIR,category)
class_num = dir.index(category)
for img in tqdm(os.listdir(path)):
try:
img_array = cv2.imread(os.path.join(path,img))
training_data.append([img_array, class_num])
except Exception as e:
pass
create_training_data()
DATATESTDIR = 'C:/PikadaNewTest'
dir1 = sorted(os.listdir(DATATESTDIR), key=len)
test_data = []
def create_test_data():
for category in dir1:
path = os.path.join(DATATESTDIR,category)
class_num = dir1.index(category)
for img in tqdm(os.listdir(path)):
try:
img_array = cv2.imread(os.path.join(path,img)) # convert to array
test_data.append([img_array, class_num])
except Exception as e:
pass
create_test_data()
#print(len(training_data))
#print(len(test_data))
X = []
y = []
Xtest = []
ytest = []
for features,label in training_data:
X.append(features)
y.append(label)
for features,label in test_data:
Xtest.append(features)
ytest.append(label)
X = np.array(X).reshape(-1, 640, 480, 3)
Xtest= np.array(Xtest).reshape(-1, 640, 480, 3)
y = np.array(y)
ytest = np.array(ytest)
y = to_categorical(y)
ytest = to_categorical(ytest)
X = X/255.0
Xtest = Xtest/255.0
X,y = shuffle(X,y)
Xtest,ytest = shuffle(Xtest,ytest)
Thanks and sorry for mistakes, i hope its understandable what i wanna to achieve
Every advice is much appreciated
Samo
You are facing an overfitting problem because your data are so small and the model in more complex than needed. you can try the following:
Add more data if you can.
Try to simplify the model by removing some layers.
Add dropout to the model and use regularizes.
Use smaller number of epochs.

CNN Acc and Loss constant with custom image data

I am trying to train a simple CNN model for a binary classification task in Keras with a dataset of images I mined. The problem is that I am getting constant accuracy, val_accuracy and loss after a couple of epochs. Am I processing the data the wrong way? Or is it something in the model settings?
At the beginning I was using softmax as the final activation function and categorical cossentropy, I was also using the to_categorical function on the labels.
After reading up on what usually causes this to happen I decided to use sigmoid and binary_crossentropy instead and not use to_categorical. Still the problem persists and I am starting to wonder whether it's my data the problem (the two classes are too similar) or the way I am feeding the image arrays.
conkeras1 = []
pics = os.listdir("/Matrices/")
# I do this for the images of both classes, just keeping it short.
for x in range(len(pics)):
img = image.load_img("Matrices/"+pics[x])
conkeras1.append(img)
conkeras = conkeras1+conkeras2
conkeras = np.array([image.img_to_array(x) for x in conkeras]).astype("float32")
conkeras = conkeras / 255 # I also tried normalizing with a z-score with no success
yecs1 = [1]*len(conkeras1)
yecs2 = [0]*len(conkeras2)
y_train = yecs1+yecs2
y_train = np.array(y_train).astype("float32")
model = Sequential([
Conv2D(64, (3, 3), input_shape=conkeras.shape[1:], padding="same", activation="relu"),
Conv2D(32, (3, 3), activation="relu", padding="same"),
Flatten(),
Dense(500, activation="relu"),
#Dense(4096, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adam(lr=0.001),
metrics=['accuracy'])
history = model.fit(conkeras, y_train,
batch_size=32,
epochs=32, shuffle=True,
verbose=1,
callbacks=[tensorboard])
The output I get is this:
975/975 [==============================] - 107s 110ms/step - loss: 8.0022 - acc: 0.4800
Epoch 2/32
975/975 [==============================] - 99s 101ms/step - loss: 8.1756 - acc: 0.4872
Epoch 3/32
975/975 [==============================] - 97s 100ms/step - loss: 8.1756 - acc: 0.4872
Epoch 4/32
975/975 [==============================] - 97s 99ms/step - loss: 8.1756 - acc: 0.4872
and these are the shapes of the traning set and labels
>>> conkeras.shape
(975, 100, 100, 3)
>>> y_train.shape
(975,)

Deep learning: Training set tends to be good and Validation set is bad

I am facing to a problem for which I have difficulties to understand why I have such behaviour.
I am trying to use a pre-trained resnet 50 (keras) model for a binary image classification, I also built a simple cnn. I have about 8k balanced RGB images of size 200x200 and I divided this set into three sub-sets (train 70%, validation 15%, test 15%).
I built a generator to feed data to my models based on keras.utils.Sequence.
The problem that I have is my models tends to learn on the training set but on validation set I have poor results on pre-trained resnet50 and on simple cnn.
I tried several things to solve this problem but Not improvement at all.
With and without Data augmentation on training set (rotation)
Images are normalised between [0,1]
With and without Regularizers
Variation of the learning rate
This is an example of results obtained:
Epoch 1/200
716/716 [==============================] - 320s 447ms/step - loss: 8.6096 - acc: 0.4728 - val_loss: 8.6140 - val_acc: 0.5335
Epoch 00001: val_loss improved from inf to 8.61396, saving model to ../models_saved/resnet_adam_best.h5
Epoch 2/200
716/716 [==============================] - 287s 401ms/step - loss: 8.1217 - acc: 0.5906 - val_loss: 10.9314 - val_acc: 0.4632
Epoch 00002: val_loss did not improve from 8.61396
Epoch 3/200
716/716 [==============================] - 249s 348ms/step - loss: 7.5357 - acc: 0.6695 - val_loss: 11.1432 - val_acc: 0.4657
Epoch 00003: val_loss did not improve from 8.61396
Epoch 4/200
716/716 [==============================] - 284s 397ms/step - loss: 7.5092 - acc: 0.6828 - val_loss: 10.0665 - val_acc: 0.5351
Epoch 00004: val_loss did not improve from 8.61396
Epoch 5/200
716/716 [==============================] - 261s 365ms/step - loss: 7.0679 - acc: 0.7102 - val_loss: 4.2205 - val_acc: 0.5351
Epoch 00005: val_loss improved from 8.61396 to 4.22050, saving model to ../models_saved/resnet_adam_best.h5
Epoch 6/200
716/716 [==============================] - 285s 398ms/step - loss: 6.9945 - acc: 0.7161 - val_loss: 10.2276 - val_acc: 0.5335
....
This is classes used to load data into my models.
class DataGenerator(keras.utils.Sequence):
def __init__(self, inputs,
labels, img_size,
input_shape,
batch_size, num_classes,
validation=False):
self.inputs = inputs
self.labels = labels
self.img_size = img_size
self.input_shape = input_shape
self.batch_size = batch_size
self.num_classes = num_classes
self.validation = validation
self.indexes = np.arange(len(self.inputs))
self.inc = 0
def __getitem__(self, index):
"""Generate one batch of data
Parameters
----------
index :the index from which batch will be taken
Returns
-------
out : a tuple that contains (inputs and labels associated)
"""
batch_inputs = np.zeros((self.batch_size, *self.input_shape))
batch_labels = np.zeros((self.batch_size, self.num_classes))
# Generate data
for i in range(self.batch_size):
# choose random index in features
if self.validation:
index = self.indexes[self.inc]
self.inc += 1
if self.inc == len(self.inputs):
self.inc = 0
else:
index = random.randint(0, len(self.inputs) - 1)
batch_inputs[i] = self.rgb_processing(self.inputs[index])
batch_labels[i] = to_categorical(self.labels[index], num_classes=self.num_classes)
return batch_inputs, batch_labels
def __len__(self):
"""Denotes the number of batches per epoch
Returns
-------
out : number of batches per epochs
"""
return int(np.floor(len(self.inputs) / self.batch_size))
def rgb_processing(self, path):
img = load_img(path)
rgb = img.get_rgb_array()
if not self.validation:
if random.choice([True, False]):
rgb = random_rotation(rgb)
return rgb/np.max(rgb)
class Models:
def __init__(self, input_shape, classes):
self.input_shape = input_shape
self.classes = classes
pass
def simpleCNN(self, optimizer):
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=self.input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(self.classes), activation='softmax'))
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
def resnet50(self, optimizer):
model = keras.applications.resnet50.ResNet50(include_top=False,
input_shape=self.input_shape,
weights='imagenet')
model.summary()
model.layers.pop()
model.summary()
for layer in model.layers:
layer.trainable = False
output = Flatten()(model.output)
#I also tried to add dropout layers here with batch normalization but it does not change results
output = Dense(len(self.classes), activation='softmax')(output)
finetuned_model = Model(inputs=model.input,
outputs=output)
finetuned_model.compile(optimizer=optimizer,
loss=keras.losses.binary_crossentropy,
metrics=['accuracy'])
return finetuned_model
This is how these functions are called:
train_batches = DataGenerator(inputs=train.X.values,
labels=train.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES))
validate_batches = DataGenerator(inputs=validate.X.values,
labels=validate.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES),
validation=True)
if model_name == "cnn":
model = models.simpleCNN(optimizer=Adam(lr=0.0001))
elif model_name == "resnet":
model = models.resnet50(optimizer=Adam(lr=0.0001))
early_stopping = EarlyStopping(patience=15)
checkpointer = ModelCheckpoint(output_name + '_best.h5', verbose=1, save_best_only=True)
history = model.fit_generator(train_batches, steps_per_epoch=num_train_steps, epochs=epochs,
callbacks=[early_stopping, checkpointer], validation_data=validate_batches,
validation_steps=num_valid_steps)
I finally found the principal element that causes this over-fitting. Since I use a pre-trained model. I was set layers as non-trainable. Thus I tried to put them as trainable and It seems that it solves the problem.
for layer in model.layers:
layer.trainable = False
My hypothesis is that my images are too far away from data used to train the model.
I also added some dropouts and batch normalization at the end of the resnet model.

Negative Loss function when using cross_val_score in sklearn/keras. Works when not using k fold

I am trying to implement a model which takes in an array of 167 categorical variables (0 or 1), and outputs an estimated value between 0 and 1. Over 300 datapoints are available.
The model works when using a basic model, below:
classifier = Sequential()
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu', input_dim = 167))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 200)
y_pred = classifier.predict(X_test)
Output is similar to:
Epoch 105/200
253/253 [==============================] - 0s - loss: 0.5582 - acc: 0.0079
Epoch 106/200
253/253 [==============================] - 0s - loss: 0.5583 - acc: 0.0079
Unfortunately, when I try to use cross validation, model stops working, and loss function becomes large and negative. Code is below:
def build_classifier():
classifier = Sequential()
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu', input_dim = 167))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 100)
accuracies = cross_val_score(estimator = classifier, X=X_train, y=y_train, cv=3,n_jobs=1)
Output looks like:
Epoch 59/100
168/168 [==============================] - 0s - loss: -1106.9519 - acc: 0.0060
Epoch 60/100
168/168 [==============================] - 0s - loss: -1106.9519 - acc: 0.0060
I have toyed with different parameters, but I cannot seem to find what is causing the issue. Still learning, so any help is very appreciated.
It can happen if the data is sparse. A lot NaNs and Infs can cause this problem. If you are doing a 3 fold validation, it may be possible that in one of the folds, the data selected doesn't contain enough info. Possible solutions can be:
Change the random seed.
Increase the data set.

Categories