keras BLSTM for sequence labeling

keras BLSTM for sequence labeling - python

I'm relatively new to neural nets so please excuse my ignorance. I'm trying to adapt the keras BLSTM example here. The example reads in texts and classifies them as 0 or 1. I want a BLSTM that does something very much like POS tagging, though extras like lemmatizing or other advanced features are not neccessary, I just want a basic model. My data is a list of sentences and each word is given a category 1-8. I want to train a BLSTM that can use this data to predict the category for each word in an unseen sentence.
e.g. input = ['The', 'dog', 'is', 'red'] gives output = [2, 4, 3, 7]
If the keras example is not the best route, I'm open to other suggestions.
I currently have this:
'''Train a Bidirectional LSTM.'''
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers import Dense, Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
np.random.seed(1337) # for reproducibility
max_features = 20000
batch_size = 16
maxlen = 18
print('Loading data...')
(X_train, y_train), (X_test, y_test) = prep_scan(nb_words=max_features,
test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print("Pad sequences (samples x time)")
# type issues here? float/int?
X_train = sequence.pad_sequences(X_train, value=0.)
X_test = sequence.pad_sequences(X_test, value=0.) # pad with zeros
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
# need to pad y too, because more than 1 ouput value, not classification?
y_train = sequence.pad_sequences(np.array(y_train), value=0.)
y_test = sequence.pad_sequences(np.array(y_test), value=0.)
print('y_train shape:', X_train.shape)
print('y_test shape:', X_test.shape)
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors of size 128
embedded = Embedding(max_features, 128, input_length=maxlen)(sequence)
# apply forwards LSTM
forwards = LSTM(64)(embedded)
# apply backwards LSTM
backwards = LSTM(64, go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.5)(merged)
# number after dense has to corresponse to output matrix?
output = Dense(17, activation='sigmoid')(after_dp)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=4,
validation_data=[X_test, y_test])
X_test_new = np.array([[0,0,0,0,0,0,0,0,0,12,3,55,4,34,5,45,3,9],[0,0,0,0,0,0,0,1,7,65,34,67,34,23,24,67,54,43,]])
classes = model.predict(X_test_new, batch_size=16)
print(classes)
My output is the right dimension, but is giving me floats 0-1. I think this is because it's still looking for binary classfication. Anyone know how to fix this?
SOLVED
Just make sure the labels are each binary arrays:
(X_train, y_train), (X_test, y_test), maxlen, word_ids, tags_ids = prep_model(
nb_words=nb_words, test_len=75)
W = (y_train > 0).astype('float')
print(len(X_train), 'train sequences')
print(int(len(X_train)*val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors of size 256
embedded = Embedding(nb_words, output_dim=hidden,
input_length=maxlen, mask_zero=True)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.15)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
Dense(output_dim=nb_classes,
activation='softmax'))(after_dp)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='adam',
sample_weight_mode='temporal')
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=epochs,
shuffle=True,
validation_split=val_split,
sample_weight=W)

Solved. The main issue was reshaping the data for the classification categories as binary arrays. Also used TimeDistributed and set return_sequences to True.

I knows that this thread is very old but i hope will i can help.
I modified the model for a binary model:
sequence = Input(shape=(X_train.shape[1],), dtype='int32')
embedded = Embedding(max_fatures,embed_dim,input_length=X_train.shape[1], mask_zero=True)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = concatenate([forwards, backwards])
after_dp = Dropout(0.15)(merged)
# add now layer LSTM without return_sequence
lstm_normal = LSTM(hidden)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
#output = TimeDistributed(Dense(output_dim=2,activation='sigmoid'))(after_dp)
#I changed output layer TimeDistributed for a Dense, for the problem of dimensionality and output_dim = 1 (output binary)
output = Dense(output_dim=1,activation='sigmoid')(lstm_normal)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
# I changed modelo compile by to binary and remove sample_weight_mode parameter
model.compile(loss='binary_crossentropy',
metrics=['accuracy'], optimizer='adam',
)
print(model.summary())
###################################
#this is the line of training
model.fit(X_train, Y_train,
batch_size=128,
epochs=10,
shuffle=True,
validation_split=0.2,
#sample_weight=W
)
#In this moment work fine.....
Train on 536000 samples, validate on 134000 samples
Epoch 1/10
536000/536000 [==============================] - 1814s 3ms/step - loss: 0.4794 - acc: 0.7679 - val_loss: 0.4624 - val_acc: 0.7784
Epoch 2/10
536000/536000 [==============================] - 1829s 3ms/step - loss: 0.4502 - acc: 0.7857 - val_loss: 0.4551 - val_acc: 0.7837
Epoch 3/10
99584/536000 [====>.........................] - ETA: 23:10 - loss: 0.4291 - acc: 0.7980

Related

How do I prevent loss and accuracy from restarting every time I start a new training session with Keras?

I am learning how to build an image classifier by creating a CNN in Keras. How do I prevent my program from starting over every time I start a new training session. Code is below including some commented out code of things that I have tried. Thanks!
I have tried to create checkpoints but the accuracy and loss still seems to reset. I have also tried to model.save() and model.save_weights() and then load them into a new model but my accuracy and loss still seems to start from the beginning.
IMPORTS
# import CIFAR10 data
from keras.datasets import cifar10
# import keras utils
import keras.utils as utils
# import Sequential model
from keras.models import Sequential
# import layers
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
# normalizes values in kernal
from keras.constraints import maxnorm
# import compiler optimizers
from keras.optimizers import SGD
# import keras checkpoint
from keras.callbacks import ModelCheckpoint
# import h5py
import h5py
END IMPORTS
# load cifar10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# format trains and tests to float32 and divide by 255.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# change y_train and y_test to utils categorical
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
# create labels array
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
SEQUENTIAL MODEL 1
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu', padding='same',
kernel_constraint=maxnorm(3)))
#### add second convolution layer - MaxPooling2d ####
# decreases image size from 32x32 to 16x16
# pool_size: finds max value in each 2x2 section of input
model.add(MaxPooling2D(pool_size=(2, 2)))
#### flatten features ####
# converts matrix to a 1 dimensional array
model.add(Flatten())
#### add third convolution layer - first Dense and feed into it ####
# creates prediction network
# units: 512 neurons for first layer
# activation: relu for accuracy
# kernal_constraint: maxnorm
model.add(Dense(units=512, activation='relu', kernel_constraint=maxnorm(3)))
#### add fourth convolution later - Dropout - kills some neurons - prevents overfitting - TRAINING ONLY ####
# improves reliability
# rate: 0.5 means kill half the neurons
# only to be used while training
model.add(Dropout(rate=0.5))
#### add fifth convolution layer - Second Dense layer - Creates 10 outputs because we have 10 categories ####
# produces output for each of the 10 categories
# units: 10 categories = 10 output units
# activation = 'softmax' because we are calculating the probabilities of each of the 10 categories (floats)
model.add(Dense(units=10, activation='softmax'))
############################## END SEQUENTIAL MODEL ##########################
############################## COMPILER ######################################
model.compile(optimizer=SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])
################################ END COMPILER ################################
################################ SAVE DATA ###################################
# saves the training data
model.save(filepath='model.h5')
# create model checkpoint based on best accuracy
#filepath = 'model.h5'
#checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', save_best_only='True',
#save_weights_only='False', mode='max', period=1)
#callbacks_list = [checkpoint]
# save weights
model.save_weights('model_weights.h5')
############################### END SAVE DATA ################################
model.fit(x=x_train, y=y_train, validation_split=0.1, epochs=20, batch_size=32, shuffle='True')

The issue in your case is you are saving the model before training it. You must first fit the model which does the training and then save the model. ALso attaching code with the change
from keras.datasets import cifar10
# import keras utils
import keras.utils as utils
# import Sequential model
from keras.models import Sequential
# import layers
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
# normalizes values in kernal
from keras.constraints import maxnorm
# import compiler optimizers
from keras.optimizers import SGD
# import keras checkpoint
from keras.callbacks import ModelCheckpoint
# import h5py
import h5py
# load cifar10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# format trains and tests to float32 and divide by 255.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# change y_train and y_test to utils categorical
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
# create labels array
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu', padding='same',
kernel_constraint=maxnorm(3)))
#### add second convolution layer - MaxPooling2d ####
# decreases image size from 32x32 to 16x16
# pool_size: finds max value in each 2x2 section of input
model.add(MaxPooling2D(pool_size=(2, 2)))
#### flatten features ####
# converts matrix to a 1 dimensional array
model.add(Flatten())
#### add third convolution layer - first Dense and feed into it ####
# creates prediction network
# units: 512 neurons for first layer
# activation: relu for accuracy
# kernal_constraint: maxnorm
model.add(Dense(units=512, activation='relu', kernel_constraint=maxnorm(3)))
#### add fourth convolution later - Dropout - kills some neurons - prevents overfitting - TRAINING ONLY ####
# improves reliability
# rate: 0.5 means kill half the neurons
# only to be used while training
model.add(Dropout(rate=0.5))
#### add fifth convolution layer - Second Dense layer - Creates 10 outputs because we have 10 categories ####
# produces output for each of the 10 categories
# units: 10 categories = 10 output units
# activation = 'softmax' because we are calculating the probabilities of each of the 10 categories (floats)
model.add(Dense(units=10, activation='softmax'))
############################## END SEQUENTIAL MODEL ##########################
############################## COMPILER ######################################
model.compile(optimizer=SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=x_train, y=y_train, validation_split=0.1, epochs=20, batch_size=32, shuffle='True')
################################ END COMPILER ################################
################################ SAVE DATA ###################################
# saves the training data
model.save(filepath='model.h5')
# create model checkpoint based on best accuracy
#filepath = 'model.h5'
#checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', save_best_only='True',
#save_weights_only='False', mode='max', period=1)
#callbacks_list = [checkpoint]
# save weights
model.save_weights('model_weights.h5')
############################### END SAVE DATA ################################
Let me know if this works

You should save the model after training it and load the model using keras.models.load_model.
See the following snippet.
# import CIFAR10 data
from keras.datasets import cifar10
# import keras utils
import keras.utils as utils
# import Sequential model
from keras.models import Sequential
# import layers
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
# normalizes values in kernal
from keras.constraints import maxnorm
# import compiler optimizers
from keras.optimizers import SGD
# import keras checkpoint
from keras.callbacks import ModelCheckpoint
# import h5py
import h5py
from keras.models import load_model
# load cifar10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# format trains and tests to float32 and divide by 255.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# change y_train and y_test to utils categorical
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
# create labels array
labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=(32, 32, 3), activation='relu', padding='same',
kernel_constraint=maxnorm(3)))
#### add second convolution layer - MaxPooling2d ####
# decreases image size from 32x32 to 16x16
# pool_size: finds max value in each 2x2 section of input
model.add(MaxPooling2D(pool_size=(2, 2)))
#### flatten features ####
# converts matrix to a 1 dimensional array
model.add(Flatten())
#### add third convolution layer - first Dense and feed into it ####
# creates prediction network
# units: 512 neurons for first layer
# activation: relu for accuracy
# kernal_constraint: maxnorm
model.add(Dense(units=512, activation='relu', kernel_constraint=maxnorm(3)))
#### add fourth convolution later - Dropout - kills some neurons - prevents overfitting - TRAINING ONLY ####
# improves reliability
# rate: 0.5 means kill half the neurons
# only to be used while training
model.add(Dropout(rate=0.5))
#### add fifth convolution layer - Second Dense layer - Creates 10 outputs because we have 10 categories ####
# produces output for each of the 10 categories
# units: 10 categories = 10 output units
# activation = 'softmax' because we are calculating the probabilities of each of the 10 categories (floats)
model.add(Dense(units=10, activation='softmax'))
############################## END SEQUENTIAL MODEL ##########################
############################## COMPILER ######################################
model.compile(optimizer=SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])
################################ END COMPILER ################################
################################ SAVE DATA ###################################
model = load_model('model.h5')
# create model checkpoint based on best accuracy
#filepath = 'model.h5'
#checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', save_best_only='True',
#save_weights_only='False', mode='max', period=1)
#callbacks_list = [checkpoint]
# # save weights
# model.save_weights('model_weights.h5')
############################### END SAVE DATA ################################
model.fit(x=x_train, y=y_train, validation_split=0.1, epochs=1, batch_size=32, shuffle='True')
# saves the training data
model.save(filepath='model.h5')
Once you load the saved model and retrain, the loss and accuracy start from the previous stopped values.
44800/45000 [============================>.] - ETA: 0s - loss: 1.9399
- acc: 0.3044832/45000 [============================>.] - ETA: 0s - loss: 1.9398 - acc: 0.3044864/45000 [============================>.] -
ETA: 0s - loss: 1.9397 - acc: 0.3044896/45000
[============================>.] - ETA: 0s - loss: 1.9396 - acc:
0.3044928/45000 [============================>.] - ETA: 0s - loss: 1.9397 - acc: 0.3044960/45000 [============================>.] - ETA: 0s - loss: 1.9397 - acc: 0.3044992/45000
[============================>.] - ETA: 0s - loss: 1.9395 - acc:
0.3045000/45000 [==============================] - 82s 2ms/step - loss: 1.9395 - acc: 0.3030 - val_loss: 1.7316 - val_acc: 0.3852
In the next run,
Epoch 1/1 32/45000 [..............................] - ETA: 3:13 -
loss: 1.7473 - acc: 0. 64/45000 [..............................] -
ETA: 2:15 - loss: 1.7321 - acc: 0. 96/45000
[..............................] - ETA: 1:58 - loss: 1.6830 - acc: 0.
128/45000 [..............................] - ETA: 1:48 - loss: 1.6729
- acc: 0. 160/45000 [..............................] - ETA: 1:41 - loss: 1.6876 - acc: 0.
However, note that compiling the model is not necessary when you load the model from the file.

As far as I understand, your problem is to continue training if you closed your training session.
Here are some useful sources for your problems.
From towardsdatascience
From SO
From pyimagesearch
Generally speaking, you can check for the sources with, checkpoints and resuming training with keras.
Hope it resolves.

ResNet50 Model is not learning with transfer learning in keras

I am trying to perform transfer learning on ResNet50 model pretrained on Imagenet weights for PASCAL VOC 2012 dataset. As it is a multi label dataset, I am using sigmoid activation function in the final layer and binary_crossentropy loss. The metrics are precision,recall and accuracy. Below is the code I used to build the model for 20 classes (PASCAL VOC has 20 classes).
img_height,img_width = 128,128
num_classes = 20
#If imagenet weights are being loaded,
#input must have a static square shape (one of (128, 128), (160, 160), (192, 192), or (224, 224))
base_model = applications.resnet50.ResNet50(weights= 'imagenet', include_top=False, input_shape= (img_height,img_width,3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dropout(0.7)(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
for layer in model.layers[-2:]:
layer.trainable=True
for layer in model.layers[:-3]:
layer.trainable=False
adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
#print(model.summary())
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y, random_state=42, test_size=0.2)
savingcheckpoint = ModelCheckpoint('ResnetTL.h5',monitor='val_loss',verbose=1,save_best_only=True,mode='min')
earlystopcheckpoint = EarlyStopping(monitor='val_loss',patience=10,verbose=1,mode='min',restore_best_weights=True)
model.fit(X_train, Y_train, epochs=epochs, validation_data=(X_test,Y_test), batch_size=batch_size,callbacks=[savingcheckpoint,earlystopcheckpoint],shuffle=True)
model.save_weights('ResnetTLweights.h5')
It ran for 35 epochs until earlystopping and the metrics are as follows (without Dropout layer):
loss: 0.1195 - accuracy: 0.9551 - precision_m: 0.8200 - recall_m: 0.5420 - val_loss: 0.3535 - val_accuracy: 0.8358 - val_precision_m: 0.0583 - val_recall_m: 0.0757
Even with Dropout layer, I don't see much difference.
loss: 0.1584 - accuracy: 0.9428 - precision_m: 0.7212 - recall_m: 0.4333 - val_loss: 0.3508 - val_accuracy: 0.8783 - val_precision_m: 0.0595 - val_recall_m: 0.0403
With dropout, for a few epochs, the model is reaching to a validation precision and accuracy of 0.2 but not above that.
I see that precision and recall of validation set is pretty low compared to training set with and without dropout layer. How should I interpret this? Does this mean the model is overfitting. If so, what should I do? As of now the model predictions are quite random (totally incorrect). The dataset size is 11000 images.

Please can you modify code as below and try to execute
From:
predictions = Dense(num_classes, activation= 'sigmoid')(x)
To:
predictions = Dense(num_classes, activation= 'softmax')(x)
From:
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
To:
model.compile(optimizer= adam, loss='categorical_crossentropy', metrics=['accuracy',precision_m,recall_m])

This question is pretty old, but I'll answer it in case it is helpful to someone else:
In this example, you froze all layers except by the last two (Global Average Pooling and the last Dense one). There is a cleaner way to achieve the same state:
rn50 = applications.resnet50.ResNet50(weights='imagenet', include_top=False,
input_shape=(img_height, img_width, 3))
x = rn50.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
rn50.trainable = False # <- this
model.compile(...)
In this case, features are being extracted from the ResNet50 network and fed to a linear softmax classifier, but the ResNet50's weights are not being trained. This is called feature extraction, not fine-tuning.
The only weights being trained are from your classifier, which was instantiated with weights drawn from a random distribution, and thus should be entirely trained. You should be using Adam with its default learning rate:
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
So you can train it for a few epochs, and, once it's done, then you unfreeze the backbone and "fine-tune" it:
backbone.trainable = False
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
model.fit(epochs=50)
backbone.trainable = True
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.00001))
model.fit(epochs=60, initial_epoch=50)
There is a nice article about this on Keras website: https://keras.io/guides/transfer_learning/

Text classification with LSTM Network and Keras 0.0% accuracy

I have csv file with two columns:
category, description
1030 categories in the file and only about 12,600 lines
I need to get a model for text classification, trained on this data. I use keras with LSTM model.
I found an article describing how to make a binary classification, and slightly modified it to use several categories.
My code:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM
from numpy import array
from keras.preprocessing.text import one_hot
from sklearn.preprocessing import LabelEncoder
from keras.preprocessing import sequence
import keras
df = pd.read_csv('/tmp/input_data.csv')
#one hot encode your documents
# integer encode the documents
vocab_size = 2000
encoded_docs = [one_hot(d, vocab_size) for d in df['description']]
def load_data_from_arrays(strings, labels, train_test_split=0.9):
data_size = len(strings)
test_size = int(data_size - round(data_size * train_test_split))
print("Test size: {}".format(test_size))
print("\nTraining set:")
x_train = strings[test_size:]
print("\t - x_train: {}".format(len(x_train)))
y_train = labels[test_size:]
print("\t - y_train: {}".format(len(y_train)))
print("\nTesting set:")
x_test = strings[:test_size]
print("\t - x_test: {}".format(len(x_test)))
y_test = labels[:test_size]
print("\t - y_test: {}".format(len(y_test)))
return x_train, y_train, x_test, y_test
encoder = LabelEncoder()
categories = encoder.fit_transform(df['category'])
num_classes = np.max(categories) + 1
print('Categories count: {}'.format(num_classes))
#Categories count: 1030
X_train, y_train, x_test, y_test = load_data_from_arrays(encoded_docs, categories, train_test_split=0.8)
# Truncate and pad the review sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
x_test = sequence.pad_sequences(x_test, maxlen=max_review_length)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)
# Build the model
embedding_vector_length = 32
top_words = 10000
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])
print(model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_8 (Embedding) (None, 500, 32) 320000
_________________________________________________________________
lstm_8 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_8 (Dense) (None, 1030) 104030
=================================================================
Total params: 477,230
Trainable params: 477,230
Non-trainable params: 0
_________________________________________________________________
None
#Train the model
model.fit(X_train, y_train, validation_data=(x_test, y_test), epochs=5, batch_size=64)
Train on 10118 samples, validate on 2530 samples
Epoch 1/5
10118/10118 [==============================] - 60s 6ms/step - loss: 6.5086 - acc: 0.0019 - val_loss: 10.0911 - val_acc: 0.0000e+00
Epoch 2/5
10118/10118 [==============================] - 63s 6ms/step - loss: 6.3281 - acc: 0.0028 - val_loss: 10.8270 - val_acc: 0.0000e+00
Epoch 3/5
10118/10118 [==============================] - 63s 6ms/step - loss: 6.3120 - acc: 0.0024 - val_loss: 11.0078 - val_acc: 0.0000e+00
Epoch 4/5
10118/10118 [==============================] - 64s 6ms/step - loss: 6.2891 - acc: 0.0030 - val_loss: 11.8264 - val_acc: 0.0000e+00
Epoch 5/5
10118/10118 [==============================] - 69s 7ms/step - loss: 6.2559 - acc: 0.0032 - val_loss: 12.1625 - val_acc: 0.0000e+00
#Evaluate the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
Accuracy: 0.00%
What mistake did I make when preparing the data?
why accuracy is always 0?

I have curated end-to-end code with some inputs from my end and tested working on this data, you can use the same with your data with no or minimal changes as I have removed specifics and made it generic. Also at the end, I have highlighted what points I have worked on top of the code you provided above.
Code
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense
from nltk.tokenize import word_tokenize
def load_data_from_arrays(strings, labels, train_test_split=0.9):
data_size = len(strings)
test_size = int(data_size - round(data_size * train_test_split))
print("Test size: {}".format(test_size))
print("\nTraining set:")
x_train = strings[test_size:]
print("\t - x_train: {}".format(len(x_train)))
y_train = labels[test_size:]
print("\t - y_train: {}".format(len(y_train)))
print("\nTesting set:")
x_test = strings[:test_size]
print("\t - x_test: {}".format(len(x_test)))
y_test = labels[:test_size]
print("\t - y_test: {}".format(len(y_test)))
return x_train, y_train, x_test, y_test
# estimating the vocab length with the help of nltk
def get_vocab_length(strings):
vocab = []
for sent in strings:
words = word_tokenize(sent)
vocab.extend(words)
vocab = list(set(vocab))
vocab_length = len(vocab)
return vocab_length
def clean_text(sent):
# <your cleaning code here>
# clean func 1
# clean func 2
# ...
# clean func n
return sent
# load input data
df = pd.read_csv('/tmp/input_data.csv')
strings = df['description'].values
labels = df['category'].values
clean_strings = [clean_text(sent) for sent in strings]
vocab_length = get_vocab_length(clean_strings)
# create onehot encodings of strings
encoded_docs = [one_hot(sent, vocab_length) for sent in strings]
# create onehot encodings of labels
ohe = OneHotEncoder()
categories = ohe.fit_transform(labels.reshape(-1,1)).toarray()
# split data
X_train, y_train, X_test, y_test = load_data_from_arrays(encoded_docs, categories, train_test_split=0.8)
# assuming max input to be not more than 512 words
max_input_len = 512
# padding data
X_train = pad_sequences(X_train, maxlen=max_input_len, padding= 'post')
X_test = pad_sequences(X_test, maxlen=max_input_len, padding= 'post')
# setting embedding vector length
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(vocab_length, embedding_vector_length, input_length=max_input_len, name= 'embedding') )
model.add(Flatten())
model.add(Dense(5, activation= 'softmax'))
model.compile('adam', loss= 'categorical_crossentropy', metrics= ['accuracy'])
model.summary()
# training the model
model.fit(X_train, y_train, epochs= 10, batch_size= 128, validation_split= 0.2, verbose= 1)
# evaluating the model
score = model.evaluate(X_test, y_test, verbose=0)
print("Test Loss:", score[0])
print("Test Acc:", score[1])
Additional areas I have worked on
1. Text Cleaning
Created a function to clean the text. It is extremely important as it will remove unnecessary noise from the data and also note this step will totally depend on the type of data you have. To help you simplify, I have created a clean_text function in the above code where you can place your cleaning code. It should be used in such a way that it takes in raw text and provides clean text. Some of the libraries you may like to look into are re, string, and emoji.
2. Estimating Vocab Size
If you have enough data, it is good to estimate the vocab size rather than putting some number directly while passing it to Keras one_hot function. I have created a basic get_vocab_length function using nltk word_tokenize. You can use the same or enhance it further as per your data.
What Else?
You can work further on hyperparameter tuning and a few different neural network designs.
Final Words
It still may not work as it totally depends on the data quality and amount of data you have. There is a good chance you may not get results after trying everything if you have poor quality data or a very less amount of data.
I would then suggest you try transfer learning on some pre-trained models like BERT, RoBERTa, etc. HuggingFace provides good support for state-of-art pre-trained models, you can get started at the following links -
https://huggingface.co/docs/transformers/index#supported-models
https://towardsdatascience.com/text-classification-with-hugging-face-transformers-in-tensorflow-2-without-tears-ee50e4f3e7ed
https://towardsdatascience.com/an-introduction-to-transformers-and-hugging-face-13052ec9d72d

I guess that your vocab_size is way too low. If you are dealing with usual text, try 10.000 - 100.000 as a starting point.
What one_hot does is to use the hashing trick. That means all of your words are hashed and projected into an 2000 vector space. It does not only mean that your dict is 2000 words long, it does mean every word will be projected to into this space, which effectively causes a lot of collisions, where words have the same index and are considered as equal in the LSTM.
Furthermore you should take a look at the transformed text, just too get an understanding of what happens here. To do so, build an reverse lookup and transform all the indices back.
As a further improvement it is feasible to preprocess the text with common techniques like stemming, normalizing etc. and the usage of a vocabulary or discard bag of words and use word embeddings.

from keras.preprocessing.text import one_hot, Tokenizer, hashing_trick
text1 = 'I love you'
text2 = 'you love I'
print('one_hot: ')
print(one_hot(text1, n=20))
print(one_hot(text2, n=20))
print('--------------------------------------')
print('Tokenizer: ')
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text1, text2])
print(tokenizer.word_index)
print(tokenizer.index_word)
print('--------------------------------------')
print('hashing_trick: ')
print(hashing_trick(text1, n=20))
print(hashing_trick(text2, n=20))
print('--------------------------------------')
out:
one_hot:
[14, 7, 14]
[14, 7, 14]
--------------------------------------
Tokenizer:
{'i': 1, 'love': 2, 'you': 3}
{1: 'i', 2: 'love', 3: 'you'}
--------------------------------------
hashing_trick:
[14, 7, 14]
[14, 7, 14]
--------------------------------------
Run more times and you will find that the results of one_hot and hashing_trick are not unique.
You should use Tokenizer to convert text.

How to use keras embedding layer with 3D tensor input?

I am facing difficulty in using Keras embedding layer with one hot encoding of my input data.
Following is the toy code.
Import packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np
import openpyxl
import pandas as pd
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau
The input data is text based as follows.
Train and Test data
X_train_orignal= np.array(['OC(=O)C1=C(Cl)C=CC=C1Cl', 'OC(=O)C1=C(Cl)C=C(Cl)C=C1Cl',
'OC(=O)C1=CC=CC(=C1Cl)Cl', 'OC(=O)C1=CC(=CC=C1Cl)Cl',
'OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O'])
X_test_orignal=np.array(['OC(=O)C1=CC=C(Cl)C=C1Cl', 'CCOC(N)=O',
'OC1=C(Cl)C(=C(Cl)C=C1Cl)Cl'])
Y_train=np.array(([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]]))
Y_test=np.array([[2.20],
[2.81],
[2.00]])
Creating dictionaries
Now i create two dictionaries, characters to index vice. The unique character number is stored in len(charset) and maximum length of the string along with 5 additional characters is stored in embed. The start of each string will be padded with ! and end will be E.
charset = set("".join(list(X_train_orignal))+"!E")
char_to_int = dict((c,i) for i,c in enumerate(charset))
int_to_char = dict((i,c) for i,c in enumerate(charset))
embed = max([len(smile) for smile in X_train_orignal]) + 5
print (str(charset))
print(len(charset), embed)
One hot encoding
I convert all the train data into one hot encoding as follows.
def vectorize(smiles):
one_hot = np.zeros((smiles.shape[0], embed , len(charset)),dtype=np.int8)
for i,smile in enumerate(smiles):
#encode the startchar
one_hot[i,0,char_to_int["!"]] = 1
#encode the rest of the chars
for j,c in enumerate(smile):
one_hot[i,j+1,char_to_int[c]] = 1
#Encode endchar
one_hot[i,len(smile)+1:,char_to_int["E"]] = 1
return one_hot[:,0:-1,:]
X_train = vectorize(X_train_orignal)
print(X_train.shape)
X_test = vectorize(X_test_orignal)
print(X_test.shape)
When it converts the input train data into one hot encoding, the shape of the one hot encoded data becomes (5, 44, 14) for train and (3, 44, 14) for test. For train, there are 5 example, 0-44 is the maximum length and 14 are the unique characters. The examples for which there are less number of characters, are padded with E till the maximum length.
Verifying the correct padding
Following is the code to verify if we have done the padding rightly.
mol_str_train=[]
mol_str_test=[]
for x in range(5):
mol_str_train.append("".join([int_to_char[idx] for idx in np.argmax(X_train[x,:,:], axis=1)]))
for x in range(3):
mol_str_test.append("".join([int_to_char[idx] for idx in np.argmax(X_test[x,:,:], axis=1)]))
and let's see, how the train set looks like.
mol_str_train
['!OC(=O)C1=C(Cl)C=CC=C1ClEEEEEEEEEEEEEEEEEEEE',
'!OC(=O)C1=C(Cl)C=C(Cl)C=C1ClEEEEEEEEEEEEEEEE',
'!OC(=O)C1=CC=CC(=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
'!OC(=O)C1=CC(=CC=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
'!OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=OEEE']
Now is the time to build model.
Model
model = Sequential()
model.add(Embedding(len(charset), 10, input_length=embed))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=X_train, y=Y_train,
batch_size=1,
epochs=10,
validation_data=(X_test,Y_test),
callbacks=callbacks_list)
Error
ValueError: Error when checking input: expected embedding_3_input to have 2 dimensions, but got array with shape (5, 44, 14)
The embedding layer expects two dimensional array. How can I deal with this issue so that it can accept the one hot vector encoded data.
All the above code can be run.

The Keras embedding layer works with indices, not directly with one-hot encodings.
So you don't need to have (5,44,14), just (5,44) works fine.
E.g. get indices with argmax:
X_test = np.argmax(X_test, axis=2)
X_train = np.argmax(X_train, axis=2)
Although it's probably better to not one-hot encode it first =)
Besides that, your 'embed' variable says size 45, while your data is size 44.
If you change those, your model runs fine:
model = Sequential()
model.add(Embedding(len(charset), 10, input_length=44))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=np.argmax(X_train, axis=2), y=Y_train,
batch_size=1,
epochs=10,
validation_data=(np.argmax(X_test, axis=2),Y_test),
callbacks=callbacks_list)

our input shape was not defined properly in the embedding layer. The following code works for me by reducing the steps to covert your data dimensions to 2D you can directly pass the 3-D input to your embedding layer.
#THE MISSING STUFF
#_________________________________________
Y_train = Y_train.reshape(5) #Dense layer contains a single unit so need to input single dimension array
max_len = len(charset)
max_features = embed-1
inputshape = (max_features, max_len) #input shape didn't define. Embedding layer can accept 3D input by using input_shape
#__________________________________________
model = Sequential()
#model.add(Embedding(len(charset), 10, input_length=14))
model.add(Embedding(max_features, 10, input_shape=inputshape))#input_length=max_len))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
print(model.summary())
optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])
callbacks_list = [
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]
history =model.fit(x=X_train, y=Y_train,
batch_size=10,
epochs=10,
validation_data=(X_test,Y_test),
callbacks=callbacks_list)

Extremely poor prediction: LSTM time-series

I tried to implement LSTM model for time-series prediction. Below is my trial code. This code runs without error. You can also try it without dependency.
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle
fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)
scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)
time_steps = 7
def create_ds(data, t_steps):
data = pd.DataFrame(data)
data_s = data.copy()
for i in range(time_steps):
data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)
data.dropna(axis=0, inplace=True)
return data.values
ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats
n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)
train_data = ds[:train_size, :]
train_data = shuffle(train_data)
test_data = ds[train_size:, :]
x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]
x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1])
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])
print (x_train.shape)
print (y_train.shape)
print (x_test.shape)
print (y_test.shape)
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2]), stateful=True, batch_size=1))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(n_feats, return_sequences=True, stateful=True))
model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=2)
y_predict = model.predict(x_test)
y_predict = y_predict.reshape(y_predict.shape[1], y_predict.shape[2])
y_predict = scaler.inverse_transform(y_predict)
y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]
print (y_test.shape)
print (y_predict.shape)
plt.plot(y_test, label='True')
plt.plot(y_predict, label='Predict')
plt.legend()
plt.show()
However, prediction is extremely poor. How to improve the predictin? Do you have any ideas to improve it?
Any ideas for improving prediction by re-designing architecture and/or layers?

If you want to use the model in my code (the link you passed), you need to have the data correctly shaped: (1 sequence, total_time_steps, 5 features)
Important: I don't know if this is the best way or the best model to do this, but this model is predicting 7 time steps ahead of the input (time_shift=7)
Data and initial vars
fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print("raw shape:")
print (raw.shape)
#(1789,5) - 1789 time steps / 5 features
scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)
time_shift = 7 #shift is the number of steps we are predicting ahead
n_rows = raw.shape[0] #n_rows is the number of time steps of our sequence
n_feats = raw.shape[1]
train_size = int(n_rows * 0.8)
#I couldn't understand how "ds" worked, so I simply removed it because in the code below it's not necessary
#getting the train part of the sequence
train_data = raw[:train_size, :] #first train_size steps, all 5 features
test_data = raw[train_size:, :] #I'll use the beginning of the data as state adjuster
#train_data = shuffle(train_data) !!!!!! we cannot shuffle time steps!!! we lose the sequence doing this
x_train = train_data[:-time_shift, :] #the entire train data, except the last shift steps
x_test = test_data[:-time_shift,:] #the entire test data, except the last shift steps
x_predict = raw[:-time_shift,:] #the entire raw data, except the last shift steps
y_train = train_data[time_shift:, :]
y_test = test_data[time_shift:,:]
y_predict_true = raw[time_shift:,:]
x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1]) #ok shape (1,steps,5) - 1 sequence, many steps, 5 features
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])
y_test = y_test.reshape(1, y_test.shape[0], y_test.shape[1])
x_predict = x_predict.reshape(1, x_predict.shape[0], x_predict.shape[1])
y_predict_true = y_predict_true.reshape(1, y_predict_true.shape[0], y_predict_true.shape[1])
print("\nx_train:")
print (x_train.shape)
print("y_train")
print (y_train.shape)
print("x_test")
print (x_test.shape)
print("y_test")
print (y_test.shape)
Model
Your model wasn't very powerful for this task, so I tried a bigger one (this on the other hand is too powerful)
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(256, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True))
model.compile(loss='mse', optimizer='adam')
Fitting
Notice that I had to train 2000+ epochs for the model to have good results.
I added the validation data so we can compare the loss for train and test.
#notice that I'm predicting from the ENTIRE sequence, including x_train
#is important for the model to adjust its states before predicting the end
model.fit(x_train, y_train, epochs=1000, batch_size=1, verbose=2, validation_data=(x_test,y_test))
Predicting
Important: as for predicting the end of a sequence based on the beginning, it's important that the model sees the beginning to adjust the internal states, so I'm predicting the entire data (x_predict), not only the test data.
y_predict_model = model.predict(x_predict)
print("\ny_predict_true:")
print (y_predict_true.shape)
print("y_predict_model: ")
print (y_predict_model.shape)
def plot(true, predicted, divider):
predict_plot = scaler.inverse_transform(predicted[0])
true_plot = scaler.inverse_transform(true[0])
predict_plot = predict_plot[:,0]
true_plot = true_plot[:,0]
plt.figure(figsize=(16,6))
plt.plot(true_plot, label='True',linewidth=5)
plt.plot(predict_plot, label='Predict',color='y')
if divider > 0:
maxVal = max(true_plot.max(),predict_plot.max())
minVal = min(true_plot.min(),predict_plot.min())
plt.plot([divider,divider],[minVal,maxVal],label='train/test limit',color='k')
plt.legend()
plt.show()
test_size = n_rows - train_size
print("test length: " + str(test_size))
plot(y_predict_true,y_predict_model,train_size)
plot(y_predict_true[:,-2*test_size:],y_predict_model[:,-2*test_size:],test_size)
Showing entire data
Showing the end portion of it for more detail
Please notice that this model is overfitting, it means it can learn the training data and get bad results in test data.
To solve this you must experimentally try smaller models, use dropout layers and other techniques to prevent overfitting.
Notice also that this data very probably contains A LOT of random factors, meaning the models will not be able to learn anything useful from it. As you make smaller models to avoid overfitting, you may also find that the model will present worse predictions for training data.
Finding the perfect model is not an easy task, it's an open question and you must experiment. Maybe LSTM models simply aren't the solution. Maybe your data is simply not predictable, etc. There isn't a definitive answer for this.
How to know the model is good
With the validation data in training, you can compare loss for train and test data.
Train on 1 samples, validate on 1 samples
Epoch 1/1000
9s - loss: 0.4040 - val_loss: 0.3348
Epoch 2/1000
4s - loss: 0.3332 - val_loss: 0.2651
Epoch 3/1000
4s - loss: 0.2656 - val_loss: 0.2035
Epoch 4/1000
4s - loss: 0.2061 - val_loss: 0.1696
Epoch 5/1000
4s - loss: 0.1761 - val_loss: 0.1601
Epoch 6/1000
4s - loss: 0.1697 - val_loss: 0.1476
Epoch 7/1000
4s - loss: 0.1536 - val_loss: 0.1287
Epoch 8/1000
.....
Both should go down together. When the test data stops going down, but the train data continues to improve, your model is starting to overfit.
Trying another model
The best I could do (but I didn't really try much) was using this model:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True))
model.compile(loss='mse', optimizer='adam')
When the losses were about:
loss: 0.0389 - val_loss: 0.0437
After this point, the validation loss started going up (so training beyond this point is totally useless)
Result:
This shows that all this model could learn was very overall behaviour, such as zones with higher values.
But the high frequency was either too random or the model wasn't good enough for this...

you may consider changing your model:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle
fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)
scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)
time_steps = 7
def create_ds(data, t_steps):
data = pd.DataFrame(data)
data_s = data.copy()
for i in range(time_steps):
data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)
data.dropna(axis=0, inplace=True)
return data.values
ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats
n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)
train_data = ds[:train_size, :]
train_data = shuffle(train_data)
test_data = ds[train_size:, :]
x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]
print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)
x_train = x_train.reshape(x_train.shape[0], time_steps, n_feats)
x_test = x_test.reshape(x_test.shape[0], time_steps, n_feats)
print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)
model = Sequential()
model.add(LSTM(64, input_shape=(time_steps, n_feats), return_sequences=True))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(n_feats))
model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=1, shuffle=False)
y_predict = model.predict(x_test)
print (y_predict.shape)
y_predict = scaler.inverse_transform(y_predict)
y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]
print (y_test.shape)
print (y_predict.shape)
plt.plot(y_test, label='True')
plt.plot(y_predict, label='Predict')
plt.legend()
plt.show()
But I really do not know merits of your implementation:
* both x and y are 3d (1,steps,features) rather than x in 3d (samples, time-steps, features) and y in 2d (samples, features)
* input_shape=(None, x_train.shape[2])
* last layer - model.add(LSTM(n_feats, return_sequences=True, stateful=True))
Someone may provide better answer.

Reading the original code, it seems the author first scales the dataset and then splits it up into Training and Testing subsets. This means that information about the Testing subset (e.g., volatility etc.) has "leaked" into the Training subset.
The recommended approach is to first perform the Training/Testing split up, calculate the scaling parameters using only the Training subset, and using these parameters perform the scaling of the Training and the Testing subsets separately.

I’m not exactly sure what you could do, that data looks as if it has no discernible pattern. If I can’t see one I doubt an LSTM could. Your prediction does look like a good regression line though.

I am at a point myself with creating a model that predicts data like this I created a SMOTErnn soultion to add as past data, and I have found using TimeSeriesGenrator on batch_size higher with higher strides it performs much bettter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

keras BLSTM for sequence labeling - python

Solved. The main issue was reshaping the data for the classification categories as binary arrays. Also used TimeDistributed and set return_sequences to True.

Related

How do I prevent loss and accuracy from restarting every time I start a new training session with Keras?

ResNet50 Model is not learning with transfer learning in keras

Text classification with LSTM Network and Keras 0.0% accuracy

How to use keras embedding layer with 3D tensor input?

Extremely poor prediction: LSTM time-series

Categories

Resources