Keras LSTM multiclass classification - python

I have this code that works for binary classification. I have tested it for keras imdb dataset.
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
I need the above code to be converted for multi-class classification where there are 7 categories in total. What I understand after reading few articles to convert above code I have to change
model.add(Dense(7, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Obviously changing just above two lines doesn't work. What else do I have to change to make the code work for multiclass classification. Also I think I have to change the classes to one hot encoding but don't know how in keras.

Yes, you need one hot target, you can use to_categorical to encode your target or a short way:
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
here is the full code:
from keras.models import Sequential
from keras.layers import *
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Summary
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 500, 32) 160000
_________________________________________________________________
lstm_1 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_1 (Dense) (None, 7) 707
=================================================================
Total params: 213,907
Trainable params: 213,907
Non-trainable params: 0
_________________________________________________________________

Related

how to feed LSTM model in Keras python?

I have read about LSTM and I know that algorithm takes the value of the previous words and consider it in the next word parameters
Now I am trying to apply my first LSTM algorithm
I have this code.
model = Sequential()
model.add(LSTM(units=6, input_shape = (X_train_count.shape[0], X_train_count.shape[1]), return_sequences = True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=ytrain.shape[1], return_sequences=True, name='output'))
model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.summary()
cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.summary()
cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)
history = model.fit(X_train_count, ytrain,
epochs=20,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10,
callbacks=[cp])
1- I cannot see how the LSTM would know the word sequence while my dataset built based on TFIDF?
2- I am getting error that
ValueError: Input 0 of layer sequential_8 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 18644]
The issue seems to be in the shape of X_train_count you are taking in LSTM input shape is always tricky.
If your X_train_count is not in 3D then reshape using the below line.
X_train_count=X_train_count.reshape(X_train_count.shape[0],X_train_count.shape[1],1))
In the LSTM layer, the input_shape should be (timesteps, data_dim).
Below is the example to illustrate the same.
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
X = ["first example","one more","good morning"]
Y = ["first example","one more","good morning"]
vectorizer = TfidfVectorizer().fit(X)
tfidf_vector_X = vectorizer.transform(X).toarray()
tfidf_vector_Y = vectorizer.transform(Y).toarray()
tfidf_vector_X = tfidf_vector_X[:, :, None]
tfidf_vector_Y = tfidf_vector_Y[:, :, None]
X_train, X_test, y_train, y_test = train_test_split(tfidf_vector_X, tfidf_vector_Y, test_size = 0.2, random_state = 1)
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM
model = Sequential()
model.add(LSTM(units=6, input_shape = X_train.shape[1:], return_sequences = True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=1, return_sequences=True, name='output'))
model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])
Model Summary:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_9 (LSTM) (None, 6, 6) 192
_________________________________________________________________
lstm_10 (LSTM) (None, 6, 6) 312
_________________________________________________________________
lstm_11 (LSTM) (None, 6, 6) 312
_________________________________________________________________
output (LSTM) (None, 6, 1) 32
=================================================================
Total params: 848
Trainable params: 848
Non-trainable params: 0
_________________________________________________________________
None
Here X_train is of shape (2, 6, 1)
To add to the solution, I would like to suggest to go with a dense vector instead of a sparse vector generated from the Tf-Idf approach representation by replacing with pre-trained models like Google News Vector or Glove as weights to the embedding layer which would be better in performance wise and result wise.

How would I improve my model such that it will work on more characters not in the dataset?

In my last post linked here, it was said that I have to modify my model for it to be better. To quote the only answerer's comment to my questions (again, thank you, Sir):
The accuracy of prediction is a metric of how good your neural network architecture is and it also depends on your train/validation data. You will have to tune your neural network in such a way that you generalize well by adjusting the hyper parameters such as number of layers, type of layers, learning rate, optimizer etc. ...
I would like to know how I would do these mentioned. Or at the least, be pointed in the right direction. I am honestly both lost in theory and practice.
The only thing I have been able to do is to adjust the epoch above 100. I have also cleaned the images to be identified as much as I can.
Currently, here is how I create my model. It is only based on Tensorflow 2.0's tutorial.
import numpy as np
import tensorflow as tf
from tensorflow import keras
# Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def createModel():
# Build the tf.keras.Sequential model by stacking layers.
# Choose an optimizer and loss function used for training:
model = tf.keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model = createModel()
model.fit(x_train, y_train, epochs=102, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test)
It gave out a validation accuracy of around .9800 for me. But its performance against images of handwritten characters I've extracted from documents is dismal. I would also like it to be extended such that it can also read other selected characters, but I guess that can be another question for another day.
Thanks!
You could have multiple layers of Convolution/ Max Pool at the beginning that would perform a feature extraction by scanning the image. After that you use a fully connected NN like you did before and a softmax.
You could create a model with a CNN that way:
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.models import Sequential
# Create the model
model = Sequential()
# Add the 1st Convolution/ max pool
model.add(Conv2D(40, kernel_size=5, padding="same",input_shape=(28, 28, 1), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd convolution / max pool
model.add(Conv2D(200, kernel_size=3, padding="same", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# 3rd convolution/ max pool
model.add(Conv2D(512, kernel_size=3, padding="valid", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# Reduce dimensions from 2d to 1d
model.add(Flatten())
model.add(Dense(units=100, activation='relu'))
# Add dropout to prevent overfitting
model.add(Dropout(0.5))
# Final fullyconnected layer
model.add(Dense(10, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Which returns the following model:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 28, 28, 40) 1040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 40) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 200) 72200
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 200) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 10, 10, 512) 922112
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 3276900
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 4,273,262
Trainable params: 4,273,262
Non-trainable params: 0
_________________________________________________________________

How to define inpute shapes in Sequential keras model

Please, help to define appropriate Dense input shapes in keras models. Maybe I have to reshape my data first. I have data set with dimensions shown below:
Data shapes are X_train: (2858, 2037) y_train: (2858, 1) X_test: (715, 2037) y_test: (715, 1)
Number of features (input shape) is 2037
I want to define Sequential keras model like that
``
batch_size = 128
num_classes = 2
epochs = 20
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(X_input_shape,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=RMSprop(),
from_logits=True,
metrics=['accuracy'])
``
Model summary:
``
Layer (type) Output Shape Param #
=================================================================
dense_20 (Dense) (None, 512) 1043456
_________________________________________________________________
dropout_12 (Dropout) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 512) 262656
=================================================================
Total params: 1,306,112
Trainable params: 1,306,112
Non-trainable params: 0
``
And when I try to fit it...
``
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_test, y_test))
``
I got an error:
``
ValueError: Error when checking target: expected dense_21 to have shape (512,) but got array with shape (1,)
``
Modify
model.add(Dense(512, activation='relu'))
to
model.add(Dense(1, activation='relu'))
The output shape to be of size 1, same as y_train.shape[1].

How to fit list of numpy array into LSTM Neural Network?

I am new to Keras so I really appreciate any help here. For my project, I am trying to train the neural network on multiple time series. I got it work by running a for loop through to fit each time series to the model. The code looks like this:
for i in range(len(train)):
history = model.fit(train_X[i], train_Y[i], epochs=20, batch_size=6, verbose=0, shuffle=True)
If I am not wrong, I am doing online training here. Now I'm trying to do batch training to see if the result is better. I tried to fit a list consisting of all timeseries (each converted into a numpy array), but I get this error:
Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 56 arrays:
Here is the info about the dataset and the model:
model = Sequential()
model.add(LSTM(1, input_shape=(1,16),return_sequences=True))
model.add(Flatten())
model.add(Dense(1, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1, 1) 72
_________________________________________________________________
flatten_2 (Flatten) (None, 1) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 2
=================================================================
Total params: 74
Trainable params: 74
Non-trainable params: 0
print(len(train_X), train_X[0].shape, len(train_Y), train_Y[0].shape)
56 (1, 23, 16) 56 (1, 23, 1)
Here is the block of code that gives me the error :
pyplot.figure(figsize=(16, 25))
history = model.fit(train_X, train_Y, epochs=1, verbose=1, shuffle=False, batch_size = len(train_X))
Input shape of LSTM should be - batch_size, timesteps, features.But we need not to mention batch_size in input shape if you want you can use batch_input_shape.
model = Sequential()
model.add(LSTM(1, input_shape=(23,16),return_sequences=True))
# model.add(Flatten())
model.add(Dense(1, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
X = np.random.random((56,1, 23, 16))
y = np.random.random((56,1, 23, 1))
X=np.squeeze(X,axis =1) #as input shape should be (`batch_size`, `timesteps`, `features`)
y = np.squeeze(y,axis =1)
model.fit(X,y,epochs=1, verbose=1, shuffle=False, batch_size = len(X))
I am not sure if it serves your purpose.

Error when running ANN with reccurent layer

I have created the below ANN with 2 fully connected layers and one recurrent. However when running it i get the error: Exception: Input 0 is incompatible with layer lstm_11: expected ndim=3, found ndim=2 Why is it happening?
from keras.models import Sequential
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
from sklearn.preprocessing import StandardScaler
from keras.layers import LSTM
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted_output.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:15]
scaler = StandardScaler(copy=True, with_mean=True, with_std=True ) #data normalization
X = scaler.fit_transform(X) #data normalization
Y = dataset[:,15]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=15, init='uniform', activation='relu'))
model.add(LSTM(10, return_sequences=True))
model.add(Dense(15, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)
Based on above all of layers are "Dense" layers in LSTM you are returning the sequence as true.. You should be setting return_sequences=False because of all Dense layer and it should work.
The reason for this error is that LSTM expects the input to have a shape of 3 dimensions (for batch, sequence length and input dimension). But the Dense layer before it outputs a shape of 2 dimensions (for batch and output dimension).
You can see the output shape of the Dense layer by executing below lines of code
>>> model = Sequential()
>>> model.add(Dense(12, input_dim=15, init='uniform', activation='relu'))
>>> model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_4 (Dense) (None, 12) 192 dense_input_2[0][0]
====================================================================================================
Total params: 192
Trainable params: 192
Non-trainable params: 0
____________________________________________________________________________________________________
However, you don't explain your intention with the model, so I cannot give you further guidance for this issue. What is your input data? Do you expect the input to be a sequence?
If your input is a sequence, then I suggest you to remove the first Dense layer. But if your input is not a sequence, then I suggest you to remove the LSTM layer.

Categories