Keras LSTM multiclass classification

Keras LSTM multiclass classification - python

I have this code that works for binary classification. I have tested it for keras imdb dataset.
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
I need the above code to be converted for multi-class classification where there are 7 categories in total. What I understand after reading few articles to convert above code I have to change
model.add(Dense(7, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Obviously changing just above two lines doesn't work. What else do I have to change to make the code work for multiclass classification. Also I think I have to change the classes to one hot encoding but don't know how in keras.

Yes, you need one hot target, you can use to_categorical to encode your target or a short way:
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
here is the full code:
from keras.models import Sequential
from keras.layers import *
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Summary
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 500, 32) 160000
_________________________________________________________________
lstm_1 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_1 (Dense) (None, 7) 707
=================================================================
Total params: 213,907
Trainable params: 213,907
Non-trainable params: 0
_________________________________________________________________

Related

how to feed LSTM model in Keras python?

I have read about LSTM and I know that algorithm takes the value of the previous words and consider it in the next word parameters
Now I am trying to apply my first LSTM algorithm
I have this code.
model = Sequential()
model.add(LSTM(units=6, input_shape = (X_train_count.shape[0], X_train_count.shape[1]), return_sequences = True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=ytrain.shape[1], return_sequences=True, name='output'))
model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.summary()
cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.summary()
cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)
history = model.fit(X_train_count, ytrain,
epochs=20,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10,
callbacks=[cp])
1- I cannot see how the LSTM would know the word sequence while my dataset built based on TFIDF?
2- I am getting error that
ValueError: Input 0 of layer sequential_8 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 18644]

The issue seems to be in the shape of X_train_count you are taking in LSTM input shape is always tricky.
If your X_train_count is not in 3D then reshape using the below line.
X_train_count=X_train_count.reshape(X_train_count.shape[0],X_train_count.shape[1],1))
In the LSTM layer, the input_shape should be (timesteps, data_dim).
Below is the example to illustrate the same.
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
X = ["first example","one more","good morning"]
Y = ["first example","one more","good morning"]
vectorizer = TfidfVectorizer().fit(X)
tfidf_vector_X = vectorizer.transform(X).toarray()
tfidf_vector_Y = vectorizer.transform(Y).toarray()
tfidf_vector_X = tfidf_vector_X[:, :, None]
tfidf_vector_Y = tfidf_vector_Y[:, :, None]
X_train, X_test, y_train, y_test = train_test_split(tfidf_vector_X, tfidf_vector_Y, test_size = 0.2, random_state = 1)
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM
model = Sequential()
model.add(LSTM(units=6, input_shape = X_train.shape[1:], return_sequences = True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=1, return_sequences=True, name='output'))
model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])
Model Summary:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_9 (LSTM) (None, 6, 6) 192
_________________________________________________________________
lstm_10 (LSTM) (None, 6, 6) 312
_________________________________________________________________
lstm_11 (LSTM) (None, 6, 6) 312
_________________________________________________________________
output (LSTM) (None, 6, 1) 32
=================================================================
Total params: 848
Trainable params: 848
Non-trainable params: 0
_________________________________________________________________
None
Here X_train is of shape (2, 6, 1)
To add to the solution, I would like to suggest to go with a dense vector instead of a sparse vector generated from the Tf-Idf approach representation by replacing with pre-trained models like Google News Vector or Glove as weights to the embedding layer which would be better in performance wise and result wise.

How would I improve my model such that it will work on more characters not in the dataset?

In my last post linked here, it was said that I have to modify my model for it to be better. To quote the only answerer's comment to my questions (again, thank you, Sir):
The accuracy of prediction is a metric of how good your neural network architecture is and it also depends on your train/validation data. You will have to tune your neural network in such a way that you generalize well by adjusting the hyper parameters such as number of layers, type of layers, learning rate, optimizer etc. ...
I would like to know how I would do these mentioned. Or at the least, be pointed in the right direction. I am honestly both lost in theory and practice.
The only thing I have been able to do is to adjust the epoch above 100. I have also cleaned the images to be identified as much as I can.
Currently, here is how I create my model. It is only based on Tensorflow 2.0's tutorial.
import numpy as np
import tensorflow as tf
from tensorflow import keras
# Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def createModel():
# Build the tf.keras.Sequential model by stacking layers.
# Choose an optimizer and loss function used for training:
model = tf.keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
model = createModel()
model.fit(x_train, y_train, epochs=102, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test)
It gave out a validation accuracy of around .9800 for me. But its performance against images of handwritten characters I've extracted from documents is dismal. I would also like it to be extended such that it can also read other selected characters, but I guess that can be another question for another day.
Thanks!

You could have multiple layers of Convolution/ Max Pool at the beginning that would perform a feature extraction by scanning the image. After that you use a fully connected NN like you did before and a softmax.
You could create a model with a CNN that way:
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.models import Sequential
# Create the model
model = Sequential()
# Add the 1st Convolution/ max pool
model.add(Conv2D(40, kernel_size=5, padding="same",input_shape=(28, 28, 1), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# 2nd convolution / max pool
model.add(Conv2D(200, kernel_size=3, padding="same", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# 3rd convolution/ max pool
model.add(Conv2D(512, kernel_size=3, padding="valid", activation = 'relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1, 1)))
# Reduce dimensions from 2d to 1d
model.add(Flatten())
model.add(Dense(units=100, activation='relu'))
# Add dropout to prevent overfitting
model.add(Dropout(0.5))
# Final fullyconnected layer
model.add(Dense(10, activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Which returns the following model:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 28, 28, 40) 1040
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 40) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 200) 72200
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 200) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 10, 10, 512) 922112
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 32768) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 3276900
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 4,273,262
Trainable params: 4,273,262
Non-trainable params: 0
_________________________________________________________________

How to define inpute shapes in Sequential keras model

Please, help to define appropriate Dense input shapes in keras models. Maybe I have to reshape my data first. I have data set with dimensions shown below:
Data shapes are X_train: (2858, 2037) y_train: (2858, 1) X_test: (715, 2037) y_test: (715, 1)
Number of features (input shape) is 2037
I want to define Sequential keras model like that
``
batch_size = 128
num_classes = 2
epochs = 20
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(X_input_shape,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=RMSprop(),
from_logits=True,
metrics=['accuracy'])
``
Model summary:
``
Layer (type) Output Shape Param #
=================================================================
dense_20 (Dense) (None, 512) 1043456
_________________________________________________________________
dropout_12 (Dropout) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 512) 262656
=================================================================
Total params: 1,306,112
Trainable params: 1,306,112
Non-trainable params: 0
``
And when I try to fit it...
``
history = model.fit(X_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_test, y_test))
``
I got an error:
``
ValueError: Error when checking target: expected dense_21 to have shape (512,) but got array with shape (1,)
``

Modify
model.add(Dense(512, activation='relu'))
to
model.add(Dense(1, activation='relu'))
The output shape to be of size 1, same as y_train.shape[1].

How to fit list of numpy array into LSTM Neural Network?

I am new to Keras so I really appreciate any help here. For my project, I am trying to train the neural network on multiple time series. I got it work by running a for loop through to fit each time series to the model. The code looks like this:
for i in range(len(train)):
history = model.fit(train_X[i], train_Y[i], epochs=20, batch_size=6, verbose=0, shuffle=True)
If I am not wrong, I am doing online training here. Now I'm trying to do batch training to see if the result is better. I tried to fit a list consisting of all timeseries (each converted into a numpy array), but I get this error:
Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 56 arrays:
Here is the info about the dataset and the model:
model = Sequential()
model.add(LSTM(1, input_shape=(1,16),return_sequences=True))
model.add(Flatten())
model.add(Dense(1, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1, 1) 72
_________________________________________________________________
flatten_2 (Flatten) (None, 1) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 2
=================================================================
Total params: 74
Trainable params: 74
Non-trainable params: 0
print(len(train_X), train_X[0].shape, len(train_Y), train_Y[0].shape)
56 (1, 23, 16) 56 (1, 23, 1)
Here is the block of code that gives me the error :
pyplot.figure(figsize=(16, 25))
history = model.fit(train_X, train_Y, epochs=1, verbose=1, shuffle=False, batch_size = len(train_X))

Input shape of LSTM should be - batch_size, timesteps, features.But we need not to mention batch_size in input shape if you want you can use batch_input_shape.
model = Sequential()
model.add(LSTM(1, input_shape=(23,16),return_sequences=True))
# model.add(Flatten())
model.add(Dense(1, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
X = np.random.random((56,1, 23, 16))
y = np.random.random((56,1, 23, 1))
X=np.squeeze(X,axis =1) #as input shape should be (`batch_size`, `timesteps`, `features`)
y = np.squeeze(y,axis =1)
model.fit(X,y,epochs=1, verbose=1, shuffle=False, batch_size = len(X))
I am not sure if it serves your purpose.

Error when running ANN with reccurent layer

I have created the below ANN with 2 fully connected layers and one recurrent. However when running it i get the error: Exception: Input 0 is incompatible with layer lstm_11: expected ndim=3, found ndim=2 Why is it happening?
from keras.models import Sequential
from keras.layers import Dense
from sklearn.cross_validation import train_test_split
import numpy
from sklearn.preprocessing import StandardScaler
from keras.layers import LSTM
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("sorted_output.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:15]
scaler = StandardScaler(copy=True, with_mean=True, with_std=True ) #data normalization
X = scaler.fit_transform(X) #data normalization
Y = dataset[:,15]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=15, init='uniform', activation='relu'))
model.add(LSTM(10, return_sequences=True))
model.add(Dense(15, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10)

Based on above all of layers are "Dense" layers in LSTM you are returning the sequence as true.. You should be setting return_sequences=False because of all Dense layer and it should work.

The reason for this error is that LSTM expects the input to have a shape of 3 dimensions (for batch, sequence length and input dimension). But the Dense layer before it outputs a shape of 2 dimensions (for batch and output dimension).
You can see the output shape of the Dense layer by executing below lines of code
>>> model = Sequential()
>>> model.add(Dense(12, input_dim=15, init='uniform', activation='relu'))
>>> model.summary()
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dense_4 (Dense) (None, 12) 192 dense_input_2[0][0]
====================================================================================================
Total params: 192
Trainable params: 192
Non-trainable params: 0
____________________________________________________________________________________________________
However, you don't explain your intention with the model, so I cannot give you further guidance for this issue. What is your input data? Do you expect the input to be a sequence?
If your input is a sequence, then I suggest you to remove the first Dense layer. But if your input is not a sequence, then I suggest you to remove the LSTM layer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras LSTM multiclass classification - python

Related

how to feed LSTM model in Keras python?

How would I improve my model such that it will work on more characters not in the dataset?

How to define inpute shapes in Sequential keras model

How to fit list of numpy array into LSTM Neural Network?

Error when running ANN with reccurent layer

Categories

Resources