LSTM Keras - Many to many classification Value error: incompatible shapes - python

I have just started with implementing a LSTM in Python with Tensorflow / Keras to test out an idea I had, however I am struggling to properly create a model. This post is mainly about a Value error that I often get (see the code at the bottom), but any and all help with creating a proper LSTM model for the problem below is greatly appreciated.
For each day, I want to predict which of a group of events will occur. The idea is that some events are recurring / always occur after a certain amount of time has passed, whereas other events occur only rarely or without any structure. A LSTM should be able to pick up on these recurring events, in order to predict their occurences for days in the future.
In order to display the events, I use a list with values 0 and 1 (non-occurence and occurence). So for example if I have the events ["Going to school", "Going to the gym" , "Buying a computer"] I have lists like [1, 0, 1], [1, 1, 0], [1, 0, 1], [1, 1, 0] etc. The idea is then that the LSTM will recognize that I go to school every day, the gym every other day and that buying a computer is very rare. So following the sequence of vectors, for the next day it should predict [1,0,0].
So far I have done the following:
Create x_train: a numpy.array with shape (305, 60, 193). Each entry of x_train contains 60 consecutive days, where day is represented by a vector of the same 193 events that can take place like described above.
Create y_train: a numpy.array with shape (305, 1, 193). Similar to x_train, but y_train only contains 1 day per entry.
x_train[0] consists of day 1,2,...,60 and y_train[0] contains day 61. x_train[1] then contains day 2,...,61 and y_train[1] contains day 62, etc. The idea is that the LSTM should learn to use data from the past 60 days, and that it can then iteratively start predicting/generating new vectors of event occurences for future days.
I am really struggling with how to create a simple implementation of a LSTM that can handle this. So far I think I have figured out the following:
I need to start with the below block of code, where N_INPUTS = 60 and N_FEATURES = 193. I am not sure what N_BLOCKS should be, or if the value it should take is strictly bound by some conditions. EDIT: According to https://zhuanlan.zhihu.com/p/58854907 it can be whatever I want
model = Sequential()
model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES)))
I should probably add a dense layer. If I want the output of my LSTM to be a vector with the 193 events, this should look as follows:
model.add(layers.Dense(193,activation = 'linear') #or some other activation function
I can also add a dropout layer to prevent overfitting, for example with model.add.layers.dropout(0.2) where the 0.2 is some rate at which things are set to 0.
I need to add a model.compile(loss = ..., optimizer = ...). I am not sure if the loss function (e.g. MSE or categorical_crosstentropy) and optimizer matter if I just want a working implementation.
I need to train my model, which I can achieve by using model.fit(x_train,y_train)
If all of the above works well, I can start to predict values for the next day using model.predict(the 60 days before the day I want to predict)
One of my attempts can be seen here:
print(x_train.shape)
print(y_train.shape)
model = keras.Sequential()
model.add(layers.LSTM(256, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[2], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train) #<- This line causes the ValueError
Output:
(305, 60, 193)
(305, 1, 193)
Model: "sequential_29"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_27 (LSTM) (None, 256) 460800
dense_9 (Dense) (None, 1) 257
=================================================================
Total params: 461,057
Trainable params: 461,057
Non-trainable params: 0
_________________________________________________________________
ValueError: Shapes (None, 1, 193) and (None, 193) are incompatible
Alternatively, I have tried replacing the line model.add(layers.Dense(y_train.shape[2], activation='softmax')) with model.add(layers.Dense(y_train.shape[1], activation='softmax')). This produces ValueError: Shapes (None, 1, 193) and (None, 1) are incompatible .
Are my ideas somewhat okay? How can I resolve this Value Error? Any help would be greatly appreciated.
EDIT: As suggested in the comments, changing the size of y_train did the trick.
print(x_train.shape)
print(y_train.shape)
model = keras.Sequential()
model.add(layers.LSTM(193, input_shape=(x_train.shape[1], x_train.shape[2]))) #De 193 mag ieder mogelijk getal zijn. zie: https://zhuanlan.zhihu.com/p/58854907
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train)
(305, 60, 193)
(305, 193)
Model: "sequential_40"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_38 (LSTM) (None, 193) 298764
dropout_17 (Dropout) (None, 193) 0
dense_16 (Dense) (None, 193) 37442
=================================================================
Total params: 336,206
Trainable params: 336,206
Non-trainable params: 0
_________________________________________________________________
10/10 [==============================] - 3s 89ms/step - loss: 595.5011
Now I am stuck on the fact that model.predict(x) requires x to be of the same size as x_train, and will output an array with the same size as y_train. I was hoping only one set of 60 days would be required to output the 61th day. Does anyone know how to achieve this?

The solution may be to have y_train of shape (305, 193) instead of (305, 1, 193) as you predict one day, this does not change the data, just its shape. You should then be able to train and predict.
With model.add(layers.Dense(y_train.shape[1], activation='softmax')) of course.

Related

How do i correctly shape my input data for a keras model?

I'm currently working on a Keras neural network for fun. I'm just learning the basics, but cant get over this dimension problem:
So my input data (X) should be a 12x6 matrix, with 12 timestamps and 6 different data values for every timestamp:
X = np.zeros([2867, 12, 6])
Y = np.zeros([2867, 3])
My Output (Y) should be a one-hot encoded 3x1 vector.
Now i want to feed this data through the following LSTM model.
model = Sequential()
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(12, 6)))
model.add(Dense(3))
model.summary()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=X, y=Y, batch_size=100, epochs=1000, verbose=2, validation_split=0.2)
The Summary looks like this:
Model: "sequential"
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 12, 30) 4440
_________________________________________________________________
dense (Dense) (None, 12, 3) 93
=================================================================
Total params: 4,533
Trainable params: 4,533
Non-trainable params: 0
_________________________________________________________________
When i run this program, i get this error:
ValueError: Shapes (None, 3) and (None, 12, 3) are incompatible.
I already tried to reshape my data to a 72x1 vector, but this doesnt work either.
Maybe someone can help me how to shape my input data correctly :).
You probably need to define your model as follows as you used the categorical_crossentropy loss function.
model.add(LSTM(30, activation="softsign",
return_sequences=False, input_shape=(12, 6)))
model.add(Dense(3, activations='softmax'))

How to improve LSTM model predictions and accuracy?

After creating pre-embedded layer using gensim my val_accuracy has gone down to 45% for 4600 records:-
model = models.Sequential()
model.add(Embedding(input_dim=MAX_NB_WORDS, output_dim=EMBEDDING_DIM,
weights=[embedding_model],trainable=False,
input_length=seq_len,mask_zero=True))
#model.add(SpatialDropout1D(0.2))
#model.add(Embedding(vocabulary_size, 64))
model.add(GRU(units=150, return_sequences=True))
model.add(Dropout(0.4))
model.add(LSTM(units=200,dropout=0.4))
#model.add(Dropout(0.8))
#model.add(LSTM(100))
#model.add(Dropout(0.4))
#Bidirectional(tf.keras.layers.LSTM(embedding_dim))
#model.add(LSTM(400,input_shape=(1117, 100),return_sequences=True))
#model.add(Bidirectional(LSTM(128)))
model.add(Dense(100, activation='relu'))
#
#model.add(Dropout(0.4))
#model.add(Dense(200, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) (None, 50, 100) 2746300
_________________________________________________________________
gru_4 (GRU) (None, 50, 150) 112950
_________________________________________________________________
dropout_4 (Dropout) (None, 50, 150) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 200) 280800
_________________________________________________________________
dense_7 (Dense) (None, 100) 20100
_________________________________________________________________
dense_8 (Dense) (None, 4) 404
=================================================================
Total params: 3,160,554
Trainable params: 414,254
Non-trainable params: 2,746,300
_________________________________________________________________
Full code is at
https://colab.research.google.com/drive/13N94kBKkHIX2TR5B_lETyuH1QTC5VuRf?usp=sharing
It would be great help for me.Since i am new in deep learning and i tried almost everything i knew.But now am all blank.
The problem is with your input. You've padded your input sequences with zeros but have not provided this information to your model. So your model doesn't ignore the zeros which is the reason it's not learning at all. To resolve this, change your embedding layer as follows:
model.add(layers.Embedding(input_dim=vocab_size+1,
output_dim=embedding_dim,
mask_zero=True))
This will enable your model to ignore the zero padding and learn. Training with this, I got a training accuracy of 100% in just 6 epochs though validation accuracy wasn't that good (aroung 54%) which is expected as your training data contains only 32 examples. More about embedding layer: https://keras.io/api/layers/core_layers/embedding/
Since your dataset is small, the model tends to overfit on training data quite easily which gives lower validation accuracy. To mitigate this to some extent, you can try using pre-trained word embeddings like word2vec or GloVe instead of training your own embedding layer. Also, try some text data augmentation methods like creating artificial data using templates or replacing words in training data with their synonyms. You can also experiment with different types of layers (like replacing GRU with another LSTM) but in my opinion that may not help much here and should be considered after trying out pre-trained embeddings and data augmentation.

How do I know the correct format for my input data into my keras RNN?

I am trying to build an Elman simple RNN as described here.
I've built my model using Keras as follows:
model = keras.Sequential()
model.add(keras.layers.SimpleRNN(7,activation =None,use_bias=True,input_shape=
[x_train.shape[0],x_train.shape[1]]))
model.add(keras.layers.Dense(7,activation = tf.nn.sigmoid))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_2 (SimpleRNN) (None, 7) 105
_________________________________________________________________
dense_2 (Dense) (None, 7) 56
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
_________________________________________________________________
My training data is currently of shape (15000, 7, 7). That is, 15000 instances of length 7 one hot encodings, coding for one of seven letters. E.g [0,0,0,1,0,0,0],[0,0,0,0,1,0,0] and so forth.
The data's labels are the same format, since each letter predicts the next letter in the sequence, i.e [0,1,0,0,0,0,0] has the label [0,0,1,0,0,0,0].
So, training data (x_train) and training labels (y_train) are both shaped (15000,7,7).
My validation data x_val and y_val are shaped (10000,7,7). I.e the same shape just with fewer instances.
So when I run my model:
history = model.fit(x_train,
y_train,
epochs = 40,
batch_size=512,
validation_data = (x_val,y_val))
I get the error:
ValueError: Error when checking input: expected simple_rnn_7_input to have shape (15000, 7) but got array with shape (7, 7)
Clearly my input data is formatted incorrectly for input into the Keras RNN, but I can't think how to feed it the correct input.
Could anyone advise me as to the solution?
SimpleRNN layer expects the inputs of dimensions (seq_length, input_dim) that is (7,7) in your case.
Also if you want output at each time-step, you need to use return_sequence=True, by default which is false. This way you can compare the output at time-step.
So the model architecture will be something like this:
model.add(keras.layers.SimpleRNN(7, activation='tanh',
return_sequences=True,
input_shape=[7,7]))
model.add(keras.layers.Dense(7))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_12 (SimpleRNN) (None, 7, 7) 105
_________________________________________________________________
dense_2 (Dense) (None, 7, 7) 56
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
_________________________________________________________________
Now at training time, it expects data input and output of dims (num_samples, seq_length, input_dims) i.e. (15000, 7, 7) for both.
model.compile(loss='categorical_crossentropy', optimizer='adam')# define any loss, you want
model.fit(x_train, y_train, epochs=2)

how to configure data labels in a numpy array for training a Keras model?

I'm trying to implement Keras for my first time (so sorry for the dumb question) as part of a wider project to make an AI that learns to play connect 4. As part of this, I pass a NN a 6*7 grid and it outputs an array of 7 values giving the probabilities to pick for each column in the game. Here is the output of the Model.summary() method for a bit more detail:
______________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 42) 0
_________________________________________________________________
dense (Dense) (None, 20) 860
_________________________________________________________________
dense_1 (Dense) (None, 20) 420
_________________________________________________________________
dense_2 (Dense) (None, 7) 147
=================================================================
Total params: 1,427
Trainable params: 1,427
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
the model will give (at the moment random) predictions when i pass it numpy arrays of shape (1, 6, 7), however, when i try to train the model with an array of shape (221, 6, 7) for the data and an array of shape (221, 7) for the labels i get this error:
ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (7,)
This is the code I use to train the model (which outputs (221, 6, 7) and (221, 7)):
board_tensor = np.array(full_board_list)
print(board_tensor.shape)
label_tensor = np.array(full_label_list)
print(label_tensor.shape)
self.model.fit(board_tensor, label_tensor)
this is the code I use to define the model:
self.model = keras.Sequential([
keras.layers.Flatten(input_shape=(6, 7)),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(7, activation=tf.nn.softmax)])
self.model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
(the model is part of an AI object so that it could be compared to other types of AI objects)
This is the code which successfully predicts a batch of size 1, generated from by a two dimensional list representing the board (it outputs (1, 6, 7) and (1, 7)):
input_tensor = np.array(board.board)
input_tensor = np.expand_dims(input_tensor, 0)
print(input_tensor.shape)
probability_distribution = self.model.predict(input_tensor)
print(probability_distribution.shape)
I realise that the error is probably due to a lack of understanding on my part as to what the methods in Keras expect to be given; so as a little side-note, does anyone have any good, thorough learning resources which really get you to understand what each method is doing (ie. not just telling you which code to type in to make an image recogniser) that would be understandable to people new to Keras and Tensorflow like me?
thanks a lot in advance!
You are using the sparse_categorical_crossentropy loss, which takes integer labels (not one-hot encoded ones), while your labels are one-hot encoded. This is why you get an error.
The easiest way to fix it is to change loss to categorical_crossentropy.

Neural Network regression when the output is imbalanced

I am trying to perform regression using a neural network to predict a single output from 146 input features.
I applied Standard Scaling on all inputs and output.
I monitor the Mean Absolute Error after training and it is unreasonably high on the train, validation and test sets (I am not even overfitting).
I suspect this is due to the fact that the output variable is very imbalanced (see histogram).
From the histogram it is possible to see that most of the samples are grouped around 0 but there is also another small group of samples around -5.
Histogram of the imbalanced output
This is model creation code:
input = Input(batch_shape=(None, X.shape[1]))
layer1 = Dense(20, activation='relu')(input)
layer1 = Dropout(0.3)( layer1)
layer1 = BatchNormalization()(layer1)
layer2 = Dense(5, activation='relu',
kernel_regularizer='l2')(layer1)
layer2 = Dropout(0.3)(layer2)
layer2 = BatchNormalization()(layer2)
out_layer = Dense(1, activation='linear')(layer2)
model = Model(inputs=input, outputs=out_layer)
model.compile(loss='mean_squared_error', optimizer=optimizers.adam()
, metrics=['mae'])
This is the model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 146) 0
_________________________________________________________________
dense_1 (Dense) (None, 20) 2940
_________________________________________________________________
dropout_1 (Dropout) (None, 20) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 20) 80
_________________________________________________________________
dense_2 (Dense) (None, 5) 105
_________________________________________________________________
dropout_2 (Dropout) (None, 5) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 5) 20
_________________________________________________________________
dense_3 (Dense) (None, 1) 6
=================================================================
Total params: 3,151
Trainable params: 3,101
Non-trainable params: 50
_________________________________________________________________
Looking at the actual model predictions, the large error mainly happens for samples with a true output value around -5 (the small group of samples).
I tried many configurations for the hyperparameters but still the error is very high.
I see many suggestions on performing neural network classification on imbalanced data but what could be done with regression?
It seems odd to me that a regression neural network is not learning this correctly. What am I doing wrong?
From your histogram, it looks as though it's rare for there to be a non-zero output. This is similar to a classification problem where we're trying to predict a rare class, in that a strong strategy in terms of the loss function is simply to guess the most common class - in this case your modal value of zero.
You should do some research around what people do to predict rare events or to classify inputs when some classes are rare. E.g. this discussion might be helpful: https://www.reddit.com/r/MachineLearning/comments/412wpp/predicting_rare_events_how_to_prevent_machine/
Some strategies you might try include
Removing most of the zero-output training examples so that your training data is more balanced
Creating or acquiring more non-zero training examples
Using a different machine learning algorithm (someone at the link I provided recommends boosting. I wonder if you'd get good results from using a residual neural network structure, which is in some ways similar to boosting)
Re-structuring or rescaling your data to add more weight to the rare values
It appears to me that you have a normal distribution with a very small standard deviation. In which case this should train just as well as any other probability distribution.

Categories