keras lstm incorrect input_shape

keras lstm incorrect input_shape - python

I am trying to use a lstm model to predict the weather (mainly to learn about lstm's and using python).
I have a dataset of 500,000 rows each of which represents a date and there are 8 columns which are my features.
Below is my model.
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.fit(
X,
y,
batch_size=512,
epochs=100,
validation_split=0.05)
For the input parameters as I understand it the first parameter is the time step so here I am saying that I think the last 30 observations should be used to predict the next value. The 8 as I understand are the features so, air pressure, temperature etc.
So my X matrix I convert into a 3D matrix with the line below so X is now 500000, 8, 1 matrix.
X = np.reshape(X, (X.shape[0], X.shape[1], 1))
When I run the model though I get the error below.
ValueError: Error when checking input: expected lstm_3_input to have shape (30, 8) but got array with shape (8, 1)
What am I doing wrong?

Your issue is with data preparation.
Find details on data preparation for LSTMs here.
LSTMs map a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple samples Consider a given univariate sequence:
[10, 20, 30, 40, 50, 60, 70, 80, 90]
We can divide the sequence into multiple input/output patterns called samples, where three n_steps time steps are used as input and one time step is used as label for the one-step prediction that is being learned.
X, y
10, 20, 30 40
20, 30, 40 50
30, 40, 50 60
# ...
So what you want to do is implemented in the split_sequence() function below:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Getting back to our initial example the following happens:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
print(X[i], y[i])
# [10 20 30] 40
# [20 30 40] 50
# [30 40 50] 60
# [40 50 60] 70
# [50 60 70] 80
# [60 70 80] 90
Take away: Now your shapes should be what your LSTM model expects them to be, and you should be able to adjust your data shape to your needs. Obviously the same works for multiple input feature rows.

I think your input shape is off. The NN does not understand that you want it to take slices of 30 points to predict 31st. What you need to do is to slice your dataset into chunks of length 30 (which means each point is going to be copied 29 time) and train on that, which will have a shape of (499969, 30, 8) , assuming that last point goes only into y. Also do not add a dummy dimension at the end, it is needed in conv layers for RGB channels.

I think you might need just a simple explanation of how layers work. In particular, note that all Keras layers behave something like this:
NAME(output_dim, input_shape = (...,input_dim))
For example, suppose I have 15000, 3 long vectors and I would like to change them to 5 long vectors. Then something like this would do that:
import numpy as np, tensorflow as tf
X = np.random.random((15000,3))
Y = np.random.random((15000,5))
M = tf.keras.models.Sequential()
M.add(tf.keras.layers.Dense(5,input_shape=(3,)))
M.compile('sgd','mse')
M.fit(X,Y) # Take note that I provided complete working code here. Good practice.
# I even include the imports and random data to check that it works.
Likewise, if my input looks something like (1000,10,5) and I run it through an LSTM like LSTM(7); then I should know (automatically) that I will get something like (...,7) as my output. Those 5 long vectors will get changed to 7 long vectors. Rule to understand. The last dimension is always the vector you are changing and the first parameter of the layer is always the dimension to change it to.
Now the second thing to learn about LSTMs. They use a time axis (which is not the last axis, because as we just went over, that is always the "changing dimension axis") which is removed if return_sequences=False and kept if return_sequences=True. Some examples:
LSTM(7) # (10000,100,5) -> (10000,7)
# Here the LSTM will loop through the 100, 5 long vectors (like a time series with memory),
# producing 7 long vectors. Only the last 7 long vector is kept.
LSTM(7,return_sequences=True) # (10000,100,5) -> (10000,100,7)
# Same thing as the layer above, except we keep all the intermediate steps.
You provide a layer that looks like this:
LSTM(50,input_shape=(30,8),return_sequences=True) # (10000,30,8) -> (10000,30,50)
Notice the 30 is the TIME dimension used in your LSTM model. The 8 and the 50 are the INPUT_DIM and OUTPUT_DIM, and have nothing to do with the time axis. Another common misunderstanding, notice that the LSTM expects you to provide each SAMPLE with it's own COMPLETE PAST and TIME AXIS. That is, an LSTM does not use previous sample points for the next sample point; each sample is independent and comes with it's own complete past data.
So let's take a look at your model. Step one. What is your model doing and what kind of data is it expecting?
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
print(model.input_shape)
model.summary() # Lets see what your model is doing.
So, now I clearly see your model does:
(10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)
Did you expect that? Did you see that those would be the dimensions of the intermediate steps? Now that I know what input and output your model is expecting, I can easily verify that your model trains and works on that kind of data.
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
import numpy as np
X = np.random.random((10000,30,8))
Y = np.random.random((10000,1))
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
model.fit(X,Y)
Did you notice that your model was expecting inputs like (...,30,8)? Did you know your model was expecting output data that looked like (...,1)? Knowing what your model wants, also means you can now change your model to fit the data your interested in. If you want your data to run over your 8 parameters like a time axis, then your input dimension needs to reflect that. Change the 30 to an 8 and change the 8 to a 1. If you do this, notice also that your first layer is expanding each 1 long vector (a single number) into a 50 long vector. Does that sound like what you wanted the model to do? Maybe your LSTM should be an LSTM(2) or LSTM(5) instead of 50...etc. You could spend the next 1000 hours trying to find the right parameters that work with the data you are using.
Maybe you don't want to go over your FEATURE space as a TIME SPACE, maybe try repeating your data into batches of size 10, where each sample has it's own history, dimensions say (10000,10,8). Then a LSTM(50) would use your 8 long feature space and change it into a 50 long feature space while going over the TIME AXIS of 10. Maybe you just want to keep the last one with return_sequences=False.

Let me copy a function I used for preparing my data for LSTM:
from itertools import islice
def slice_data_for_lstm(data, lookback):
return np.array(list(zip(*[islice(np.array(data), i, None, 1) for i in range(lookback)])))
X_sliced = slice_data_for_lstm(X, 30)
lookback should be 30 in your case and will create 30 stacks of your (8, 1) features. The resulting data is in shape (N, 30, 8, 1).

Related

How to make array have same number of samples?

I am a beginner using CNN and Keras and I am trying to make a program to predict whether someone could develop diabetes using data in a CSV file. I think I am getting confused with how to reshape the array as I am receiving the error:
ValueError: Data cardinality is ambiguous:
x sizes: 8
y sizes: 768
Make sure all arrays contain the same number of samples
Here is the code:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten
# read in the csv file using pandas
data = pd.read_csv("diabetes.csv")
# extract the input and output columns from the dataframe
X = data.drop(columns=['Outcome'])
y = data['Outcome']
# reshape the input data into the shape expected by a CNN
X = X.values.reshape(8, 768, 1)
# create a Sequential model in Keras
model = Sequential()
# add a 2D convolutional layer with 32 filters and a kernel size of 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=(8, 768, 1)))
# add a flatten layer to flatten the output from the convolutional layer
model.add(Flatten())
# add a fully-connected layer with 64 units and a ReLU activation
model.add(Dense(64, activation="relu"))
# add a fully-connected layer with 10 units and a softmax activation
model.add(Dense(10, activation="softmax"))
# compile the model using categorical crossentropy loss and an Adam optimizer
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# fit the model using the input and output data
model.fit(X, y)
# print prediction
print(model.predict(10, 139, 80, 0, 0, 27.1, 1.441, 57))

Tldr, you probably don't want a CNN in this case.
First off, I’m assuming your data looks something like the following, if that’s not the case the rest of the post may be way off target:
enter image description here
So there are 768 rows or patients, 8 inputs for each row, and 1 output (known as the label).
Convolutional layers are used when there is an input signal that you wish to analyze. In 2d, this would be something like a grid of pixels, or in 1d it might be time series data. Your data is neither – each row of the data represents a single 8-dimensional data point (i.e. a single patient) at a single point in time, so you very likely don’t want to use a convolutional layer at all.
For more information, you can read up on the differences between convnets and fully connected neural networks here: https://ai.stackexchange.com/questions/5546/what-is-the-difference-between-a-convolutional-neural-network-and-a-regular-neur?rq=1
“CNN, in specific, has one or more layers of convolution units. A convolution unit receives its input from multiple units from the previous layer which together create a proximity. Therefore, the input units (that form a small neighborhood) share their weights.
The convolution units (as well as pooling units) are especially beneficial as:
• They reduce the number of units in the network (since they are many-to-one mappings). This means, there are fewer parameters to learn which reduces the chance of overfitting as the model would be less complex than a fully connected network.
• They consider the context/shared information in the small neighborhoods. This feature is very important in many applications such as image, video, text, and speech processing/mining as the neighboring inputs (eg pixels, frames, words, etc) usually carry related information."
A very naïve, very basic NN for a problem like this would just use Dense, i.e. fully connected layers.
In Keras, you can do the following:
model = Sequential()
model.add(Dense(64, activation="relu", input_shape=(8,)))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
Note that the last layer is a single neuron, since you have only one output. If you were classifying images as one of say 10 categories (dog, cat, bird, etc), then you would use 10 output nodes in the last layer, softmax them, and use categorical cross entropy. Here, with a single condition, you only need a single output node and note that the loss function should probably be binary crossentropy – i.e. you’re trying to detect the presence or absence of the condition.
Hope this helps.

How to correctly shape my CNN-LSTM input layer

I have a data set with the shape (3340, 6). I want to use a CNN-LSTM to read a sequence of 30 rows and predict the next row's (6) elements. From what I have read, this is considered a multi-parallel time series. I have been primarily following this machine learning mastery tutorial and am having trouble implementing the CNN-LSTM architecture for a multi-parallel time series.
I have used this function to split the data into 30 day time step frames
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the dataset
if end_ix > len(sequences)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Here is a sample of the data frames produced by the function above.
# 30 Time Step Input Frame X[0], X.shape = (3310, 30, 6)
[4.951e-02, 8.585e-02, 5.941e-02, 8.584e-02, 8.584e-02, 5.000e+00],
[8.584e-02, 9.307e-02, 7.723e-02, 8.080e-02, 8.080e-02, 4.900e+01],
[8.080e-02, 8.181e-02, 7.426e-02, 7.474e-02, 7.474e-02, 2.000e+01],
[7.474e-02, 7.921e-02, 6.634e-02, 7.921e-02, 7.921e-02, 4.200e+01],
...
# 1 Time Step Output Array y[0], y.shape = (3310, 6)
[6.550e-02, 7.690e-02, 6.243e-02, 7.000e-02, 7.000e-02, 9.150e+02]
Here is the following model that I am using:
model = Sequential()
model.add(TimeDistributed(Conv1D(64, 1, activation='relu'), input_shape=(None, 30, 6)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(Dense(6))
model.compile(optimizer='adam', loss='mse')
When I run model.fit, I receive the following error:
ValueError: Error when checking input: expected time_distributed_59_input to have
4 dimensions, but got array with shape (3310, 30, 6)
I am at a loss at how to properly shape my input layer so that I can get this model learning. I have done several Conv2D nets in the past but this is my first time series model so I apologize if there's an obvious answer here that I am missing.

Remove TimeDistributed from Conv1D and MaxPooling1D; 3D inputs are supported
Remove Flatten(), as it destroys timesteps-channels relationships
Add TimeDistributed to the last Dense layer, as Dense does not support 3D inputs (returned by LSTM(return_sequences=True); alternatively, use return_sequences=False)

Understanding LSTMs - layers data dimensions

I am not understanding how LSTM layers are fed with data.
LSTM layers requires three dimensions (x,y,z).
I do have a dataset of time series: 2900 rows in total, which should conceptually divided into groups of 23 consecutive rows where each row is described by 178 features.
Conceptually every 23 rows I have a new sequence 23 rows long regarding a new patient.
Are the following statements right?
x samples = # of bunches of sequences 23 rows long - namely len(dataframe)/23
y time steps = length of the each sequence - by domain assumption 23 here.
z feature size = # of columns for each row - 178 in this case.
Therefore x*y = "# of rows in the dataset"
Assuming this is correct, what's a batch size while training a model in this case?
Might be the number of samples considered in an epoch while training?
Therefore by having x(# of samples) equal to 200, it makes no sense to set a batch_size greater than 200, because that's my upper limit - I don't have more data to train on.

I interpret your description as saying that your total dataset is of 2900 data samples. Where each data sample has 23 time slots, each with a vector of 178 dimensions.
If that the case the input_shape for your model should be defined as (23, 178). The batch size is simple the number of samples (out of the 2900) that will be used for a training / test / prediction run.
Try the following:
from keras.models import Sequential
from keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(64, input_shape=(23,178)))
model.compile(loss='mse', optimizer='sgd')
model.summary()
print model.input
This is just a simplistic model that outputs a single 64-wide vector for each sample. You will see that the expected model.input is:
Tensor("lstm_3_input:0", shape=(?, 23, 178), dtype=float32)
The batch_size is unset in the input shape which means that the model can be used to train / predict batches of different sizes.

Unrolling, timesteps, batchsize and hidden unit

I read this blog here to understand the theoretical background this but after reading here I am bit confused about what **1)timesteps, 2)unrolling, 3)number of hidden units and 4) batch size ** are ? Maybe someone could explain this on a code basis as well because when I look into the model config this code below does not unroll, but what is timestep doing in this case ? Lets say I have a data of length of 2.000 points, splitted into 40 time steps and one feature. E.g. hidden units are 100. batchsize is not defined, what is happening in the model ?
model = Sequential()
model.add(LSTM(100, input_shape=(n_timesteps_in, n_features)))
model.add(RepeatVector(n_timesteps_in))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(n_features, activation='tanh')))
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
history=model.fit(train, train, epochs=epochs, verbose=2, shuffle=False)
Is the code below still an encoder decode model without a RepeatVector?
model = Sequential()
model.add(LSTM(100, return_sequences=True, input_shape=(n_timesteps_in, n_features)))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(n_features, activation='tanh')))
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
history=model.fit(train, train, epochs=epochs, verbose=2, shuffle=False)

"Unroll" is just a mechanism to process the LSTMs in a way that makes them faster by occupying more memory. (The details are unknown for me... but it certainly has no influence in steps, shapes, etc.)
When you say "2000 points split in 40 time steps", I have absolutely no idea of what is going on.
The data must be meaningfully structured and saying "2000" data points is really lacking a lot of information.
Data structured for LSTMs is:
I have a certain number of individual sequences (data evolving with time)
Each sequence has a number of time steps (measures in time)
In each step we measured a number of different vars with different meanings (features)
Example:
2000 users in a website
They used the site for 40 days
In each day I measured the number of times they clicked a button
I can plot how this data evolves with time daily (each day is a step)
So, if you have 2000 sequences (also called "samples" in Keras), each sequence with length of 40 steps, and one single feature per step, this will happen:
Dimensions
Batch size is defined as 32 by default in the fit method. The model will process batches containing 32 sequences/users until it reaches 2000 sequences/users.
input_shape will required to be (40,1) (free batch size to choose in fit)
Steps
Your LSTMs will try to understand how clicks vary in time, step by step. That's why they're recurrent, they calculate things for a step and feed these things into the next step, until all 40 steps are processed. (You won't see this processing, though, it's internal)
With return_sequences=True, you will get the output for all steps.
Without it, you will get only the output for the last step.
The model
The model will process 32 parallel (and independent) sequences/users together in each batch.
The first LSTM layer will process the entire sequence in recurrent steps and return a final result. (The sequence is killed, there are no steps left because you didn't use return_sequences=True)
Output shape = (batch, 100)
You create a new sequence with RepeatVector, but this sequence is constant in time.
Output shape = (batch, 40, 100)
The next LSTM layer processes this constant sequence and produces an output sequence, with all 40 steps
Output shape = (bathc, 40, 100)
The TimeDistributed(Dense) will process each of these steps, but independently (in parallel), not recursively as the LSTMs would do.
Output shape = (batch, 40, n_features)
The output will be a the total group of 2000 sequences (that were processed in groups of 32), each with 40 steps and n_features output features.
Cells, features, units
Everything is independent.
Input features is one thing, output features is another. There is no requirement for Dense to use the same number of features used in input_shape, unless that's what you want.
When you use 100 units in the LSTM layer, it will produce an output sequence of 100 features, shape (batch, 40, 100). If you use 200 units, it will produce an output sequence with 200 features, shape (batch, 40, 200). This is computing power. More neurons = more intelligence in the model.
Something strange in the model:
You should replace:
model.add(LSTM(100, input_shape=(n_timesteps_in, n_features)))
model.add(RepeatVector(n_timesteps_in))
With only:
model.add(LSTM(100, return_sequences=True,input_shape=(n_timesteps_in, n_features)))
Not returning sequences in the first layer and then creating a constant sequence with RepeatVector is sort of destroying the work of your first LSTM.

How should we classify full sequence?

I want to classify fully sequence into two categories. I searched a lot in the web but found no result for this. My prefered way is to use LSTM model from keras to classify "full" sequence of varing rows into two categories. The problem with this approach is the different shape of X and y. This is a sample code I wrote to explain my problem.
import numpy as np
from keras.layers import Dense,LSTM
from keras.models import Sequential
#(no of samples, no of rows,step, feature)
feat= np.random.randint(0, 100, (150, 250, 25,10))
#(no of samples, binary_output)
target= np.random.randint(0, 2, (150, 1))
#model
model = Sequential()
model.add(LSTM(10, input_shape=(25, 10), return_sequences=True))
model.add(LSTM(10,return_sequences=False))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
print model.summary()
for i in xrange(target.shape[0]):
X=feat[i]
y=target[i]
model.fit(X,y)
Here I have 150 sample sequence which I want to classify into 0 or 1. The problem is this
ValueError: Input arrays should have the same number of samples as target arrays. Found 250 input samples and 1 target samples.
If there is no way to carry out this in deep learning methods, can you suggest any other machine learning algorithms?
EDIT
Many have asked doubts regarding this
#(no of samples, no of rows,step, feature)
feat= np.random.randint(0, 100, (150, 250, 25,10))
150 is number of samples( consider this as 150 time series data).
250 and 10 is a time series data with 250 rows and 10 columns.
(250, 25,10) is additon of 25 time steps that way it can be passed to keras lstm input

The problem is that when you do
X=feat[i]
y=target[i]
This removes the first axis, which causes X.shape = (250, 25, 10) and y.shape == (1,). When you call model.fit(X, y) keras then assumes that X has 250 samples, and y has only one sample. Which is why you get that error.
You can solve this by extracting slices of feat and target, for example by calling
X=feat[i:i+batch_size]
y=target[i:i+batch_size]
Where batch_size is how many samples you want to use per iteration. If you set batch_size = 1, you should get the behavior you intended in your code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.