I want to classify fully sequence into two categories. I searched a lot in the web but found no result for this. My prefered way is to use LSTM model from keras to classify "full" sequence of varing rows into two categories. The problem with this approach is the different shape of X and y. This is a sample code I wrote to explain my problem.
import numpy as np
from keras.layers import Dense,LSTM
from keras.models import Sequential
#(no of samples, no of rows,step, feature)
feat= np.random.randint(0, 100, (150, 250, 25,10))
#(no of samples, binary_output)
target= np.random.randint(0, 2, (150, 1))
#model
model = Sequential()
model.add(LSTM(10, input_shape=(25, 10), return_sequences=True))
model.add(LSTM(10,return_sequences=False))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
print model.summary()
for i in xrange(target.shape[0]):
X=feat[i]
y=target[i]
model.fit(X,y)
Here I have 150 sample sequence which I want to classify into 0 or 1. The problem is this
ValueError: Input arrays should have the same number of samples as target arrays. Found 250 input samples and 1 target samples.
If there is no way to carry out this in deep learning methods, can you suggest any other machine learning algorithms?
EDIT
Many have asked doubts regarding this
#(no of samples, no of rows,step, feature)
feat= np.random.randint(0, 100, (150, 250, 25,10))
150 is number of samples( consider this as 150 time series data).
250 and 10 is a time series data with 250 rows and 10 columns.
(250, 25,10) is additon of 25 time steps that way it can be passed to keras lstm input
The problem is that when you do
X=feat[i]
y=target[i]
This removes the first axis, which causes X.shape = (250, 25, 10) and y.shape == (1,). When you call model.fit(X, y) keras then assumes that X has 250 samples, and y has only one sample. Which is why you get that error.
You can solve this by extracting slices of feat and target, for example by calling
X=feat[i:i+batch_size]
y=target[i:i+batch_size]
Where batch_size is how many samples you want to use per iteration. If you set batch_size = 1, you should get the behavior you intended in your code.
Related
I have a time series problem with 15 minutes as a timestep.The complete data will be from 2016-09-01 00:00:15 to 2016-12-31 23:45:00.
I have 5 variables(v1,v2,v3,v4,v5,v6) in the data frame and I want to predict the sixth variable (v6) for the next timestep.
I prepare the data set and prepare the information as 5-time lags. like if the time is t in the row I create the values for (t-1) to (t-5) as lags for v1 to v6.
So in total, I have 30 features (5 lags for 6 variables).
I also normalize the values by PowerTransformer.
scaler_x = PowerTransformer()
scaler_y = PowerTransformer()
train_X = scaler_x.fit_transform(train_X)
train_y = scaler_y.fit_transform(train_y.reshape(-1,1))
My data input shape of traix_X and train_y is like below at initial:
(11253, 30) , (11253, 1)
11253 rows having 30 variables as input and a single variable as target variable .Then i reshape this to fit my ConvLSTM2D like below:
# define the number of subsequences and the length of subsequences
n_steps, n_length = 5, 6 #I take into account of past 5 steps for the 6 variables
n_features=1
#reshape for ConvLSTM
# reshape into subsequences [samples, time steps, rows, cols, channels]
train_X = train_X.reshape(train_X.shape[0], n_steps, 1, n_length, n_features)
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
The ConvLSTM2D architecture looks like below :
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, verbose=0)
But this model gives a very bad result (It is overfitting a lot). I suspect that my inputs are not given correctly to the ConvLSTM2D.
Is my reshaping correct? Any help is appreciated.
EDIT:
I have realized my input is being given correctly to the Network but the issue is it is overfitting a lot.
My hyperparameters are below :
#hyper-parameter
epochs=100
batch_size=64
adam_opt = keras.optimizers.Adam(lr=0.001)
I even tried 50 and 10 epochs its same issue.
In my personal experience there are a few things I've picked up about using ConvLSTM2D.
I would first check to see if the model is training at all. Based on your answer I am unsure how loss is changing as your model trains - if at all. If there is some variation, you need to perform a grid search (playing around with amount of layers and filters)
I also found my models needed to train for a long time to perform well, see the Keras example on ConvLSTM2d where 300 epochs are needed to train a model to perform an arguably simple task : https://keras.io/examples/conv_lstm/. A case I worked on needed a similar amount of epochs to train.
Check different loss functions and optimizers (even though I think mse and adam are good for this type of problem)
Normalize your data differently, you may want to normalize your data statistically as
shown in this keras example : https://www.tensorflow.org/tutorials/keras/regression
From personal experience, you might want more layers for this specific problem. See keras ConvLSTM2d example above for this
* I see how you want to format your data, and though it may work, a more straightforward solution may work better. You might want to try giving (v1,v2,v3,v4,v5) and predicting for v6. You may have the use large batch sizes for this. *
I am trying to use a lstm model to predict the weather (mainly to learn about lstm's and using python).
I have a dataset of 500,000 rows each of which represents a date and there are 8 columns which are my features.
Below is my model.
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.fit(
X,
y,
batch_size=512,
epochs=100,
validation_split=0.05)
For the input parameters as I understand it the first parameter is the time step so here I am saying that I think the last 30 observations should be used to predict the next value. The 8 as I understand are the features so, air pressure, temperature etc.
So my X matrix I convert into a 3D matrix with the line below so X is now 500000, 8, 1 matrix.
X = np.reshape(X, (X.shape[0], X.shape[1], 1))
When I run the model though I get the error below.
ValueError: Error when checking input: expected lstm_3_input to have shape (30, 8) but got array with shape (8, 1)
What am I doing wrong?
Your issue is with data preparation.
Find details on data preparation for LSTMs here.
LSTMs map a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple samples Consider a given univariate sequence:
[10, 20, 30, 40, 50, 60, 70, 80, 90]
We can divide the sequence into multiple input/output patterns called samples, where three n_steps time steps are used as input and one time step is used as label for the one-step prediction that is being learned.
X, y
10, 20, 30 40
20, 30, 40 50
30, 40, 50 60
# ...
So what you want to do is implemented in the split_sequence() function below:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
Getting back to our initial example the following happens:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
print(X[i], y[i])
# [10 20 30] 40
# [20 30 40] 50
# [30 40 50] 60
# [40 50 60] 70
# [50 60 70] 80
# [60 70 80] 90
Take away: Now your shapes should be what your LSTM model expects them to be, and you should be able to adjust your data shape to your needs. Obviously the same works for multiple input feature rows.
I think your input shape is off. The NN does not understand that you want it to take slices of 30 points to predict 31st. What you need to do is to slice your dataset into chunks of length 30 (which means each point is going to be copied 29 time) and train on that, which will have a shape of (499969, 30, 8) , assuming that last point goes only into y. Also do not add a dummy dimension at the end, it is needed in conv layers for RGB channels.
I think you might need just a simple explanation of how layers work. In particular, note that all Keras layers behave something like this:
NAME(output_dim, input_shape = (...,input_dim))
For example, suppose I have 15000, 3 long vectors and I would like to change them to 5 long vectors. Then something like this would do that:
import numpy as np, tensorflow as tf
X = np.random.random((15000,3))
Y = np.random.random((15000,5))
M = tf.keras.models.Sequential()
M.add(tf.keras.layers.Dense(5,input_shape=(3,)))
M.compile('sgd','mse')
M.fit(X,Y) # Take note that I provided complete working code here. Good practice.
# I even include the imports and random data to check that it works.
Likewise, if my input looks something like (1000,10,5) and I run it through an LSTM like LSTM(7); then I should know (automatically) that I will get something like (...,7) as my output. Those 5 long vectors will get changed to 7 long vectors. Rule to understand. The last dimension is always the vector you are changing and the first parameter of the layer is always the dimension to change it to.
Now the second thing to learn about LSTMs. They use a time axis (which is not the last axis, because as we just went over, that is always the "changing dimension axis") which is removed if return_sequences=False and kept if return_sequences=True. Some examples:
LSTM(7) # (10000,100,5) -> (10000,7)
# Here the LSTM will loop through the 100, 5 long vectors (like a time series with memory),
# producing 7 long vectors. Only the last 7 long vector is kept.
LSTM(7,return_sequences=True) # (10000,100,5) -> (10000,100,7)
# Same thing as the layer above, except we keep all the intermediate steps.
You provide a layer that looks like this:
LSTM(50,input_shape=(30,8),return_sequences=True) # (10000,30,8) -> (10000,30,50)
Notice the 30 is the TIME dimension used in your LSTM model. The 8 and the 50 are the INPUT_DIM and OUTPUT_DIM, and have nothing to do with the time axis. Another common misunderstanding, notice that the LSTM expects you to provide each SAMPLE with it's own COMPLETE PAST and TIME AXIS. That is, an LSTM does not use previous sample points for the next sample point; each sample is independent and comes with it's own complete past data.
So let's take a look at your model. Step one. What is your model doing and what kind of data is it expecting?
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
print(model.input_shape)
model.summary() # Lets see what your model is doing.
So, now I clearly see your model does:
(10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)
Did you expect that? Did you see that those would be the dimensions of the intermediate steps? Now that I know what input and output your model is expecting, I can easily verify that your model trains and works on that kind of data.
from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
import numpy as np
X = np.random.random((10000,30,8))
Y = np.random.random((10000,1))
model = Sequential()
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')
model.fit(X,Y)
Did you notice that your model was expecting inputs like (...,30,8)? Did you know your model was expecting output data that looked like (...,1)? Knowing what your model wants, also means you can now change your model to fit the data your interested in. If you want your data to run over your 8 parameters like a time axis, then your input dimension needs to reflect that. Change the 30 to an 8 and change the 8 to a 1. If you do this, notice also that your first layer is expanding each 1 long vector (a single number) into a 50 long vector. Does that sound like what you wanted the model to do? Maybe your LSTM should be an LSTM(2) or LSTM(5) instead of 50...etc. You could spend the next 1000 hours trying to find the right parameters that work with the data you are using.
Maybe you don't want to go over your FEATURE space as a TIME SPACE, maybe try repeating your data into batches of size 10, where each sample has it's own history, dimensions say (10000,10,8). Then a LSTM(50) would use your 8 long feature space and change it into a 50 long feature space while going over the TIME AXIS of 10. Maybe you just want to keep the last one with return_sequences=False.
Let me copy a function I used for preparing my data for LSTM:
from itertools import islice
def slice_data_for_lstm(data, lookback):
return np.array(list(zip(*[islice(np.array(data), i, None, 1) for i in range(lookback)])))
X_sliced = slice_data_for_lstm(X, 30)
lookback should be 30 in your case and will create 30 stacks of your (8, 1) features. The resulting data is in shape (N, 30, 8, 1).
I am not understanding how LSTM layers are fed with data.
LSTM layers requires three dimensions (x,y,z).
I do have a dataset of time series: 2900 rows in total, which should conceptually divided into groups of 23 consecutive rows where each row is described by 178 features.
Conceptually every 23 rows I have a new sequence 23 rows long regarding a new patient.
Are the following statements right?
x samples = # of bunches of sequences 23 rows long - namely len(dataframe)/23
y time steps = length of the each sequence - by domain assumption 23 here.
z feature size = # of columns for each row - 178 in this case.
Therefore x*y = "# of rows in the dataset"
Assuming this is correct, what's a batch size while training a model in this case?
Might be the number of samples considered in an epoch while training?
Therefore by having x(# of samples) equal to 200, it makes no sense to set a batch_size greater than 200, because that's my upper limit - I don't have more data to train on.
I interpret your description as saying that your total dataset is of 2900 data samples. Where each data sample has 23 time slots, each with a vector of 178 dimensions.
If that the case the input_shape for your model should be defined as (23, 178). The batch size is simple the number of samples (out of the 2900) that will be used for a training / test / prediction run.
Try the following:
from keras.models import Sequential
from keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(64, input_shape=(23,178)))
model.compile(loss='mse', optimizer='sgd')
model.summary()
print model.input
This is just a simplistic model that outputs a single 64-wide vector for each sample. You will see that the expected model.input is:
Tensor("lstm_3_input:0", shape=(?, 23, 178), dtype=float32)
The batch_size is unset in the input shape which means that the model can be used to train / predict batches of different sizes.
TL;DR - I have a couple of thousand speed-profiles (time-series where the speed of a car has been sampled) and I am unsure how to configure my models such that I can perform arbitrary forecasting (i.e. predict t+n samples given a sample t).
I have read numerous explanations (1, 2, 3, 4, 5) about how Keras implements statefulness in their recurrent layers, and how one should reset/not reset between iterations, etc..
However, I am unable to acquire the model shape that I want (I think).
As for now, I am only working with a subset of my profiles (denoted as routes in the code below).
Number of training routes: 90
Number of testing routes: 10
The routes vary in length, hence, the first thing I do is to iterate through all routes and pad them with 0, so they are all the same length. (I have assumed this is required, if I am wrong please let me know.) After the padding I convert the routes into a format better suited for the supervised learning task, as described HERE. In this case I have opted to forecast the succeeding 5 steps of the current sample.
The result is a tensor, as:
Shape of trainig_data: (90, 3186, 6) == (nb_routes, nb_samples/route, nb_timesteps)
which is split into X and y for training as:
Shape of X: (90, 3186, 1)
Shape of y: (90, 3186, 5)
My goal is to have the model take one route at the time and train on it. I have created a model like this:
# Create model
model = Sequential()
# Add recurrent layer
model.add(SimpleRNN(nb_cells, batch_input_shape=(1, X.shape[1], X.shape[2]), stateful=True))
# Add dense layer at the end to acquire correct kind of forecast
model.add(Dense(y.shape[2]))
# Compile model
model.compile(loss="mean_squared_error", optimizer="adam", metrics = ["accuracy"])
# Fit model
for _ in range(nb_epochs):
model.fit(X, y,
validation_split=0.1,
epochs=1,
batch_size=1,
verbose=1,
shuffle=False)
model.reset_states()
Which would imply that I have a model with nb_cells layers, the input of the model is (number_of_samples, number_of_timesteps) i.e. (3186, 1) and the output of the model is (number_of_timesteps_lagged) i.e. (5).
However, when running the above I get the following error:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (90, 3186, 5)
I have tried different ways to solve the above, but I have been unsuccessful.
I have also tried other ways of structuring my data and my model. For instance merging my routes such that instead of (90, 3186, 6) I had (286740, 6). I simply took the data for each route and put it after the other. After fiddeling with my model I got this to run, and I get a result that is quite good, but I really want to understand how this works - and I think the solution I am attempting above is bette (if I can get it to work).
Update
Note: I am still looking for feedback.
I have reached a "solution" which I think does the trick.
I have abandoned the padding and instead opted for a one sample at the time approach. The reason being that I am trying to acquire a network that allows me to predict by providing the network with one sample at the time. I want to give the network sample t and have it predict t+1, t+2, ...,t+n, so it is my understanding that I must train the network on one sample at the time. I also assume that using:
stateful will allow me to keep the hidden state of the cells unspoiled between batches (meaning that I can determine the batch size to be len(route))
return_sequences will allow me to get the output vector that I desire
The changed code is given below. Unlike the original question, the shape of the input data is now (90,) (i.e. 90 routes of various length) but each training route still has only one feature per sample, and each label route has five samples per feature (the lagged time).
# Create model
model = Sequential()
# Add nn_type cells
model.add(SimpleRNN(nb_cells, return_sequences=True, stateful=True, batch_input_shape=(1, 1, nb_past_obs)))
# Add dense layer at the end to acquire correct kind of forecast
model.add(Dense(nb_future_obs))
# Compile model
model.compile(loss="mean_squared_error", optimizer="adam", metrics = ["accuracy"])
# Fit model
for e in range(nb_epochs):
for r in range(len(training_data)):
route = training_data[r]
for s in range(len(route)):
X = route[s, :nb_past_obs].reshape(1, 1, nb_past_obs)
y = route[s, nb_past_obs:].reshape(1, 1, nb_future_obs)
model.fit(X, y,
epochs=1,
batch_size=1,
verbose=0,
shuffle=False))
model.reset_states()
return model
My input data consists of 10 samples, each of which has 200 time steps, while each time step is described by a vector of 30 dimensions.
In addition, each time step consists of a 3 dimensional vector (one hot encoding) which describes the action which has been taken at that particular time step. With that being said, I am trying to build a model which get fed in all previous actions and then predicts which action would be the best to take next.
I tried to get this working with tflearn and tensorflow but with limited success so far.
Simple sample code:
import numpy as np
import operator
import tflearn
from tflearn import regression
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
from tflearn.data_utils import to_categorical, pad_sequences
SAMPLES = 10
TIME_STEPS = 200
DATA_DIMENSIONS = 30
LABEL_CLASSES = 3
x = []
y = []
# Generate fake data.
for i in range(SAMPLES):
sequences = []
outputs = []
for i in range(TIME_STEPS):
d = []
for i in range(DATA_DIMENSIONS):
d.append(1)
sequences.append(d)
outputs.append([0,0,1])
x.append(sequences)
y.append(outputs)
print("X1:", len(x), ", X2:", len(x[0]), ", X3:", len(x[0][0]))
print("Y1:", len(y), ", Y2:", len(y[0]), ", Y3:", len(y[0][0]))
# Define model
net = tflearn.input_data([None, TIME_STEPS, DATA_DIMENSIONS], name='input')
net = tflearn.lstm(net, 128, dropout=0.8, return_seq=True)
net = tflearn.fully_connected(net, LABEL_CLASSES, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(net)
# Fit model.
model.fit({'input': x}, {'targets': y},
n_epoch=1,
snapshot_step=1000,
show_metric=True, run_id='test', batch_size=32)
Error
ValueError: Cannot feed value of shape (10, 200, 3) for Tensor
'targets/Y:0', which has shape '(?, 3)'
As far as I understand, the input_data should be correct. However, the output data is apparently wrong, at least, Tensorflow throws an error. That is probably because my model expects one label per sample rather than one label per time step.
Can I even achieve my goal with an LSTM, and if so, how do I have to set up my model?
Thanks,
Robert
As the error suggests, there is a shape mismatch between the expected size of your targets tensor, and the one of the data you actually provide for it. Let us break it down.
From what I understand, you have labeled action for every timestep of your sequences. This means that the labels that you provide should have a shape (10, 200, 3). This seems to be the case from the error message. Good.
So we now know the error comes from what the network generates.
=================
Input data -> (10, 200, 30)
LSTM -> (10, 128) (because return_seq=False)
FullyConnected -> (10, 3).
=================
So that explains the second part of the error message, your network indeed produces an output with shape (10, 3) which mismatches the one of your data.
I think you missed the return_seq argument of the LSTM. As is usually the case with RNN implementations, you have a parameter telling if you want the layer to return outputs for the whole sequence, or only for the last timestep. Here by default it is the second option, that is why you don't get an output with the expected shape. Use return_seq=True.