Adding exogenous variables to my univariate LSTM model - python

My data frame is on an hourly basis (index of my df) and I want to predict y.
> df.head()
Date y
2019-10-03 00:00:00 343
2019-10-03 01:00:00 101
2019-10-03 02:00:00 70
2019-10-03 03:00:00 67
2019-10-03 04:00:00 122
I will now import the libraries and train the model:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
prediction_hours = 24
df_train= df[:len(df)-prediction_hours]
df_test= df[len(df)-prediction_hours:]
print(df_train.head())
print('/////////////////////////////////////////')
print (df_test.head())
training_set = df_train.values
training_set = min_max_scaler.fit_transform(training_set)
x_train = training_set[0:len(training_set)-1]
y_train = training_set[1:len(training_set)]
x_train = np.reshape(x_train, (len(x_train), 1, 1))
num_units = 2
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100
regressor = Sequential()
regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1)))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = optimizer, loss = loss_function)
regressor.fit(x_train, y_train, batch_size = batch_size, epochs = num_epochs)
And after training, I can actually use it on my test data:
test_set = df_test.values
inputs = np.reshape(test_set, (len(test_set), 1))
inputs = min_max_scaler.transform(inputs)
inputs = np.reshape(inputs, (len(inputs), 1, 1))
predicted_y = regressor.predict(inputs)
predicted_y = min_max_scaler.inverse_transform(predicted_y)
This is the prediction I got:
The forecast is actually pretty good: is it too good to be true? Am I doing anything wrong? I followed the implementation step by step from a GitHub implementation.
I want to add some exogenous variables, namely v1, v2, v3. If my dataset now looks like this with new variables,
df.head()
Date y v1 v2 v3
2019-10-03 00:00:00 343 4 6 10
2019-10-03 01:00:00 101 3 2 24
2019-10-03 02:00:00 70 0 0 50
2019-10-03 03:00:00 67 0 4 54
2019-10-03 04:00:00 122 3 3 23
How can I include these variables v1,v2 and v3 in my LSTM model? The implementation of the multivariate LSTM is very confusing to me.
Edit to answer Yoan suggestion:
For a dataframe with the date as index and with the columns y, v1, v2 and v3, I've done the following as suggested:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
prediction_hours = 24
df_train= df[:len(df)-prediction_hours]
df_test= df[len(df)-prediction_hours:]
print(df_train.head())
print('/////////////////////////////////////////')
print (df_test.head())
training_set = df_train.values
training_set = min_max_scaler.fit_transform(training_set)
x_train = np.reshape(x_train, (len(x_train), 1, 4))
y_train = training_set[0:len(training_set),1] #I've tried with 0:len.. and
#for 1:len..
num_units = 2
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100
regressor = Sequential()
regressor.add(LSTM(units = num_units, activation = activation_function,
input_shape=(None, 1,4)))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = optimizer, loss = loss_function)
regressor.fit(x_train, y_train, batch_size = batch_size, epochs =
num_epochs)
But I get the following error:
only integer scalar arrays can be converted to a scalar index

Combining auxiliary features with sequences
There are multiple ways of handling auxiliary features with LSTMs and all of these are inspired by what your data contains and how you want to model these features. I'll discuss 4 different scenarios and strategies for your reference below with some dummy code.
Scenario 1: If you have simple continuous features, simply pass them into an LSTM!
Scenario 2: If you have multiple label encoded sequences, embed and then encode them separately in LSTMs after which concatenate them for your downstream predictions
If you have a label encoded sequence and some auxiliary features, you can -
Scenario 3: Append these after embedding them and then pass them into the LSTMs
Scenario 4: Append them to the output of the LSTM and choose to pass them to another set of LSTMs
Scenario 1:
Let's say you have 4 sequential features and all of those are continuous (not label encoded as in text or categorical). In this case, LSTMs are well equipped to handle these features directly. An LSTM layer expects a shape of (batch, sequence, features) and therefore such a scenario fits nicely without any modifications.
Features --> LSTM --> Process --> Predict
Code
from tensorflow.keras import layers, Model, utils
#Four continuous features
X = np.random.random((100,10,4))
Y = np.random.random((100,))
###Define model###
inp = layers.Input((10,4))
#LSTMs
x = layers.LSTM(8, return_sequences=True)(inp)
x = layers.LSTM(8)(x)
out = layers.Dense(1)(x)
model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)
Scenario 2:
Next, let's assume another simple case. You have 2 labels encoded sequences (say text). As one would think, all you would want to do is separately create sequential features by building LSTMs for each of them and then concatenating them at the end before your downstream prediction task.
Sequence --> Embed --> LSTM -->|
* --> Append --> Process --> Predict
Sequence --> Embed --> LSTM -->|
Code
from tensorflow.keras import layers, Model, utils
#Two sequential, label encoded features
X = np.random.random((100,10,2))
Y = np.random.random((100,))
###Define model###
inp = layers.Input((10,2))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1])(inp)
#Append embeddings features
x1 = layers.Embedding(1000, 5)(feature1)
x2 = layers.Embedding(1200, 7)(feature2)
#LSTMs
x1 = layers.LSTM(8, return_sequences=True)(x1)
x1 = layers.LSTM(8)(x1)
x2 = layers.LSTM(8, return_sequences=True)(x2)
x2 = layers.LSTM(8)(x2)
#Combine LSTM final states
x = layers.concatenate([x1,x2])
out = layers.Dense(1)(x)
model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)
Scenario 3:
Next scenario, let's assume you are working with one feature which is label encoded sequence (say text). Before you pass this feature to LSTMs you will have to encode it into an n dimensional vector it using an embedding layer. This will result in a (batch, sequence, embedding_dim) shaped input for the LSTMs which is no problem at all. Let's say, however, you also have 3 auxiliary features which are continuous (and properly normalized). One simply thing you could do is just append these to the output of the Embedding layer to get a (batch, sequence, embedding_dims+auxiliary) input which the LSTM can handle as well!
Sequence --> Embed ----->|
*--> Append --> LSTM -> Process --> Predict
Auxiliary --> Process -->|
Code
from tensorflow.keras import layers, Model, utils
#One sequential, label encoded feature & 3 auxilary features for each timestep
X = np.random.random((100,10,4))
Y = np.random.random((100,))
###Define model###
inp = layers.Input((10,4))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1:4])(inp)
#Append embeddings features
x = layers.Embedding(1000, 5)(feature1)
x = layers.concatenate([x, feature2])
#LSTMs
x = layers.LSTM(8, return_sequences=True)(x)
x = layers.LSTM(8)(x)
out = layers.Dense(1)(x)
model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)
In the above example, after the label encoded input is embedded into the 5-dimensional vector, the 3 auxiliary inputs are appended and then the (10,8) dimensional sequence is passed to the LSTMs for doing their magic.
Scenario 4:
Let's say you have the same scenario as above, but you want the sequential features to be richer representations before you append the auxilary inputs. Here you could simply pass the sequential feature to an LSTM and append the auxiliary input to the OUTPUT of the LSTM and then decide to pass it into another LSTM if needed. This will require you to return_sequences=True so that you can get the same length sequence which can be appended to the auxiliary features for that set of time steps.
Sequence --> Embed --> LSTM(seq) -->|
*--> Append --> Process --> Predict
Auxiliary --> Process ------------->|
Code
from tensorflow.keras import layers, Model, utils
#One sequential, label and 3 auxilary continous features
X = np.random.random((100,10,4))
Y = np.random.random((100,))
###Define model###
inp = layers.Input((10,4))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1:4])(inp)
#feature2 = layers.Reshape((-1,1))(feature2)
#Append embeddings features
x = layers.Embedding(1000, 5)(feature1)
#LSTMs
x = layers.LSTM(8, return_sequences=True)(x)
x = layers.concatenate([x, feature2])
x = layers.LSTM(8)(x)
#Combine LSTM final states
out = layers.Dense(1)(x)
model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)
There are architectures that add a single feature to the output of an LSTM and encode them again in an LSTM, after which they add the next feature and so on instead of adding all of them together. That is a design choice and will have to be tested for your specific data.
Hope this clarifies your question.

Keras default implementation of LSTM expect input shape : (batch, sequence, features).
So when reshaping x_train instead of doing :
x_train = np.reshape(x_train, (len(x_train), 1, 1))
You simply have :
x_train = np.reshape(x_train, (len(x_train), 1, num_features))
It's not clear from your post whether you also want to predict this new features (multivariate prediction) or if you still want to predict y only.
In the first case you'll need to modify your Dense layer to account for the new dimension of the target :
regressor.add(Dense(units = num_features))
In the second case you'll need to reshape y_train to take only y
y_train = training_set[1:len(training_set),1] # (assuming Date is not the index)
Finally your LSTM input shape must be updated to accept the new reshaped x_train :
regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1, num_features)))

Related

Concatenate a layer to itself

I am learning to work with categorical data following the code examples of same old entity embedding by Abhishek Thakur. He is using visual studio one as the python editor, but I am using Google Colab notebook. The code is the same, but I am getting error on a concatenation.
According to the tensor documentation:
layers.Concatenate(axis = along which to concatenate)
The sample code uses:
x=layers.Concatenate()(outputs)
without mentioning the axis to which the concatenation is supposed to happen. Using the code as it is, I get the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-8d7152b44077> in <module>()
----> 1 run(0)
5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/merge.py in build(self, input_shape)
491 # Used purely for shape validation.
492 if not isinstance(input_shape[0], tuple) or len(input_shape) < 2:
--> 493 raise ValueError('A `Concatenate` layer should be called '
494 'on a list of at least 2 inputs')
495 if all(shape is None for shape in input_shape):
ValueError: A `Concatenate` layer should be called on a list of at least 2 inputs
I get the same error even when I add the argument axis = 1.
Here is my sample code:
import os
import gc
import joblib
import pandas as pd
import numpy as np
from sklearn import metrics, preprocessing
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import Model
from tensorflow.keras.models import load_model
from tensorflow.keras import callbacks
from tensorflow.keras import backend as K
from tensorflow.keras import utils
def create_model(data, catcols):
#init lit of inputs for embedding
inputs = []
#init list of outputs for embeddings
outputs = []
#loop over all categorical colums
for c in catcols:
#find the number of unique values in the column
num_unique_values = int(data[c].nunique())
#simple dimension of embedding calculator
#min size is half of the number of unique values
#max size is 50. max size depends on the number of unique
#categories too.
embed_dim = int(min(np.ceil((num_unique_values)/2), 50))
#simple keras input layer with size 1
inp = layers.Input(shape=(1, ))
#add embedding layer to raw input
#embedding size is always 1 more than uique values in input
out = layers.Embedding(num_unique_values + 1, embed_dim, name = c)(inp)
#1-d spatial dropout is the standard for embedding layers
#you can use it in NLP tasks too
out = layers.SpatialDropout1D(0.3)(out)
#reshape the input to the dimension of embedding
#this becomes our output layer for current feature
out = layers.Reshape(target_shape=(embed_dim, ))(out)
#add input to input list
inputs.append(inp)
#add out to output list
outputs.append(out)
#concatenate all output layers
x = layers.Concatenate(axis=1)(outputs)
#add a batchnorm layer
x = layers.BatchNormalization()(x)
#add layers with dropout
x= layers.Dense(300, acivation="relu")(x)
x= layers.Dropout(0.3)(x)
x= layers.BatchNormalization()(x)
x = layers.Dense(300, activation="relu")(x)
x = layers.Dropout(0.3)(x)
x = layers.BatchNormalization()(x)
y = layers.Dense(2, activation="softmax")(x)
#create final model
model = Model(inputs=inputs, outputs=y)
model.compile(loss='binary_crossentropy', optimizer='adam')
return model_selection
from sklearn import metrics, preprocessing
def run(fold):
df = pd.read_csv('/content/drive/My Drive/train_folds.csv')
features = [f for f in df.columns if f not in ("id", "target", "kfold")]
for col in features:
df.loc[:, col] = df[col].astype(str).fillna("NONE") #fill the nan values with nan. The function "fillna" was created to fill only one column;
# at a time it is not tested against a list of columns.
for feat in features:
lbl_enc = preprocessing.LabelEncoder() #create the label encoder
df.loc[:, feat] = lbl_enc.fit_transform(df[feat].values) #label encode all columns
#split the data
df_train = df[df.kfold != fold].reset_index(drop=True) #get the training data. Remember that we created a column name kfold. if kfold = fold,
#then we train all the other folds but the argument fold.
df_valid = df[df.kfold == fold].reset_index(drop=True) #get the validation data. The argument fold is reserved for validation
#create the model
model = create_model(df, features)
#our features are lists of lists
xtrain = [df_train[features].values[:, k] for k in range(len(features))]
xvalid = [df_valid[features].values[:, k] for k in range(len(features))]
#retrieve target values
ytain = df_train.target.values
yvalid = df_valid.target.values
#convert target columsn to categories
#binarization
ytrain_cat = utils.to_categorical(ytrain)
yvalid_cat = utils.to_categorical(yvalid)
#fit the model
model.fit(xtrain, ytrain_cat, validation_data=(xvalid, yvalid_cat), verbose=1, batch_size=1024, epochs=3)
#generation validation predictions
valid_preds = model.predict(xvalid)[:, 1]
#metrics
print(metrics.roc_auc_score(yvalid, valid_preds))
#save memory space by clearing session
K.clear_session()
I ran the code:
run(0)

Averaging over the batch dimension in Keras

I've got a problem where I want to predict one time series with many time series. My input is (batch_size, time_steps, features) and my output should be (1, time_steps, features)
I can't figure out how to average over N.
Here's a dummy example. First, dummy data where the output is a linear function of 200 time series:
import numpy as np
time = 100
N = 2000
dat = np.zeros((N, time))
for i in range(time):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
y = dat.T # np.random.normal(size = N)
Now I'll define a time series model (using 1-D conv nets):
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K
n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Out[43]: (2000, 100, 1)
The output is the wrong shape! Next, I tried using an averaging layer, but I get this weird error:
def av_over_batches(x):
x = K.mean(x, axis = 0)
return(x)
x = Lambda(av_over_batches)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape
Traceback (most recent call last):
File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
model.predict(dat.reshape(N, time, 1)).shape
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (100,1) into shape (32,1)
Where does 32 come from? (Incidentally, I got the same number in my real data, not just in the MWE).
But the main question is: how can I build a network that averages over the input batch dimension?
I would approach the problem in a different way
Problem: You want to predict a time series from a set of time series. so lets say you have 3 time series value TS1, TS2, TS3 each of 100 time steps you want to predict a time series y1, y2, y3.
My approach for this problem will be as below
i.e group the times series each time step together and feed it to an LSTM. If some time steps are shorter then others them you can pad them. Similarly if some sets have fewer time series then again pad them.
Example:
import numpy as np
np.random.seed(33)
time = 100
N = 5000
k = 5
magic = np.random.normal(size = k)
x = list()
y = list()
for i in range(N):
dat = np.zeros((k, time))
for i in range(k):
dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
x.append(dat)
y.append(dat.T # magic)
So I want to predict a timeseries of 100 steps from a set of 3 times steps. We want to the model to learn the magic.
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt
input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')
data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))
for i in range(N):
data_x[i] = x[i].T.reshape(100,5)
data_y[i] = y[i].reshape(100,1)
from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
ss_y = StandardScaler()
data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)
# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)
The val loss was going down still but I stoped it. Lets see how good our prediction is
y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')
The results are promising. Running it for more epochs and also hyper parameter tuning should further bring us close the the magic. One can also try stacked LSTM and bi-directional LSTM.
I feel RNNs are better suited for time series data rather then CNN's
Data Format:
Lets say time steps = 3
Time series 1 = [1,2,3]
Time series 2 = [4,5,6]
Time series 3 = [7,8,9]
Time series 3 = [10,11,12]
Y = [100,200,300]
For a batch size of 1
[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]

Get Cell, Input Gate, Output Gate and Forget Gate activation values for LSTM network using Keras

I want to get the activation values for a given input of a trained LSTM network, specifically the values for the cell, the input gate, the output gate and the forget gate. According to this Keras issue and this Stackoverflow question I'm able to get some activation values with the following code:
(basically I'm trying to classify 1-dimensional timeseries using one label per timeseries, but that doesn't really matter for this general question)
import random
from pprint import pprint
import keras.backend as K
import numpy as np
from keras.layers import Dense
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from keras.utils import to_categorical
def getOutputLayer(layerNumber, model, X):
return K.function([model.layers[0].input],
[model.layers[layerNumber].output])([X])
model = Sequential()
model.add(LSTM(10, batch_input_shape=(1, 1, 1), stateful=True))
model.add(Dense(2, activation='softmax'))
model.compile(
loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# generate some test data
for i in range(10):
# generate a random timeseries of 100 numbers
X = np.random.rand(10)
X = X.reshape(10, 1, 1)
# generate a random label for the whole timeseries between 0 and 1
y = to_categorical([random.randint(0, 1)] * 10, num_classes=2)
# train the lstm for this one timeseries
model.fit(X, y, epochs=1, batch_size=1, verbose=0)
model.reset_states()
# to keep the output simple use only 5 steps for the input of the timeseries
X_test = np.random.rand(5)
X_test = X_test.reshape(5, 1, 1)
# get the activations for the output lstm layer
pprint(getOutputLayer(0, model, X_test))
Using that I get the following activation values for the LSTM layer:
[array([[-0.04106992, -0.00327154, -0.01524276, 0.0055838 , 0.00969929,
-0.01438944, 0.00211149, -0.04286387, -0.01102304, 0.0113989 ],
[-0.05771339, -0.00425535, -0.02032563, 0.00751972, 0.01377549,
-0.02027745, 0.00268653, -0.06011265, -0.01602218, 0.01571197],
[-0.03069103, -0.00267129, -0.01183739, 0.00434298, 0.00710012,
-0.01082268, 0.00175544, -0.0318702 , -0.00820942, 0.00871707],
[-0.02062054, -0.00209525, -0.00834482, 0.00310852, 0.0045242 ,
-0.00741894, 0.00141046, -0.02104726, -0.0056723 , 0.00611038],
[-0.05246543, -0.0039417 , -0.01877101, 0.00691551, 0.01250046,
-0.01839472, 0.00250443, -0.05472757, -0.01437504, 0.01434854]],
dtype=float32)]
So I get for each input value 10 values, because I specified in the Keras model to use a LSTM with 10 neurons. But which one is a cell, which is is the input gate, which one the output gate, which one the forget gate?
Well, these are the output values, to get and look into the value of each gate look into this issue
I paste the essential part here
for i in range(epochs):
print('Epoch', i, '/', epochs)
model.fit(cos,
expected_output,
batch_size=batch_size,
verbose=1,
nb_epoch=1,
shuffle=False)
for layer in model.layers:
if 'LSTM' in str(layer):
print('states[0] = {}'.format(K.get_value(layer.states[0])))
print('states[1] = {}'.format(K.get_value(layer.states[1])))
print('Input')
print('b_i = {}'.format(K.get_value(layer.b_i)))
print('W_i = {}'.format(K.get_value(layer.W_i)))
print('U_i = {}'.format(K.get_value(layer.U_i)))
print('Forget')
print('b_f = {}'.format(K.get_value(layer.b_f)))
print('W_f = {}'.format(K.get_value(layer.W_f)))
print('U_f = {}'.format(K.get_value(layer.U_f)))
print('Cell')
print('b_c = {}'.format(K.get_value(layer.b_c)))
print('W_c = {}'.format(K.get_value(layer.W_c)))
print('U_c = {}'.format(K.get_value(layer.U_c)))
print('Output')
print('b_o = {}'.format(K.get_value(layer.b_o)))
print('W_o = {}'.format(K.get_value(layer.W_o)))
print('U_o = {}'.format(K.get_value(layer.U_o)))
# output of the first batch value of the batch after the first fit().
first_batch_element = np.expand_dims(cos[0], axis=1) # (1, 1) to (1, 1, 1)
print('output = {}'.format(get_LSTM_output([first_batch_element])[0].flatten()))
model.reset_states()
print('Predicting')
predicted_output = model.predict(cos, batch_size=batch_size)
print('Ploting Results')
plt.subplot(2, 1, 1)
plt.plot(expected_output)
plt.title('Expected')
plt.subplot(2, 1, 2)
plt.plot(predicted_output)
plt.title('Predicted')
plt.show()

Sequence-to-sequence classification with variable sequence lengths in Keras

I would like to train an LSTM or GRU network in TensorFlow/Keras to continuously recognize whether a user is walking or not based on input from motion sensors (accelerometer and gyroscope). I have 50 input sequences with lengths varying from 581 to 5629 time steps and 6 features and 50 corresponding output sequences of boolean values. My problem is that I don't know how to feed the training data to the fit() method.
I know approximately what I need to do: I'd like to train with 5 batches of 10 sequences each, and for each batch I have to pad all but the longest sequence so all 10 sequences have the same lengths and apply masking. I just don't know how to build the data structures. I know that I can make one big 3D tensor of size (50,5629,6) and that works, but it's painfully slow, so I'd really like to make the sequence length of each batch as small as possible.
Here's the problem in code:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (n,6) of float64, where n is the timesteps
# and 6 is the number of features, sorted by increasing sequence lengths.
# y_list is a list of arrays (n,1) of Boolean.
x_train = # WHAT DO I WRITE HERE?
y_train = # AND HERE?
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=10, epochs=100)
You can do some thing like this
use generator function take a look at this link fit_generator look for fit_generator method.
def data_generater(batch_size):
print("reading data")
training_file = 'data_location', 'r')
# assuming data is in json format so feels free to change accordingly
training_set = json.loads(training_file.read())
training_file.close()
batch_i = 0 # Counter inside the current batch vector
batch_x = [] # The current batch's x data
batch_y = [] # The current batch's y data
while True:
for obj in training_set:
batch_x.append(your input sequences one by one)
if obj['val'] == True:
batch_y.append([1])
elif obj['val'] == False:
batch_y.append([0])
batch_i += 1
if batch_i == batch_size:
# Ready to yield the batch
# pad input to max length in the batch
batch_x = pad_txt_data(batch_x)
yield batch_x, np.array(batch_y)
batch_x = []
batch_y = []
batch_i = 0
def pad_txt_data(arr):
# expecting arr to be in the shape of (10, m, 6)
paded_arr = []
prefered_len = len(max(arr, key=len))
# Now pad all your sequences to preferred length in the batch(arr)
return np.array(paded_arr)
and in the model
model = keras.Sequential()
model.add(keras.layers.Masking(mask_value=0., input_shape=(None,6)))
model.add(keras.layers.LSTM(32))
model.add(keras.layers.Dense(1, activation="softmax"))
model.compile(optimizer="Adam", loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit_generator(data_generater(10), steps_per_epoch=5, epochs=10)
Batch_size, steps_per_epoch, epoch can be different.
Generally
steps_per_epoch = (number of sequences/batch_size)
Note: Form reading your description your task appears to be Binary classification problem not like an Sequence to sequence problem. A good example for sequence to sequence is a language translation. Just google around you will find what i mean.
And if you really want to see the difference in training times I suggest using a GPU if available and CuDNNLSTM.
In case it helps someone, here's how I ended up implementing a solution:
import tensorflow as tf
import numpy as np
# Load data from file
x_list, y_list = loadSequences("train.csv")
# x_list is now a list of arrays (m,n) of float64, where m is the timesteps
# and n is the number of features.
# y_list is a list of arrays (m,1) of Boolean.
assert len(x_list) == len(y_list)
num_sequences = len(x_list)
num_features = len(x_list[0][0])
batch_size = 10
batches_per_epoch = 5
assert batch_size * batches_per_epoch == num_sequences
def train_generator():
# Sort by length so the number of timesteps in each batch is minimized
x_list.sort(key=len)
y_list.sort(key=len)
# Generate batches
while True:
for b in range(batches_per_epoch):
longest_index = (b + 1) * batch_size - 1
timesteps = len(x_list[longest_index])
x_train = np.zeros((batch_size, timesteps, num_features))
y_train = np.zeros((batch_size, timesteps, 1))
for i in range(batch_size):
li = b * batch_size + i
x_train[i, 0:len(x_list[li]), :] = x_list[li]
y_train[i, 0:len(y_list[li]), 0] = y_list[li]
yield x_train, y_train
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(mask_value=0., input_shape=(None,num_features)),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=batches_per_epoch, epochs=100)

seq2seq prediction for time series

I want to make a Seq2Seq model for reconstruction purpose. I want a model trained to reconstruct the normal time-series and it is assumed that such a model would do badly to reconstruct the anomalous time-series having not seen them during training.
I have some gaps in my code and also in the understanding. I took this as an orientation and did so far:
traindata: input_data.shape(1000,60,1) and target_data.shape(1000,50,1) with target data being the same training data only in reversed order as sugested in the paper here.
for inference: I want to predict another time series data with the trained model having the shape (3000,60,1). T Now 2 points are open: how do I specify the input data for my training model and how do I build the inference part with the stop condition ?
Please correct any mistakes.
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
num_encoder_tokens = 1#number of features
num_decoder_tokens = 1#number of features
encoder_seq_length = None
decoder_seq_length = None
batch_size = 50
epochs = 40
# same data for training
input_seqs=()#shape (1000,60,1) with sliding windows
target_seqs=()#shape(1000,60,1) with sliding windows but reversed
x= #what has x to be ?
#data for inference
# how do I specify the input data for my other time series ?
# Define training model
encoder_inputs = Input(shape=(encoder_seq_length,
num_encoder_tokens))
encoder = LSTM(128, return_state=True, return_sequences=True)
encoder_outputs = encoder(encoder_inputs)
_, encoder_states = encoder_outputs[0], encoder_outputs[1:]
decoder_inputs = Input(shape=(decoder_seq_length,
num_decoder_tokens))
decoder = LSTM(128, return_sequences=True)
decoder_outputs = decoder(decoder_inputs, initial_state=encoder_states)
decoder_outputs = TimeDistributed(
Dense(num_decoder_tokens, activation='tanh'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Training
model.compile(optimizer='adam', loss='mse')
model.fit([input_seqs,x], target_seqs,
batch_size=batch_size, epochs=epochs)
# Define sampling models for inference
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(100,))
decoder_state_input_c = Input(shape=(100,))
decoder_states = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder(decoder_inputs,
initial_state=decoder_states)
decoder_model = Model([decoder_inputs] + decoder_states,
decoder_outputs)
# Sampling loop for a batch of sequences
states_values = encoder_model.predict(input_seqs)
stop_condition = False
while not stop_condition:
output_tokens = decoder_model.predict([target_seqs] + states_values)
#what else do I need to include here ?
break
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
You can see that the output from every timestep is fed back to the LSTM cell externally.
You can refer the blog and find how it is done during inference.
https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/
During training, we give the data in a one shot manner. I think you understand that part.
But during the inference time, we can't do like that. We have to give the data at every time step and then return the cell states, hidden states and the loop should continue till the last word is generated

Categories