How exactly does Keras take dimension argumentsfor LSTM / time series problems?

How exactly does Keras take dimension argumentsfor LSTM / time series problems? - python

I can't seem to find a concrete answer to the question of how to feed data into Keras. Most examples seem to work off image / text data and have clearly defined data points.
I'm trying to feed music into an LSTM neural network. I want the network to take ~3 seconds of music and nominate the next 2 seconds. I have my music prepared into .wav files and partitioned into 5 second intervals that I've decomposed into my X (first 3 seconds) and Y (last two seconds). I've sampled my music at 44,100 hz so my X is 132,300 observations 'long' and my Y is '88,200' observations long.
But I can't figure out exactly how to bridge Keras to my data structure. I'm using a Tensorflow backend.
In the interest of generalizing the problem and answer, I'll use A,B,C to denote dimensions. The only difference between this example data and my real data is that these are random values distributed from 0 to 1, and my data is an array of integers.
import numpy as np
#using variables to make it easy to generalize the answer
#a = the number of observations I have
a = 411
#b = the duration of the sample, 44.1k observations per second of music
b_train = 132300
b_test = 88200
#c = the number of channels in the music, this is 2 channel stereo
c = 2
#now create sample data with the dimensionality given above:
X = np.random.rand(a,b_train,c)
y = np.random.rand(a,b_test ,c)
#split the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.20, random_state=42)
However, I don't really know how to configure a model to understand that the 'first' (A) dimension contains observations and that I want to more or less break out the music (B) by channel (C).
I know that it'd probably be easier to convert this to mono (and a 2d problem) but I'm very curious to see whether or not this has a 'simple' solution - whether that mostly takes the shape of what I have below or whether I should think of the model in another way.
The primary question is this: how would I construct a model that would allow me to transform my X data into my Y data?
Ideally, an answer would show how to modify the model below to fit the data structure above.
import keras
import math, time
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import load_model
def build_model(layers):
d = 0.3
model = Sequential()
model.add(LSTM(256, input_shape=(layers), return_sequences=True))
model.add(Dropout(d))
model.add(LSTM(256, input_shape=(layers), return_sequences=False))
model.add(Dropout(d))
model.add(Dense(32,kernel_initializer="uniform",activation='relu'))
model.add(Dense(1,kernel_initializer="uniform",activation='linear'))
start = time.time()
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
print("Compilation Time : ", time.time() - start)
return model
#build model...
model = build_model([328,132300,2])
model.fit(X_train,y_train,batch_size=512,epochs=30,validation_split=0.1,verbose=1)
However, this yields an error (at the model = ... step):
ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=4
I can't figure out where Keras gets the expectation to see ndim=4 data. Also, I don't know to how to ensure that I feed data into the model such that the model 'understands' observations are distributed across the A-axis and the data itself is distributed on the B- and C-axis.
If anything is unclear, please leave a comment. I'll watch this diligently until Sept '17 or so and I'll be sure to update this question to reflect advice / comments left.
Thanks!

Keras convention is that the batch dimension is typically omitted in the input_shape arguments. From the guide:
Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
So changing model = build_model([132300,2]) should solve the problem.

Related

Binary classification of multivariate time series in the form of panel data using LSTM

Problem definition
Dear community, I need your help in implementing an LSTM neural network for a classification problem of panel data using Keras. The panel data I am manipulating consists of ids (let's call it id), a timestep for each id (t), n time varying covariates and a binary outcome y. Each id contains a number of timesteps and for each timestep I have my covariates and a unique outcome (0 or 1). I have reason to believe that each covariate for each id can have a certain degree of autocorrelation and henceforth can be considered a small timeseries of t steps. For simplicity, I consider that each id has a fixed number of t observations) with t not a big number (about 10 or so).
Data
Below is a toy example of what the data might look like in my case. In this example, the parameters are 2 individuals, 4 timesteps each, 4 covariates and each observation has a unique binary outcome. Covariates may be considered as (short) timeseries since they might be autocorrelated.
print(df)
[out]:
A B C D y
id t
id1 1 1.054127 0.346052 1.091299 -0.058137 0.0
2 0.621390 -0.204682 -1.056786 0.864572 0.0
3 1.275124 2.473959 0.264029 -1.047810 0.0
4 -0.328441 -0.135891 0.148498 0.470876 1.0
id2 1 0.362969 0.777082 0.197423 -2.225296 0.0
2 0.227134 0.086731 0.550267 -0.361482 0.0
3 0.223526 0.556242 -0.160042 0.675871 1.0
4 0.070125 0.156659 -2.922709 -1.143887 1.0
I have reason to assume that, for id1, the target at timestep 4 is conditional on the three previous timesteps for that same individual (id1). In addition, The target variable y may contain more than one value of 1 for each individual (as outlined in the case of id2 above). I do not have reason to believe that the data from an individual would affect the result of another (as with many behavior analysis scenarios since every individual is unique).
Prediction problem
What I would like to do is to predict a single outcome for a new individual for whom I have those 4 rows of observation. In other words, based on the historical data of an individual, I would like to know if said individual is likely to have an outcome 1 or 0. If I understand correctly, this can be achieved using an LSTM (alternatively, an RNN) with some data manipulation.
Things I have tried so far
To start simple, I have tried aggregating every set of id rows into a single row with a single outcome and applied a typical statistical learning approach such as boosted trees and got a model as good as random.
I looked into shaping it as a survival analysis problem, in vain. I would not be interested in any estimation of a survival function unlike tutorials on how to handle panel data in the medical field (nor would I have access to such data).
I have tried reshaping my data such that the input is a 3D array in the form of [observations, timesteps, features] where observations are unique ids for an LSTM like so in python :
# separate into features and target
df_feat = df.drop("y", axis = 1)
df_target = df[["y"]]
# get reshaped values for 3D tensor
n_samples = len(df_feat.index.get_level_values('id').unique().tolist())
n_timesteps = 4
n_features = df_feat.shape[1]
# reshape input array to be 3D
X_3D = df_feat.to_numpy().reshape(n_samples, n_timesteps, n_features)
print(X_3D.shape)
[out]:
(2, 4, 4)
However, at this point I get confused as to what my learning instances for the LSTM are and what the outcome y should be shaped like. I have tried having a shape like one outcome per training instance by taking only the last observation for each id (so y=[1,1] and y.shape = (2,) in the toy example above) which technically makes an LSTM script run... but does not capture prior information. Below is the code for such LSTM:
def train_lstm(X_train, y_train, X_valid, y_valid, save_name='best_lstm.h5'):
# starts a sequential model
model = Sequential()
# add first lstm hidden layer with 64 units and default keras params
model.add(LSTM(64, input_shape = (X_train.shape[1], X_train.shape[2]), return_sequences=True))
# add a second hidden lstm layer with 128 units and default keras params
model.add(LSTM(128, return_sequences = True))
# add one last hidden layer
model.add(LSTM(64))
# add one dense layer with 2 units and a sigmoid activation function
model.add(Dense(2, activation = 'sigmoid'))
# define adam optimiser with learning rate
opt = tf.keras.optimizers.Adam(learning_rate = 0.01)
# compile model with binary cross entropy as loss function and accuracy as metrics
model.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])
# define early stopping and best model checkpoint parameters
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 0, patience = 20)
mc = ModelCheckpoint(save_name, monitor = 'val_accuracy', mode = 'max', verbose = 0, save_best_only = True)
# train the model using fit method (target vector is one-hot encoded as required by keras)
history = model.fit(X_train, tf.one_hot(y_train, depth = 2),
validation_data = (X_valid, tf.one_hot(y_valid, depth = 2)),
epochs = 100, callbacks = [es, mc])
return history
It runs and it makes predictions the way I want them to (for one id of previous history, we can predict one outcome) but results in poor performance since it fails to capture outcomes prior to the last.
I have carefully read and followed this nicely written medium article by Alexander Laskorunsky which remotely resembles what I am trying to do, and slides the window of K-length frames to capture the prior outcomes (and not just the last as I have done which makes more sense). However, in Alexander's case, he does not consider panel data but rather a multivariate timeseries classification that uses n_timesteps to predict the target using all predictors and all rows even if it overlaps (so not using panel data).
Questions
Am I right to believe that I need a many to one LSTM architecture?
How may I divide and reshape training and testing samples such that a new, previously unseen individual which would not be related in any way to other ids can be classified?
Should each id be considered as one sample / training instance? Should each id be split into training and testing sets and concatenate all training and testing sets to feed to an LSTM architecture?
Would you be so kind as to provide code snippets on how to correctly split and reshape my data as well as a simple LSTM architecture using keras (or maybe modify my own function above in case I coded it wrong)? No need for basic preprocessing and encoding variables.
Any help or advice / tutorials / articles regarding what architecture is most suitable for that kind of problem is greatly appreciated and thank you in advance for your help!

How do I specify what column/feature I want to predict in a RNN?

I'm trying to use a time-series data set with 30 different features and I want to predict the future values for 3 of those features. Is there any way I can specify what features I want to be used for output and how many outputs using TensorFlow and Sckit-learn? Or is that just done when I am creating the x_train, y_train, etc. sets? I want to predict the heat index, temperature, and humidity based on various meteorological factors (air pressure, HDD, CDD, pollution, etc.) The 3 factors I wish to predict are part of the 30 total features.
I am using TensorFlows RNN tutorial: https://www.tensorflow.org/tutorials/structured_data/time_series
univariate_past_history = 30
univariate_future_target = 0
x_train_uni, y_train_uni = univariate_data(uni_data, 0, 1930,
univariate_past_history,
univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, 1930, None,
univariate_past_history,
univariate_future_target)
My data is given daily so I wanted to predict the next day using the last 30 days for example here.
and this is my implementation of the training of the model:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate =
train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
simple_lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]),
tf.keras.layers.Dense(1)
])
simple_lstm_model.compile(optimizer='adam', loss='mae')
for x, y in val_univariate.take(1):
print(simple_lstm_model.predict(x).shape)
EVALUATION_INTERVAL = 200
EPOCHS = 30
simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
steps_per_epoch=EVALUATION_INTERVAL,
validation_data=val_univariate, validation_steps=50)
EDIT: I understand that to increase the number of outputs I have to increase the Dense(1) value, want to understand how to specify which features to output/predict

You need to give the model.fit call the variables you want to learn from in a shape compatible with an LSTM layer
So for example, without any code a model like yours might take as input:
[batchsize, n_timestamps, n_features]
and output:
[batchsize, n_timestamps, m_features]
where n is input and m output.
So then you need to give the model the truth data of the same shape as the model output in order for the model to calculate a loss.
So the model.fit call should be:
model.fit(x_train, y_train, ....) where y_train are the truth vectors of the same shape as the model output.
You have to design a model architecture that fits your needs and matches the outputs you expect. I made a toy example, but I have never really worked with this type of NN so no idea if it makes sense for the problem.
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, InputLayer, Reshape
ni_feats = 10
no_feats = 3
ndays = 30
model = tf.keras.Sequential([
InputLayer((ndays, ni_feats)),
LSTM(10),
Dense(int(no_feats * ndays)),
Reshape((ndays, no_feats))
])

How to reshape input for ConvLSTM2D to not overfit?

I have a time series problem with 15 minutes as a timestep.The complete data will be from 2016-09-01 00:00:15 to 2016-12-31 23:45:00.
I have 5 variables(v1,v2,v3,v4,v5,v6) in the data frame and I want to predict the sixth variable (v6) for the next timestep.
I prepare the data set and prepare the information as 5-time lags. like if the time is t in the row I create the values for (t-1) to (t-5) as lags for v1 to v6.
So in total, I have 30 features (5 lags for 6 variables).
I also normalize the values by PowerTransformer.
scaler_x = PowerTransformer()
scaler_y = PowerTransformer()
train_X = scaler_x.fit_transform(train_X)
train_y = scaler_y.fit_transform(train_y.reshape(-1,1))
My data input shape of traix_X and train_y is like below at initial:
(11253, 30) , (11253, 1)
11253 rows having 30 variables as input and a single variable as target variable .Then i reshape this to fit my ConvLSTM2D like below:
# define the number of subsequences and the length of subsequences
n_steps, n_length = 5, 6 #I take into account of past 5 steps for the 6 variables
n_features=1
#reshape for ConvLSTM
# reshape into subsequences [samples, time steps, rows, cols, channels]
train_X = train_X.reshape(train_X.shape[0], n_steps, 1, n_length, n_features)
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
The ConvLSTM2D architecture looks like below :
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, verbose=0)
But this model gives a very bad result (It is overfitting a lot). I suspect that my inputs are not given correctly to the ConvLSTM2D.
Is my reshaping correct? Any help is appreciated.
EDIT:
I have realized my input is being given correctly to the Network but the issue is it is overfitting a lot.
My hyperparameters are below :
#hyper-parameter
epochs=100
batch_size=64
adam_opt = keras.optimizers.Adam(lr=0.001)
I even tried 50 and 10 epochs its same issue.

In my personal experience there are a few things I've picked up about using ConvLSTM2D.
I would first check to see if the model is training at all. Based on your answer I am unsure how loss is changing as your model trains - if at all. If there is some variation, you need to perform a grid search (playing around with amount of layers and filters)
I also found my models needed to train for a long time to perform well, see the Keras example on ConvLSTM2d where 300 epochs are needed to train a model to perform an arguably simple task : https://keras.io/examples/conv_lstm/. A case I worked on needed a similar amount of epochs to train.
Check different loss functions and optimizers (even though I think mse and adam are good for this type of problem)
Normalize your data differently, you may want to normalize your data statistically as
shown in this keras example : https://www.tensorflow.org/tutorials/keras/regression
From personal experience, you might want more layers for this specific problem. See keras ConvLSTM2d example above for this
* I see how you want to format your data, and though it may work, a more straightforward solution may work better. You might want to try giving (v1,v2,v3,v4,v5) and predicting for v6. You may have the use large batch sizes for this. *

Proper and definitive explanation about how to build a CNN 1D in Keras

Ciao,
this is the second part of a problem I'm facing with CNN 1d. The first part is this
How does it works the input_shape variable in Conv1d in Keras?
I'using this code:
from keras.models import Sequential
from keras.layers import Dense, Conv1D
import numpy as np
N_FEATURES=5
N_TIMESTEPS=10
X = np.random.rand(100, N_FEATURES)
Y = np.random.randint(0,2, size=100)
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=N_TIMESTEPS, activation='relu', input_shape=(N_TIMESTEPS, N_FEATURES)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Now, what I want to do?
I want to train a CNN 1d over a timeseries with 5 features. Actually I want to work with time windows og length N_TIMESTEPS rather than timeserie it self. This means that I want to use a sort of "magnifier" of dimension N_TIMESTEPS x N_FEATURES on the time series to work locally. That's why I've decided to use CNN
Here come the first question. It is not clear at all if I have to transform the time series into a tensor or it is something that Keras will do for me since I've specified the kernel_size variable.
In case I must provide a tensor I would do something like this:
X_tensor = []
for i in range(len(X)-N_TIMESTEPS):
X_tensor+=[X_tensor[range(i, N_TIMESTEPS+i), :]]
X_tensor = np.asarray(X_tensor)
In this case of course I should also provide a Y_tensor vector computed from Y according to some criteria. Suppose I have already this Y_tensor boolean vector of the same length of X_tensor, which is len(X)-N_TIMESTEPS-1.
Y_tensor = np.random.randint(0,2,len(X)-N_TIMESTEPS-1)
Now if I try to feed the model I get of the most common error for CNN 1d which is:
ValueError: Error when checking input: expected conv1d_4_input to have 3 dimensions, but got array with shape (100, 5)
By looking to a dozen of posts about it I cannot understand what I did wrong. This is what I've tried:
model.fit(X,Y)
model.fit(np.expand_dims(X, axis=0),Y)
model.fit(np.expand_dims(X, axis=2),Y)
model.fit(X_tensor,Y_tensor)
For all of these cases I get always the same error (with different dimensional values in the final tuple).
Questions:
What Keras expects from my data? Can I feed the model with the whole time series or I have to slice it into a tensor?
How I have to feed the model in term of data structure?I.e. I have to specify in some strange way the dimension of the data?
Can you help me? I find out that this is one the most confusing point of CNN implementation in Keras that there are different posts with different solutions that do not fit with structure of my data (even if they have a very common structure according to me).
Note: There are some post suggesting to pass in the input_shape variable the length of the data. This is meaningless to me since I should not provide the dimension of the data (which is a variable) to the model. The only thing I should give to it, according to the theory, is the filter dimension and number of features (namely the dimension of the matrix that will roll over the time series).
Thanks,
am

Simply, Conv1D requires 3 dimensions:
Number of series (1)
Number of steps (100 - your entire data)
Number of features (5)
So, model.fit(np.expand_dims(X, axis=0),Y) is correct for X.
Now, if X is (1, 100, 5), naturally your input_shape=(100,5).
If your Y has 100 steps, then you need to make sure your Conv1D will output 100 steps. You need padding='same', otherwise it will become 91. (I suggest you actually work with 91, since you want a result for each 10 steps and probably don't want border effects spoiling your results)
Y must also follow the same rules for shape:
Number of series (1)
Number of steps (100 if padding='same'; 91 if padding='valid')
Number of features (1 = Dense output)
So, Y = Y.reshape((1,-1,1)).
Since you have only one class (true/false), it's pointless to use 'categorical_crossentropy'. You should go with 'binary_crossentropy'.
In general, your overall idea of using this convolution with kernel_size=10 to simulate sliding windows of 10 steps will work as expected (whether it will be efficient or not is another question, answered only by trying).
If you want better networks for sequences, you should probably try LSTM layers. The dimensions work exactly the same way. You will need return_sequences=False.
The main difference is that you will need to separate the data as you did in that loop. Then:
X.shape == (91, 10, 5)
Y.shape == (91, 1)

I think you don't have a clear idea on how 1d convolutional neural networks work:
if you want to predict the y values from the timeseries x and you just have 1 timeseries, your approach won't work. the network needs lots of samples to train, and having just 1 will allow it to easily memorize the input and not learn. For example, if the timeseries is the humidity of a given day, and y is the chance of rain at a specific timestep, what you have now is the data for just one day (timesteps being for example hours of the day). In order for the network to learn you need to gather data for many days, ending up with datasets of shape x=(n_days, timesteps, features) y=(n_days, timesteps, 1).
If you describe your actual problem there's better chance to get more helpful answers
[Edit] by sticking to your code, and using just one timeseries, you are better off with other methods that don't involve deep learning. You could split your timeseries at regular interval, obtaining n samples that would allow your network to train, but unless you have a very long timeseries that may not be a valid alternative.

Tensorflow LSTM: Predict next action based on a series of previous ones

My input data consists of 10 samples, each of which has 200 time steps, while each time step is described by a vector of 30 dimensions.
In addition, each time step consists of a 3 dimensional vector (one hot encoding) which describes the action which has been taken at that particular time step. With that being said, I am trying to build a model which get fed in all previous actions and then predicts which action would be the best to take next.
I tried to get this working with tflearn and tensorflow but with limited success so far.
Simple sample code:
import numpy as np
import operator
import tflearn
from tflearn import regression
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
from tflearn.data_utils import to_categorical, pad_sequences
SAMPLES = 10
TIME_STEPS = 200
DATA_DIMENSIONS = 30
LABEL_CLASSES = 3
x = []
y = []
# Generate fake data.
for i in range(SAMPLES):
sequences = []
outputs = []
for i in range(TIME_STEPS):
d = []
for i in range(DATA_DIMENSIONS):
d.append(1)
sequences.append(d)
outputs.append([0,0,1])
x.append(sequences)
y.append(outputs)
print("X1:", len(x), ", X2:", len(x[0]), ", X3:", len(x[0][0]))
print("Y1:", len(y), ", Y2:", len(y[0]), ", Y3:", len(y[0][0]))
# Define model
net = tflearn.input_data([None, TIME_STEPS, DATA_DIMENSIONS], name='input')
net = tflearn.lstm(net, 128, dropout=0.8, return_seq=True)
net = tflearn.fully_connected(net, LABEL_CLASSES, activation='softmax')
net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(net)
# Fit model.
model.fit({'input': x}, {'targets': y},
n_epoch=1,
snapshot_step=1000,
show_metric=True, run_id='test', batch_size=32)
Error
ValueError: Cannot feed value of shape (10, 200, 3) for Tensor
'targets/Y:0', which has shape '(?, 3)'
As far as I understand, the input_data should be correct. However, the output data is apparently wrong, at least, Tensorflow throws an error. That is probably because my model expects one label per sample rather than one label per time step.
Can I even achieve my goal with an LSTM, and if so, how do I have to set up my model?
Thanks,
Robert

As the error suggests, there is a shape mismatch between the expected size of your targets tensor, and the one of the data you actually provide for it. Let us break it down.
From what I understand, you have labeled action for every timestep of your sequences. This means that the labels that you provide should have a shape (10, 200, 3). This seems to be the case from the error message. Good.
So we now know the error comes from what the network generates.
=================
Input data -> (10, 200, 30)
LSTM -> (10, 128) (because return_seq=False)
FullyConnected -> (10, 3).
=================
So that explains the second part of the error message, your network indeed produces an output with shape (10, 3) which mismatches the one of your data.
I think you missed the return_seq argument of the LSTM. As is usually the case with RNN implementations, you have a parameter telling if you want the layer to return outputs for the whole sequence, or only for the last timestep. Here by default it is the second option, that is why you don't get an output with the expected shape. Use return_seq=True.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How exactly does Keras take dimension argumentsfor LSTM / time series problems? - python

Related

Binary classification of multivariate time series in the form of panel data using LSTM

How do I specify what column/feature I want to predict in a RNN?

How to reshape input for ConvLSTM2D to not overfit?

Proper and definitive explanation about how to build a CNN 1D in Keras

Tensorflow LSTM: Predict next action based on a series of previous ones

Categories

Resources