Poor LSTM Performance with Keras on Time Series Data

Poor LSTM Performance with Keras on Time Series Data - python

I'm trying to construct a network that will predict a Boolean target.
The data provided to the network contains both categorical and numerical entries but has all ready been properly processed. The data I am working with is Time Series data with 84 fields and 310033 rows of data. All the data has been scaled to remain between 0 and 1. Ever row represents a second in the data.
I created a database, data, with a shape (310033, 60, 500) and the target vector is of shape (1000, 1). The Time Step dimension was defined to be 60 because that is the maximum full 60 min hours possible with the amount of data available.
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give a matrix like this to the lstm model and expect a good prediction (if the relationships are there)? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer. Which is what I have done here.
Below is the Transformation code from 2D to 3D:
X_train, X_test, y_train, y_test = train_test_split(scaled, target, train_size=.7, shuffle = False)
# Generate Lag time Steps 3D framework for LSTM - Currently in 2D Framework
# As required for LSTM networks, we must reshape the input data into N_samples x TimeSteps x Variables
hours = len(X_train)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_train) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_train = np.array(temp) #Export Train Features in (70% x Hours x Variables)
hours = len(X_test)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_test) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_test = np.array(temp) #Export Test Features in (30% x Hours x Variables)
Below is the Framework of the Model:
model = Sequential()
#Layer 1 - returns a sequence of vectors
model.add(LSTM(128, return_sequences=True,
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 2
model.add(LSTM(256, return_sequences=False))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 3 - return a single vector
model.add(Dense(32))
#Output of 2 because we have 2 classes
model.add(Dense(2, activation= 'sigmoid'))
# Define optimiser
opt = tf.keras.optimizers.Adam(learning_rate=1e-5, decay=1e-6)
# Compile model
model.compile(loss='sparse_categorical_crossentropy', # Mean Square Error Loss = 'mse'; Mean Absolute Error = 'mae'; sparse_categorical_crossentropy
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=epoch, batch_size=batch, validation_data=(X_test, y_test), verbose=3, shuffle=False)
I have experimented with many different frameworks for the LSTM. Single layer, multilayer, a double LSTM layer with 2 truncating Dense layers (LSTM -> LSTM -> Dense(32) -> Dense(2)), Batch normalization, etc...
Is there a suggested framework for this type of time series data to improve performance? I was getting better results when the data only had a single TimeStep = 1.

Related

Surrogate model for [parameter vector] to [time series]

Say I have a function F that takes in a parameter vector P (say, a 5-element vector), and produces a (numerical) time series Y[t] of length T (eg T=100, so t=1,...,100). The function could be complicated (eg enzyme reaction models)
I want to make a neural network that predicts the output (Y[t]) that would result from feeding a new parameter set (P') into the function. How can this be done?
A simple feed-forward network can work, but it requires a very large number of output nodes, and doesn't take into account the temporal correlation / relationships between points. Is it possible/better to use a RNN or Transformer instead?

Using RNN might work for you. Here is some example code in Keras to get you started:
param_length = 5
time_length = 100
hidden_size = 20
model = tf.keras.Sequential([
# Encode input parameters.
tf.keras.layers.Dense(hidden_size, input_shape=[param_length]),
# Generate a sequence.
tf.keras.layers.RepeatVector(time_length),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss="mse", optimizer="nadam")
model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=10)
The first Dense layer converts input parameters to a hidden state. Then LSTM RNN units generate time sequences. You will need to experiment with hyperparameters like the number of dense and LTSM layers, the size of hidden layers etc.
One more thing you can try is to use different loss function like:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_mae", patience=50, restore_best_weights=True)
model.compile(loss=tf.keras.losses.Huber(), optimizer="nadam", metrics=["mae"])
history = model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=500,
callbacks=[early_stopping_cb])

Getting unreasonably good results when using a simple neural network for Price prediction

I am trying to predict GPU prices. For this I have prepared a custom dataset of shape (135,39). I am using a simple multi layer neural network.
I gave my network a feature vector of size 39,1 which includes crypto prices and GPU prices of today. my y vector consists only of GPU prices one month from now.
When I train the network on the shuffled data, the predictions I get turn out to be really good. When I start predicting prices 12 months from now, I still get great results. This does not make any sense at all.
here are some of the results:
Here's what the dataset looks like
Input matrix X
Output y
And now here's the code
y_data = df.iloc[:,23:39]
y_data_mean = y_data.mean()
y_data_std = y_data.std()
norm_df=(df-df.mean())/df.std()
future_months = 1
X = norm_df[:len(df)-(5*future_months)]
y = norm_df.iloc[(5*future_months):,23:39]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,shuffle=True)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(39,)))
model.add(Dense(25, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(tf.keras.layers.Dense(16))
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt,
loss=[tf.keras.losses.MeanSquaredError()])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size = 32, epochs = 2000)
Here's the train/test MSE loss.As per this graph I am not overfitting on the data. I should have used validation data but due to lack of dataset i only kept train and test datasets.

How to reshape input for ConvLSTM2D to not overfit?

I have a time series problem with 15 minutes as a timestep.The complete data will be from 2016-09-01 00:00:15 to 2016-12-31 23:45:00.
I have 5 variables(v1,v2,v3,v4,v5,v6) in the data frame and I want to predict the sixth variable (v6) for the next timestep.
I prepare the data set and prepare the information as 5-time lags. like if the time is t in the row I create the values for (t-1) to (t-5) as lags for v1 to v6.
So in total, I have 30 features (5 lags for 6 variables).
I also normalize the values by PowerTransformer.
scaler_x = PowerTransformer()
scaler_y = PowerTransformer()
train_X = scaler_x.fit_transform(train_X)
train_y = scaler_y.fit_transform(train_y.reshape(-1,1))
My data input shape of traix_X and train_y is like below at initial:
(11253, 30) , (11253, 1)
11253 rows having 30 variables as input and a single variable as target variable .Then i reshape this to fit my ConvLSTM2D like below:
# define the number of subsequences and the length of subsequences
n_steps, n_length = 5, 6 #I take into account of past 5 steps for the 6 variables
n_features=1
#reshape for ConvLSTM
# reshape into subsequences [samples, time steps, rows, cols, channels]
train_X = train_X.reshape(train_X.shape[0], n_steps, 1, n_length, n_features)
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
The ConvLSTM2D architecture looks like below :
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, verbose=0)
But this model gives a very bad result (It is overfitting a lot). I suspect that my inputs are not given correctly to the ConvLSTM2D.
Is my reshaping correct? Any help is appreciated.
EDIT:
I have realized my input is being given correctly to the Network but the issue is it is overfitting a lot.
My hyperparameters are below :
#hyper-parameter
epochs=100
batch_size=64
adam_opt = keras.optimizers.Adam(lr=0.001)
I even tried 50 and 10 epochs its same issue.

In my personal experience there are a few things I've picked up about using ConvLSTM2D.
I would first check to see if the model is training at all. Based on your answer I am unsure how loss is changing as your model trains - if at all. If there is some variation, you need to perform a grid search (playing around with amount of layers and filters)
I also found my models needed to train for a long time to perform well, see the Keras example on ConvLSTM2d where 300 epochs are needed to train a model to perform an arguably simple task : https://keras.io/examples/conv_lstm/. A case I worked on needed a similar amount of epochs to train.
Check different loss functions and optimizers (even though I think mse and adam are good for this type of problem)
Normalize your data differently, you may want to normalize your data statistically as
shown in this keras example : https://www.tensorflow.org/tutorials/keras/regression
From personal experience, you might want more layers for this specific problem. See keras ConvLSTM2d example above for this
* I see how you want to format your data, and though it may work, a more straightforward solution may work better. You might want to try giving (v1,v2,v3,v4,v5) and predicting for v6. You may have the use large batch sizes for this. *

Keras Deep Learning and Financial Returns

I am experiencing with Tensorflow via the Keras library and before diving into predicting uncertainty, I thought it might be a good idea to predict something certain. Therefore, I tried to predict weekly returns using daily price level data. My input shape looks like this: (1000, 5, 2), i.e. 1000 matrices of the form:
Stock A Stock B
110 100
95 101
90 100
89 99
100 110
For Stock A the price at day t=0is 100, 95 at t-1 and 100 at t-5. Thus, the weekly return for Stock A would be 110/100=10%and -10% for Stock B. Because I focus on only predicting Stock As return for now, my y for this input matrix would just be the scalar 0.01. Furthermore, I want to make it a classification problem and thus make a one-hot encoded vector via to_categorical with 1 if the y is above 5%, 2 if it is below -5% and 0 if it is in between. Hence my classification output for the aforementioned matrix would be:
0 1 0
To simplify: I want my model to learn to calculate returns, i.e. divide the first value in the input matrix by the last value of the input matrix for stock A and ignore the input for stock B. This would give the y. It is just a practice task for me before I get to more difficult tasks and the model should achieve a loss of zero because there is no uncertainty. What model do you propose to do that? I tried the following and it does not converge at all. Training and validation weights are calculated via compute_sample_weight('balanced', ).
Earlystop = EarlyStopping(monitor='val_loss', patience=150, mode='min', verbose=1, min_delta=0.0002, restore_best_weights=True)
checkpoint = ModelCheckpoint('nn', monitor='val_loss', verbose=1, save_best_only=True, mode='min', save_weights_only=False)
Plateau = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=30, verbose=1)
optimizer = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, amsgrad=True)
input_ = Input(batch_shape=(batch_size, 1, 5, 2))
model = LocallyConnected2D(16, kernel_size=(5, 1), padding='valid', data_format="channels_first")(input_)
model = LeakyReLU(alpha=0.01)(model)
model = Dense(128)(model)
model = LeakyReLU(alpha=0.01)(model)
model = Flatten()(model)
x1 = Dense(3, activation='softmax', name='0')(model)
final_model = Model(inputs=input_, outputs=[x1])
final_model.compile(loss='categorical_crossentropy' , optimizer=optimizer, metrics=['accuracy'])
history = final_model.fit(X_train, y_train, epochs=1000, batch_size=batch_size, verbose=2, shuffle=False, validation_data=[X_valid, y_valid, valid_weight], sample_weight=train_weight, callbacks=[Earlystop, checkpoint, Plateau])
I thought convolution might be good for this and because every return is calcualted individually I decided to go for a LocallyConnected layer. Do I need to add more layers for such a simple task?
EDIT: transformed my input matrix to returns and the model converges successfully. So the input must be correct but the model fails to find the division function. Are there any layers that would be suited to do that?

LSTM with keras

I have some training data x_train and some corresponding labels for this x_train called y_train. Here is how x_train and y_train are constructed:
train_x = np.array([np.random.rand(1, 1000)[0] for i in range(10000)])
train_y = (np.random.randint(1,150,10000))
train_x has 10000 rows and 1000 columns for each row.
train_y has a label between 1 and 150 for each sample in train_x and represents a code for each train_x sample.
I also have a sample called sample, which is 1 row with 1000 columns, which I want to use for prediction on this LSTM model. This variable is defined as
sample = np.random.rand(1,1000)[0]
I am trying to train and predict an LSTM on this data using Keras. I want to take in this feature vector and use this LSTM to predict one of the codes in range 1 to 150. I know these are random arrays, but I cannot post the data I have. I have tried the following approach which I believe should work, but am facing some issues
model = Sequential()
model.add(LSTM(output_dim = 32, input_length = 10000, input_dim = 1000,return_sequences=True))
model.add(Dense(150, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y,
batch_size=128, nb_epoch=1,
verbose = 1)
model.predict(sample)
Any help or adjustments to this pipeline would be great. I am not sure if the output_dim is correct. I want to pass train the LSTM on each sample of the 1000 dimension data and then reproduce a specific code that is in range 1 to 150. Thank you.

I see at least three things you need to change:
Change this line:
model.add(Dense(150, activation='relu'))
to:
model.add(Dense(150, activation='softmax'))
as leaving 'relu' as activation makes your output unbounded whereas it needs to have a probabilistic interpretation (as you use categorical_crossentropy).
Change loss or target:
As you are using categorical_crossentropy you need to change your target to be a one-hot encoded vector of length 150. Another way is to leave your target but to change loss to sparse_categorical_crossentropy.
Change your target range:
Keras has a 0-based array indexing (as in Python, C and C++ so your values should be in range [0, 150) instead [1, 150].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Poor LSTM Performance with Keras on Time Series Data - python

Related

Surrogate model for [parameter vector] to [time series]

Getting unreasonably good results when using a simple neural network for Price prediction

How to reshape input for ConvLSTM2D to not overfit?

Keras Deep Learning and Financial Returns

LSTM with keras

Categories

Resources