How to reshape input for ConvLSTM2D to not overfit?

How to reshape input for ConvLSTM2D to not overfit? - python

I have a time series problem with 15 minutes as a timestep.The complete data will be from 2016-09-01 00:00:15 to 2016-12-31 23:45:00.
I have 5 variables(v1,v2,v3,v4,v5,v6) in the data frame and I want to predict the sixth variable (v6) for the next timestep.
I prepare the data set and prepare the information as 5-time lags. like if the time is t in the row I create the values for (t-1) to (t-5) as lags for v1 to v6.
So in total, I have 30 features (5 lags for 6 variables).
I also normalize the values by PowerTransformer.
scaler_x = PowerTransformer()
scaler_y = PowerTransformer()
train_X = scaler_x.fit_transform(train_X)
train_y = scaler_y.fit_transform(train_y.reshape(-1,1))
My data input shape of traix_X and train_y is like below at initial:
(11253, 30) , (11253, 1)
11253 rows having 30 variables as input and a single variable as target variable .Then i reshape this to fit my ConvLSTM2D like below:
# define the number of subsequences and the length of subsequences
n_steps, n_length = 5, 6 #I take into account of past 5 steps for the 6 variables
n_features=1
#reshape for ConvLSTM
# reshape into subsequences [samples, time steps, rows, cols, channels]
train_X = train_X.reshape(train_X.shape[0], n_steps, 1, n_length, n_features)
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
The ConvLSTM2D architecture looks like below :
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(LSTM(50, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, verbose=0)
But this model gives a very bad result (It is overfitting a lot). I suspect that my inputs are not given correctly to the ConvLSTM2D.
Is my reshaping correct? Any help is appreciated.
EDIT:
I have realized my input is being given correctly to the Network but the issue is it is overfitting a lot.
My hyperparameters are below :
#hyper-parameter
epochs=100
batch_size=64
adam_opt = keras.optimizers.Adam(lr=0.001)
I even tried 50 and 10 epochs its same issue.

In my personal experience there are a few things I've picked up about using ConvLSTM2D.
I would first check to see if the model is training at all. Based on your answer I am unsure how loss is changing as your model trains - if at all. If there is some variation, you need to perform a grid search (playing around with amount of layers and filters)
I also found my models needed to train for a long time to perform well, see the Keras example on ConvLSTM2d where 300 epochs are needed to train a model to perform an arguably simple task : https://keras.io/examples/conv_lstm/. A case I worked on needed a similar amount of epochs to train.
Check different loss functions and optimizers (even though I think mse and adam are good for this type of problem)
Normalize your data differently, you may want to normalize your data statistically as
shown in this keras example : https://www.tensorflow.org/tutorials/keras/regression
From personal experience, you might want more layers for this specific problem. See keras ConvLSTM2d example above for this
* I see how you want to format your data, and though it may work, a more straightforward solution may work better. You might want to try giving (v1,v2,v3,v4,v5) and predicting for v6. You may have the use large batch sizes for this. *

Related

How do I specify what column/feature I want to predict in a RNN?

I'm trying to use a time-series data set with 30 different features and I want to predict the future values for 3 of those features. Is there any way I can specify what features I want to be used for output and how many outputs using TensorFlow and Sckit-learn? Or is that just done when I am creating the x_train, y_train, etc. sets? I want to predict the heat index, temperature, and humidity based on various meteorological factors (air pressure, HDD, CDD, pollution, etc.) The 3 factors I wish to predict are part of the 30 total features.
I am using TensorFlows RNN tutorial: https://www.tensorflow.org/tutorials/structured_data/time_series
univariate_past_history = 30
univariate_future_target = 0
x_train_uni, y_train_uni = univariate_data(uni_data, 0, 1930,
univariate_past_history,
univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, 1930, None,
univariate_past_history,
univariate_future_target)
My data is given daily so I wanted to predict the next day using the last 30 days for example here.
and this is my implementation of the training of the model:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate =
train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
simple_lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(8, input_shape=x_train_uni.shape[-2:]),
tf.keras.layers.Dense(1)
])
simple_lstm_model.compile(optimizer='adam', loss='mae')
for x, y in val_univariate.take(1):
print(simple_lstm_model.predict(x).shape)
EVALUATION_INTERVAL = 200
EPOCHS = 30
simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
steps_per_epoch=EVALUATION_INTERVAL,
validation_data=val_univariate, validation_steps=50)
EDIT: I understand that to increase the number of outputs I have to increase the Dense(1) value, want to understand how to specify which features to output/predict

You need to give the model.fit call the variables you want to learn from in a shape compatible with an LSTM layer
So for example, without any code a model like yours might take as input:
[batchsize, n_timestamps, n_features]
and output:
[batchsize, n_timestamps, m_features]
where n is input and m output.
So then you need to give the model the truth data of the same shape as the model output in order for the model to calculate a loss.
So the model.fit call should be:
model.fit(x_train, y_train, ....) where y_train are the truth vectors of the same shape as the model output.
You have to design a model architecture that fits your needs and matches the outputs you expect. I made a toy example, but I have never really worked with this type of NN so no idea if it makes sense for the problem.
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, InputLayer, Reshape
ni_feats = 10
no_feats = 3
ndays = 30
model = tf.keras.Sequential([
InputLayer((ndays, ni_feats)),
LSTM(10),
Dense(int(no_feats * ndays)),
Reshape((ndays, no_feats))
])

Loss function for class imbalanced multi-class classifier in Keras

I am trying to apply deep learning to a multi-class classification problem with high class imbalance between target classes (10K, 500K, 90K, 30K). I want to write a custom loss function.
This is my current model:
model = Sequential()
model.add(LSTM(
units=10, # number of units returned by LSTM
return_sequences=True,
input_shape=(timestamps,nb_features),
dropout=0.2,
recurrent_dropout=0.2
)
)
model.add(TimeDistributed(Dense(1)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(units=nb_classes,
activation='softmax'))
model.compile(loss="categorical_crossentropy",
metrics = ['accuracy'],
optimizer='adadelta')
Unfortunately, all predictions belong to class 1!!! The model always predicts 1 for any input...
Appreciate any pointers on how I can solve this task.
Update:
Dimensions of input data:
94981 train sequences
29494 test sequences
X_train shape: (94981, 20, 18)
X_test shape: (29494, 20, 18)
y_train shape: (94981, 4)
y_test shape: (29494, 4)
Basically in the train data I have 94981 samples. Each sample contains a sequence of 20 timestamps. There are 18 features.
The imbalance between target classes (10K, 500K, 90K, 30K) is just an example. I have similar proportions in my real dataset.

First of all, you have ~100k samples. Start with something smaller, like 100 samples and multiple epochs and see whether your model overfits to this smaller training dataset (if it can't, you either have an error in your code or the model is not capable to model the dependencies [I would go with the second case]). Seriously, start with this one. And remember about representing all of your classes in this small dataset.
Secondly, hidden size of LSTM may be too small, you have 18 features for each sequence and sequences have length of 20, while your hidden is only 10. And you apply dropout to top it off and regularize the network even further.
Furthermore, you may want to add some dense outputs units instead of merely returning a linear layer of size 10 x 1 for each timestamp.
Last but not least, you may want to upsample the underrepresented data. 0 class would have to be repeated say 50 times (or maybe 25), class 2 something around 4 times and your one around 10-15 times, so the network is trained on them.
Oh, and use cross-validation for your hyperparameters like the hidden size, number of dense units etc.
Plus I don't know for how many epochs you've been training this network, what is your test dataset (it is entirely possible it only constitutes of the first class if you haven't done stratification).
I think this will get you started, hit me up with any doubts in the comments.
EDIT: When it comes to metrics, you may want to check something different than mere accuracy; maybe F1 score and your loss monitoring + accuracy to see how it performs. There are other available choices, for inspiration you can check sklearn's documentation as they provide quite a few options.

Keras LSTM how to use a model that is many-to-many when trained for prediction that is one-to-many with stateful=True?

I am learning how to use the LSTM model in Keras. I have looked at this answer and this answer, and would like to train a model in the many-to-many manner but at testing time make predictions using the one-to-many with stateful=True manner. I am unsure if I am on the right track.
I have a data set comprising of 10,000 individuals, each has a sequence of 20 timesteps and 10 features. I want to train an LSTM model to predict 5 of the features in the next timestep, using a 90-10 train and test split, my train_x is shaped (9,000, 20, 10) and my train_y is shaped (9,000, 20, 5) with the values in y being the values of the selected features in the next timestep. My test_x is shaped (1,000, 20, 10).
At test time, I would like to use the trained model to make predictions using only the 10 features at the very start of the sequence (timestep 0). First to predict the values of the selected 5 features in the next time step. The values of the other 5 features in the next timestep is known so I would like to combine them with the predicted 5 features and again use that as input to predict the 5 features in the next timestep and so on for 20 steps.
Is it possible to do this using the Keras library?
My code for training looks like
t_model = Sequential()
t_model.add(LSTM(100, return_sequence=True,
input_shape=(train_x.shape[1],
train_x.shape[2])))
t_model.add(TimeDistributed(Dense(5))
t_modle.compile(loss='mean_squared_error',
optimizer='adam')
checkpointer = ModelCheckpoint(filepath='weights.hdf5',
verbose=1,
save_best_only=True)
history = t_model.fit(train_x, train_y, epochs=50,
validation_split=0.1, callbacks=[checkpointer],
verbose=2, shuffle=False)
This seems to train ok. Please let me know if there is any misunderstanding in the way I am structuring my model.
My code for testing looks like
p_model = Sequential()
p_model.add(LSTM(100, stateful=True,
return_sequences=True,
batch_input_shape=(1, 1,
test_x.shape[2])))
p_model.add(TimeDistributed(Dense(5)))
p_model.load_weights('weights.hdf5')
complete_yhat = np.empty([0, 5])
for i in range(len(test_x):
ind = test_x[i]
x = ind[0]
x = x.reshape(1, 1, x.shape[0])
for j in range(20):
yhat = p_model.predict(x)
complete_yhat = np.append(complete_yhat, yhat[0], axis=0)
if j < 19:
x = ind[j+1]
x = np.append([x[:-5]], yhat[0], axis=1)
x = x.reshape(1, x.shape[0], x.shape[1])
p_model.reset_states()
This runs ok, but I am struggling to get good forecast accuracy. Can someone let me know whether I am using Keras LSTM correctly?
Thank you for your help

I am not sure if you can really train a model with many-to-many architecture and then test it one-to-many. You might be able to hack something and have a piece of code that runs, but from a technical point of view this does not make much sense. Can you explain why you want to do one-to-many in test time?
Generally, the rule of thumb for any supervised machine learning model development is that your training phase should "resemble" you testing phase. For example, if you want to test one-to-many architecture, then you should also train it as one-to-many.
Edit:
Reading the comments, it seems that you want to train with features from one time step and see how it will perform for future time steps. (I think this is in odds with the nature of a time-series data where every sample contributes to the future state, and if one sample can predict the future very well, then it means that the next samples are useless... but anyways). Here is how you can do this. There are other ways of course...
Split your data for training and testing similar to what you are doing for test time. so your input should by of shape (None, 10) and output of the shape (None, 20, 5). Then use Keras RepeatVector at your input (like this output = RepeatVector(20)(input) and then you should get something of the shape (None, 20, 10) which you can now pass through the rest of your model.

For Keras LSTM, what is the difference in passing in lag features vs timesteps of features?

I'm getting acquainted with LSTMs and I need clarity on something. I'm modeling a time series using t-300:t-1 to predict t:t+60. My first approach was to set up an LSTM like this:
# fake dataset to put words into code:
X = [[1,2...299,300],[2,3,...300,301],...]
y = [[301,302...359,360],[302,303...360,361],...]
# LSTM requires (num_samples, timesteps, num_features)
X = X.reshape(X.shape[0],1,X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons[0], batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1]))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1, batch_size=1, verbose=1, shuffle=False)
With my real dataset, the results have been suboptimal, and on CPU it was able to train 1 epoch of around 400,000 samples in 20 minutes. The network converged quickly after a single epoch, and for any set of points I fed it, the same results would come out.
My latest change has been to reshape X in the following way:
X = X.reshape(X.shape[0],X.shape[1],1)
Training seems to be going slower (I have not tried on the full dataset), but it is noticably slower. It takes about 5 minutes to train over a single epoch of 2,800 samples. I toyed around with a smaller subset of my real data and a smaller number of epochs and it seems to be promising. I am not getting the same output for different inputs.
Can anyone help me understand what is happening here?

In Keras, timesteps in (num_samples, timesteps, num_features) determine how many steps BPTT will propagate the error back.
This, in turn, takes more time to do hence the slow down that you are observing.
X.reshape(X.shape[0], X.shape[1], 1) is the right thing to do in your case, since what you have is a single feature, with 300 timesteps.

LSTM with keras

I have some training data x_train and some corresponding labels for this x_train called y_train. Here is how x_train and y_train are constructed:
train_x = np.array([np.random.rand(1, 1000)[0] for i in range(10000)])
train_y = (np.random.randint(1,150,10000))
train_x has 10000 rows and 1000 columns for each row.
train_y has a label between 1 and 150 for each sample in train_x and represents a code for each train_x sample.
I also have a sample called sample, which is 1 row with 1000 columns, which I want to use for prediction on this LSTM model. This variable is defined as
sample = np.random.rand(1,1000)[0]
I am trying to train and predict an LSTM on this data using Keras. I want to take in this feature vector and use this LSTM to predict one of the codes in range 1 to 150. I know these are random arrays, but I cannot post the data I have. I have tried the following approach which I believe should work, but am facing some issues
model = Sequential()
model.add(LSTM(output_dim = 32, input_length = 10000, input_dim = 1000,return_sequences=True))
model.add(Dense(150, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y,
batch_size=128, nb_epoch=1,
verbose = 1)
model.predict(sample)
Any help or adjustments to this pipeline would be great. I am not sure if the output_dim is correct. I want to pass train the LSTM on each sample of the 1000 dimension data and then reproduce a specific code that is in range 1 to 150. Thank you.

I see at least three things you need to change:
Change this line:
model.add(Dense(150, activation='relu'))
to:
model.add(Dense(150, activation='softmax'))
as leaving 'relu' as activation makes your output unbounded whereas it needs to have a probabilistic interpretation (as you use categorical_crossentropy).
Change loss or target:
As you are using categorical_crossentropy you need to change your target to be a one-hot encoded vector of length 150. Another way is to leave your target but to change loss to sparse_categorical_crossentropy.
Change your target range:
Keras has a 0-based array indexing (as in Python, C and C++ so your values should be in range [0, 150) instead [1, 150].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to reshape input for ConvLSTM2D to not overfit? - python

Related

How do I specify what column/feature I want to predict in a RNN?

Loss function for class imbalanced multi-class classifier in Keras

Keras LSTM how to use a model that is many-to-many when trained for prediction that is one-to-many with stateful=True?

For Keras LSTM, what is the difference in passing in lag features vs timesteps of features?

LSTM with keras

Categories

Resources