Multi-step LSTM Time series prediction - python

I am trying to build an LSTM model for Multistep prediction. My data is a time series of parking occupancy rate sampled each five minutes (I have 25 weeks of samples). I started creating the code like below :
import numpy as np
training_data_len = int(np.ceil( len(data) * .90 ))
train_data = data.iloc[0:int(training_data_len), :]
print(len(train_data))
# Create the testing data set
test_data = data.iloc[training_data_len: , :] # - timestep
print(len(test_data))
data_train = np.array(train_data)
def split_sequence(sequence, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out
# check if we are beyond the sequence
if out_end_ix > len(sequence):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
X_train, y_train = [], []
X_train, y_train = split_sequence(data_train,6,6)
reg = Sequential()
reg.add(LSTM(units = 200,return_sequences=True, input_shape=(1,1)))#, return_sequences=True , activation = 'relu'
reg.add(Dropout(0.2))
reg.add(LSTM(units = 200,return_sequences=True)) #, activation = 'relu'
reg.add(Dropout(0.2))
reg.add(LSTM(units = 200,return_sequences=True)) #, activation = 'relu'
reg.add(Dropout(0.2))
reg.add(Dense(6,))
#here we have considered loss as mean square error and optimizer as adam
reg.compile(loss='mse', optimizer='adam')
#training the model
#,validation_split=0.1,
# shuffle=False
reg.fit(X_train, y_train, epochs = 10,verbose=1)
data_test = np.array(test_data)
#here we are splitting the data weekly wise(7days)
X_test, y_test = split_sequence(data_test,6,6)
y_pred = reg.predict(X_test)
My goal is to predict using 30 minutes in the past(6 samples =30 minutes) next 30 minutes(6 samples =30 minutes).
I'm new with these kind of models and I wanna know if i'm working good or there is something that i'm missing or some improves.
Thank you

Question: Is there an issue with my approach?
Usually, you may want to try out multiple models and multiple hyper-parameters. If it's a toy project, you should at least try out multiple models. Make sure you understand how each model works before setting parameters.
You may want to have more data in than out. Get 1h in and predict 10 min out.
You may want to do some data analysis before running any code, to get some insight about what might work. Make it visual, create graphics like PCA (may not work well with time series).
Talking about models: You can replace your LSTM with a Transformer. It can retain more information for longer. It's a new type of model that is better in every way to LSTMs.
If you have questions about data science or machine learning you should try the datascience.StackExchange instead of StackOverflow. Here we are supposed to help with quick, snappy responses about code. ;)

Related

Problems with inverse_transform scaled predictions and y_test in multi-step, multi-variate LSTM

I have built a multi-step, multi-variate LSTM model to predict the target variable 5 days into the future with 5 days of look-back. The model runs smooth (even though it has to be further improved), but I cannot correctly invert the transformation applied, once I get my predictions.
I have seen on the web that there are many ways to pre-process and transform data. I decided to follow these steps:
Data fetching and cleaning
df = yfinance.download(['^GSPC', '^GDAXI', 'CL=F', 'AAPL'], period='5y', interval='1d')['Adj Close'];
df.dropna(axis=0, inplace=True)
df.describe()
Data set table
Split the data set into train and test
size = int(len(df) * 0.80)
df_train = df.iloc[:size]
df_test = df.iloc[size:]
Scaled train and test sets separately with MinMaxScaler()
scaler = MinMaxScaler(feature_range=(0,1))
df_train_sc = scaler.fit_transform(df_train)
df_test_sc = scaler.transform(df_test)
Creation of 3D X and y time-series compatible with the LSTM model
I borrowed the following function from this article
def create_X_Y(ts: np.array, lag=1, n_ahead=1, target_index=0) -> tuple:
"""
A method to create X and Y matrix from a time series array for the training of
deep learning models
"""
# Extracting the number of features that are passed from the array
n_features = ts.shape[1]
# Creating placeholder lists
X, Y = [], []
if len(ts) - lag <= 0:
X.append(ts)
else:
for i in range(len(ts) - lag - n_ahead):
Y.append(ts[(i + lag):(i + lag + n_ahead), target_index])
X.append(ts[i:(i + lag)])
X, Y = np.array(X), np.array(Y)
# Reshaping the X array to an RNN input shape
X = np.reshape(X, (X.shape[0], lag, n_features))
return X, Y
#In this example let's assume that the first column (AAPL) is the target variable.
trainX,trainY = create_X_Y(df_train_sc,lag=5, n_ahead=5, target_index=0)
testX,testY = create_X_Y(df_test_sc,lag=5, n_ahead=5, target_index=0)
Model creation
def build_model(optimizer):
grid_model = Sequential()
grid_model.add(LSTM(64,activation='tanh', return_sequences=True,input_shape=(trainX.shape[1],trainX.shape[2])))
grid_model.add(LSTM(64,activation='tanh', return_sequences=True))
grid_model.add(LSTM(64,activation='tanh'))
grid_model.add(Dropout(0.2))
grid_model.add(Dense(trainY.shape[1]))
grid_model.compile(loss = 'mse',optimizer = optimizer)
return grid_model
grid_model = KerasRegressor(build_fn=build_model,verbose=1,validation_data=(testX,testY))
parameters = {'batch_size' : [12,24],
'epochs' : [8,30],
'optimizer' : ['adam','Adadelta'] }
grid_search = GridSearchCV(estimator = grid_model,
param_grid = parameters,
cv = 3)
grid_search = grid_search.fit(trainX,trainY)
grid_search.best_params_
my_model = grid_search.best_estimator_.model
Get predictions
yhat = my_model.predict(testX)
Invert transformation of predictions and actual values
Here my problems begin, because I am not sure which way to go. I have read many tutorials, but it seems that those authors prefer to apply MinMaxScaler() on the entire dataset before splitting the data into train and test. I do not agree on this, because, otherwise, training data will be incorrectly scaled with information we should not use (i.e. the test set). So, I followed my approach, but I am stucked here.
I found this possible solution on another post, but it's not working for me:
# invert scaling for forecast
pred_scaler = MinMaxScaler(feature_range=(0, 1)).fit(df_test.values[:,0].reshape(-1, 1))
inv_yhat = pred_scaler.inverse_transform(yhat)
# invert scaling for actual
inv_y = pred_scaler.inverse_transform(testY)
In fact, when I double check the last values of the target from my original data set they don't match with the inverted scaled version of the testY.
Can someone please help me on this? Many thanks in advance for your support!
Two things could be mentioned here. First, you cannot inverse transform something you did not see. This happens because you use two different scalers. The NN will predict values in the range of Scaler 1, where it is not said that this lies within the range of Scaler 2 (scaled on test data). Second, the best practice is to fit your scaler on the training set and use the same scaler (only transform) on the test data as well. Now, you should be able to reverse transform your test results. Third if scaling wents off, because the test set has completely different values - e.g. happens with live streaming data, it is up to you to deal with it, e.g. the min-max scaler will produce values > 1.0.

Regression with LSTM - python and Keras

I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
def Network_ii(IN, OUT, TIME_PERIOD, EPOCHS, BATCH_SIZE, LTSM_SHAPE):
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.add(LSTM(units=128))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history = model.fit(train_x, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model
model.save(f"models/new_model")
I am attempting to roughly follow the steps outlined here- https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.
You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.

Keras LSTM appears to be fitting the end of time-series input instead of the prediction target

To preface this, I have plenty of experience with python and moderate experience building and using machine learning networks. That being said, this is the first LSTM I have made aside from some of the cookie-cutter examples available, so any help is appreciated. I feel like this is a problem with a simple solution and that I have just been looking at this code for far too long to see it.
This model is made in a python3.5 venv using Keras with a tensorflow backend.
In short, I am trying to make predictions of some temporal data using the data itself as well as a few mathematical permutations of this data, creating four input features. I am building a time-series input from the prior 60 data points and specifying the prediction target to be 60 data points in the future.
Shape of complete training data (input)(target): (2476224, 60, 4) (2476224)
Shape of single data "point" (input)(target): (1, 60, 4) (1)
What appears to be happening is that the trained model has fit the trailing value of my input time-series (the current value) instead of the target I have provided it (60 cycles in the future).
What is interesting is that the loss function seems to be calculating according to the correct prediction target, yet the model is not converging to the proper solution.
I have no idea why the model should be doing this. My first thought was that I was preprocessing my data incorrectly and feeding it the wrong target. I have tested my input formatting of the data extensively and am pretty confident that I am providing the model with he correct target and input information.
In one instance, I had increased the learning rate a tad such that the model converged to a local minima. This testing loss of this convergence was very similar to the loss of my preferred learning rate (still quite high). But the predictions were still of the "current value". Why is this so?
Here is how I created my model:
def create_model():
lstm_model = Sequential()
lstm_model.add(CuDNNLSTM(100, batch_input_shape=(batch_size, time_step, train_input.shape[2]),
stateful=True, return_sequences=True,
kernel_initializer='random_uniform'))
lstm_model.add(Dropout(0.4))
lstm_model.add(CuDNNLSTM(60))
lstm_model.add(Dropout(0.4))
lstm_model.add(Dense(20, activation='relu'))
lstm_model.add(Dense(1, activation='linear'))
optimizer = optimizers.Adagrad(lr=params["lr"])
lstm_model.compile(loss='mean_squared_error', optimizer=optimizer)
return lstm_model
This is how I am pre-processing the data. The first function, build_timeseries, constructs my input-output pairs. I believe this is working correctly (but please correct me if I am wrong). The second function trims the pairs to fit the batch size. I do the exact same for the test input/target.
train_input, train_target = build_timeseries(train_input, time_step, pred_horiz, 0)
train_input = trim_dataset(train_input, batch_size)
train_target = trim_dataset(train_target, batch_size)
def build_timeseries(mat, TIME_STEPS, PRED_HORIZON, y_col_index):
# y_col_index is the index of column that would act as output column
dim_0 = mat.shape[0] # num datasets
dim_1 = mat.shape[1] # num features
dim_2 = mat.shape[2] # num datapoints
# Reformatted matrix
mat = mat.swapaxes(1, 2)
x = np.zeros((dim_0*(dim_2-PRED_HORIZON), TIME_STEPS, dim_1))
y = np.zeros((dim_0*(dim_2-PRED_HORIZON),))
k = 0
for i in range(dim_0): # Iterate through datasets
for j in range(TIME_STEPS, dim_2-PRED_HORIZON):
x[k] = mat[i, j-TIME_STEPS:j]
y[k] = mat[i, j+PRED_HORIZON, y_col_index]
k += 1
print("length of time-series i/o", x.shape, y.shape)
return x, y
def trim_dataset(mat, batch_size):
no_of_rows_drop = mat.shape[0] % batch_size
if(no_of_rows_drop > 0):
return mat[no_of_rows_drop:]
else:
return mat
Lastly, this is how I call the actual model.
history = model.fit(train_input, train_target, epochs=params["epochs"], verbose=2, batch_size=batch_size,
shuffle=True, validation_data=(test_input, test_target), callbacks=[es, mcp])
As the model converges, I expect it to predict values close to the specified targets I had fed it. However instead, its predictions align much more closely with the trailing value of the time-series data (or the current value). Though, on the other hand, the model appears to be evaluating the loss according to the specified target.... Why is it working this way and how can I fix it? Any help is appreciated.

Character LSTM keeps generating same character sequence

I'm training a 2-layer character LSTM with keras to generate sequences of characters similar to the corpus I am training on. When I train the LSTM, however, the generated output by the trained LSTM is the same sequence over and over again.
I've seen suggestions for similar problems to increase the LSTM input sequence length, increase the batch size, add dropout layers, and increase the dropout amount. I've tried all these things and none of them seem to have fixed the issue. The one thing that has yielded some success is adding a random noise vector to each vector outputted by the LSTM during generation. This makes sense since the LSTM uses the previous step's output to generate the next output. However, generally if I add enough noise to break the LSTM out of its repetitive generation, the quality of the output degrades a great deal.
My LSTM training code is as follows:
# [load data from file]
raw_text = collected_statements.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text + '\b')))
char_to_int = dict((c, i) for i, c in enumerate(chars))
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
seq_in = raw_text[i:i + seq_length]
seq_out = raw_text[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]),
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1,
save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fix random seed for reproducibility
seed = 8
numpy.random.seed(seed)
# split into 80% for train and 20% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=seed)
# train the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=18,
batch_size=256, callbacks=callbacks_list)
My generation code is as follows:
filename = "weights-improvement-18-1.5283.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = unpadded_patterns[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = (x / float(n_vocab)) + (numpy.random.rand(1, len(pattern), 1) * 0.01)
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
#print(index)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
sys.stdout.write(result)
pattern.append(index)
pattern = pattern[1:len(pattern)]
print("\nDone.")
When I run the generation code, I get the same sequence over and over again:
we have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country." "we have the best economy in the history of our country." "we
have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country."
Is there anything else I could try that could help to generate something besides the same sequence over and over?
In your character generation I would suggest sampling from the probabilities your model outputs instead of taking the argmax directly. This is what the keras example char-rnn does to get diversity.
This is the code they use for sampling in their example:
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
In your code you've got index = numpy.argmax(prediction)
I'd suggest just replacing that with index = sample(prediction) and experiment with temperatures of your choice. Keep in mind that higher temperatures make your output more random and lower temperatures make it less random.
What the model generates as its output is the probability of the next character given the previous character. And in the text generation process you just take the character with maximum probability. Instead, it might help to inject some stochasticity (i.e. randomness) into this process by sampling the next character based on the probability distribution generated by the model. One easy way to do this is to use np.random.choice function:
# get the probability distribution generated by the model
prediction = model.predict(x, verbose=0)
# sample the next character based on the predicted probabilites
idx = np.random.choice(y.shape[1], 1, p=prediction[0])[0]
# the rest is the same...
This way the next selected character is not always the most probable characters. Rather, all the characters have a chance to be selected guided by the probability distribution generated by your model. This stochasticity not only breaks the repetitive loop, but also it may result in some interesting generated texts.
Additionally, you can further inject stochasticity by introducing softmax temperature in the sampling process, which you can see in the #Primusa's answer which is based on the Keras char-rnn example. Basically, its ideas is that it would re-weight the probability distribution so that you can control how much surprising (i.e. higher temperature/entropy) or predictable (i.e. lower temperature/entropy) the next selected character would be.

How does this case blend prediction above and why divide all train data into two part?

I am new in this field and trying to read some code to do my homework
https://github.com/auduno/Kaggle-Acquire-Valued-Shoppers-Challenge/blob/master/generate_submission.py
In this case, the programmer divide the whole train data into two part and only use one part to train and predict another part. I am confused about this, together with the test part.
test_data = pd.io.parsers.read_csv("./features/test/all_features.csv", sep=" ")
train_data = pd.io.parsers.read_csv("./features/train/all_features.csv", sep=" ")
split = ShuffleSplit(train_data.shape[0], n_iter = 1, test_size=0.10)
for tr, te in split:
train1, train2 = tr, te
Also, in the blend part, I can't understand the process it put all prediction data together. pred_train and pred_test are all data predicted but he use it to train again? Could you explain about that? Thanks a lot.
############### BLEND
dtrain2 = xgb.DMatrix( pred_train, label=train2_label.values)
dtest = xgb.DMatrix( pred_test )
print "training blend : xgb trees booster logistic regression, max depth 2"
param = {'bst:max_depth':2, 'bst:eta':0.1, 'silent':1, 'objective':'binary:logistic', 'nthread' : 8, 'eval_metric':'auc' }
num_round = 50
bst = xgb.train( param, dtrain2, num_round)
pred_label_test = bst.predict( dtest )
print "training blend : xgb linear booster logistic regression"
param = {'booster_type':1, 'bst:lambda':0, 'bst:alpha':0, 'bst:lambda_bias':0, 'silent':1, 'objective':'binary:logistic', 'nthread' : 8, 'eval_metric':'auc' }
num_round = 25
bst = xgb.train( param, dtrain2, num_round)
pred_label = bst.predict( dtest )
mean_pred = (pred_label + pred_label_test)/2.
predictions[r] = mean_pred
In machine learning a common problem is overfitting. If you check classification quality with the same set that you use for training, then algorithms have tendency to train for that particular set. They learn features that are specific only for that set. So common practice is to divide data to a set used for training and a set used for checking that classification is correct.
The BLEND part of your code uses two different machine learning algorithms and then blends their result by taking mean value. That way author tries to minimize regression error in a hope that different algorithms make different mistakes.

Categories