Regression with LSTM - python and Keras - python

I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
def Network_ii(IN, OUT, TIME_PERIOD, EPOCHS, BATCH_SIZE, LTSM_SHAPE):
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.add(LSTM(units=128))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history = model.fit(train_x, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model
model.save(f"models/new_model")
I am attempting to roughly follow the steps outlined here- https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.

You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.

Related

Getting unreasonably good results when using a simple neural network for Price prediction

I am trying to predict GPU prices. For this I have prepared a custom dataset of shape (135,39). I am using a simple multi layer neural network.
I gave my network a feature vector of size 39,1 which includes crypto prices and GPU prices of today. my y vector consists only of GPU prices one month from now.
When I train the network on the shuffled data, the predictions I get turn out to be really good. When I start predicting prices 12 months from now, I still get great results. This does not make any sense at all.
here are some of the results:
Here's what the dataset looks like
Input matrix X
Output y
And now here's the code
y_data = df.iloc[:,23:39]
y_data_mean = y_data.mean()
y_data_std = y_data.std()
norm_df=(df-df.mean())/df.std()
future_months = 1
X = norm_df[:len(df)-(5*future_months)]
y = norm_df.iloc[(5*future_months):,23:39]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,shuffle=True)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=(39,)))
model.add(Dense(25, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(tf.keras.layers.Dense(16))
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt,
loss=[tf.keras.losses.MeanSquaredError()])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size = 32, epochs = 2000)
Here's the train/test MSE loss.As per this graph I am not overfitting on the data. I should have used validation data but due to lack of dataset i only kept train and test datasets.

Bert prediction shape not equal to num_samples

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.
Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.

Keras LSTM appears to be fitting the end of time-series input instead of the prediction target

To preface this, I have plenty of experience with python and moderate experience building and using machine learning networks. That being said, this is the first LSTM I have made aside from some of the cookie-cutter examples available, so any help is appreciated. I feel like this is a problem with a simple solution and that I have just been looking at this code for far too long to see it.
This model is made in a python3.5 venv using Keras with a tensorflow backend.
In short, I am trying to make predictions of some temporal data using the data itself as well as a few mathematical permutations of this data, creating four input features. I am building a time-series input from the prior 60 data points and specifying the prediction target to be 60 data points in the future.
Shape of complete training data (input)(target): (2476224, 60, 4) (2476224)
Shape of single data "point" (input)(target): (1, 60, 4) (1)
What appears to be happening is that the trained model has fit the trailing value of my input time-series (the current value) instead of the target I have provided it (60 cycles in the future).
What is interesting is that the loss function seems to be calculating according to the correct prediction target, yet the model is not converging to the proper solution.
I have no idea why the model should be doing this. My first thought was that I was preprocessing my data incorrectly and feeding it the wrong target. I have tested my input formatting of the data extensively and am pretty confident that I am providing the model with he correct target and input information.
In one instance, I had increased the learning rate a tad such that the model converged to a local minima. This testing loss of this convergence was very similar to the loss of my preferred learning rate (still quite high). But the predictions were still of the "current value". Why is this so?
Here is how I created my model:
def create_model():
lstm_model = Sequential()
lstm_model.add(CuDNNLSTM(100, batch_input_shape=(batch_size, time_step, train_input.shape[2]),
stateful=True, return_sequences=True,
kernel_initializer='random_uniform'))
lstm_model.add(Dropout(0.4))
lstm_model.add(CuDNNLSTM(60))
lstm_model.add(Dropout(0.4))
lstm_model.add(Dense(20, activation='relu'))
lstm_model.add(Dense(1, activation='linear'))
optimizer = optimizers.Adagrad(lr=params["lr"])
lstm_model.compile(loss='mean_squared_error', optimizer=optimizer)
return lstm_model
This is how I am pre-processing the data. The first function, build_timeseries, constructs my input-output pairs. I believe this is working correctly (but please correct me if I am wrong). The second function trims the pairs to fit the batch size. I do the exact same for the test input/target.
train_input, train_target = build_timeseries(train_input, time_step, pred_horiz, 0)
train_input = trim_dataset(train_input, batch_size)
train_target = trim_dataset(train_target, batch_size)
def build_timeseries(mat, TIME_STEPS, PRED_HORIZON, y_col_index):
# y_col_index is the index of column that would act as output column
dim_0 = mat.shape[0] # num datasets
dim_1 = mat.shape[1] # num features
dim_2 = mat.shape[2] # num datapoints
# Reformatted matrix
mat = mat.swapaxes(1, 2)
x = np.zeros((dim_0*(dim_2-PRED_HORIZON), TIME_STEPS, dim_1))
y = np.zeros((dim_0*(dim_2-PRED_HORIZON),))
k = 0
for i in range(dim_0): # Iterate through datasets
for j in range(TIME_STEPS, dim_2-PRED_HORIZON):
x[k] = mat[i, j-TIME_STEPS:j]
y[k] = mat[i, j+PRED_HORIZON, y_col_index]
k += 1
print("length of time-series i/o", x.shape, y.shape)
return x, y
def trim_dataset(mat, batch_size):
no_of_rows_drop = mat.shape[0] % batch_size
if(no_of_rows_drop > 0):
return mat[no_of_rows_drop:]
else:
return mat
Lastly, this is how I call the actual model.
history = model.fit(train_input, train_target, epochs=params["epochs"], verbose=2, batch_size=batch_size,
shuffle=True, validation_data=(test_input, test_target), callbacks=[es, mcp])
As the model converges, I expect it to predict values close to the specified targets I had fed it. However instead, its predictions align much more closely with the trailing value of the time-series data (or the current value). Though, on the other hand, the model appears to be evaluating the loss according to the specified target.... Why is it working this way and how can I fix it? Any help is appreciated.

Keras network producing inverse predictions

I have a timeseries dataset and I am trying to train a network so that it overfits (obviously, that's just the first step, I will then battle the overfitting).
The network has two layers:
LSTM (32 neurons) and Dense (1 neuron, no activation)
Training/model has these parameters:
epochs: 20, steps_per_epoch: 100, loss: "mse", optimizer: "rmsprop".
TimeseriesGenerator produces the input series with: length: 1, sampling_rate: 1, batch_size: 1.
I would expect the network would just memorize such a small dataset (I have tried even much more complicated network to no avail) and the loss on training dataset would be pretty much zero. It is not and when I visualize the results on the training set like this:
y_pred = model.predict_generator(gen)
plot_points = 40
epochs = range(1, plot_points + 1)
pred_points = numpy.resize(y_pred[:plot_points], (plot_points,))
target_points = gen.targets[:plot_points]
plt.plot(epochs, pred_points, 'b', label='Predictions')
plt.plot(epochs, target_points, 'r', label='Targets')
plt.legend()
plt.show()
I get:
The predictions have somewhat smaller amplitude but are precisely inverse to the targets. Btw. this is not memorized, they are inversed even for the test dataset which the algorithm hasn't trained on at all.It appears that instead of memorizing the dataset, my network just learned to negate the input value and slightly scale it down. Any idea why this is happening? It doesn't seem like the solution the optimizer should have converged to (loss is pretty big).
EDIT (some relevant parts of my code):
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x,
y,
length=1,
sampling_rate=1,
batch_size=1,
shuffle=False
)
model = Sequential()
model.add(LSTM(32, input_shape=(1, 1), return_sequences=False))
model.add(Dense(1, input_shape=(1, 1)))
model.compile(
loss="mse",
optimizer="rmsprop",
metrics=[keras.metrics.mean_squared_error]
)
history = model.fit_generator(
train_gen,
epochs=20,
steps_per_epoch=100
)
EDIT (different, randomly generated dataset):
I had to increase number of LSTM neurons to 256, with the previous setting (32 neurons), the blue line was pretty much flat. However, with the increase the same pattern arises - inverse predictions with somewhat smaller amplitude.
EDIT (targets shifted by +1):
Shifting the targets by one compared to predictions doesn't produce much better fit. Notice the highlighted parts where the graph isn't just alternating, it's more apparent there.
EDIT (increased length to 2 ... TimeseriesGenerator(length=2, ...)):
With length=2 the predictions stop tracking the targets so closely but the overall pattern of inversion still stands.
You say that your network "just learned to negate the input value and slightly scale it down". I don't think so. It is very likely that all you are seeing is the network performing poorly, and just predicting the previous value (but scaled as you say). This issue is something I've seen again and again. Here is another example, and another, of this issue. Also, remember it is very easy to fool yourself by shifting the data by one. It is very likely you are simply shifting the poor prediction back in time and getting an overlap.
EDIT: After author's comments I do not believe this is the correct answer but I will keep it posted for posterity.
Great question and the answer is due to how the Time_generator works! Apparently instead of grabbing x,y pairs with the same index (e.g input x[0] to output target y[0]) it grabs target with offset 1 (so x[0] to y[1]).
Thus plotting y with offset 1 will produce the desired fit.
Code to simulate:
import keras
import matplotlib.pyplot as plt
x=np.random.uniform(0,10,size=41).reshape(-1,1)
x[::2]*=-1
y=x[1:]
x=x[:-1]
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x,
y,
length=1,
sampling_rate=1,
batch_size=1,
shuffle=False
)
model = keras.models.Sequential()
model.add(keras.layers.LSTM(100, input_shape=(1, 1), return_sequences=False))
model.add(keras.layers.Dense(1))
model.compile(
loss="mse",
optimizer="rmsprop",
metrics=[keras.metrics.mean_squared_error]
)
model.optimizer.lr/=.1
history = model.fit_generator(
train_gen,
epochs=20,
steps_per_epoch=100
)
Proper plotting:
y_pred = model.predict_generator(train_gen)
plot_points = 39
epochs = range(1, plot_points + 1)
pred_points = np.resize(y_pred[:plot_points], (plot_points,))
target_points = train_gen.targets[1:plot_points+1] #NOTICE DIFFERENT INDEXING HERE
plt.plot(epochs, pred_points, 'b', label='Predictions')
plt.plot(epochs, target_points, 'r', label='Targets')
plt.legend()
plt.show()
Output, Notice how the fit is no longer inverted and is mostly very accurate:
This is how it looks when the offset is incorrect:

LSTM with keras

I have some training data x_train and some corresponding labels for this x_train called y_train. Here is how x_train and y_train are constructed:
train_x = np.array([np.random.rand(1, 1000)[0] for i in range(10000)])
train_y = (np.random.randint(1,150,10000))
train_x has 10000 rows and 1000 columns for each row.
train_y has a label between 1 and 150 for each sample in train_x and represents a code for each train_x sample.
I also have a sample called sample, which is 1 row with 1000 columns, which I want to use for prediction on this LSTM model. This variable is defined as
sample = np.random.rand(1,1000)[0]
I am trying to train and predict an LSTM on this data using Keras. I want to take in this feature vector and use this LSTM to predict one of the codes in range 1 to 150. I know these are random arrays, but I cannot post the data I have. I have tried the following approach which I believe should work, but am facing some issues
model = Sequential()
model.add(LSTM(output_dim = 32, input_length = 10000, input_dim = 1000,return_sequences=True))
model.add(Dense(150, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y,
batch_size=128, nb_epoch=1,
verbose = 1)
model.predict(sample)
Any help or adjustments to this pipeline would be great. I am not sure if the output_dim is correct. I want to pass train the LSTM on each sample of the 1000 dimension data and then reproduce a specific code that is in range 1 to 150. Thank you.
I see at least three things you need to change:
Change this line:
model.add(Dense(150, activation='relu'))
to:
model.add(Dense(150, activation='softmax'))
as leaving 'relu' as activation makes your output unbounded whereas it needs to have a probabilistic interpretation (as you use categorical_crossentropy).
Change loss or target:
As you are using categorical_crossentropy you need to change your target to be a one-hot encoded vector of length 150. Another way is to leave your target but to change loss to sparse_categorical_crossentropy.
Change your target range:
Keras has a 0-based array indexing (as in Python, C and C++ so your values should be in range [0, 150) instead [1, 150].

Categories