Keras LSTM Autoencoder time-series reconstruction

Keras LSTM Autoencoder time-series reconstruction - python

I am trying to reconstruct time series data with LSTM Autoencoder (Keras).
Now I want train autoencoder on small amount of samples (5 samples, every sample is 500 time-steps long and have 1 dimension). I want to make sure that model can reconstruct that 5 samples and after that I will use all data (6000 samples).
window_size = 500
features = 1
data = data.reshape(5, window_size, features)
model = Sequential()
model.add(LSTM(256, input_shape=(window_size, features),
return_sequences=True))
model.add(LSTM(128, input_shape=(window_size, features),
return_sequences=False))
model.add(RepeatVector(window_size))
model.add(LSTM(128, input_shape=(window_size, features),
return_sequences=True))
model.add(LSTM(256, input_shape=(window_size, features),
return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)
Model
Training:
Epoch 1/100
5/5 [==============================] - 2s 384ms/step - loss: 0.1603
...
Epoch 100/100
5/5 [==============================] - 2s 388ms/step - loss: 0.0018
After training, I tried reconstruct one of 5 samples:
yhat = model.predict(np.expand_dims(data[1,:,:], axis=0), verbose=0)
Reconstitution: Blue
Input: Orange
Why is reconstruction so bad when loss is small? How can I make model better? Thanks.

Update:
The answer below is based on the old version and based on the current LSTM doc, the input should be shaped as [batch, timesteps, feature]!
See this: https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8ef/keras/layers/rnn/lstm.py#L481
Old Answer:
It seems to me, a time series should be given to the LSTMs in this format:
(samples, features , window_size)
So, if you change the format, for example I exchanged the variables, and look at the results:
[![enter image description here][1]][1]
Code for reproducing the result(I didn't change the name of the variables, so please don't be confused :)):
import numpy as np
import keras
from keras import Sequential
from keras.layers import Dense, RepeatVector, TimeDistributed
from keras.layers import LSTM
N = 10000
data = np.random.uniform(-0.1, 0.1, size=(N, 500))
data = data.cumsum(axis=1)
print(data.shape)
window_size = 1
features = 500
data = data.reshape(N, window_size, features)
model = Sequential()
model.add(LSTM(32, input_shape=
(window_size,features),
return_sequences=True))
model.add(LSTM(16, input_shape=(window_size,
features),
return_sequences=False))
model.add(RepeatVector(window_size))
model.add(LSTM(16, input_shape=(window_size,
features),
return_sequences=True))
model.add(LSTM(32, input_shape=(window_size,
features),
return_sequences=True))
model.add(TimeDistributed(Dense(500)))
model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)
yhat = model.predict(np.expand_dims(data[1,:,:], axis=0), verbose=0)
plot(np.arange(500), yhat[0,0,:])
plot(np.arange(500), data[1,0,:])
Credit to sobe86: I used the proposed data by him/her.
[1]: https://i.stack.imgur.com/5JUDN.png

I tried running your code on the following data
data = np.random.uniform(-0.1, 0.1, size=(5, 500))
data = data.cumsum(axis=1)
so the data is just the cumalative sum of some random uniform noise. I ran for 1000 epochs, and my results are not as bad as yours, the LSTM seems to make some effort to follow the line, though it seems to just be hovering around the running mean (as one might expect).
Note that this is running the model on the TRAINING data (which you seem to imply you were doing in your question) - if we try to look at performance on data that the model was not trained on, we can get bad results.
This is not surprising in the least, with such a small training set, we should fully expect the model to overfit, and not generalise to new data.

One thing I understood from my experience trying to fit auto encoders, is that they are not easy to fit. But I would check these elements:
LSTM doesn't do good with non-stationary data. Instead of learning the variability in the data it would try to learn the trend. So de-trending would be a good step to add to your data before hand. Now, to do that, one easy way is to calculate the difference of data with its previous timestamp. Then at each timestep you would have x[i]-x[i-1] instead of x[i]. You can experiment with different orders of de-trending based on your data and its trend/seasonality. For example, if you expect the data has weekly seasonality, another order to check would be 7 days (if each timestep is a day) and your data would be x[i]-x[i-7].
Experiment with the architecture of the auto-encoder. depending on the sequence length 32 hidden units might not be enough to encode the data properly and keep enough information.
Use Bidirectional layers. Sometimes I use Conv1D as well.
Don't need to be Symmetrical. So be creative.

Related

How do I select train data for LSTM network training

I'm basically new to RNNs, but I'm trying to predict signals based on recordings.
I have two sets of data A and B - A is the raw data recording, and B is the binary labeled data marking '1' for every active event on A, both with shape (1895700,1)
Could you help me figure out what should be used as x and y train?
I been reading about this and understood to loop through A and extract x and y from here. did thi and got input shape of x_train - (189555, 150, 1) y_train - (189555, 150, 1) but getting accuracy of: 0.0000e+00 and negative loss.
My other approach was using A as x_train and B as y_train with input shapes of (12638,150,1) but from first step of epoch 1, had accuracy of: 96 and around .10 loss. they didnt vary much throughout training
So I'm not really sure what data should be my input
model:
model = Sequential()
model.add(LSTM(128, dropout=0.5, input_shape=(ts,features), recurrent_dropout=0.4, return_sequences=True))
model.add(LSTM(128, dropout=0.5, input_shape=(ts,features), recurrent_dropout=0.3, return_sequences=True))
model.add(LSTM(64, dropout=0.5, input_shape=(ts,features), recurrent_dropout=0.3, return_sequences=True))
model.add(Dense(features, input_shape=(ts, features), activation="sigmoid"))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
Thanks in advance!

Your X_train is the data that represent your features. While Y_train is the data that represents the output for the X_train features.
you can split your data by simply providing a parameter validation_split to the fit function:
model.fit(X_data, Y_data, batch_size=4, epochs=5, verbose=1, validation_split=0.2)
in this case it will split 20% of the data for validation.

Avoid overfitting in sequence to sequence problem using keras

I'm having a problem with a model I want to train.
It's a typical seq-to-seq problem with an attention layer, where the input is a string, and the output is a substring from the submitted string.
e.g.
Input Ground Truth
-----------------------------
helloimchuck chuck
johnismyname john
(This is just a dummy data, not a real part of the dataset ^^)
And the model looks like this:
model = Sequential()
model.add(Bidirectional(GRU(hidden_size, return_sequences=True), merge_mode='concat',
input_shape=(None, input_size))) # Encoder
model.add(Attention())
model.add(RepeatVector(max_out_seq_len))
model.add(GRU(hidden_size * 2, return_sequences=True)) # Decoder
model.add(TimeDistributed(Dense(units=output_size, activation="softmax")))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
The problem is this here:
As you can see, there is overfitting.
I'm using early stop criteria on the validation loss with patience=8.
self.Early_stop_criteria = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
patience=8, verbose=0,
mode='auto')
And I'm using one-hot-vector to fit the model.
BATCH_SIZE = 64
HIDDEN_DIM = 128
The thing is, I've tried with other batch sizes, other hidden dimensions, a dataset of 10K rows, 15K rows, 25K rows and now 50K rows. However, there is always overfitting, and I don't know why.
The test_size = 0.2 and the validation_split=0.2. Those are the only parameters I haven't changed.
I'm also made me sure that the dataset properly build.
The only idea that I have is trying with another validation split, maybe 0.33 instead of 0.2.
I don't know if cross-validation would help.
Maybe anyone has a better idea, what I could try. Thanks in advance.

As kvish proposed, dropout was a good solution.
I first tried with a dropout of 0.2.
model = Sequential()
model.add(Bidirectional(GRU(hidden_size, return_sequences=True, dropout=0.2), merge_mode='concat',
input_shape=(None, input_size))) # Encoder
model.add(Attention())
model.add(RepeatVector(max_out_seq_len))
model.add(GRU(hidden_size * 2, return_sequences=True)) # Decoder
model.add(TimeDistributed(Dense(units=output_size, activation="softmax")))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
And with 50K rows, it worked, but still had overfitting.
So, I tried with a dropout of 0.33, and it worked perfectly.

Keras-- low accuracy with LSTM layer but the accuracy is good without LSTM

I am training a model in Keras with IMDB dataset. For this model with LSTM layer, the accuracy is about 50%:
model = Sequential()
model.add(Embedding(max_features, 32))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
Accuracy:
loss: 0.6933 - acc: 0.5007 - val_loss: 0.6932 - val_acc: 0.4947
I have also tried with a single LSTM layer but it also gives similar accuracy.
However, if I don't use LSTM layer the accuracy reaches to around 82%
model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l1(0.001), activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
Accuracy:
loss: 0.6738 - acc: 0.8214 - val_loss: 0.6250 - val_acc: 0.8320
This is how I compile and fit the model:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.fit(partial_x_train, partial_y_train, epochs=Numepochs, batch_size=Batchsize, validation_data=(x_val, y_val))
How can this be explained? I thought LSTM works great for sequential text data?

Don't forget that LSTM is used for processing sequences such as timeseries or text data. In a sequence the order of elements is very important and if you reorder the element then the whole meaning of that sequence might completely change.
Now the problem in your case is that the preprocessing step you have used is not the proper one for a LSTM model. You are encoding each sentence as a vector where each of its elements represents the presence or absence of particular word. Therefore, you are completely ignoring the order of appearance of words in a sentence, which LSTM layer is good at modeling it. There is also another issue in your LSTM model, considering the preprocessing scheme you have used, which is the fact that Embedding layer accepts word indices as input and not a vector of zero and ones (i.e. the output of the preprocessing stage).
Since the IMDB data is already stored as sequences of word indices, to resolve this issue you just need to preprocess the IMDB data by only padding/truncating the sequences with a specified length to be able to utilize batch processing. For example:
from keras.preprocessing.sequences import pad_sequences
vocab_size = 10000 # only consider the 10000 most frequent words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
x_train = pad_sequences(x_train, maxlen=500) # truncate or pad sequences to make them all have a length of 500
Now, x_train would have a shape of (25000, 500) and it consists of 25000 sequences of length 500, encoded as integer word indices. Now you can use it for training by passing it to fit method. I guess you can reach at least 80% training accuracy with an Embedding layer and a single LSTM layer. Don't forget that to use a validation scheme to monitor overfitting (one simple option is to set validation_split argument when calling fit method).

How to improve prediction with keras and tensorflow

I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers

Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.

Training Keras autoencoder without bottleneck does not return original data

I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur

I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.