I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers
Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.
Related
I am trying to build a regression model but the mse and mae are very high. I filter and normalize the data (both the input and output, and also the test and train set). I think the problem comes because I have very high values in one column: the minimum is 1 and the maximum is 9100000 (without normalizing), but I actually need to predict these high values.
The model looks like this: I have 6 input columns and 800000 rows. And I have tried with more neurons and layers, or changing the sigmoid function, but the loss and the error keep being around 0.8 for mse and 0.3 for mae. The predictions are also way lower than they should be, never achieving the high values.
model = Sequential()
model.add(Dense(7, input_dim=num_input, activation='relu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='rmsprop', metrics=['mse', 'mae'])
history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_val, y_val))
A few remarks/advices:
RMSProp is generally not used with fully connected layers, I recommend switching to Adam or SGD.
If you have a skewed distribution with many large values, you might consider using the log of these values instead.
First try with a shallow model with few neurons. Then gradually increase the number of neurons in order to overfit the dataset. You should be able to reach perfect score on the train set. At that point you can start decrease the number of neurons and add layers with dropout to improve generalisation.
As already mentioned in the comments, the output activation for regression should be "linear". Sigmoid is for binary classification.
I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()
I am trying to reconstruct time series data with LSTM Autoencoder (Keras).
Now I want train autoencoder on small amount of samples (5 samples, every sample is 500 time-steps long and have 1 dimension). I want to make sure that model can reconstruct that 5 samples and after that I will use all data (6000 samples).
window_size = 500
features = 1
data = data.reshape(5, window_size, features)
model = Sequential()
model.add(LSTM(256, input_shape=(window_size, features),
return_sequences=True))
model.add(LSTM(128, input_shape=(window_size, features),
return_sequences=False))
model.add(RepeatVector(window_size))
model.add(LSTM(128, input_shape=(window_size, features),
return_sequences=True))
model.add(LSTM(256, input_shape=(window_size, features),
return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)
Model
Training:
Epoch 1/100
5/5 [==============================] - 2s 384ms/step - loss: 0.1603
...
Epoch 100/100
5/5 [==============================] - 2s 388ms/step - loss: 0.0018
After training, I tried reconstruct one of 5 samples:
yhat = model.predict(np.expand_dims(data[1,:,:], axis=0), verbose=0)
Reconstitution: Blue
Input: Orange
Why is reconstruction so bad when loss is small? How can I make model better? Thanks.
Update:
The answer below is based on the old version and based on the current LSTM doc, the input should be shaped as [batch, timesteps, feature]!
See this: https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8ef/keras/layers/rnn/lstm.py#L481
Old Answer:
It seems to me, a time series should be given to the LSTMs in this format:
(samples, features , window_size)
So, if you change the format, for example I exchanged the variables, and look at the results:
[![enter image description here][1]][1]
Code for reproducing the result(I didn't change the name of the variables, so please don't be confused :)):
import numpy as np
import keras
from keras import Sequential
from keras.layers import Dense, RepeatVector, TimeDistributed
from keras.layers import LSTM
N = 10000
data = np.random.uniform(-0.1, 0.1, size=(N, 500))
data = data.cumsum(axis=1)
print(data.shape)
window_size = 1
features = 500
data = data.reshape(N, window_size, features)
model = Sequential()
model.add(LSTM(32, input_shape=
(window_size,features),
return_sequences=True))
model.add(LSTM(16, input_shape=(window_size,
features),
return_sequences=False))
model.add(RepeatVector(window_size))
model.add(LSTM(16, input_shape=(window_size,
features),
return_sequences=True))
model.add(LSTM(32, input_shape=(window_size,
features),
return_sequences=True))
model.add(TimeDistributed(Dense(500)))
model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)
yhat = model.predict(np.expand_dims(data[1,:,:], axis=0), verbose=0)
plot(np.arange(500), yhat[0,0,:])
plot(np.arange(500), data[1,0,:])
Credit to sobe86: I used the proposed data by him/her.
[1]: https://i.stack.imgur.com/5JUDN.png
I tried running your code on the following data
data = np.random.uniform(-0.1, 0.1, size=(5, 500))
data = data.cumsum(axis=1)
so the data is just the cumalative sum of some random uniform noise. I ran for 1000 epochs, and my results are not as bad as yours, the LSTM seems to make some effort to follow the line, though it seems to just be hovering around the running mean (as one might expect).
Note that this is running the model on the TRAINING data (which you seem to imply you were doing in your question) - if we try to look at performance on data that the model was not trained on, we can get bad results.
This is not surprising in the least, with such a small training set, we should fully expect the model to overfit, and not generalise to new data.
One thing I understood from my experience trying to fit auto encoders, is that they are not easy to fit. But I would check these elements:
LSTM doesn't do good with non-stationary data. Instead of learning the variability in the data it would try to learn the trend. So de-trending would be a good step to add to your data before hand. Now, to do that, one easy way is to calculate the difference of data with its previous timestamp. Then at each timestep you would have x[i]-x[i-1] instead of x[i]. You can experiment with different orders of de-trending based on your data and its trend/seasonality. For example, if you expect the data has weekly seasonality, another order to check would be 7 days (if each timestep is a day) and your data would be x[i]-x[i-7].
Experiment with the architecture of the auto-encoder. depending on the sequence length 32 hidden units might not be enough to encode the data properly and keep enough information.
Use Bidirectional layers. Sometimes I use Conv1D as well.
Don't need to be Symmetrical. So be creative.
I have a number of usage patterns for different kinds of customers. Large scale factories and small scale customers or even single devices such as AC units.
I am trying to create a model with keras that can predict any of these inputs, given proper prior scaling of the input.
When I run a model on each customer individually, the predictions are sort of good.
Once I train that model on a number of inputs however, the results degrade greatly.
Is it possible to train a network that can handle a range of inputs and recognize them as different classes / types. How would such a model have to look like?
This is my current one:
model_unif = Sequential()
# input layer
input_shape = (cfg.DEMAND_ONE_WEEK,)
model_unif.add(Dense(168, input_shape=input_shape, activation='linear'))
model_unif.add(BatchNormalization()) #applying batch normalization now
model_unif.add(Dense(100, activation='linear'))
model_unif.add(Dense(100, activation='linear'))
model_unif.add(Dense(100, activation='linear'))
model_unif.add(Dense(50, activation='linear'))
model_unif.add(Dense(50, activation='linear'))
model_unif.add(Dense(24))
model_unif.add(Activation('linear'))
model_unif.compile(loss='mse', optimizer='rmsprop')
I'm trying to make an autoencoder using Keras with a tensorflow backend. In particular, I have data of a vector of n_components (i.e. 200) sampled n_times (i.e. 20000). It is key that when I train time t, that I compare it only to time t. It appears that it is shuffling the sampling times. I removed the bottleneck and find that the network is doing a pretty bad job of predicting the n_components, instead representing something more like the mean of the input scaled by each component.
Here is my network with the bottleneck commented out:
model = keras.models.Sequential()
# Make a 7-layer autoencoder network
model.add(keras.layers.Dense(n_components, activation='relu', input_shape=(n_components,)))
model.add(keras.layers.Dense(n_components, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
# model.add(keras.layers.Dense(3, activation='relu'))
# model.add(keras.layers.Dense(50, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.add(keras.layers.Dense(n_components, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
# act is a numpy matrix of size (n_components, n_times)
model.fit(act.T, act.T, epochs=15, batch_size=100, shuffle=False)
newact = model.predict(act.T).T
I have tested shuffling the second component of act, n_times, and passing it as model.fit(act.T, act_shuffled.T) and see no difference from model.fit(act.T, act.T). Am I doing something wrong? How can I force it to learn from the specific time?
Many thanks,
Arthur
I believe that I have solved the problem, but more knowledgeable users of Keras might be able to correct me. I had tried many different values for the argument batch_size of fit, but I didn't try a value of 1. When I changed it to 1, it did a good job of reproducing the input data.
I believe that the batch size, even if shuffle is set to False, allows the autoencoder to train one input time against an unrelated input time.
So, I have ammended my code to:
model.fit(act.T, act.T, epochs=15, batch_size=1, shuffle=False)