I have time series training data of about 5000 numbers. For each 100 numbers, I am trying to predict the 101st. At the end of the series, I would put in the predicted numbers back into the model to predict ahead of the time series.
The attached graph shows the training data, the test data and the prediction output. Currently, the model seems to be under-fitting. I would like to know what hyperparameters should be changed, or if I need to re-structure my input and output data.
I am using the following LSTM network.
model = Sequential()
model.add(LSTM(128, input_shape=([bl,1]), activation='relu', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(20,activation='relu'))
model.add(Dense(1))
model.compile(optimizer=adam(lr=0.0001), loss='mean_squared_error', metrics=['accuracy'])
model.fit(y_ba_tr_in, y_ba_tr_out,
epochs=20,
batch_size=5,shuffle=False,verbose=2)
y_ba_tr_in.shape = (4961, 100, 1)
y_ba_tr_out.shape = (4961, 1)
Something you could try is taking return_sequences=True out of your last LSTM layer. I believe this is generally the approach when you intend to predict for the next timestep.
After that modification, you also shouldn't need the subsequent Flatten() and Dense() layers.
Related
I have a dataset of weather forecasts and am trying to make a model that predicts which forecast will be more accurate the next day.
In order to do so, my y output is of the form y=[1,0,1,0] because I have the forecasts of 4 different organizations. 1 represents that this is the best forecast for the current record and more 'ones' means that multiple forecasts had the same best prediction.
My problem is that I want to create a model that trains on these data but also learns that only predicting 1 value correctly is 100% correct answer as I only need to get as a result one of the best and equal forecasts. I believe that the way I am doing this 'shaves' accuracy from my evaluation. Is there a way to implement this in keras? The architecture of the neural network is totally experimental and there is no specific reason why I chose it. This is the code I wrote. My train dataset consists of 6463 rows × 505 columns.
model = Sequential()
model.add(LSTM(150, activation='relu',activity_regularizer=regularizers.l2(l=0.0001)))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(4, activation='softmax'))
#LSTM
# reshape input to be 3D [samples, timesteps, features]
X_train_sc =X_train_sc.reshape((X_train_sc.shape[0], 1, X_train_sc.shape[1]))
X_test_sc = X_test_sc.reshape((X_test_sc.shape[0], 1,X_test_sc.shape[1]))
#validation set
x_val=X_train.iloc[-2000:-1300,0:505]
y_val=y_train[-2000:-1300]
x_val_sc=scaler.transform(x_val)
# reshape input to be 3D for LSTM[samples, timesteps, features]
x_val_sc =x_val_sc.reshape((x_val_sc.shape[0], 1, x_val_sc.shape[1]))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
history= model.fit(x=X_train_sc, y=y_train ,validation_data=(x_val_sc,y_val), epochs=300, batch_size=24)
print(model.evaluate(X_test_sc,y_test))
yhat= model.predict(X_test_sc)
My accuracy is ~44%
If you want to make prediction of form [1,0,1,0] ie. the model should predict the probabiliyt of belong to each of the 4 classes then it is called multi-label classification. What you have coded for is multi-class classification.
Muti-label classification
Your last layer will be a dense layers of size 4 for each class, with sigmod activation. You will use a binary_crossentropy loss.
x = np.random.randn(100,10,1)
y = np.random.randint(0,2,(100,4))
model = keras.models.Sequential()
model.add(keras.layers.LSTM(16, activation='relu', input_shape=(10,1), return_sequences=False))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(4, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x,y)
Check
print (model.predict(x))
Output
array([[0.5196002 , 0.52978194, 0.5009601 , 0.5036485 ],
[0.508756 , 0.5189857 , 0.5022978 , 0.50169533],
[0.5213044 , 0.5254892 , 0.51159555, 0.49724004],
[0.5144601 , 0.5264933 , 0.505496 , 0.5008205 ],
[0.50524575, 0.5147699 , 0.50287664, 0.5021702 ],
[0.521035 , 0.53326863, 0.49642274, 0.50102305],
.........
As you can see the probabilities for each prediction do not sum up to one, rather each value is a probability of it belonging to the corresponding class. So if the probability > 0.5 you can say that it belong to the class.
On the other hand if you use softmax, the probabilies sum up to 1 ie. it belongs to the single class for which it has value > 0.5.
I am using an LSTM model to predict data. But when the model executes, it doesn't wrap to the values at the edges.
Graphed Result * CLICK to VIEW
and here is the lstm model
model = Sequential()
model.add(Bidirectional(LSTM(100, activation='relu', input_shape=(n_steps_in,1))))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss="mae", metrics = [test_acc])
# fit model
model.fit(X_train, y_train, epochs=7)
Can someone explain why the model doesn't predict the values till the bottom, 0r at least get close to it?
P.S : I have tried changing the epoch to 100 and other combinations also
I hope someone can point out where I am going wrong with my RNN. The long and short of my problem is that no matter the structure of my network, the predictions are always along the lines of this:
I have tried 1, 2, 3, and 4 layers of LSTMs each with varying neuron counts and either relu or tanh activation functions. For the above image, the network was setup as:
model = Sequential()
model.add(LSTM(128, activation='relu', return_sequences=True, input_shape=(length, scaled_train_data.shape[1])))
model.add(LSTM(256, activation='relu', return_sequences=True))
model.add(LSTM(256, activation='relu', return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation='relu'))
model.add(Dense(scaled_train_data.shape[1]))
model.compile(optimizer='adam', loss="mse")
The actual training of the model passes ok, without event:
My data is financial data. There are around 70k rows and I have approx. 70/30 train/test split.
Where am I going wrong? Thanks!
So from asking about and reading around, it seems RNNs might not be the best solution for financial / random walk data - at least with the setup I am using. I wonder if using averages might produce better results?
Anyway, moving on to Reinforcement Learning.
I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()
I'm new to Keras . I want to predict the next 30 values of a time series using the previous 10 values of features (5 different features). Here is my code but I am almost sure that it is not the right way to do that. any idea?
# features is a (11277,10,5) array--> (samples, time steps, features)
# predictions is a (11277,30) array--> (samples, outputs)
model = Sequential()
model.add(LSTM(100, input_shape=(features.shape[1], features.shape[2]),return_sequences=True))
model.add(LSTM(300, return_sequences=False))
model.add(Dense(predictions.shape[1], init='uniform', activation='linear'))
model.compile(loss="mean_squared_error", optimizer="adam")
Thank you,