I'm new to Keras . I want to predict the next 30 values of a time series using the previous 10 values of features (5 different features). Here is my code but I am almost sure that it is not the right way to do that. any idea?
# features is a (11277,10,5) array--> (samples, time steps, features)
# predictions is a (11277,30) array--> (samples, outputs)
model = Sequential()
model.add(LSTM(100, input_shape=(features.shape[1], features.shape[2]),return_sequences=True))
model.add(LSTM(300, return_sequences=False))
model.add(Dense(predictions.shape[1], init='uniform', activation='linear'))
model.compile(loss="mean_squared_error", optimizer="adam")
Thank you,
Related
I am using an LSTM model to predict data. But when the model executes, it doesn't wrap to the values at the edges.
Graphed Result * CLICK to VIEW
and here is the lstm model
model = Sequential()
model.add(Bidirectional(LSTM(100, activation='relu', input_shape=(n_steps_in,1))))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss="mae", metrics = [test_acc])
# fit model
model.fit(X_train, y_train, epochs=7)
Can someone explain why the model doesn't predict the values till the bottom, 0r at least get close to it?
P.S : I have tried changing the epoch to 100 and other combinations also
I've created a code to train my CNN which will determine if a image be one or another class ("pdr or nonPdr")
This is my keras model:
model = Sequential()
model.add(Conv2D(input_shape=(605,700,3), filters=64, kernel_size=(3,3), padding="same",activation="tanh"))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(32, activation='tanh', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
data, labels = ReadImages(TRAIN_DIR)
model.fit(np.array(data), np.array(labels), epochs=10, batch_size=16)
model.save('model.h5')
And this is my test_predict file:
model = load_model('model.h5')
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
test_image = cv2.imread("test_data/im0203.ppm")
test_image = test_image.reshape(1,605,700,3).astype('float')
test_image = np.array(test_image)
test_image2 = cv2.imread("test_data/im0001.ppm")
test_image2 = test_image2.reshape(1,605,700,3).astype('float')
#predict the result
print(model.predict(test_image))
print(model.predict(test_image2))
After run the code below I get the same value for my 2 images (that are different, one is pdr and the other is nonPdr)
[[0.033681]]
[[0.033681]]
How can I fix it and improve my CNN. I appreciate your help.
Update
I tried to remove one dense layer and change the last one in a (2, activation='sigmoid') but it not works too.. I really don't have any ideia what to do**
This could be because of multiple reasons. Most plausible ones are:
The input fed to network and the image fed during prediction may not have the same preprocessing steps.
The model is over-fitted and when it finds a little different prediction image, it tends to bias its decision based on its training.
Model is not good
To avoid such complications, possible future solutions could be:
1. Try feeding a normalized or scaled image only during training and testing.
Do not use too many Dense layers (Especially for image classification) because dense layers are good at learning spatial information and for an image to classify correctly, the model needs to look the image in 2d matrix rather than 1d spatial matrix (since dense layers flatten the 2d matrix to 1d matrix).
Even if you are trying to use a dense layer, I would recommend using a dense layer only at the last layer, it, for the final number_of_classes layers.
A better architecture would be
[ Conv followed by few transition blocks and bottleneck layers ] x 2
flatten
Dense(number_of_classes)
I have time series training data of about 5000 numbers. For each 100 numbers, I am trying to predict the 101st. At the end of the series, I would put in the predicted numbers back into the model to predict ahead of the time series.
The attached graph shows the training data, the test data and the prediction output. Currently, the model seems to be under-fitting. I would like to know what hyperparameters should be changed, or if I need to re-structure my input and output data.
I am using the following LSTM network.
model = Sequential()
model.add(LSTM(128, input_shape=([bl,1]), activation='relu', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(20,activation='relu'))
model.add(Dense(1))
model.compile(optimizer=adam(lr=0.0001), loss='mean_squared_error', metrics=['accuracy'])
model.fit(y_ba_tr_in, y_ba_tr_out,
epochs=20,
batch_size=5,shuffle=False,verbose=2)
y_ba_tr_in.shape = (4961, 100, 1)
y_ba_tr_out.shape = (4961, 1)
Something you could try is taking return_sequences=True out of your last LSTM layer. I believe this is generally the approach when you intend to predict for the next timestep.
After that modification, you also shouldn't need the subsequent Flatten() and Dense() layers.
I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()
I am using tensorflow with keras to perform regression on some historical data. Data type as follows:
id,timestamp,ratio
"santalucia","2018-07-04T16:55:59.020000",21.8
"santalucia","2018-07-04T16:50:58.043000",22.2
"santalucia","2018-07-04T16:45:56.912000",21.9
"santalucia","2018-07-04T16:40:56.572000",22.5
"santalucia","2018-07-04T16:35:56.133000",22.5
"santalucia","2018-07-04T16:30:55.767000",22.5
And I am reformulating it as a time series problem (25 time steps) so that I can predict (make a regression) for the next values of the series (variance should not be high). I am using also sklearn.preprocessing MinMaxScaler to scale the data to range (-1,1) or (0,1) depending if I use LSTM or Dense (respectively).
I am training with two different architectures:
Dense is as follows:
def get_model(self, layers, activation='relu'):
model = Sequential()
# Input arrays of shape (*, layers[1])
# Output = arrays of shape (*, layers[1] * 16)
model.add(Dense(units=int(64), input_shape=(layers[1],), activation=activation))
model.add(Dense(units=int(64), activation=activation))
# model.add(Dropout(0.2))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
# opt = optimizers.Adagrad(lr=self.learning_rate, epsilon=None, decay=self.decay_lr)
opt = optimizers.rmsprop(lr=0.001)
model.compile(optimizer=opt, loss=self.loss_fn, metrics=['mae'])
model.summary()
return model
Which more or less provides with good results (same architecture as in tensorflows' tutorial for predicting house prices).
However, LSTM is not giving good results, it usually ends up stuck around a value (for example, 40 (40.0123123, 40.123123,41.09090...) and I do not see why or how to improve it. Architecture is:
def get_model(self, layers, activation='tanh'):
model = Sequential()
# Shape = (Samples, Timesteps, Features)
model.add(LSTM(units=128, input_shape=(layers[1], layers[2]),
return_sequences=True, activation=activation))
model.add(LSTM(64, return_sequences=True, activation=activation))
model.add(LSTM(layers[2], return_sequences=False, activation=activation))
model.add(Dense(units=layers[3], activation='linear'))
# activation=activation))
opt = optimizers.Adagrad(lr=0.001, decay=self.decay_lr)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
model.summary()
return model
I currently train with a batch size of 200 that increases by a rate of 1.5 every fit. Each fit is made of 50 epochs, and I use a keras earlystopping callback with at least 20 epoch.
I have tried adding more layers, more units, reducing layers, units, increasing and decreasing learning rate, etc, but every time it gets stuck around a value. Any reason for this?
Also, do you know any good practices that can be applied to this problem?
Cheers
Have you tried holding back a validation set seeing how well the model performance on the training set tracks with the validation set? This is often how I catch myself overfitting.
A simple function for doing this (adapted from here) can help you do that:
hist = model.fit_generator(...)
def gen_graph(history, title):
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title(title)
gen_graph(hist, "Accuracy, training vs. validation scores")
Also, do you have enough samples? If you're really, really sure that you have done as much as you can in terms of preprocessing, and in terms of hyperparameter tuning... generating some synthetic data or doing some data augmentation has occasionally helped me.