Python: Issues training and predicting regression on Keras - python

I'm working on a simple time series regression problem using Keras, I want to predict the next closing price using the last 20 closing prices, I have the following code according to some examples I found:
I write my sequential model in a separated function, as needed by "build_fn" parameter:
def modelcreator():
model = Sequential()
model.add(Dense(500, input_shape = (20, ),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(250,activation='relu'))
model.add(Dense(1,activation='linear'))
model.compile(optimizer=optimizers.Adam(),
loss=losses.mean_squared_error)
return model
I create the KerasRegressor Object passing the model creator function and the desired fit parameters:
estimator = KerasRegressor(build_fn=modelcreator,nb_epoch=100, batch_size=32)
I train the model trough the KerasRegressor Object with 592 samples:
self.estimator.fit(X_train, Y_train)
And the issues start to show up, although nb_epoch=100 my model only trains for 10 epochs:
Epoch 1/10
592/592 [==============================] - 0s - loss: 6.9555e-05
Epoch 2/10
592/592 [==============================] - 0s - loss: 1.2777e-05
Epoch 3/10
592/592 [==============================] - 0s - loss: 1.0596e-05
Epoch 4/10
592/592 [==============================] - 0s - loss: 8.8115e-06
Epoch 5/10
592/592 [==============================] - 0s - loss: 7.4438e-06
Epoch 6/10
592/592 [==============================] - 0s - loss: 8.4615e-06
Epoch 7/10
592/592 [==============================] - 0s - loss: 6.4859e-06
Epoch 8/10
592/592 [==============================] - 0s - loss: 6.9010e-06
Epoch 9/10
592/592 [==============================] - 0s - loss: 5.8951e-06
Epoch 10/10
592/592 [==============================] - 0s - loss: 7.2253e-06
When I try to get a prediction using a data sample:
prediction = self.estimator.predict(test)
The prediction value should be close to the 0.02-0.04 range but when I print it I get 0.000980315962806344
Q1: How can I set the training epochs to the desired value?
Q2: How can I generate predictions with my NN?

The first thing is that you are most likely using Keras 2.0, and in that version the parameter nb_epochs was renamed to epochs.
The second thing is that you have to normalize your inputs and outputs to the [0, 1] range. It won't work without normalization. Also to match the normalized output and the network range, it would be best to use a sigmoid activation at the output layer.

Your network is not converging. Try changing the parameters. The loss should reduce consistently. Also initialize the parameters properly.

Related

how to get accuracy from callback function to keras sequence

How can I get accuracy calculated each epoch in keras sequence?
Accuracy after each epoch is being printed on console like:
Epoch 14/500
90/90 [==============================] - 17s 184ms/step - loss: 0.6935 - sparse_categorical_accuracy: 0.5174 - val_loss: 0.6927 - val_sparse_categorical_accuracy: 0.5146 - lr: 0.0010
I am using tf.keras.utils.Sequence to change dataset every epoch with on_epoch_end.
I want to use this accuracy in the Sequence to decide whether to change the dataset or not.
How can I call(or get) this accuracy from callback to Sequence?

Why the results on the same val set are different in Model.fit() and Model.evaluate()?

I want to use a resnet50 for a regression task. And I use a custom loss for training. I want to use checkpoints to save the best model which has the minimum loss on testing data. The codes for model's training are as follows:
input_shape = (32, 32, 1)
inputs = keras.Input(shape=input_shape)
outputs = tf.keras.applications.ResNet50(
include_top=False, weights=None, input_tensor=None,
input_shape=input_shape, pooling='max'
)(inputs)
outputs = keras.layers.Dense(1, activation=None)(outputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer='adam',
loss=EWC_loss(model,fisher_1,prior_weights_1,Lambda=1),
metrics='mse')
checkpoint_filepath_3 = 'F:/NTU_PyCode/CL_regression_mnist/saved_resnet/resnet50_task2_epoch=5(1).h5'
model_checkpoint_callback_2 = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath_3,
save_weights_only=True,
monitor='val_loss',
mode='min',
save_best_only=True)
model.fit(x_train_2,y_train_2,batch_size=32,shuffle=True,
validation_data=(x_test_2, y_test_2), epochs=5,
callbacks=[model_checkpoint_callback_2])
And here are the training results. In my plan, the model's weights after the 3rd epoch will be saved to the checkpoint_filepath. Because it has the minimum val_loss (val_mse is not minimum because the custom loss involves other terms).
2/1875 [..............................] - ETA: 1:07 - loss: 8.4497 - mse: 8.4489WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0239s vs `on_train_batch_end` time: 0.0449s). Check your callbacks.
1875/1875 [==============================] - 136s 73ms/step - loss: 2.6100 - mse: 2.5062 - val_loss: 5.5797 - val_mse: 5.4108
Epoch 2/5
1875/1875 [==============================] - 129s 69ms/step - loss: 1.2896 - mse: 1.1265 - val_loss: 1.6604 - val_mse: 1.4745
Epoch 3/5
1875/1875 [==============================] - 128s 68ms/step - loss: 0.9861 - mse: 0.7998 - val_loss: 1.4171 - val_mse: 1.2161
Epoch 4/5
1875/1875 [==============================] - 128s 68ms/step - loss: 1.1695 - mse: 0.8958 - val_loss: 1.4705 - val_mse: 1.2034
Epoch 5/5
1875/1875 [==============================] - 129s 69ms/step - loss: 1.0095 - mse: 0.7305 - val_loss: 11.7203 - val_mse: 11.4236
But when I load the weights and use the evaluate function to evaluate on the same testing data, there comes the problem. The loss is not custom loss here but the metric is still mse. So I assume the mse in evaluation function should be the same to the result in fit function(same as val_mse in the 3rd epoch). But the MSEs are very different!
model.compile(optimizer='adam',
loss=tf.keras.losses.mse,
metrics='mse')
print("EWC model on Task 2")
model.load_weights(checkpoint_filepath_3)
model.evaluate(x_test_2,y_test_2)
EWC model on Task 2
313/313 [==============================] - 4s 13ms/step - loss: 9.1384 - mse: 9.1384
What causes this phenomenon? Is that the weights not be saved into the checkpoints? Or any other issues? Thank you in advance~
After more experiments, I found a puzzled phenomenon. If I run the codes of training and evaluation together, the results are correct! The results for 2 epochs in training and evaluation are showed as follows. And we can see the MSEs are the same.
2/1875 [..............................] - ETA: 59s - loss: 15.2813 - mse: 15.2805WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0190s vs `on_train_batch_end` time: 0.0439s). Check your callbacks.
1875/1875 [==============================] - 137s 73ms/step - loss: 2.0093 - mse: 1.9253 - val_loss: 1.8885 - val_mse: 1.7217
Epoch 2/2
1875/1875 [==============================] - 129s 69ms/step - loss: 1.1946 - mse: 1.0230 - val_loss: 1.1102 - val_mse: 0.9254
EWC model on Task 2
313/313 [==============================] - 4s 13ms/step - loss: 0.9254 - mse: 0.9254
But if I train and evaluate separately (run codes for training first, then just load the saved weights in model and evaluate), The results are different.
EWC model on Task 2
313/313 [==============================] - 4s 14ms/step - loss: 9.0702 - mse: 9.0702
Why is that? That's really confusing. Is there any difference between train and evaluate in one run and separately?
I don't understand the details but when you use model checkpoint and save the weights only or even the whole model when you execute model.load_weights it is a complex process that is described here. When you recompile before loading the weights that restoration process apparently gets messed up. I did find a note that says changing model.compile can cause the restoration to fail.

Keras neural network to predict change in angle of a particle is not predicting correctly

I have put together a keras regresssion model to predict the change in angle of a single particle when supplied with data about that particle. To aquire the data, I created a program that models brownian motion between n particles. As well as random angular noise, depending on how close together the particles are they will induce a change in each others angle.
It is not too important how my code works, but essentially it outputs an array containing the x,y coordinates of all particles relative to the single particle, the value of theta of all particles, and the distance between all particles and the single particle. All of these parameters are found at each time step. Each 'image' I use to train the network is all these parameters at some point in time. So overall, the input variable is x,y,angle,distance, and the output variable is the change in theta of the target particle
For my neural network I first normalised all my data to be between -1 and 1, and then reshaped it to be fed into the NN:
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
## NORMALIZE IMAGES ##########################################################
# all images and labels imported, so obviously wont run without data. This is
# designed for running data with m iterations, n particles, 4 parameters
# (size of test data array is [m,n,4]).
L = 5
# length of 'box' that houses particles
n = 10
# number of particles
train_images[:,:,0:2] = train_images[:,:,0:2]/L
# normalise [x,y] from -L:L to -1:1.
train_images[:,:,2:3] = train_images[:,:,2:3]/(2*np.pi)
# normalise theta value from -2pi:2pi to -1:1
train_images[:,:,3:4] = (train_images[:,:,3:4]/(L*np.sqrt(2))*2)-1
# normalise distance value from 0:sqrt(2)L to -1:1
test_images[:,:,0:2] = test_images[:,:,0:2]/L
test_images[:,:,2:3] = test_images[:,:,2:3]/(2*np.pi)
test_images[:,:,3:4] = (test_images[:,:,3:4]/(L*np.sqrt(2))*2)-1
## FLATTEN IMAGES ############################################################
train_images = train_images.reshape((-1, 4*(n-1)))
# reshape so each input is a single dimension
# 4*(n-1) due to 4 parameters, adn n-1 particles (since one is redundant info)
test_images = test_images.reshape((-1, 4*(n-1)))
## BUILDING THE MODEL ########################################################
model = Sequential([
Dense(64, activation='tanh', input_shape=(4*(n-1),)),
Dense(16, activation='tanh'),
Dropout(0.25),
Dense(1, activation='tanh'),
])
## COMPILING THE MODEL #######################################################
model.compile(
optimizer='adam',
loss='mean_squared_error',
#metrics=['mean_squared_error'],
)
## TRAINING THE MODEL ########################################################
history = model.fit(
train_images, # training data
train_labels, # training targets
epochs=10,
batch_size=32,
#validation_data=(test_images, test_labels),
shuffle=True,
validation_split=0.2,
)
I have used a variety of activation types for the different layers (relu, sigmoid, tanh...), but none seem to give me the correct results. The true values of my data (the change in angle of the particle) are values ranging from about 0.02 to -0.02, but the values I am getting are much smaller, and tend to be predominantly one sign (pos/neg).
I am currently using the loss function 'mean absolute error', as I am looking to minimise the difference between the real and predicted value. I notice when doing this, that after only one epoch the loss is already incredibly tiny:
Epoch 1/10
12495/12495 [==============================] - 13s 1ms/step - loss: 0.0010 - val_loss: 3.3794e-05
Epoch 2/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4491e-05 - val_loss: 3.3769e-05
Epoch 3/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4391e-05 - val_loss: 3.3883e-05
Epoch 4/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4251e-05 - val_loss: 3.4755e-05
Epoch 5/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4183e-05 - val_loss: 3.4273e-05
Epoch 6/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4175e-05 - val_loss: 3.3770e-05
Epoch 7/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4160e-05 - val_loss: 3.3646e-05
Epoch 8/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4131e-05 - val_loss: 3.3629e-05
Epoch 9/10
12495/12495 [==============================] - 14s 1ms/step - loss: 3.4145e-05 - val_loss: 3.3581e-05
Epoch 10/10
12495/12495 [==============================] - 13s 1ms/step - loss: 3.4148e-05 - val_loss: 3.4647e-05
Here is an example of the results I get from this:
Prediction: 4.8542774e-05
Actual: 0.006994473448353978
Is there anything obviously wrong I have done to get these results? Sorry if I have not provided enough information.
It is a regression problem,last layer does not have activation. Decrease the number of unit frim 32 to 16 in 1 st layer as this will prevent overfiting

Sudden 50% accuracy drop while training convolutional NN

Training convolutional neural network from scratch on my own dataset with Keras and Tensorflow.
learning rate = 0.0001,
5 classes to sort,
no Dropout used,
dataset checked twice, no wrong labels found
Model:
model = models.Sequential()
model.add(layers.Conv2D(16,(2,2),activation='relu',input_shape=(75,75,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(16,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(128,activation='relu'))
model.add(layers.Dense(5,activation='sigmoid'))
model.compile(optimizer=optimizers.adam(lr=0.0001),
loss='categorical_crossentropy',
metrics=['acc'])
history = model.fit_generator(train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=val_generator,
validation_steps=25)
Everytime when model achieves 25-35 epochs (80-90% accuracy) this happens:
Epoch 31/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3524 - acc: 0.8558 - val_loss: 0.4151 - val_acc: 0.7992
Epoch 32/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3393 - acc: 0.8700 - val_loss: 0.4384 - val_acc: 0.7951
Epoch 33/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3321 - acc: 0.8702 - val_loss: 0.4993 - val_acc: 0.7620
Epoch 34/50
100/100 [==============================] - 3s 33ms/step - loss: 1.5444 - acc: 0.3302 - val_loss: 1.6062 - val_acc: 0.1704
Epoch 35/50
100/100 [==============================] - 3s 34ms/step - loss: 1.6094 - acc: 0.2935 - val_loss: 1.6062 - val_acc: 0.1724
There is some similar problems with answers, but mostly they recommend to lower learning rate, but it doesnt help at all.
UPD: almost all weights and biases in network became nan. Network somehow died inside
Solution in this case:
I changed sigmoid function in last layer to softmax function and drops are gone
Why this worked out?
sigmoid activation function is used for binary (two-class) classifications.
In multiclassification problems we should use softmax function - special extension of sigmoid function for multiclassification problems.
More information: Sigmoid vs Softmax
Special thanks to #desertnaut and #Shubham Panchal for error indication

In Neural Networks: accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. Why?

I am training a neural network in batches with Keras 2.0 package for Python.
Below is some information about the data and the training parameters:
#samples in train: 414934
#features: 590093
#classes: 2 (binary classification problem)
batch size: 1024
#batches = 406 (414934 / 1024 = 405.2)
Below are some logs of the follow code:
for i in range(epochs):
print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
steps_per_epoch=num_of_batches,
epochs=1,
verbose=1)
(partial) Logs:
train_model:: starting epoch 1/3
Epoch 1/1
1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996
2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587
3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918
train_model:: starting epoch 2/3
Epoch 1/1
1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424
2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419
3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329
train_model:: starting epoch 3/3
Epoch 1/1
1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756
2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688
3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745
My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.
I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.
I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.
Also, from Keras documentation:
Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.
Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?
This has nothing to do with your model or your dataset; the reason for this "jump" lies in how metrics are calculated and displayed in Keras.
As Keras processes batch after batch, it saves accuracies at each one of them, and what it displays to you is not the accuracy on the latest processed batch, but the average over all batches in the current epoch. And, as the model is being trained, accuracies over successive batches tend to improve.
Now consider: in the first epoch, let's say, there are 50 batches, and network went from 0% to 90% during these 50 batches. Then at the end of the epoch Keras will show accuracy of, e.g. (0 + 0.1 + 0.5 + ... + 90) / 50%, which is, obviously, much less than 90%! But, because your actual accuracy is 90%, the first batch of the second epoch will show 90%, giving the impression of a sudden "jump" in quality. The same, obviously, goes for loss or any other metric.
Now, if you want more realistic and trustworthy calculation of accuracy, loss, or any other metric you may find yourself using, I would suggest using validation_data parameter in model.fit[_generator] to provide validation data, which will not be used for training, but will be used only to evaluate the network at the end of each epoch, without averaging over various points in time.
The accuracy at the end of an epoch is the accuracy over the full dataset. The accuracy after each batch is the accuracy over all batches that are used for training at that moment. It could be the case that your first batch is predicted very well and the following batches have a lower accuracy. In that case the accuracy over your full dataset will be low compared to the accuracy of your first batch.

Categories