Can't train a keras model to approximate a simple function - python

I just got started with machine learning and I tried to write a simple program where the nn will learn the simple function y = f(x) = 2x.
Here's the code:
#x is a 1D array of 1 to 1000
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[:750]
ytrain = y[:750]
xtest = x[750:]
ytest = y[750:]
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
model.summary()
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
I get the following output, no matter how I change the architecture or the hyperparameters:
79999/79999 [==============================] - 1s 13us/step - loss: 8533120007.8465 - acc: 0.0000e+00 - val_loss: 32532613324.8000 - val_acc: 0.0000e+00
the accuracy is 0 all the time. what am I doing wrong?

It's actually what you would expect if you blindly run and expect gradient descent methods to learn any function. The behaviour you observe stems from 2 reasons:
The derivative that SGD uses to update weights actually depends on the input. Take a very simple case y = f(wx + b), the derivative of y with respect to w is f'(wx + b)*x using the chain rule. So when there is an update for an input that is extremely large / unnormalised it blows up. Now the update is basically w' = w - alpha*gradient, so the weight suddenly becomes very small, in fact negative.
After a single gradient update the output became negative because the SGD just overshot. Since you again have relu in the final layer, that just outputs 0 and the training stalls because when the output is negative derivative of relu is 0.
You can reduce the datasize to np.arange(1, 10) and reduce the number of hidden neurons to say 12 (more neurons make the output even more negative after single update as all their weights become negative as well) and you will be able to train the network.

I think it works check this out. I used randn instead of arange. Other things are pretty much the same.
x = np.random.randn(1000)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.SGD(lr=0.01, decay=1e-6)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=20,
verbose=1,
validation_split=0.2)
If you want to use the earlier dataset(ie arange). Here is accompanying code for that.
x = np.arange(1,1000, 1)
y = x*2
xtrain = x[0:750]
ytrain = y[0:750]
model = Sequential()
model.add(Dense(128, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
sgd = optimizers.Adam(lr=0.0001)
model.compile(loss='mean_squared_error',
optimizer=sgd,
metrics=['mae'])
history = model.fit(xtrain, ytrain,
batch_size=100,
epochs=200,
verbose=1,
validation_split=0.2)

Related

Why is my multiclass neural model not training (accuracy and loss staying same)?

I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)

Keras producing model with no accuracy

I have the following code for training a model based off of some numbers:
from numpy import loadtxt
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from time import sleep
dataset = loadtxt("data.csv", delimiter=",")
X = dataset[:,0:2]
y = dataset[:,2]
model = Sequential()
model.add(Dense(196, input_dim=2, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=600, batch_size=10)
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
For reference, here is some of the data it is being presented with:
433,866,1299,1732
421,842,1263,1684
443,886,1329,1772
142,284,426,568
437,874,1311,1748
455,910,1365,1820
172,344,516,688
219,438,657,876
101,202,303,404
289,578,867,1156
110,220,330,440
421,842,1263,1684
472,944,1416,1888
121,242,363,484
215,430,645,860
134,268,402,536
488,976,1464,1952
467,934,1401,1868
418,836,1254,1672
134,268,402,536
241,482,723,964
116,232,348,464
395,790,1185,1580
438,876,1314,1752
396,792,1188,1584
57,114,171,228
218,436,654,872
372,744,1116,1488
305,610,915,1220
462,924,1386,1848
455,910,1365,1820
42,84,126,168
347,694,1041,1388
394,788,1182,1576
184,368,552,736
302,604,906,1208
326,652,978,1304
333,666,999,1332
335,670,1005,1340
176,352,528,704
168,336,504,672
62,124,186,248
26,52,78,104
335,670,1005,1340
(The first three numbers should be inputs, and the last one an output)
The Keras program keeps training but only warrants an accuracy of 0. What am I doing wrong?
Like discussed in comments, this is a regression problem (not classification), so we can use, for example, mse (mean squared errors) as a loss function, and change activation of the last layer to linear:
X = dataset[:,0:3]
y = dataset[:,3]
model = Sequential()
model.add(Dense(196, input_dim=3, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=600, batch_size=10)

Overfitting in LSTM even after using regularizers

I am having a time series prediction problem and building an LSTM like below :
def create_model():
model = Sequential()
model.add(LSTM(50,kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01), input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.591))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
When I train the model on 5 splits like below :
tss = TimeSeriesSplit(n_splits = 5)
X = data.drop(labels=['target_prediction'], axis=1)
y = data['target_prediction']
for train_index, test_index in tss.split(X):
train_X, test_X = X.iloc[train_index, :].values, X.iloc[test_index,:].values
train_y, test_y = y.iloc[train_index].values, y.iloc[test_index].values
model=create_model()
history = model.fit(train_X, train_y, epochs=10, batch_size=64,validation_data=(test_X, test_y), verbose=0, shuffle=False)
I get an overfitting problem. The graph of loss is attached
I am not sure why there is overfitting when I use regularizers in my Keras model. Any help is appreciated .
EDIT:
Tried the architectures
def create_model():
model = Sequential()
model.add(LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
def create_model(x,y):
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(x,y)))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
but still it is overfitting.
First of all remove all your regularizers and dropout. You are literally spamming with all the tricks out there and 0.5 dropout is too high.
Reduce the number of units in your LSTM. Start from there. Reach a point where your model stops overfitting.
Then, add dropout if required.
After that, the next step is to add the tf.keras.Bidirectional. If still, you are not satfisfied then, increase number of layers. Remember to keep return_sequences True for every LSTM layer except the last one.
It is seldom I come across networks using layer regularization despite the availability because dropout and layer regularization have a same effect and people usually go with dropout (at maximum, I have seen 0.3 being used).

ResNet50 Model is not learning with transfer learning in keras

I am trying to perform transfer learning on ResNet50 model pretrained on Imagenet weights for PASCAL VOC 2012 dataset. As it is a multi label dataset, I am using sigmoid activation function in the final layer and binary_crossentropy loss. The metrics are precision,recall and accuracy. Below is the code I used to build the model for 20 classes (PASCAL VOC has 20 classes).
img_height,img_width = 128,128
num_classes = 20
#If imagenet weights are being loaded,
#input must have a static square shape (one of (128, 128), (160, 160), (192, 192), or (224, 224))
base_model = applications.resnet50.ResNet50(weights= 'imagenet', include_top=False, input_shape= (img_height,img_width,3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dropout(0.7)(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
for layer in model.layers[-2:]:
layer.trainable=True
for layer in model.layers[:-3]:
layer.trainable=False
adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
#print(model.summary())
X_train, X_test, Y_train, Y_test = train_test_split(x_train, y, random_state=42, test_size=0.2)
savingcheckpoint = ModelCheckpoint('ResnetTL.h5',monitor='val_loss',verbose=1,save_best_only=True,mode='min')
earlystopcheckpoint = EarlyStopping(monitor='val_loss',patience=10,verbose=1,mode='min',restore_best_weights=True)
model.fit(X_train, Y_train, epochs=epochs, validation_data=(X_test,Y_test), batch_size=batch_size,callbacks=[savingcheckpoint,earlystopcheckpoint],shuffle=True)
model.save_weights('ResnetTLweights.h5')
It ran for 35 epochs until earlystopping and the metrics are as follows (without Dropout layer):
loss: 0.1195 - accuracy: 0.9551 - precision_m: 0.8200 - recall_m: 0.5420 - val_loss: 0.3535 - val_accuracy: 0.8358 - val_precision_m: 0.0583 - val_recall_m: 0.0757
Even with Dropout layer, I don't see much difference.
loss: 0.1584 - accuracy: 0.9428 - precision_m: 0.7212 - recall_m: 0.4333 - val_loss: 0.3508 - val_accuracy: 0.8783 - val_precision_m: 0.0595 - val_recall_m: 0.0403
With dropout, for a few epochs, the model is reaching to a validation precision and accuracy of 0.2 but not above that.
I see that precision and recall of validation set is pretty low compared to training set with and without dropout layer. How should I interpret this? Does this mean the model is overfitting. If so, what should I do? As of now the model predictions are quite random (totally incorrect). The dataset size is 11000 images.
Please can you modify code as below and try to execute
From:
predictions = Dense(num_classes, activation= 'sigmoid')(x)
To:
predictions = Dense(num_classes, activation= 'softmax')(x)
From:
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy',precision_m,recall_m])
To:
model.compile(optimizer= adam, loss='categorical_crossentropy', metrics=['accuracy',precision_m,recall_m])
This question is pretty old, but I'll answer it in case it is helpful to someone else:
In this example, you froze all layers except by the last two (Global Average Pooling and the last Dense one). There is a cleaner way to achieve the same state:
rn50 = applications.resnet50.ResNet50(weights='imagenet', include_top=False,
input_shape=(img_height, img_width, 3))
x = rn50.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
rn50.trainable = False # <- this
model.compile(...)
In this case, features are being extracted from the ResNet50 network and fed to a linear softmax classifier, but the ResNet50's weights are not being trained. This is called feature extraction, not fine-tuning.
The only weights being trained are from your classifier, which was instantiated with weights drawn from a random distribution, and thus should be entirely trained. You should be using Adam with its default learning rate:
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
So you can train it for a few epochs, and, once it's done, then you unfreeze the backbone and "fine-tune" it:
backbone.trainable = False
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001))
model.fit(epochs=50)
backbone.trainable = True
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.00001))
model.fit(epochs=60, initial_epoch=50)
There is a nice article about this on Keras website: https://keras.io/guides/transfer_learning/

Gradient Descent loss and accuracy doesn't change through iteration

I'm using keras to implement a basic CNN emotion detection. here is my model architecture
def HappyModel(input_shape):
X_Input = Input(input_shape)
X = ZeroPadding2D((3,3))(X_Input)
X = Conv2D(32, (7,7), strides=(1,1), name='conv0')(X)
X = BatchNormalization(axis = 3, name='bn0')(X)
X = Activation('relu')(X)
X = MaxPooling2D((2,2), name='mp0')(X)
X = Flatten()(X)
X = Dense(1, activation='sigmoid', name='fc0')(X)
model = Model(inputs = X_Input, outputs = X, name='hmodel')
return model
happyModel = HappyModel(X_train.shape[1:])
happyModel.compile(Adam(lr=0.1) ,loss= 'binary_crossentropy', metrics=['accuracy'])
happyModel.fit(X_train, Y_train, epochs = 50, batch_size=16, validation_data=(X_test, Y_test))
it appears that the model loss and accuracy doesn't change at all in every epoch step. it feels like the gradient descent stuck on local minima as follow:
https://i.imgur.com/9As8v0c.png
have tried using Adam and SGD optimizer, with both learning rate .1 and .5, still no luck.
it turns out if I change the compile method command parameters, the model would converge nicely on training epoch
happyModel.compile(optimizer = 'adam' ,loss= 'binary_crossentropy', metrics=['accuracy'])
Keras documentation says that if we write the parameter this way, it would use the default parameters for adam (https://keras.io/optimizers/)
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
but then, if I change the model compile method to the default parameters
happyModel.compile(Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.0),loss= 'binary_crossentropy', metrics=['accuracy'])
accuracy and loss still stuck.
what's the difference between the two different implementations of Adam optimizer on Keras?
You can check a closed issue on official keras-team page:
https://github.com/keras-team/keras/issues/5564
You probably have a syntax issue, since the two ways are not completely equivalent.

Categories