Training with Adam gets slower each epoch - python

I am training a simple neural network in Keras with Theano backend consisting of 4 dense layers connected to a Merge layer and then to a softmax classifier layer. Using Adam for training, the first few epochs train in about 60s each (in the CPU) but, after that, the training time per epoch starts increasing, taking more than 400s by epoch 70, making it unusable.
Is there anything wrong with my code or is this suppose to happen?
This only happens when using Adam, not with sgd, adadelta, rmsprop or adagrad. I'd use any of the other methods but Adam produces far better results.
The code:
modela = Sequential()
modela.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelb = Sequential()
modelb.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelc = Sequential()
modelc.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modeld = Sequential()
modeld.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
model = Sequential()
model.add(Merge([modela, modelb, modelc, modeld], mode='concat', concat_axis=1))
model.add(Dense(258, init='uniform', activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
hist = model.fit([Xa, Xb, Xc, Xd], Ycat, validation_split=.25, nb_epoch=80, batch_size=100, verbose=2)

Related

Fluctuating Accuracy/loss curves - LSTM keras

I am creating a LSTM model for human activity recognition and I keep always getting fluctuating but increasing Train and loss curves.
The following architecture gave these curves Train and loss curves :
model = Sequential()
model.add(LSTM(units = 128, return_sequences=True , input_shape=(3500, 11)))
model.add(Dropout(0.5))
model.add(Dense(units= 64, activation='relu'))
model.add(LSTM(units = 128, return_sequences=False , input_shape=(3500, 11)))
model.add(Dropout(0.5))
model.add(Dense(units= 64, activation='relu'))
model.add(Dense(4, activation='softmax'))
adam = tf.keras.optimizers.Adam(learning_rate=0.0020, beta_1=0.9, beta_2=0.999,
epsilon=None, decay=0.0, amsgrad=False, clipnorm=1.)
model.compile(optimizer=adam ,loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(Gen, validation_data=val_Gen, epochs=30, callbacks=[tensorboard_callback],
verbose=1).history
I tried changing the models architecture with different hyperparameters but nothing improved.
I am using TimeSeriesGenerator from keras to generate batches.
Does anyone have a suggestion ?

Problem with reducing the loss of a Neural Network

I have a quite large data and a binary classification problem, which I want to train with neural network, I used more than 10 combination for my NN structure varying from 3 layers to 20, I also tried to over-fit my model on a smaller sample for the purpose of debugging, but my loss does not reduce at all!! It get stuck on 0.4 after number of epochs every times with every combinations and every sample sizes! But the strange thing is that the accuracy also does not reduce and it's around 0.8 which is not very bad!
As I'm new in working with NNs, any suggestion for solving the problem?
I use keras sequential model libraries.
here is my designed neural network:
#Normalizing the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)
print(X)
from keras import utils
y = utils.to_categorical(y)
print(y)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.9)
#Dependencies
import keras
from keras.models import Sequential
from keras.layers import Dense
# Neural network
model = Sequential()
model.add(Dense(24, input_dim=25, activation='relu'))
model.add(Dense(22, activation='relu'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(18, activation='selu'))
model.add(Dense(18, activation='sigmoid'))
model.add(Dense(16, activation='relu'))
model.add(Dense(16, activation='sigmoid'))
model.add(Dense(14, activation='tanh'))
model.add(Dense(12, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(8, activation='sigmoid'))
model.add(Dense(8, activation='selu'))
model.add(Dense(7, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer = opt , metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=5000, batch_size=1000)

Model predicts negative values as zeros

I am training a keras autoencoder model with the following structure:
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(MAX_CONTEXTS, 3)))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation='relu'))
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
My data is in the shape of (number_of_samples, 430, 3) and contains values from [-1.9236537371711413, 1.9242677998256246]. This data is already normalized. I then train this model:
history = model.fit(X, X, epochs=15, batch_size=2, verbose=1, shuffle=True, validation_split=0.2)
and get an accuracy of 95.03% (suspiciously high, but my problem now is something else). Now when I predict a sample of my data, the positive values are relatively good, close to what they are in the input, but the negative values are all rounded to 0. Is this a fault of the loss function that I chose? And if so which other loss function should I choose? Or do I have to scale my data differently?
This is because you apply relu activation at the output layer.

How to make sure that Keras model weights are initialised randomly every-time the model is fit

I am training a Keras model for my data. I have to split the data into 3 parts and I am calling the same keras model for each split and trying to fit and predict consecutively.
I have a suspicion that every-time I call the model the model weights remain the same after reaching convergence from last training. And the next model called starts minimising the error from its previous state. I want that each time the model is trained, it starts to fit the data from a different random weights initialisation. Because all of my 3 splits are subset of the same dataset and I don't want any data leakage into the model due to seeing the split data beforehand while training.
Can I know if it is reinitialising the weights every-time the model is fit. And if not how can I do so?
here is how my code looks like
model = Sequential()
model.add(Dense(512, input_dim=77, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
# Compile model
model.compile(loss='mean_absolute_error', optimizer='adam')
model()
# evaluate model
history = model.fit(scaler.transform(X_train_high), y_train_high,
batch_size=128,
epochs=5)
results = model.evaluate(scaler.transform(X_train_high), y_train_high, batch_size=128)
print('High test loss, test acc:', results)
# evaluate model
history = model.fit(scaler.transform(X_train_medium), y_train_medium,
batch_size=128,
epochs=5)
results = model.evaluate(scaler.transform(X_train_medium), y_train_medium, batch_size=128)
print(' Medium test loss, test acc:', results)
# evaluate model
history = model.fit(scaler.transform(X_train_low), y_train_low,
batch_size=128,
epochs=5)
results = model.evaluate(scaler.transform(X_train_low), y_train_low, batch_size=128, epochs=5)
print('Low test loss, test acc:', results)
The model will keep its weight until you redefine one.
def define_model():
model = Sequential()
model.add(Dense(512, input_dim=77, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, kernel_initializer='RandomNormal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model=define_model()
# Compile model
model.compile(loss='mean_absolute_error', optimizer='adam')
# evaluate model
history = model.fit(scaler.transform(X_train_high), y_train_high,
batch_size=128,
epochs=5)
results = model.evaluate(scaler.transform(X_train_high), y_train_high, batch_size=128)
print('High test loss, test acc:', results)
model=define_model()
model.compile(loss='mean_absolute_error', optimizer='adam')
# evaluate model
history = model.fit(scaler.transform(X_train_medium), y_train_medium,
batch_size=128,
epochs=5)
results = model.evaluate(scaler.transform(X_train_medium), y_train_medium, batch_size=128)
print(' Medium test loss, test acc:', results)
You can check by model.get_weights.

NN Accuracy Saturates After the Very First Epoch with Keras

I'm trying to fit a simple Neural Network to predict a binary target using keras-1.0.6. The output saturates after the very first epoch. I try playing around with the learning rate (from 0.1 to 1e-6), decay and momentum of the SGD optimizer and with the layers (10-512 hidden neurons and 1-2 hidden layers) and their activation functions of the network, but nothing worked - the prediction accuracy was the same.
My training set has shape (13602, 115) and my validation set has shape (3400,115). The target variable y_train and y_test have values encoded as 1 and 0 (60% are 1's and 40% are 0's). At first, the data was not normalized though when I normalized it I got the same results.
Verifying the output, I see that the model is predicting only 1 class. Sometimes it predicts only 1's and other times only 0's (when I tweak the model).
I also tried to encode the target variable in the shape (n_sample, 2) but the output was the same.
I followed some questions here and googling that suggests tunning the learning rate and not using 'softmax' activation but couldn't improve the results.
Some of the models I tried is below:
The simplest model:
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
Model 2:
model = Sequential()
model.add(Dense(512, input_dim=X_train.shape[1]))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
Model 3
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
Model 4
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='sigmoid'))
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
and to compile and fit the model:
sgd = SGD(lr=0.01, decay=0.1, momentum=0.0, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train2, nb_epoch=5, batch_size=50, validation_split=0.2)
model.predict(X_test)
The output gives either [0,0,0,0,0,0,0,...] or [1,1,1,1,1,1,1,1,...]
Does anybody have a clue on what's going on here?

Categories