NN Accuracy Saturates After the Very First Epoch with Keras - python

I'm trying to fit a simple Neural Network to predict a binary target using keras-1.0.6. The output saturates after the very first epoch. I try playing around with the learning rate (from 0.1 to 1e-6), decay and momentum of the SGD optimizer and with the layers (10-512 hidden neurons and 1-2 hidden layers) and their activation functions of the network, but nothing worked - the prediction accuracy was the same.
My training set has shape (13602, 115) and my validation set has shape (3400,115). The target variable y_train and y_test have values encoded as 1 and 0 (60% are 1's and 40% are 0's). At first, the data was not normalized though when I normalized it I got the same results.
Verifying the output, I see that the model is predicting only 1 class. Sometimes it predicts only 1's and other times only 0's (when I tweak the model).
I also tried to encode the target variable in the shape (n_sample, 2) but the output was the same.
I followed some questions here and googling that suggests tunning the learning rate and not using 'softmax' activation but couldn't improve the results.
Some of the models I tried is below:
The simplest model:
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
Model 2:
model = Sequential()
model.add(Dense(512, input_dim=X_train.shape[1]))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
Model 3
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
Model 4
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='sigmoid'))
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
and to compile and fit the model:
sgd = SGD(lr=0.01, decay=0.1, momentum=0.0, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train2, nb_epoch=5, batch_size=50, validation_split=0.2)
model.predict(X_test)
The output gives either [0,0,0,0,0,0,0,...] or [1,1,1,1,1,1,1,1,...]
Does anybody have a clue on what's going on here?

Related

Accuracy decreasing but loss decreasing Keras Sequence classification model

model=Sequential()
model.add(LSTM(32, input_shape=(n_timesteps,n_features)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(lr=0.0001),
metrics=['accuracy'])
I have the above sequence classification model .There are around 14000 classes
The shape of a training example is (50,59) i.e 50 rows and 59 features. Data is divided in batches and the accuracy is good in one batch but the next batch destroys the accuracy.

Model predicts negative values as zeros

I am training a keras autoencoder model with the following structure:
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(MAX_CONTEXTS, 3)))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation='relu'))
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
My data is in the shape of (number_of_samples, 430, 3) and contains values from [-1.9236537371711413, 1.9242677998256246]. This data is already normalized. I then train this model:
history = model.fit(X, X, epochs=15, batch_size=2, verbose=1, shuffle=True, validation_split=0.2)
and get an accuracy of 95.03% (suspiciously high, but my problem now is something else). Now when I predict a sample of my data, the positive values are relatively good, close to what they are in the input, but the negative values are all rounded to 0. Is this a fault of the loss function that I chose? And if so which other loss function should I choose? Or do I have to scale my data differently?
This is because you apply relu activation at the output layer.

Why is my Keras model only producing the same prediction?

I'm having some trouble understanding why my Keras model has problems generating proper results (it now always returns 0). I have been able to find some others with this problem (ref 1, ref 2), but I haven't been able to understand the underlying cause.
Question: Why is my model only giving one, constant prediction?
Training Data Example
The last column is the prediction, 0 or 1.
32856500,1,1,200,6842314460,0
32800000,-1,0,0,0,0
32800000,-1,1,0,6845343222,0
32800000,-1,2,0,13692319489,0
32800000,-1,3,0,20539336035,0
32769900,-1,4,-30100,27389628085,0
32769900,-1,5,-30100,34239941481,0
32750000,-1,6,-50000,41091099905,0
32750000,-1,7,-50000,47945852379,1
Keras Code for Training
I'm using the sigmoid activation for the binary results. But I'm not sure if the issue lies here or in -for example- the binary_crossentropy or SGD optimizer.
def trainKerasModel(X, Y, path, dimensions):
# Create model
model = Sequential()
model.add(Dense(120, input_dim=dimensions, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(80, activation='sigmoid'))
model.add(Dense(60, activation='sigmoid'))
model.add(Dense(40, activation='sigmoid'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(12, activation='sigmoid'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(8, activation='sigmoid'))
model.add(Dense(6, activation='sigmoid'))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=EPOCHS, batch_size=BATCHSIZE)
# Evaluate
scores = model.evaluate(X, Y)
Helpers().Log(model.metrics_names[1], scores[1]*100)
# Save model
with open(path+".json", "w") as json_file:
json_file.write(model.to_json())
# serialize weights to HDF5
model.save_weights(path+".h5")
Helpers().Log("Saved model to disk")
someFilePath = "file.csv"
dataset = numpy.loadtxt(someFilePath, delimiter=",")
dimensions = len(dataset[0]) - 1
trainKerasModel(dataset[:,0:dimensions], dataset[:,dimensions], someFilePath, dimensions)
Keras Code for Predictions
model = model_from_json(loaded_model_json)
model.load_weights(someWeightsFile)
Xnew = preprocess_input(numpy.array([[32856500,1,1,200,6842314460,0], [32800000,-1,3,0,20539336035,0], [32750000,-1,7,-50000,47945852379,1]]))
Ynew = model.predict_classes(Xnew)
print(Ynew)
12 sigmoid fc layers will never learn anything.
Read theory.
maybe you sould try just 3 layers with tanh , and no af if tanh on input. -1 for false, 1 for true.
Also apply tanh to input datasincethey are not normalized. Also cross entropy has no sence if you have only one output.
plus extending 5 input to 120 features then 12 layers is horrible overfit. You should have here 3 layers like with ~20, 16,10 items, tanh, mse loss, ca 1e-3 1e-4 learning rate

Validation loss increase and constant training accuracy 1D cnn

I'm implementing a CNN for speech recognition. The input is MEL frequencies with shape (85314, 99, 1) and the labels are one-hot encoded with 35 output classes (shape: (85314, 35)). When I run the model the training accuracy (image 2) starts high and stays the same over the number of epochs, while the loss on validation (image 1) increases. Hence, it is probably overfitting but I cannot find the origin of the issue. I already decreased the learning rate and played with batch sizes but the results stays the same. Also the amount of training data should be sufficient. Is there another issue with my hyper-parameter settings somewhere?
My model and hyper-parameters are defined as follows:
#hyperparameters
input_dimension = 85314
learning_rate = 0.0000025
momentum = 0.85
hidden_initializer = random_uniform(seed=1)
dropout_rate = 0.2
# create model
model = Sequential()
model.add(Convolution1D(nb_filter=32, filter_length=3, input_shape=(99, 1), activation='relu'))
model.add(Convolution1D(nb_filter=16, filter_length=1, activation='relu'))
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(35, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['acc'])
history = model.fit(frequencies_train, labels_hot, validation_split=0.2, epochs=10, batch_size=50)
You are using "binary_crossentropy" for a problem of multiple classes. Change it to "categorical_crossentrop".
The accuracy computed with Keras using the binary_crossentropy with a model of more than 2 labels is just wrong.

Training with Adam gets slower each epoch

I am training a simple neural network in Keras with Theano backend consisting of 4 dense layers connected to a Merge layer and then to a softmax classifier layer. Using Adam for training, the first few epochs train in about 60s each (in the CPU) but, after that, the training time per epoch starts increasing, taking more than 400s by epoch 70, making it unusable.
Is there anything wrong with my code or is this suppose to happen?
This only happens when using Adam, not with sgd, adadelta, rmsprop or adagrad. I'd use any of the other methods but Adam produces far better results.
The code:
modela = Sequential()
modela.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelb = Sequential()
modelb.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modelc = Sequential()
modelc.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
modeld = Sequential()
modeld.add(Dense(700, input_dim=40, init='uniform', activation='relu'))
model = Sequential()
model.add(Merge([modela, modelb, modelc, modeld], mode='concat', concat_axis=1))
model.add(Dense(258, init='uniform', activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
hist = model.fit([Xa, Xb, Xc, Xd], Ycat, validation_split=.25, nb_epoch=80, batch_size=100, verbose=2)

Categories