I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)
I was training a word-level sentence generation model and as its training started to reach towards the end of total iterations, the accuracy started to go down and up multiple times and formed a wavy pattern in the history plot. I am unable to understand why is this happening. Is it because of overfitting? Do I need to add some dropout layers to my model?
My model:
def rnn_model():
model = Sequential()
model.add(Embedding(uniq_vals, 50, input_length=s_len))
model.add(SimpleRNN(25, return_sequences=True))
model.add(SimpleRNN(25))
model.add(Dense(50, activation='relu'))
model.add(Dense(uniq_vals, activation='softmax'))
return model
model = rnn_model()
model.compile(loss = 'categorical_crossentropy',optimizer = 'adam', metrics = ['accuracy'])
model.summary()
Also I have set the batch_size = 128 and epochs = 150
Accuracy and loss plot:
model=Sequential()
model.add(LSTM(32, input_shape=(n_timesteps,n_features)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(lr=0.0001),
metrics=['accuracy'])
I have the above sequence classification model .There are around 14000 classes
The shape of a training example is (50,59) i.e 50 rows and 59 features. Data is divided in batches and the accuracy is good in one batch but the next batch destroys the accuracy.
I'm having some trouble understanding why my Keras model has problems generating proper results (it now always returns 0). I have been able to find some others with this problem (ref 1, ref 2), but I haven't been able to understand the underlying cause.
Question: Why is my model only giving one, constant prediction?
Training Data Example
The last column is the prediction, 0 or 1.
32856500,1,1,200,6842314460,0
32800000,-1,0,0,0,0
32800000,-1,1,0,6845343222,0
32800000,-1,2,0,13692319489,0
32800000,-1,3,0,20539336035,0
32769900,-1,4,-30100,27389628085,0
32769900,-1,5,-30100,34239941481,0
32750000,-1,6,-50000,41091099905,0
32750000,-1,7,-50000,47945852379,1
Keras Code for Training
I'm using the sigmoid activation for the binary results. But I'm not sure if the issue lies here or in -for example- the binary_crossentropy or SGD optimizer.
def trainKerasModel(X, Y, path, dimensions):
# Create model
model = Sequential()
model.add(Dense(120, input_dim=dimensions, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(80, activation='sigmoid'))
model.add(Dense(60, activation='sigmoid'))
model.add(Dense(40, activation='sigmoid'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(12, activation='sigmoid'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(8, activation='sigmoid'))
model.add(Dense(6, activation='sigmoid'))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=EPOCHS, batch_size=BATCHSIZE)
# Evaluate
scores = model.evaluate(X, Y)
Helpers().Log(model.metrics_names[1], scores[1]*100)
# Save model
with open(path+".json", "w") as json_file:
json_file.write(model.to_json())
# serialize weights to HDF5
model.save_weights(path+".h5")
Helpers().Log("Saved model to disk")
someFilePath = "file.csv"
dataset = numpy.loadtxt(someFilePath, delimiter=",")
dimensions = len(dataset[0]) - 1
trainKerasModel(dataset[:,0:dimensions], dataset[:,dimensions], someFilePath, dimensions)
Keras Code for Predictions
model = model_from_json(loaded_model_json)
model.load_weights(someWeightsFile)
Xnew = preprocess_input(numpy.array([[32856500,1,1,200,6842314460,0], [32800000,-1,3,0,20539336035,0], [32750000,-1,7,-50000,47945852379,1]]))
Ynew = model.predict_classes(Xnew)
print(Ynew)
12 sigmoid fc layers will never learn anything.
Read theory.
maybe you sould try just 3 layers with tanh , and no af if tanh on input. -1 for false, 1 for true.
Also apply tanh to input datasincethey are not normalized. Also cross entropy has no sence if you have only one output.
plus extending 5 input to 120 features then 12 layers is horrible overfit. You should have here 3 layers like with ~20, 16,10 items, tanh, mse loss, ca 1e-3 1e-4 learning rate
I'm trying to fit a simple Neural Network to predict a binary target using keras-1.0.6. The output saturates after the very first epoch. I try playing around with the learning rate (from 0.1 to 1e-6), decay and momentum of the SGD optimizer and with the layers (10-512 hidden neurons and 1-2 hidden layers) and their activation functions of the network, but nothing worked - the prediction accuracy was the same.
My training set has shape (13602, 115) and my validation set has shape (3400,115). The target variable y_train and y_test have values encoded as 1 and 0 (60% are 1's and 40% are 0's). At first, the data was not normalized though when I normalized it I got the same results.
Verifying the output, I see that the model is predicting only 1 class. Sometimes it predicts only 1's and other times only 0's (when I tweak the model).
I also tried to encode the target variable in the shape (n_sample, 2) but the output was the same.
I followed some questions here and googling that suggests tunning the learning rate and not using 'softmax' activation but couldn't improve the results.
Some of the models I tried is below:
The simplest model:
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
Model 2:
model = Sequential()
model.add(Dense(512, input_dim=X_train.shape[1]))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
Model 3
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
Model 4
model.add(Dense(64, input_dim=X_train.shape[1], init='uniform', activation='sigmoid'))
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
and to compile and fit the model:
sgd = SGD(lr=0.01, decay=0.1, momentum=0.0, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train2, nb_epoch=5, batch_size=50, validation_split=0.2)
model.predict(X_test)
The output gives either [0,0,0,0,0,0,0,...] or [1,1,1,1,1,1,1,1,...]
Does anybody have a clue on what's going on here?