I'm having trouble making an MLP in MxNet learn. It tends to output fairly constant values, only occasionally outputting anything different. I'm using the Pima Indians dataset to do binary classification, but no matter what I do (normalisation, scaling, changing activations, objective functions, number of neurons, batch size, epochs) it wouldn't produce anything useful.
The same MLP in Keras works fine.
Here's the MxNet code:
batch_size=10
train_iter=mx.io.NDArrayIter(mx.nd.array(df_train), mx.nd.array(y_train),
batch_size, shuffle=True)
val_iter=mx.io.NDArrayIter(mx.nd.array(df_test), mx.nd.array(y_test), batch_size)
data=mx.sym.var('data')
fc1 = mx.sym.FullyConnected(data=data, num_hidden=12)
act1 = mx.sym.Activation(data=fc1, act_type='relu')
fc2 = mx.sym.FullyConnected(data=act1, num_hidden=8)
act2 = mx.sym.Activation(data=fc2, act_type='relu')
fcfinal = mx.sym.FullyConnected(data=act2, num_hidden=2)
mlp = mx.sym.SoftmaxOutput(data=fcfinal, name='softmax')
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
mlp_model.fit(train_iter,
eval_data=val_iter,
optimizer='sgd',
eval_metric='ce',
num_epoch=150)
And the same MLP in Keras:
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(df_train_res, y_train_res)
I would recommend that you initialize your parameters before you start training. Having all parameters start at zero is not ideal.
You could add the following as a parameter to your model.fit()
initializer=mx.init.Xavier(rnd_type='gaussian')
See here for more discussion
https://mxnet.incubator.apache.org/api/python/optimization.html
Related
I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)
I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output.
Here is my code:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam')
model.summary()
return model
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1)
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
Here are the plots of NN prediction vs. target for both training and test data.
Training Data
Test Data
I have also tried different weight initializers (Xavier and He) with no luck!
I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below
model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu'))
Then to make the performance better do the followong
Increase the number of hidden neurons in the hidden layers
Increase the number of hidden layers.
If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution
EDIT 1
You also have to fit the estimator after cross validation like below
estimator.fit(X_train, y_train)
and then you can test on the test data as follow
prediction = estimator.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, prediction)
I'm having some trouble understanding why my Keras model has problems generating proper results (it now always returns 0). I have been able to find some others with this problem (ref 1, ref 2), but I haven't been able to understand the underlying cause.
Question: Why is my model only giving one, constant prediction?
Training Data Example
The last column is the prediction, 0 or 1.
32856500,1,1,200,6842314460,0
32800000,-1,0,0,0,0
32800000,-1,1,0,6845343222,0
32800000,-1,2,0,13692319489,0
32800000,-1,3,0,20539336035,0
32769900,-1,4,-30100,27389628085,0
32769900,-1,5,-30100,34239941481,0
32750000,-1,6,-50000,41091099905,0
32750000,-1,7,-50000,47945852379,1
Keras Code for Training
I'm using the sigmoid activation for the binary results. But I'm not sure if the issue lies here or in -for example- the binary_crossentropy or SGD optimizer.
def trainKerasModel(X, Y, path, dimensions):
# Create model
model = Sequential()
model.add(Dense(120, input_dim=dimensions, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(80, activation='sigmoid'))
model.add(Dense(60, activation='sigmoid'))
model.add(Dense(40, activation='sigmoid'))
model.add(Dense(20, activation='sigmoid'))
model.add(Dense(12, activation='sigmoid'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(8, activation='sigmoid'))
model.add(Dense(6, activation='sigmoid'))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=EPOCHS, batch_size=BATCHSIZE)
# Evaluate
scores = model.evaluate(X, Y)
Helpers().Log(model.metrics_names[1], scores[1]*100)
# Save model
with open(path+".json", "w") as json_file:
json_file.write(model.to_json())
# serialize weights to HDF5
model.save_weights(path+".h5")
Helpers().Log("Saved model to disk")
someFilePath = "file.csv"
dataset = numpy.loadtxt(someFilePath, delimiter=",")
dimensions = len(dataset[0]) - 1
trainKerasModel(dataset[:,0:dimensions], dataset[:,dimensions], someFilePath, dimensions)
Keras Code for Predictions
model = model_from_json(loaded_model_json)
model.load_weights(someWeightsFile)
Xnew = preprocess_input(numpy.array([[32856500,1,1,200,6842314460,0], [32800000,-1,3,0,20539336035,0], [32750000,-1,7,-50000,47945852379,1]]))
Ynew = model.predict_classes(Xnew)
print(Ynew)
12 sigmoid fc layers will never learn anything.
Read theory.
maybe you sould try just 3 layers with tanh , and no af if tanh on input. -1 for false, 1 for true.
Also apply tanh to input datasincethey are not normalized. Also cross entropy has no sence if you have only one output.
plus extending 5 input to 120 features then 12 layers is horrible overfit. You should have here 3 layers like with ~20, 16,10 items, tanh, mse loss, ca 1e-3 1e-4 learning rate
I'm trying to do transfer learning in Keras. I set up a ResNet50 network set to not trainable with some extra layers:
# Image input
model = Sequential()
model.add(ResNet50(include_top=False, pooling='avg')) # output is 2048
model.add(Dropout(0.05))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(512, activation='relu'))
model.add(Dense(7, activation='softmax'))
model.layers[0].trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Then I create input data: x_batch using the ResNet50 preprocess_input function, along with the one hot encoded labels y_batch and do the fitting as so:
model.fit(x_batch,
y_batch,
epochs=nb_epochs,
batch_size=64,
shuffle=True,
validation_split=0.2,
callbacks=[lrate])
Training accuracy gets close to 100% after ten or so epochs, but validation accuracy actually decreases from around 50% to 30% with validation loss steadily increasing.
However if I instead create a network with just the last layers:
# Vector input
model2 = Sequential()
model2.add(Dropout(0.05, input_shape=(2048,)))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model2.summary()
and feed in the output of the ResNet50 prediction:
resnet = ResNet50(include_top=False, pooling='avg')
x_batch = resnet.predict(x_batch)
Then validation accuracy gets up to around 85%... What is going on? Why won't the image input method work?
Update:
This problem is really bizarre. If I change ResNet50 to VGG19 it seems to work ok.
After a lot of googling I found that the problem is to do with the Batch Normalisation layers in ResNet. There are no batch normalisation layers in VGGNet which is why it works for that topology.
There is a pull request to fix this in Keras here, which explains in more detail:
Assume we use one of the pre-trained CNNs of Keras and we want to fine-tune it. Unfortunately, we get no guarantees that the mean and variance of our new dataset inside the BN layers will be similar to the ones of the original dataset. As a result, if we fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference the top layers will receive data which are scaled using the mean/variance of the original dataset. This discrepancy can lead to reduced accuracy.
This means that the BN layers are adjusting to the training data, however when validation is performed, the original parameters of the BN layers are used. From what I can tell, the fix is to allow the frozen BN layers to use the updated mean and variance from training.
A work around is to pre-compute the ResNet output. In fact, this decreases training time considerably, as we are not repeating that part of the calculation.
you can try :
Res = keras.applications.resnet.ResNet50(include_top=False,
weights='imagenet', input_shape=(IMG_SIZE , IMG_SIZE , 3 ) )
# Freeze the layers except the last 4 layers
for layer in vgg_conv.layers :
layer.trainable = False
# Check the trainable status of the individual layers
for layer in vgg_conv.layers:
print(layer, layer.trainable)
# Vector input
model2 = Sequential()
model2.add(Res)
model2.add(Flatten())
model2.add(Dropout(0.05 ))
model2.add(Dense(512, activation='relu'))
model2.add(Dropout(0.15))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(7, activation='softmax'))
model2.compile(optimizer='adam', loss='categorical_crossentropy', metrics =(['accuracy'])
model2.summary()
I'm implementing a CNN for speech recognition. The input is MEL frequencies with shape (85314, 99, 1) and the labels are one-hot encoded with 35 output classes (shape: (85314, 35)). When I run the model the training accuracy (image 2) starts high and stays the same over the number of epochs, while the loss on validation (image 1) increases. Hence, it is probably overfitting but I cannot find the origin of the issue. I already decreased the learning rate and played with batch sizes but the results stays the same. Also the amount of training data should be sufficient. Is there another issue with my hyper-parameter settings somewhere?
My model and hyper-parameters are defined as follows:
#hyperparameters
input_dimension = 85314
learning_rate = 0.0000025
momentum = 0.85
hidden_initializer = random_uniform(seed=1)
dropout_rate = 0.2
# create model
model = Sequential()
model.add(Convolution1D(nb_filter=32, filter_length=3, input_shape=(99, 1), activation='relu'))
model.add(Convolution1D(nb_filter=16, filter_length=1, activation='relu'))
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(35, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['acc'])
history = model.fit(frequencies_train, labels_hot, validation_split=0.2, epochs=10, batch_size=50)
You are using "binary_crossentropy" for a problem of multiple classes. Change it to "categorical_crossentrop".
The accuracy computed with Keras using the binary_crossentropy with a model of more than 2 labels is just wrong.