Keras Dense Neural Net Predicting in Narrow Range - python

I've been playing with Numer.ai data, mostly as a way to improve my understanding of neural nets but I'm running into a problem that I can't seem to get past. No matter the configuration of my dense neural net, the output comes out in a tight range.
The input is 300 scaled feature columns (0 to 1) and the target is between 0 and 1 (values of 0, 0.25, 0.5, 0.75, and 1)
Here is my fully reproducible code:
import pandas as pd
# load data
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]
# select those columns out of the training dataset
X_train = training_data[feature_cols].to_numpy()
# select target variables
y_train = training_data.loc[:,'target'].to_numpy()
#same thing on validation data
val_data = tournament_data[tournament_data.data_type=='validation']
X_val = val_data[feature_cols]
y_val= val_data.loc[:,'target']
I've tried a number of different configurations in my neural network (different optimizers: adam and sgd, different learning rates 0.01 down to 0.0001, different neuron sizes, adding dropout: although, I didn't expect this to work because it seems to be a problem with bias, not variance, using linear, softmax, and sigmoid final layer activation functions: softmax produces negative values so that was an immediate non-starter, different batch sizes: as small as 16 and as large as 256, adding or removing batch normalization, shuffling the input data, and training for different numbers of epochs). Ultimately, the results are one of two things:
Predicted values are all the same number, usually somewhere in the 0.45 to 0.55 area
Predicted values are in a very narrow range, usually not more than 0.05 different. So the values are 0.45 to 0.55
I can't figure out what configuration changes I need to make to get this neural network to output predictions across a broader area of the 0 to 1 range.
from tensorflow.keras import models, layers
dropout_rate = 0.15
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu', kernel_regularizer='l2'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',metrics=['mae', 'mse'])
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=64,
epochs=200,
verbose=1)
# Prediction output
predictions_df = model.predict(X_val)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
# example output: 0.51895267 0.47968164 0.039271027

EDIT:
There is an impact on them when the following changes are made (tests run on batches size of 512, number of epochs 5, below results are only on training data) -
Loss set to mse instead of binary_crossentropy
Batch size 512 (for quick prototyping)
Epochs set to 5 (loss flattens after that)
Remove l2 regularization, and increase dropout
Set output activation -
With sigmoid -> Max:0.60, ​Min: 0.36
Without activation -> Max: 0.69, Min: 0.29
With relu -> Max: 0.73, Min: 0.10
Here is the code for testing purposes -
from tensorflow.keras import models, layers
dropout_rate = 0.50
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1024, activation = 'relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='relu'))
model.compile(optimizer='adam',
loss='mse',metrics=['mae'])
history = model.fit(X_train, y_train,
#validation_data=(X_val, y_val),
batch_size=512,
epochs=5,
verbose=1)
# Prediction output
predictions_df = model.predict(X_train)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
0.73566914 0.1063129 0.62935627
Proposed solutions
You are trying to solve a regression problem of predicting an arbitrary value between 0 to 1 (values of 0, 0.25, 0.5, 0.75, and 1), but trying to solve it as a binary classification problem using a sigmoid activation and a binary_crossentropy loss.
What you may want to try is using mse and/or removing any output activation (or better, use relu as suggested by #desertnaut). You could simply be underfitting as suggested by #xdurch0. Try with and without the regularization as well.
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu')
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1))
model.compile(optimizer='adam', loss='mse')
Check this table to help you with how to use losses and activations for different types of problem settings.
On a side note, the discrete nature of the values in your dependent variable, y, you can also consider reframing the problem as a multi-class single-label classification problem, if the downstream task allows it.

Related

Why am I getting horizontal line (almost zero) from neural network instead of the desired curve?

I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output.
Here is my code:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam')
model.summary()
return model
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1)
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
Here are the plots of NN prediction vs. target for both training and test data.
Training Data
Test Data
I have also tried different weight initializers (Xavier and He) with no luck!
I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below
model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu'))
Then to make the performance better do the followong
Increase the number of hidden neurons in the hidden layers
Increase the number of hidden layers.
If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution
EDIT 1
You also have to fit the estimator after cross validation like below
estimator.fit(X_train, y_train)
and then you can test on the test data as follow
prediction = estimator.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, prediction)

Training and Testing accuracy not increasing for a CNN followed by a RNN for signature verification

I'm currently working on online signature verification. The dataset has a variable shape of (x, 7) where x is the number of points a person used to sign their signature. I have the following model:
model = Sequential()
#CNN
model.add(Conv1D(filters=64, kernel_size=3, activation='sigmoid', input_shape=(None, 7)))
model.add(MaxPooling1D(pool_size=3))
model.add(Conv1D(filters=64, kernel_size=2, activation='sigmoid'))
#RNN
model.add(Masking(mask_value=0.0))
model.add(LSTM(8))
model.add(Dense(2, activation='softmax'))
opt = Adam(lr=0.0001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
print(model.fit(x_train, y_train, epochs=100, verbose=2, batch_size=50))
score, accuracy = model.evaluate(x_test,y_test, verbose=2)
print(score, accuracy)
I know it may not be the best model but this is the first time I'm building a neural network. I have to use a CNN and RNN as it is required for my honours project. At the moment, I achieve 0.5142 as the highest training accuracy and 0.54 testing accuracy. I have tried increasing the number of epochs, changing the activation function, add more layers, moving the layers around, changing the learning rate and changing the optimizer.
Please share some advice on changing my model or dataset. Any help is much appreciated.
For CNN-RNN, some promising things to try:
Conv1D layers: activation='relu', kernel_initializer='he_normal'
LSTM layer: activation='tanh', and recurrent_dropout=.1, .2, .3
Optimizer: Nadam, lr=2e-4 (Nadam may significantly outperform all other optimizers for RNNs)
batch_size: lower it. Unless you have 200+ batches in total, set batch_size=32; lower batch size better exploits the Stochastic mechanism of the optimizer and can improve generalization
Dropout: right after second Conv1D, with a rate .1, .2 - or, after first Conv1D, with a rate .25, .3, but only if you use SqueezeExcite (see below), else MaxPooling won't work as well
SqueezeExcite: shown to enhance all CNN performance across a large variety of tasks; Keras implementation you can use below
BatchNormalization: while your model isn't large, it's still deep, and may benefit from one BN layer right after second Conv1D
L2 weight decay: on first Conv1D, to prevent it from memorizing the input; try 1e-5, 1e-4, e.g. kernel_regularizer=l2(1e-4) # from keras.regularizers import l2
Preprocessing: make sure all data is normalized (or standardized if time-series), and batches are shuffled each epoch
def SqueezeExcite(_input):
filters = _input._keras_shape[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//16,activation='relu',
kernel_initializer='he_normal', use_bias=False)(se)
se = Dense(filters, activation='sigmoid',
kernel_initializer='he_normal', use_bias=False)(se)
return multiply([_input, se])
# Example usage
x = Conv1D(filters=64, kernel_size=4, activation='relu', kernel_initializer='he_normal')(x)
x = SqueezeExcite(x) # place after EACH Conv1D

Keras MLP classifier not learning

I have a data like this
there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc)
Out of 29 column 25 are numerical data 3 are ID(object) 1 is categorical
I dropped all the Id column(since they're all unique) and also encoded the categorical(matchType) data into one hot encoding
After doing all this I am left with 41 column(after one hot)
This is how i am creating data
X = df.drop(columns=['winPlacePerc'])
#creating a dataframe with only the target column
y = df[['winPlacePerc']]
Now my X have 40 column and this is my label data looks like
> y.head()
winPlacePerc
0 0.4444
1 0.6400
2 0.7755
3 0.1667
4 0.1875
I also happen to have very large amount of data like 400k data ,so for testing purpose I am training on fraction of that,doing that using sckit
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.997, random_state=32)
which gives almost 13k data for training
For model I'm using Keras sequential model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dense, Dropout, Activation
from keras.layers.normalization import BatchNormalization
from keras import optimizers
n_cols = X_train.shape[1]
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(n_cols,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Since my y-label data is between 0 & 1 ,I'm using sigmoid layer as my output layer
this is training & validation loss & accuracy plot
I also tried to convert label into binary using step function and binary cross entropy loss function
after that y-label data looks like
> y.head()
winPlacePerc
0 0
1 1
2 1
3 0
4 0
and changing loss function
model.compile(loss='binary_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
this method was more worse than previous
as you can see its not learning after certain epoch,and this also happens even if I am taking all data rather than fraction of it
after this did not work I also used dropout and tried adding more layer,but nothing works here
Now my question ,what I am doing wrong here is it wrong layer or in data how can I improve upon this?
To clear things out - this is a Regression problem so using accuracy doesn't really makes sense, because you will never be able to predict the exact value of 0.23124.
First of all you certainly want to normalise your values (not the one hot encoded) before passing it to the network. Try using a StandardScaler as a start.
Second, I would recommend changing the activation function in the output layer - try with linear and as a loss mean_squared_error should be fine.
In order to validate you model "accuracy" plot the predicted together with the actual - this should give you a chance of validating the results visually. However, that being said your loss already looks quite decent.
Check this post, should give you a good grasp of what (activation & loss functions) and when to use.
from sklearn.preprocessing import StandardScaler
n_cols = X_train.shape[1]
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_cols,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['mean_squared_error'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Normalize data
Add more depth to your network
Make the last layer linear
Accuracy is not a good metric for regression. Let's see an example
predictions: [0.9999999, 2.0000001, 3.000001]
ground Truth: [1, 2, 3]
Accuracy = No:of Correct / Total => 0 /3 = 0
Accuracy is 0, but the predictions are pretty close to the ground truth. On the other hand, MSE will be very low pointing that the deviation of the predictions from the ground truth is very less.

How to set input and output shape for binary variables when using SimpleRNN in Keras?

I'm using SimpleRNN to train a model with binary input of length 100 and binary output of length 100, meaning that my X.shape=(nsamples, 100, 1) and y.shape=(nsamples, 100)
When using the following code the model doesn't return error but learns very slowly and the loss stop decreasing after a number of epochs.
I also tried changing learning-rate which didn't help. I guess the critical problem may be with input and output shape.
model = Sequential()
model.add(
SimpleRNN(
output_dim=100,
input_shape=(100, 1),
return_sequences=False,
activation='sigmoid'
)
)
sgd = optimizers.SGD(lr=0.01, decay=0.0, nesterov=False)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X, y, epochs=500)

Add constraint to mean square error cost function in multi output regression

I am using mean square error to compute the loss function of a multi output regressor. I used a recurrent neural network model with the one to many architecture. My output vector is of size 6 (1*6) and the values are monotonic (non decreasing).
example:
y_i = [1,3,6,13,30,57,201]
I would like to force the model to learn this dependency. Hence adding a constraint to the cost function. I am getting an error equal to 300 on the validation set. I believe after editing the mean square error loss function i will be able to get a better performance.
I am using keras for the implementation. Here is the core model.
batchSize = 256
epochs = 20
samplesData = trainX
samplesLabels = trainY
print("Compiling neural network model...")
Model = Sequential()
Model.add(LSTM(input_shape = (98,),input_dim=98, output_dim=128, return_sequences=True))
Model.add(Dropout(0.2))
#Model.add(LSTM(128, return_sequences=True))
#Model.add(Dropout(0.2))
Model.add(TimeDistributedDense(7))
#rmsprop = rmsprop(lr=0.0, decay=0.0)
Model.compile(loss='mean_squared_error', optimizer='rmsprop')
Model.summary()
print("Training model...")
# learning schedule callback
#lrate = LearningRateScheduler(step_decay)
#callbacks_list = [lrate]
history = Model.fit(samplesData, samplesLabels, batch_size=batchSize, nb_epoch= epochs, verbose=1,
validation_split=0.2, show_accuracy=True)
print("model training has been completed.")
Any other tips concerning learning rate, decay, etc.. are appreciated.
Keep the mean squared error as just a metric. Use Smooth L1 loss instead. Here's my implementation.
#Define Smooth L1 Loss
def l1_smooth_loss(y_true, y_pred):
abs_loss = tf.abs(y_true - y_pred)
sq_loss = 0.5 * (y_true - y_pred)**2
l1_loss = tf.where(tf.less(abs_loss, 1.0), sq_loss, abs_loss - 0.5)
return tf.reduce_sum(l1_loss, -1)
#And
Model.compile(loss='l1_smooth_loss', optimizer='rmsprop')

Categories