I have a data like this
there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc)
Out of 29 column 25 are numerical data 3 are ID(object) 1 is categorical
I dropped all the Id column(since they're all unique) and also encoded the categorical(matchType) data into one hot encoding
After doing all this I am left with 41 column(after one hot)
This is how i am creating data
X = df.drop(columns=['winPlacePerc'])
#creating a dataframe with only the target column
y = df[['winPlacePerc']]
Now my X have 40 column and this is my label data looks like
> y.head()
winPlacePerc
0 0.4444
1 0.6400
2 0.7755
3 0.1667
4 0.1875
I also happen to have very large amount of data like 400k data ,so for testing purpose I am training on fraction of that,doing that using sckit
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.997, random_state=32)
which gives almost 13k data for training
For model I'm using Keras sequential model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dense, Dropout, Activation
from keras.layers.normalization import BatchNormalization
from keras import optimizers
n_cols = X_train.shape[1]
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(n_cols,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Since my y-label data is between 0 & 1 ,I'm using sigmoid layer as my output layer
this is training & validation loss & accuracy plot
I also tried to convert label into binary using step function and binary cross entropy loss function
after that y-label data looks like
> y.head()
winPlacePerc
0 0
1 1
2 1
3 0
4 0
and changing loss function
model.compile(loss='binary_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
this method was more worse than previous
as you can see its not learning after certain epoch,and this also happens even if I am taking all data rather than fraction of it
after this did not work I also used dropout and tried adding more layer,but nothing works here
Now my question ,what I am doing wrong here is it wrong layer or in data how can I improve upon this?
To clear things out - this is a Regression problem so using accuracy doesn't really makes sense, because you will never be able to predict the exact value of 0.23124.
First of all you certainly want to normalise your values (not the one hot encoded) before passing it to the network. Try using a StandardScaler as a start.
Second, I would recommend changing the activation function in the output layer - try with linear and as a loss mean_squared_error should be fine.
In order to validate you model "accuracy" plot the predicted together with the actual - this should give you a chance of validating the results visually. However, that being said your loss already looks quite decent.
Check this post, should give you a good grasp of what (activation & loss functions) and when to use.
from sklearn.preprocessing import StandardScaler
n_cols = X_train.shape[1]
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_cols,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['mean_squared_error'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Normalize data
Add more depth to your network
Make the last layer linear
Accuracy is not a good metric for regression. Let's see an example
predictions: [0.9999999, 2.0000001, 3.000001]
ground Truth: [1, 2, 3]
Accuracy = No:of Correct / Total => 0 /3 = 0
Accuracy is 0, but the predictions are pretty close to the ground truth. On the other hand, MSE will be very low pointing that the deviation of the predictions from the ground truth is very less.
Related
I've never used Keras or Tensorflow before, and was going through this example in the Visual Studio code documentation, but it seems to have a bug. The documentation shows that their trained model has a 61% accuracy against the test data, which matches what I get when I run it. However, no matter how you modify the neural network parameters, you always get the exact same accuracy. You can even skip the compile and fit commands and still get 61% accuracy.
It turns out that the prediction results they got were all zeroes (which happened to be right 61% of the time against the test data), and no matter how I modify the network it only outputs all zeroes, so it seems like there's some mistake in their code. But since I don't know Keras or TF, I haven't been able to figure out how to make it work.
Here's what I think all the relevant code is, but you can check the link above for everything:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data[['sex','pclass','age','relatives','fare']], data.survived, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(x_train)
X_test = sc.transform(x_test)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(5, kernel_initializer = 'uniform', activation = 'relu', input_dim = 5))
model.add(Dense(5, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer="adam", loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=50)
y_pred = np.argmax(model.predict(X_test), axis=-1)
print(metrics.accuracy_score(y_test, y_pred))
(as mentioned by #Frightera)
np.argmax() is generally used to get max index value when there are more than 2 class probabilities. As it is a binary classification model and you have used Sigmoid activation function in the last layer which always returns the output value between 0 to 1.
Which means
For small values (< 0.5), the output will be classified as zero (0),
and
for large values (>0.5), the result will be classified as 1.
Hence, you need to replace the final few lines of your code as below:
preds = model.predict(X_test)
y_pred = np.where(preds > 0.5, 1, 0)
#y_pred = np.argmax(model.predict(X_test), axis=-1)
print(metrics.accuracy_score(y_test, y_pred))
Output:
1.0
I've been playing with Numer.ai data, mostly as a way to improve my understanding of neural nets but I'm running into a problem that I can't seem to get past. No matter the configuration of my dense neural net, the output comes out in a tight range.
The input is 300 scaled feature columns (0 to 1) and the target is between 0 and 1 (values of 0, 0.25, 0.5, 0.75, and 1)
Here is my fully reproducible code:
import pandas as pd
# load data
training_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_training_data.csv.xz")
tournament_data = pd.read_csv("https://numerai-public-datasets.s3-us-west-2.amazonaws.com/latest_numerai_tournament_data.csv.xz")
feature_cols = training_data.columns[training_data.columns.str.startswith('feature')]
# select those columns out of the training dataset
X_train = training_data[feature_cols].to_numpy()
# select target variables
y_train = training_data.loc[:,'target'].to_numpy()
#same thing on validation data
val_data = tournament_data[tournament_data.data_type=='validation']
X_val = val_data[feature_cols]
y_val= val_data.loc[:,'target']
I've tried a number of different configurations in my neural network (different optimizers: adam and sgd, different learning rates 0.01 down to 0.0001, different neuron sizes, adding dropout: although, I didn't expect this to work because it seems to be a problem with bias, not variance, using linear, softmax, and sigmoid final layer activation functions: softmax produces negative values so that was an immediate non-starter, different batch sizes: as small as 16 and as large as 256, adding or removing batch normalization, shuffling the input data, and training for different numbers of epochs). Ultimately, the results are one of two things:
Predicted values are all the same number, usually somewhere in the 0.45 to 0.55 area
Predicted values are in a very narrow range, usually not more than 0.05 different. So the values are 0.45 to 0.55
I can't figure out what configuration changes I need to make to get this neural network to output predictions across a broader area of the 0 to 1 range.
from tensorflow.keras import models, layers
dropout_rate = 0.15
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu', kernel_regularizer='l2'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',metrics=['mae', 'mse'])
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
batch_size=64,
epochs=200,
verbose=1)
# Prediction output
predictions_df = model.predict(X_val)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
# example output: 0.51895267 0.47968164 0.039271027
EDIT:
There is an impact on them when the following changes are made (tests run on batches size of 512, number of epochs 5, below results are only on training data) -
Loss set to mse instead of binary_crossentropy
Batch size 512 (for quick prototyping)
Epochs set to 5 (loss flattens after that)
Remove l2 regularization, and increase dropout
Set output activation -
With sigmoid -> Max:0.60, ​Min: 0.36
Without activation -> Max: 0.69, Min: 0.29
With relu -> Max: 0.73, Min: 0.10
Here is the code for testing purposes -
from tensorflow.keras import models, layers
dropout_rate = 0.50
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1024, activation = 'relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1, activation='relu'))
model.compile(optimizer='adam',
loss='mse',metrics=['mae'])
history = model.fit(X_train, y_train,
#validation_data=(X_val, y_val),
batch_size=512,
epochs=5,
verbose=1)
# Prediction output
predictions_df = model.predict(X_train)
predictions_df = predictions_df.reshape(len(predictions_df))
pred_max = predictions_df.max()
pred_min = predictions_df.min()
pred_range = pred_max - pred_min
print(pred_max, pred_min, pred_range)
0.73566914 0.1063129 0.62935627
Proposed solutions
You are trying to solve a regression problem of predicting an arbitrary value between 0 to 1 (values of 0, 0.25, 0.5, 0.75, and 1), but trying to solve it as a binary classification problem using a sigmoid activation and a binary_crossentropy loss.
What you may want to try is using mse and/or removing any output activation (or better, use relu as suggested by #desertnaut). You could simply be underfitting as suggested by #xdurch0. Try with and without the regularization as well.
model = models.Sequential()
model.add(layers.Dense(512, input_shape=(X_train.shape[1],)))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1028, activation = 'relu')
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
model.add(layers.Dense(1))
model.compile(optimizer='adam', loss='mse')
Check this table to help you with how to use losses and activations for different types of problem settings.
On a side note, the discrete nature of the values in your dependent variable, y, you can also consider reframing the problem as a multi-class single-label classification problem, if the downstream task allows it.
I am trying to use neural network for my regression problem in python but the output of the neural network is a straight horizontal line which is zero. I have one input and obviously one output.
Here is my code:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error',metrics=['mse'], optimizer='adam')
model.summary()
return model
# evaluate model
estimator = KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=64,validation_split = 0.2, verbose=1)
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
Here are the plots of NN prediction vs. target for both training and test data.
Training Data
Test Data
I have also tried different weight initializers (Xavier and He) with no luck!
I really appreciate your help
First of all correct your syntax while adding dense layers in model remove the double equal == with single equal = with kernal_initilizer like below
model.add(Dense(1, input_dim=1, kernel_initializer ='normal', activation='relu'))
Then to make the performance better do the followong
Increase the number of hidden neurons in the hidden layers
Increase the number of hidden layers.
If still you have same problem then try to change the optimizer and activation function. Tuning the hyperparameters may help you in converging to the solution
EDIT 1
You also have to fit the estimator after cross validation like below
estimator.fit(X_train, y_train)
and then you can test on the test data as follow
prediction = estimator.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, prediction)
I wanna use eight features to predict a target feature, and while I am using keras, I got accuracy to be zeros all the time. I am new to machine learning, and I am quite confused.
Have tried different activation, I thought this could be a regression problem so I used 'linear' as the last activation function, and it turns out that the accuracy is still zero
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import pandas as pd
# Step 2 - Load our data
zeolite_13X_error = pd.read_csv("zeolite_13X_error.csv", delimiter=",")
dataset = zeolite_13X_error.values
X = dataset[:, 0:8]
Y = dataset[:, 10] # Purity
min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)
X_val, X_text, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
# Building and training first NN
model = Sequential([
Dense(32, activation='relu', input_shape=(8,)),
Dense(32, activation='relu'),
Dense(1, activation='linear'),
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
batch_size=32, epochs=10,
validation_data=(X_val, Y_val))
If you decide to treat this as a regression problem, then
Your loss should be mean_squared_error, or some other loss appropriate for regression, but not binary_crossentropy, which is appropriate for binary classification only, and
Accuracy is meaningless - it is meaningful only for classification settings; in regression settings, we normally use the loss itself for performance evaluation - see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)? for more.
If you decide to tackle this as a classification problem, you should change the activation of your last layer to sigmoid.
In any case, the combination you show here - loss='binary_crossentropy' and activation='linear' for the single-node last layer - is meaningless.
Check the output of your model to check the values. The model is predicting probabilities, instead of binary 0/1 decision which i believe is your case as you are using accuracy as a metric. If the model is predicting probabilities then convert them into 0 or 1 by rounding them based on threshold (of your choice i.e. if prediction > 0.5 then 1 else 0).
Also increase the number of epochs. Also use sigmoid activation in the output layer.
For a school project, I'm trying to predict data using the keras framework, but it's returning 'nan' loss and values when I try to get predicted data.
Source code :
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=5)
# create model
model = Sequential()
model.add(Dense(950, input_shape=(425,), activation='relu'))
model.add(Dense(425, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd')
# Fit the model
model.fit(X_train, y_train, epochs=20, batch_size=1, verbose=1)
#evaluate the model
y_pred = model.predict(X_test)
score = model.evaluate(X_test, y_test,verbose=1)
print(score)
# calculate predictions
predictions = model.predict(X_pred)
Data :
X_train and X_test are (panda)dataframes of 5000 rows(nber of samples) * 425 columns (number of dimensions).
y_train and y_test look like :
array([ 1.17899644, 1.46080518, 0.9662137 , ..., 2.40157461,
0.53870386, 1.3192718 ])
Can you help me with that ?
Thank you for you help!
Usually, this means that something converges to infinity. As #desertnaut pointed out in the comment, reducing the learning rate might help.
But the root of the issue is your input data. What do these 425 data points mean? Are they from different sources, different features, different parameters? Finding outliners or normalizing the data, could help.
Your code looks fine otherwise.
Make sure your target output is in range (0, 1) as you have sigmoid in the last layer.
sigmoid has an output between zero and one so if the target output is not in this range then (a) change the activation function or (b) normalize outputs in the required range.
Make sure the purpose of this model is the regression.
After considering the above three points, play around with learning rate (decrease) and the optimiser (replace with any other).
Try changing your optimizer to 'Adam' instead of SGD
You initialized your SGD optimizer in variable sgd but you're not using it in compile