I wanna use eight features to predict a target feature, and while I am using keras, I got accuracy to be zeros all the time. I am new to machine learning, and I am quite confused.
Have tried different activation, I thought this could be a regression problem so I used 'linear' as the last activation function, and it turns out that the accuracy is still zero
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import pandas as pd
# Step 2 - Load our data
zeolite_13X_error = pd.read_csv("zeolite_13X_error.csv", delimiter=",")
dataset = zeolite_13X_error.values
X = dataset[:, 0:8]
Y = dataset[:, 10] # Purity
min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)
X_val, X_text, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
# Building and training first NN
model = Sequential([
Dense(32, activation='relu', input_shape=(8,)),
Dense(32, activation='relu'),
Dense(1, activation='linear'),
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
batch_size=32, epochs=10,
validation_data=(X_val, Y_val))
If you decide to treat this as a regression problem, then
Your loss should be mean_squared_error, or some other loss appropriate for regression, but not binary_crossentropy, which is appropriate for binary classification only, and
Accuracy is meaningless - it is meaningful only for classification settings; in regression settings, we normally use the loss itself for performance evaluation - see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)? for more.
If you decide to tackle this as a classification problem, you should change the activation of your last layer to sigmoid.
In any case, the combination you show here - loss='binary_crossentropy' and activation='linear' for the single-node last layer - is meaningless.
Check the output of your model to check the values. The model is predicting probabilities, instead of binary 0/1 decision which i believe is your case as you are using accuracy as a metric. If the model is predicting probabilities then convert them into 0 or 1 by rounding them based on threshold (of your choice i.e. if prediction > 0.5 then 1 else 0).
Also increase the number of epochs. Also use sigmoid activation in the output layer.
Related
I've never used Keras or Tensorflow before, and was going through this example in the Visual Studio code documentation, but it seems to have a bug. The documentation shows that their trained model has a 61% accuracy against the test data, which matches what I get when I run it. However, no matter how you modify the neural network parameters, you always get the exact same accuracy. You can even skip the compile and fit commands and still get 61% accuracy.
It turns out that the prediction results they got were all zeroes (which happened to be right 61% of the time against the test data), and no matter how I modify the network it only outputs all zeroes, so it seems like there's some mistake in their code. But since I don't know Keras or TF, I haven't been able to figure out how to make it work.
Here's what I think all the relevant code is, but you can check the link above for everything:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data[['sex','pclass','age','relatives','fare']], data.survived, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(x_train)
X_test = sc.transform(x_test)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(5, kernel_initializer = 'uniform', activation = 'relu', input_dim = 5))
model.add(Dense(5, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer="adam", loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=50)
y_pred = np.argmax(model.predict(X_test), axis=-1)
print(metrics.accuracy_score(y_test, y_pred))
(as mentioned by #Frightera)
np.argmax() is generally used to get max index value when there are more than 2 class probabilities. As it is a binary classification model and you have used Sigmoid activation function in the last layer which always returns the output value between 0 to 1.
Which means
For small values (< 0.5), the output will be classified as zero (0),
and
for large values (>0.5), the result will be classified as 1.
Hence, you need to replace the final few lines of your code as below:
preds = model.predict(X_test)
y_pred = np.where(preds > 0.5, 1, 0)
#y_pred = np.argmax(model.predict(X_test), axis=-1)
print(metrics.accuracy_score(y_test, y_pred))
Output:
1.0
I have using a dataset from Kaggle focused on heart disease predictions with 8 inputs and binary outputs. I have tested the data using logistic regression, random forests, SVM, and a simple neural network. Surprisingly, the best model is random forests. However, I'm having trouble with the neural network. The result, from the data, is either a 1 or a 0, so binary. The neural network works, but the results are odd, at least to me.
import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from keras.models import Sequential
from keras.layers import Dense
NN = Sequential()
NN = keras.Sequential([
keras.layers.Flatten(input_shape=(8,)),
keras.layers.Dense(16, activation=tf.nn.relu),
keras.layers.Dense(16, activation=tf.nn.relu),
keras.layers.Dense(1, activation=tf.nn.sigmoid),
])
NN.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
NN.fit(X_train, y_train, epochs=50, batch_size=1)
test_loss, test_acc = NN.evaluate(X_test, y_test)
y_predd = NN.predict(X_test)
I expect my results to be a value between 0 and 1, so anything over 0.5 is 1 and anything less is 0. But my prediction of the test set gives me values from 0 to 9. Any idea what I did wrong? Could I be overfitting the data?
I tested the network on a known datapoint:
X_new = [[70,1,0,145,174,0,1,125]]
X_new = sc.fit_transform(X_new)
NN.predict(X_new)
and I get the following result, which is wrong: 0.99--it should be 0.
I have the below code which works perfectly for a neural network. I know I need the confusion matrix library to find the false positive and false negative rates but I'm not sure how to do it as I'm no expert in programming. Can someone help please?
import pandas as pd
from sklearn import preprocessing
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values
# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]
# scale the dataset using sigmoid function min_max_scaler so that all the input features lie between 0 and 1
min_max_scaler = preprocessing.MinMaxScaler()
# store the dataset into an array
X_scale = min_max_scaler.fit_transform(X)
# split the dataset into 30% testing and the rest to train
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)
# split the val_and_test size equally to the validation set and the test set.
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
# specify the sequential model and describe the layers that will form architecture of the neural network
model = Sequential([Dense(7, activation='relu', input_shape=(7,)), Dense(32, activation='relu'), Dense(5, activation='relu'), Dense(1, activation='sigmoid'),])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# training the data
hist = model.fit(X_train, Y_train, batch_size=32, epochs=100, validation_data=(X_val, Y_val))
# to find the accuracy of the mf the classifier
scores = model.evaluate(X_test, Y_test)
print("Accuracy: %.2f%%" % (scores[1]*100))
This is the code provided in the answer below. response, model are both highlighted with red for unreslove references
from keras import models
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values
# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]
# Splitting into Train and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset,
response,
test_size = 0.2,
random_state = 0)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim =7 ))
model.add(Dropout(0.5))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)
# Train model
scaler = StandardScaler()
classifier.fit(scaler.fit_transform(X_train.values), y_train)
# Summary of neural network
classifier.summary()
# Predicting the Test set results & Giving a threshold probability
y_prediction = classifier.predict_classes(scaler.transform(X_test.values))
print ("\n\naccuracy" , np.sum(y_prediction == y_test) / float(len(y_test)))
y_prediction = (y_prediction > 0.5)
#Let's see how our model performed
from sklearn.metrics import classification_report
print(classification_report(y_test, y_prediction))
Your input to confusion_matrix must be an array of int not one hot encodings.
# Predicting the Test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
Below output would have come in that manner so by giving a probability threshold .5 will transform this to Binary.
output(y_pred):
[0.87812372 0.77490434 0.30319547 0.84999743]
The sklearn.metrics.accuracy_score(y_true, y_pred) method defines y_pred as:
y_pred : 1d array-like, or label indicator array / sparse matrix. Predicted labels, as returned by a classifier.
Which means y_pred has to be an array of 1's or 0's (predicated labels). They should not be probabilities.
the root cause of your error is a theoretical and not computational issue: you are trying to use a classification metric (accuracy) in a regression (i.e. numeric prediction) model (Neural Logistic Model), which is meaningless.
Just like the majority of performance metrics, accuracy compares apples to apples (i.e true labels of 0/1 with predictions again of 0/1); so, when you ask the function to compare binary true labels (apples) with continuous predictions (oranges), you get an expected error, where the message tells you exactly what the problem is from a computational point of view:
Classification metrics can't handle a mix of binary and continuous target
Despite that the message doesn't tell you directly that you are trying to compute a metric that is invalid for your problem (and we shouldn't actually expect it to go that far), it is certainly a good thing that scikit-learn at least gives you a direct and explicit warning that you are attempting something wrong; this is not necessarily the case with other frameworks - see for example the behavior of Keras in a very similar situation, where you get no warning at all, and one just ends up complaining for low "accuracy" in a regression setting...
from keras import models
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values
# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]
# Splitting into Train and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset,
response,
test_size = 0.2,
random_state = 0)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim =7 ))
model.add(Dropout(0.5))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)
# Train model
scaler = StandardScaler()
classifier.fit(scaler.fit_transform(X_train.values), y_train)
# Summary of neural network
classifier.summary()
# Predicting the Test set results & Giving a threshold probability
y_prediction = classifier.predict_classes(scaler.transform(X_test.values))
print ("\n\naccuracy" , np.sum(y_prediction == y_test) / float(len(y_test)))
y_prediction = (y_prediction > 0.5)
## EXTRA: Confusion Matrix Visualize
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred) # rows = truth, cols = prediction
df_cm = pd.DataFrame(cm, index = (0, 1), columns = (0, 1))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))
#Let's see how our model performed
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
As you already loaded the confusion_matrix from scikit.learn, you can use this one:
cutoff = 0.5
y_predict = model.predict(x_test)
y_pred_classes = np.zeros_like(y_pred) # initialise a matrix full with zeros
y_pred_classes[y_pred > cutoff] = 1
y_test_classes = np.zeros_like(y_pred)
y_test_classes[y_test > cutoff] = 1
print(confusion_matrix(y_test_classes, y_pred_classes)
the confusion matrix always is ordered like this:
True Positives False negatives
False Positives True negatives
for tn and so on you can run this:
tn, fp, fn, tp = confusion_matrix(y_test_classes, y_pred_classes).ravel()
(tn, fp, fn, tp)
I have a data like this
there are 29 column ,out of which I have to predict winPlacePerc(extreme end of dataframe) which is between 1(high perc) to 0(low perc)
Out of 29 column 25 are numerical data 3 are ID(object) 1 is categorical
I dropped all the Id column(since they're all unique) and also encoded the categorical(matchType) data into one hot encoding
After doing all this I am left with 41 column(after one hot)
This is how i am creating data
X = df.drop(columns=['winPlacePerc'])
#creating a dataframe with only the target column
y = df[['winPlacePerc']]
Now my X have 40 column and this is my label data looks like
> y.head()
winPlacePerc
0 0.4444
1 0.6400
2 0.7755
3 0.1667
4 0.1875
I also happen to have very large amount of data like 400k data ,so for testing purpose I am training on fraction of that,doing that using sckit
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.997, random_state=32)
which gives almost 13k data for training
For model I'm using Keras sequential model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dense, Dropout, Activation
from keras.layers.normalization import BatchNormalization
from keras import optimizers
n_cols = X_train.shape[1]
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(n_cols,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Since my y-label data is between 0 & 1 ,I'm using sigmoid layer as my output layer
this is training & validation loss & accuracy plot
I also tried to convert label into binary using step function and binary cross entropy loss function
after that y-label data looks like
> y.head()
winPlacePerc
0 0
1 1
2 1
3 0
4 0
and changing loss function
model.compile(loss='binary_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
this method was more worse than previous
as you can see its not learning after certain epoch,and this also happens even if I am taking all data rather than fraction of it
after this did not work I also used dropout and tried adding more layer,but nothing works here
Now my question ,what I am doing wrong here is it wrong layer or in data how can I improve upon this?
To clear things out - this is a Regression problem so using accuracy doesn't really makes sense, because you will never be able to predict the exact value of 0.23124.
First of all you certainly want to normalise your values (not the one hot encoded) before passing it to the network. Try using a StandardScaler as a start.
Second, I would recommend changing the activation function in the output layer - try with linear and as a loss mean_squared_error should be fine.
In order to validate you model "accuracy" plot the predicted together with the actual - this should give you a chance of validating the results visually. However, that being said your loss already looks quite decent.
Check this post, should give you a good grasp of what (activation & loss functions) and when to use.
from sklearn.preprocessing import StandardScaler
n_cols = X_train.shape[1]
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_cols,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error',
optimizer='Adam',
metrics=['mean_squared_error'])
model.fit(X_train, y_train,
epochs=50,
validation_split=0.2,
batch_size=20)
Normalize data
Add more depth to your network
Make the last layer linear
Accuracy is not a good metric for regression. Let's see an example
predictions: [0.9999999, 2.0000001, 3.000001]
ground Truth: [1, 2, 3]
Accuracy = No:of Correct / Total => 0 /3 = 0
Accuracy is 0, but the predictions are pretty close to the ground truth. On the other hand, MSE will be very low pointing that the deviation of the predictions from the ground truth is very less.
For a school project, I'm trying to predict data using the keras framework, but it's returning 'nan' loss and values when I try to get predicted data.
Source code :
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=5)
# create model
model = Sequential()
model.add(Dense(950, input_shape=(425,), activation='relu'))
model.add(Dense(425, activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd')
# Fit the model
model.fit(X_train, y_train, epochs=20, batch_size=1, verbose=1)
#evaluate the model
y_pred = model.predict(X_test)
score = model.evaluate(X_test, y_test,verbose=1)
print(score)
# calculate predictions
predictions = model.predict(X_pred)
Data :
X_train and X_test are (panda)dataframes of 5000 rows(nber of samples) * 425 columns (number of dimensions).
y_train and y_test look like :
array([ 1.17899644, 1.46080518, 0.9662137 , ..., 2.40157461,
0.53870386, 1.3192718 ])
Can you help me with that ?
Thank you for you help!
Usually, this means that something converges to infinity. As #desertnaut pointed out in the comment, reducing the learning rate might help.
But the root of the issue is your input data. What do these 425 data points mean? Are they from different sources, different features, different parameters? Finding outliners or normalizing the data, could help.
Your code looks fine otherwise.
Make sure your target output is in range (0, 1) as you have sigmoid in the last layer.
sigmoid has an output between zero and one so if the target output is not in this range then (a) change the activation function or (b) normalize outputs in the required range.
Make sure the purpose of this model is the regression.
After considering the above three points, play around with learning rate (decrease) and the optimiser (replace with any other).
Try changing your optimizer to 'Adam' instead of SGD
You initialized your SGD optimizer in variable sgd but you're not using it in compile