make prediction in new dataset - python

I build a keras logistic regression model. I am trying to find a way that i could give my model new data-set and give me prediction in the new data set that i passed. my new data-set will be the same shape of my model
my second question is there a way to improve the accuracy of my model becouse my accrucy is 69% and when i print the classification repoert i got bad precion in one class
split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu', input_dim = 6))
# Adding the second hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])`enter code here`
# Fitting the ANN to the Training set, y_train, batch_size = 10, epochs = 20)
Epoch 1/20
16704/16704 [==============================] - 1s 76us/step - loss: 0.6159 - acc: 0.6959
Epoch 2/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6114 - acc: 0.6967
Epoch 3/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6110 - acc: 0.6964
Epoch 4/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6101 - acc: 0.6965
Epoch 5/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6091 - acc: 0.6961
Epoch 6/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6094 - acc: 0.6963
Epoch 7/20
16704/16704 [==============================] - 1s 68us/step - loss: 0.6086 - acc: 0.6967
Epoch 8/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6083 - acc: 0.6965
Epoch 9/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6081 - acc: 0.6964: 0s - loss: 0.6085 - acc:
Epoch 10/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6082 - acc: 0.6971
Epoch 11/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6077 - acc: 0.6968
Epoch 12/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6073 - acc: 0.6971
Epoch 13/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6067 - acc: 0.6971
Epoch 14/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6070 - acc: 0.6965
Epoch 15/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6066 - acc: 0.6967: 0s - loss: 0.6053 - ac
Epoch 16/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6060 - acc: 0.6967
Epoch 17/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6061 - acc: 0.6968
Epoch 18/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6062 - acc: 0.6971
Epoch 19/20
16704/16704 [==============================] - 1s 69us/step - loss: 0.6057 - acc: 0.6968
Epoch 20/20
16704/16704 [==============================] - 1s 74us/step - loss: 0.6055 - acc: 0.6973
y_pred = classifier.predict(X_test)
y_pred = [ 1 if y>=0.5 else 0 for y in y_pred ]
print(classification_report(y_test, y_pred))
precision recall f1-score support
0 0.71 1.00 0.83 2968
1 0.33 0.00 0.01 1208
micro avg 0.71 0.71 0.71 4176
macro avg 0.52 0.50 0.42 4176
weighted avg 0.60 0.71 0.59 4176
I expect to improve my model
I expect to find a way that i could make prediction in new data-set

To make prediction on the new data set
Load the data the same you load your test set
Apply all the per-processing steps applied on your training set.
Use the
function to make prediction and carry on with your post processing.
It's almost same as predicting with the test set.


Losses and Accuracy could not improve

Im trying to train a Product Detection model with approximately 100,000 training images and 10,000 test images. However no matter what optimizer i used in my model, i have tried Adam, SGD with multiple learning rates, my loss and accuracy does not improve. Below is my code
First i read the train images
for x in train_data.category.tolist():
if x < 10:
x = "0" + str(x)
path = os.path.join(train_DATADIR,x)
x = str(x)
path = os.path.join(train_DATADIR,x)
img_array = cv2.imread(os.path.join(path,str(train_data.filename[idx])), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array,(100,100))
idx += 1
print(f'{idx}/105392 - {(idx/105392)*100:.2f}%')
narray = np.array(train_images)
then i save the train_images data into a binary file + 'train_images_bitmap.npy', narray)
then i divide the train_images by 255.0
train_images = train_images / 255.0
and declared my model with input nodes of 100x100 as the images are resized to 100x100
model = keras.Sequential([
keras.layers.Flatten(input_shape=(100, 100)),
keras.layers.Dense(128, activation='relu'),
then i compile the model, i tried adam, SGD(lr=0.01 up to 0.2 and as low to 0.001)
Next i fit the model with a callback of the epoch, train_labels,epochs=2000)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,monitor='val_acc',
mode='max',save_best_only=True, save_weights_only=True, verbose=1)
but the output i got on the epoch wasnt improving, how can i improve the loss and accuracy? below is the output on the epochs
Epoch 6/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0249
Epoch 7/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0248
Epoch 8/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0255
Epoch 9/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0251
Epoch 10/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0254
Epoch 11/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0254
Epoch 12/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0243
Epoch 13/2000
3294/3294 [==============================] - 12s 3ms/step - loss: 3.7210 - accuracy: 0.0238
Epoch 14/2000
3294/3294 [==============================] - 11s 3ms/step - loss: 3.7210 - accuracy: 0.0251
Epoch 15/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7209 - accuracy: 0.0253
Epoch 16/2000
3294/3294 [==============================] - 11s 3ms/step - loss: 3.7210 - accuracy: 0.0243
Epoch 17/2000
3294/3294 [==============================] - 12s 4ms/step - loss: 3.7210 - accuracy: 0.0247
Epoch 18/2000
3294/3294 [==============================] - 12s 3ms/step - loss: 3.7210 - accuracy: 0.0247
I don't think the choice of optimizer is the main problem. With only a little experience on the matter, I can only suggest some things:
For images i would try using a 2d-convolution layer before the dense layer.
Try adding a dropout-layer to reduce the possibility of overfitting.
The first layer is 100*100, and a reduction to 128 is perhaps to aggressive (i dont know, but thats at least my intuition) Try increasing from 128 to a larger number, or even add an intermediate layer :)
Perhaps something like:
model = Sequential()

The accuracy of basic Tensorflow model not increasing

I am really new to Data Science/ML and have been working on Tensorflow to implement Linear Regression on California Housing Prices from Kaggle.
I tried to train a mode in two different ways:
Using a Sequential model
Custom implementation
In both cases, the loss of the model was really high and I have not been able to understand what are the ways to improve it.
Dataset prep
df = pd.read_csv('')
df = df[['total_rooms', 'total_bedrooms', 'median_house_value', 'housing_median_age', 'median_income']]
print('Shape of dataset before removing NAs and duplicates {}'.format(df.shape))
input_train, input_test, target_train, target_test = train_test_split(df['total_rooms'].values, df['median_house_value'].values, test_size=0.2)
scaler = MinMaxScaler()
input_train = input_train.reshape(-1,1)
input_test = input_test.reshape(-1,1)
input_train = scaler.fit_transform(input_train)
input_test = scaler.fit_transform(input_test)
target_train = target_train.reshape(-1,1)
target_train = scaler.fit_transform(target_train)
target_test = target_test.reshape(-1,1)
target_test = scaler.fit_transform(target_test)
print('Number of training input elements {}'.format(input_train.shape))
print('Number of training target elements {}'.format(target_train.shape))
Using Sequential API:
BUFFER = 5000
dataset =, target_train))
dataset = dataset.shuffle(BUFFER).batch(BATCH_SIZE)
model = tf.keras.Sequential([
tf.keras.layers.Dense(DENSE_UNITS, activation='relu'),
tf.keras.layers.Dense(DENSE_UNITS, activation='relu'),
EPOCH = 5000
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001), loss='mean_squared_error', metrics=['accuracy', 'mse'])
history =, epochs=EPOCH, callbacks=[early_stopping])
Final training metrics -
Epoch 1/5000
1635/1635 [==============================] - 13s 8ms/step - loss: 0.0564 - accuracy: 0.0013 - mse: 0.0564
Epoch 2/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0552 - accuracy: 0.0016 - mse: 0.0552
Epoch 3/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0551 - accuracy: 0.0012 - mse: 0.0551
Epoch 4/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0551 - accuracy: 9.1766e-04 - mse: 0.0551
Epoch 5/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0551 - accuracy: 0.0013 - mse: 0.0551
Epoch 6/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0551 - accuracy: 0.0013 - mse: 0.0551
Epoch 7/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0013 - mse: 0.0549
Epoch 8/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0550 - accuracy: 0.0012 - mse: 0.0550
Epoch 9/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0011 - mse: 0.0549
Epoch 10/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0550 - accuracy: 0.0012 - mse: 0.0550
Epoch 11/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0010 - mse: 0.0549
Epoch 12/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0011 - mse: 0.0549
Epoch 13/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0013 - mse: 0.0549
Epoch 14/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0016 - mse: 0.0549
Epoch 15/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0011 - mse: 0.0549
Epoch 16/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0017 - mse: 0.0549
Epoch 17/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0013 - mse: 0.0549
Epoch 18/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 6.1177e-04 - mse: 0.0549
Epoch 19/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 6.1177e-04 - mse: 0.0549
Epoch 20/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 6.1177e-04 - mse: 0.0549
Epoch 21/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0012 - mse: 0.0550
Epoch 22/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0548 - accuracy: 9.7883e-04 - mse: 0.0549
Epoch 23/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0550 - accuracy: 7.3412e-04 - mse: 0.0549
Epoch 24/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 7.9530e-04 - mse: 0.0549
Epoch 25/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0548 - accuracy: 0.0013 - mse: 0.0548
Epoch 26/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 7.9530e-04 - mse: 0.0549
Epoch 27/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 6.7295e-04 - mse: 0.0549
Epoch 28/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0548 - accuracy: 0.0012 - mse: 0.0548
Epoch 29/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0549 - accuracy: 0.0013 - mse: 0.0549
Epoch 30/5000
1635/1635 [==============================] - 7s 4ms/step - loss: 0.0548 - accuracy: 9.7883e-04 - mse: 0.0549
Using custom training
class Linear(object):
def __init__(self):
Y = mX + C
Initializing the intercet and the slope
self.m = tf.Variable(tf.random.normal(shape=()))
self.C = tf.Variable(tf.random.normal(shape=()))
def __call__(self, x):
return self.m * x + self.C
# Defining a MSE loss function
def loss(predicted_y, target_y):
return tf.reduce_mean(tf.square(predicted_y - target_y))
def train(model, input, output, learning_rate):
with tf.GradientTape() as tape:
predicted_y = model(input)
current_loss = loss(predicted_y, output)
df_m, df_C = tape.gradient(current_loss, [model.m, model.C])
model.m.assign_sub(learning_rate * df_m)
model.C.assign_sub(learning_rate * df_C)
epochs = 5000
model = Linear()
ms, Cs, losses = [], [], []
target_train = target_train.astype('float32')
for epoch in range(epochs):
predicted_y = model(input_train)
current_loss = loss(predicted_y, target_train)
train(model, input_train, target_train, 0.1)
if epoch % 500 == 0:
print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
(epoch, ms[-1], Cs[-1], current_loss))
predicted_test = model(input_test[:10])
predicted_loss = loss(predicted_test, target_test[:10])
Final training metrics
Epoch 0: W=-1.86 b=-0.09, loss=0.44381
Epoch 500: W=-1.19 b=0.47, loss=0.06470
Epoch 1000: W=-0.73 b=0.44, loss=0.06034
Epoch 1500: W=-0.39 b=0.42, loss=0.05799
Epoch 2000: W=-0.13 b=0.40, loss=0.05671
Epoch 2500: W=0.05 b=0.39, loss=0.05602
Epoch 3000: W=0.19 b=0.38, loss=0.05565
Epoch 3500: W=0.29 b=0.38, loss=0.05545
Epoch 4000: W=0.36 b=0.37, loss=0.05534
Epoch 4500: W=0.41 b=0.37, loss=0.05528
In your first example, you shouldn't reshape your input into 1D. You transformed your matrix into a long 1D array. So, remove these lines:
input_train = input_train.reshape(-1,1)
input_test = input_test.reshape(-1,1)
Then you will keep the 8 features of your input data. Then, change the first lines of your model like this:
model = tf.keras.Sequential([
Your loss will decrease. After a few epochs I get this:
1644/1652 [============================>.] -
ETA: 0s - loss: 0.0144 - accuracy: 0.0434 - mse: 0.0144
I couldn't use your .zip file so I did it differently. Here is all my reproducible code:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn.datasets import fetch_california_housing
x, y = fetch_california_housing(return_X_y=True)
input_train, _, target_train, _ = train_test_split(x, y)
scaler = MinMaxScaler()
input_train = scaler.fit_transform(input_train)
target_train = target_train.reshape(-1,1)
target_train = scaler.fit_transform(target_train)
dataset =, target_train))
dataset = dataset.shuffle(5000).batch(32)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
loss='mean_squared_error', metrics=['accuracy', 'mse'])
history =, epochs=50, callbacks=[early_stopping])

Keras model does not generalise

I am trying to build a deep learning model on Keras for a test and I am not very good at this. I have a scaled dataset with 128 features and these correspond to 6 different classes.
I have already tried adding/deleting layers or using regularisation like dropout/l1/l2, My model learns and accuracy goes up so high. But accuracy on test set is around 15%.
from tensorflow.keras.layers import Dense, Dropout
model = Sequential()
model.add(Dense(128, activation='tanh', input_shape=(128,)))
model.add(Dense(60, activation='tanh'))
model.add(Dense(20, activation='tanh'))
model.add(Dense(6, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy']), train_y, epochs=20, batch_size=32, verbose=1)
6955/6955 [==============================] - 1s 109us/sample - loss: 1.5805 - acc: 0.3865
Epoch 2/20
6955/6955 [==============================] - 0s 71us/sample - loss: 1.1512 - acc: 0.6505
Epoch 3/20
6955/6955 [==============================] - 0s 71us/sample - loss: 0.9191 - acc: 0.7307
Epoch 4/20
6955/6955 [==============================] - 0s 67us/sample - loss: 0.7819 - acc: 0.7639
Epoch 5/20
6955/6955 [==============================] - 0s 66us/sample - loss: 0.6939 - acc: 0.7882
Epoch 6/20
6955/6955 [==============================] - 0s 69us/sample - loss: 0.6284 - acc: 0.8099
Epoch 7/20
6955/6955 [==============================] - 0s 70us/sample - loss: 0.5822 - acc: 0.8240
Epoch 8/20
6955/6955 [==============================] - 1s 73us/sample - loss: 0.5305 - acc: 0.8367
Epoch 9/20
6955/6955 [==============================] - 1s 75us/sample - loss: 0.5130 - acc: 0.8441
Epoch 10/20
6955/6955 [==============================] - 1s 75us/sample - loss: 0.4703 - acc: 0.8591
Epoch 11/20
6955/6955 [==============================] - 1s 73us/sample - loss: 0.4679 - acc: 0.8650
Epoch 12/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.4399 - acc: 0.8705
Epoch 13/20
6955/6955 [==============================] - 1s 80us/sample - loss: 0.4055 - acc: 0.8904
Epoch 14/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3965 - acc: 0.8874
Epoch 15/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3964 - acc: 0.8877
Epoch 16/20
6955/6955 [==============================] - 1s 77us/sample - loss: 0.3564 - acc: 0.9048
Epoch 17/20
6955/6955 [==============================] - 1s 80us/sample - loss: 0.3517 - acc: 0.9087
Epoch 18/20
6955/6955 [==============================] - 1s 78us/sample - loss: 0.3254 - acc: 0.9133
Epoch 19/20
6955/6955 [==============================] - 1s 78us/sample - loss: 0.3367 - acc: 0.9116
Epoch 20/20
6955/6955 [==============================] - 1s 76us/sample - loss: 0.3165 - acc: 0.9192
The result I am recieving 39% With other models like GBM or XGB I can reach upto 85%
What am I doing wrong? Any suggestions?

Neural Network with California Housing Data

I tried to code a neural network which is trained on the California housing dataset, which I got from Aurelion Geron's GitHup.
But when I run the code, the net does not get trained and loss = nan.
Can someone explain what I did wrong?
Best regards, Robin
Link for the csv file:
My Code:
import numpy
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
# load dataset
df = pd.read_csv("housing.csv", delimiter=",", header=0)
# split into input (X) and output (Y) variables
Y = df["median_house_value"].values
X = df.drop("median_house_value", axis=1)
# Inland / Not Inland -> True / False = 1 / 0
X["ocean_proximity"] = X["ocean_proximity"]== "INLAND"
X= X.astype(float)
Y= Y.astype(float)
model = Sequential()
model.add(Dense(100, activation="relu", input_dim=9))
model.add(Dense(1, activation="linear"))
# Compile model
model.compile(loss="mean_squared_error", optimizer="adam"), Y, epochs=50, batch_size=1000, verbose=1)
I found the error, there was a missing value in the "total_bedrooms" column
You need to drop NaN values from you data.
After having a quick look at data, you also need to normalize your data (as everytime with Neural Nets, to help convergence).
To do this you can use Standard Scaler, Min-Max Scaler etc..
nan values in your DataFrame are causing this behavior. Drop rows with the nan values and normalize your data:
df = df[~df.isnull().any(axis=1)]
And you will get:
Epoch 1/50
1000/20433 [>.............................] - ETA: 3s - loss: 0.1732
20433/20433 [==============================] - 0s 11us/step - loss: 0.1001
Epoch 2/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0527
20433/20433 [==============================] - 0s 3us/step - loss: 0.0430
Epoch 3/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0388
20433/20433 [==============================] - 0s 2us/step - loss: 0.0338
Epoch 4/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0301
20433/20433 [==============================] - 0s 2us/step - loss: 0.0288
Epoch 5/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0300
20433/20433 [==============================] - 0s 2us/step - loss: 0.0259
Epoch 6/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0235
20433/20433 [==============================] - 0s 3us/step - loss: 0.0238
Epoch 7/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0242
20433/20433 [==============================] - 0s 2us/step - loss: 0.0225
Epoch 8/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0213
20433/20433 [==============================] - 0s 2us/step - loss: 0.0218
Epoch 9/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0228
20433/20433 [==============================] - 0s 2us/step - loss: 0.0214
Epoch 10/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0206
20433/20433 [==============================] - 0s 2us/step - loss: 0.0211

Neural net fails on toy dataset

I have created the following toy dataset:
I am trying to predict the class with a neural net in keras:
model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))
With nr_feats and nr_classes set to 2.
The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. Using Logistic Regression results in 100 percent accuracy.
I can not find what is going wrong here.
I have uploaded a notebook to github if you quickly want to try something.
I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98.
This still seems extremely slow for such a simple dataset.
I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation.
I pre-process my dataframe as follows:
from sklearn.preprocessing import LabelEncoder
x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]
nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()
label_enc = LabelEncoder()
y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)
Training and evaluation:, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train), df_train.iloc[:, -1].values)
After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later.
Note: Read the "Update" section at the end of my answer if you want the true reason. In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3).
I put together some code. It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like):
import numpy as np
import random
# generate random data with two features
n_samples = 200
n_feats = 2
cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))
# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
x_train = x_train[indices]
y_train = y_train[indices]
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))
model.summary(), y_train, epochs=5, batch_size=32, verbose=True)
Here is the output:
Layer (type) Output Shape Param #
dense_25 (Dense) (None, 2) 6
dense_26 (Dense) (None, 1) 3
Total params: 9
Trainable params: 9
Non-trainable params: 0
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000
As you can see the accuracy never increases from 50%. What if you increase the number of epochs to say 50:
Layer (type) Output Shape Param #
dense_35 (Dense) (None, 2) 6
dense_36 (Dense) (None, 1) 3
Total params: 9
Trainable params: 9
Non-trainable params: 0
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625
The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs).
Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (i.e. quickly converge).
Why so many epochs needed?
I think it is because of these two reasons (combined):
1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and
2) The number of data points compared to the size of neural net (i.e. number of trainable parameters, which is 9 in example code above) is relatively large.
Therefore, it takes more epochs for the model to learn the weights. It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. As an evidence, just try to increase the number of units in the first dense layer. You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. Here you increase the capacity and therefore the model converges (i.e. trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue).
Side note:
Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation, the results will be "officially undefined" in this case.
One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. If the learning rate is too low, the model converges slowly. Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs:
from keras import optimizers
# or you may use adam
The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2, but it is not capable of predicting 2.
Subtract 1 from all your y values. As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax. Finally, use binary_crossentropy for the loss for binary classification problems.
