Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to learn some machine learning and after looking up some tutorials I managed to train a linear regression and second degree equation with acceptable precision. I then decided to step it up a notch and try with: y = x^3 + 9x^2 .
Since now everything worked fine, but with this new set my loss remains above 100k all the time and predictions are off by about +-100.
Here is a list of the things i tried:
Increase number or layers
Increase number of neurons
Increase number of layers and neurons
Vary batch size
Increase and decrease learning rate
Divided the number of epochs by 3 and trained him 3 times while feeding him a random data set each time
Remove the kernel_regularizer (still have to understand what this does)
None of this solutions worked, each time loss was above 100k. Moreover I noticed that it's not a steady decrease, the resulting loss looks pretty random going from 100k to 800k and down again to 400k and then up to 1 million and down again....you can only notice that the average loss is going down but it's still hard to tell in that randomness
Some examples:
Epoch 832/10000
32/32 [==============================] - 0s 3ms/step - loss: 757260.0625 - val_loss: 624795.0000
Epoch 833/10000
32/32 [==============================] - 0s 3ms/step - loss: 784539.6250 - val_loss: 257286.3906
Epoch 834/10000
32/32 [==============================] - 0s 3ms/step - loss: 481110.4688 - val_loss: 246353.5469
Epoch 835/10000
32/32 [==============================] - 0s 3ms/step - loss: 383954.2812 - val_loss: 508324.5312
Epoch 836/10000
32/32 [==============================] - 0s 3ms/step - loss: 516217.7188 - val_loss: 543258.3750
Epoch 837/10000
32/32 [==============================] - 0s 3ms/step - loss: 1042559.3125 - val_loss: 1702137.1250
Epoch 838/10000
32/32 [==============================] - 0s 3ms/step - loss: 3192045.2500 - val_loss: 1154483.5000
Epoch 839/10000
32/32 [==============================] - 0s 3ms/step - loss: 1195508.7500 - val_loss: 4658847.0000
Epoch 840/10000
32/32 [==============================] - 0s 3ms/step - loss: 1251505.8750 - val_loss: 275300.7188
Epoch 841/10000
32/32 [==============================] - 0s 3ms/step - loss: 294105.2188 - val_loss: 330317.0000
Epoch 842/10000
32/32 [==============================] - 0s 3ms/step - loss: 528083.4375 - val_loss: 4624526.0000
Epoch 843/10000
32/32 [==============================] - 0s 4ms/step - loss: 3371695.2500 - val_loss: 2008547.0000
Epoch 844/10000
32/32 [==============================] - 0s 3ms/step - loss: 723132.8125 - val_loss: 884099.5625
Epoch 845/10000
32/32 [==============================] - 0s 3ms/step - loss: 635335.8750 - val_loss: 372132.1562
Epoch 846/10000
32/32 [==============================] - 0s 3ms/step - loss: 424794.2812 - val_loss: 349575.8438
Epoch 847/10000
32/32 [==============================] - 0s 3ms/step - loss: 266175.3125 - val_loss: 247624.6719
Epoch 848/10000
32/32 [==============================] - 0s 3ms/step - loss: 387106.7500 - val_loss: 1091736.7500
This was my original (and cleaner) code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep
model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1], kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=1)])
lr = 1e-1
decay = lr/10000
optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')
xs = np.random.random((10000, 1)) * 100 - 50;
ys = xs**3 + 9*xs**2
model.fit(xs, ys, epochs=10000, batch_size=256, validation_split=0.2)
print(model.predict([10.0]))
resp = input('Want to save model? y/n: ')
if resp == 'y':
model.save('zig-zag')
I also found this question where the reported solution would be to use relu, but I already had that implemented and copying the code didn't work either.
Am I missing something? What and why?
For numerical reasons neural networks often dont play nice with somewhat unbounded very large numbers. So just reducing the range of values for x from -50..50 to -5..5 will let your model train.
For your case you also want to remove the l2-regularizer since you cant really overfit here and definitely not have a decay of 1e-5. I gave it a go with lr=1e-2 and decay=lr/2
Epoch 1000/1000
32/32 [==============================] - 0s 2ms/step - loss: 0.1471 - val_loss: 0.1370
Full code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep
model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1]),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=1)])
lr = 1e-2
decay = lr/2
optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')
xs = np.random.random((10000, 1)) * 10 - 5
ys = xs**3 + 9*xs**2
print(np.shape(xs))
print(np.shape(ys))
model.fit(xs, ys, epochs=1000, batch_size=256, validation_split=0.2)
print(model.predict([4.0]))
Related
Why does the error of my NN not divergate to zero when my input reveals the result? I always set input[2] to the right result, so the NN should set all weights to 0, except this one.
from random import random
import numpy
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
datax = []
datay = []
for i in range(100000):
input = []
for j in range(1000):
input.append(random())
yval=random()
# should be found out by the nn that input[2] is always the correct output
input[2] = yval
datax.append(input)
datay.append(yval)
datax = numpy.array(datax)
datay = numpy.array(datay)
model = Sequential()
model.add(Dense(10))
model.add(Dense(10))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam())
model.fit(datax, datay, epochs=100, batch_size=32, verbose=1)
it oscillates around e-05 but never gets really better than that
Epoch 33/100
3125/3125 [==============================] - 4s 1ms/step - loss: 1.2802e-04
Epoch 34/100
3125/3125 [==============================] - 4s 1ms/step - loss: 3.7720e-05
Epoch 35/100
3125/3125 [==============================] - 4s 1ms/step - loss: 4.0858e-05
Epoch 36/100
3125/3125 [==============================] - 4s 1ms/step - loss: 8.5453e-05
Epoch 37/100
3125/3125 [==============================] - 5s 1ms/step - loss: 5.5722e-05
Epoch 38/100
3125/3125 [==============================] - 5s 1ms/step - loss: 3.6459e-05
Epoch 39/100
3125/3125 [==============================] - 5s 1ms/step - loss: 1.3339e-05
Epoch 40/100
3125/3125 [==============================] - 5s 1ms/step - loss: 5.8943e-05
...
Epoch 100/100
3125/3125 [==============================] - 4s 1ms/step - loss: 1.5929e-05
The step of the gradient descent method is calculated as gradient multiplied by learning rate. So theoretically - you can not reach minimum of loss function.
Try decaying learning rate though (decaying to zero). If you are lucky - I think it could be possible because of descrete nature of float types.
I tried to train a CNN to classify 9 class of image. Each class has 1000 image for training. I tried training on VGG16 and VGG19, both can achieve validation accuracy of 90%. But when I tried to train on InceptionResNetV2 model, the model seems to stuck around 20% and 30%. Below is my code for InceptionResNetV2 and the training. What can I do to improve the training?
base_model = tf.keras.applications.InceptionResNetV2(input_shape=(IMG_HEIGHT, IMG_WIDTH ,3),weights = 'imagenet',include_top=False)
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
Flatten(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dropout(0.5),
BatchNormalization(),
Dense(1024, activation = 'relu', kernel_regularizer=regularizers.l2(0.001)),
LeakyReLU(alpha=0.4),
Dense(9, activation = 'softmax')])
optimizer_model = tf.keras.optimizers.Adam(learning_rate=0.0001, name='Adam', decay=0.00001)
loss_model = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer_model, loss="categorical_crossentropy", metrics=['accuracy'])
Epoch 1/10
899/899 [==============================] - 255s 283ms/step - loss: 4.3396 - acc: 0.3548 - val_loss: 4.2744 - val_acc: 0.3874
Epoch 2/10
899/899 [==============================] - 231s 257ms/step - loss: 3.5856 - acc: 0.4695 - val_loss: 3.9151 - val_acc: 0.3816
Epoch 3/10
899/899 [==============================] - 225s 250ms/step - loss: 3.1451 - acc: 0.4959 - val_loss: 4.8801 - val_acc: 0.2425
Epoch 4/10
899/899 [==============================] - 227s 252ms/step - loss: 2.7771 - acc: 0.5124 - val_loss: 3.7167 - val_acc: 0.3023
Epoch 5/10
899/899 [==============================] - 231s 257ms/step - loss: 2.4993 - acc: 0.5260 - val_loss: 3.7276 - val_acc: 0.3770
Epoch 6/10
899/899 [==============================] - 227s 252ms/step - loss: 2.3148 - acc: 0.5251 - val_loss: 3.7677 - val_acc: 0.3115
Epoch 7/10
899/899 [==============================] - 234s 260ms/step - loss: 2.1381 - acc: 0.5379 - val_loss: 3.4867 - val_acc: 0.2862
Epoch 8/10
899/899 [==============================] - 230s 256ms/step - loss: 2.0091 - acc: 0.5367 - val_loss: 4.1032 - val_acc: 0.3080
Epoch 9/10
899/899 [==============================] - 225s 251ms/step - loss: 1.9155 - acc: 0.5399 - val_loss: 4.1270 - val_acc: 0.2954
Epoch 10/10
899/899 [==============================] - 232s 258ms/step - loss: 1.8349 - acc: 0.5508 - val_loss: 4.3918 - val_acc: 0.2276
VGG-16/19 has a depth of 23/26 layers, whereas, InceptionResNetV2 has a depth of 572 layers. Now, there is minimal domain similarity between medical images and imagenet dataset. In VGG, due to low depth the features you're getting are not that complex and network is able to classify it on the basis of Dense layer features. However, in IRV2 network, as it's too much deep, the output of the fc layer is more complex (visualize it something object like but for imagenet dataset), and, then the features obtained from these layers are unable to connect to the Dense layer features, and, hence overfitting. I think you were able to get my point.
Check out my answer to very similar question of yours on this link: Link. It will help improve your accuracy.
I am having 0.3 million image in my Train set - Male/Female and around ~50K image in the test set - Male/Female . I am using below to work , also tried to add few more layers and more units . Also, I am doing data augmentation and others provided from keras docs.
targetSize =64
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),input_shape=(targetSize,targetSize,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Conv2D(filters = 32,kernel_size =(3,3),activation ='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Flatten())
classifier.add(Dropout(rate = 0.6))
classifier.add(Dense(units = 64, activation='relu'))
classifier.add(Dropout(rate = 0.5))
classifier.add(Dense(units = 64, activation='relu'))
classifier.add(Dropout(rate = 0.2))
classifier.add(Dense(units = 1,activation='sigmoid')
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
height_shift_range = 0.2,
width_shift_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('<train_folder_loc>',
target_size = (img_size, img_size),
batch_size = batch_size_train,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('<test_folder_loc>',
target_size = (img_size, img_size),
batch_size = batch_size_test,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = <train_image_count>/batch_size_train,
epochs = n_epoch,
validation_data = test_set,
validation_steps = <test_image_count>/batch_size_test,
use_multiprocessing = True,
workers=<mycpu>)
But with many combinations tried I am getting result like below , train acc and val acc is not moving ahead . I tried till 100 epoch and its almost like same.
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5628 - acc: 0.7403 - val_loss: 0.6001 - val_acc: 0.6967
Epoch 2/25
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5516 - acc: 0.7403 - val_loss: 0.6096 - val_acc: 0.6968
Epoch 3/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5472 - acc: 0.7404 - val_loss: 0.5837 - val_acc: 0.6967
Epoch 4/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5437 - acc: 0.7408 - val_loss: 0.5850 - val_acc: 0.6978
Epoch 5/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5409 - acc: 0.7417 - val_loss: 0.5844 - val_acc: 0.6991
Epoch 6/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5386 - acc: 0.7420 - val_loss: 0.5828 - val_acc: 0.7011
Epoch 7/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5372 - acc: 0.7427 - val_loss: 0.5856 - val_acc: 0.6984
Epoch 8/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5347 - acc: 0.7437 - val_loss: 0.5847 - val_acc: 0.7017
Epoch 9/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5331 - acc: 0.7444 - val_loss: 0.5770 - val_acc: 0.7017
Epoch 10/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5323 - acc: 0.7443 - val_loss: 0.5803 - val_acc: 0.7037
Epoch 11/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5309 - acc: 0.7453 - val_loss: 0.5877 - val_acc: 0.7018
Epoch 12/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5294 - acc: 0.7454 - val_loss: 0.5774 - val_acc: 0.7037
Epoch 13/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5282 - acc: 0.7464 - val_loss: 0.5807 - val_acc: 0.7024
Epoch 14/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5276 - acc: 0.7467 - val_loss: 0.5815 - val_acc: 0.7033
Epoch 15/25
11112/11111 [==============================] - 156s 14ms/step - loss: 0.5269 - acc: 0.7474 - val_loss: 0.5753 - val_acc: 0.7038
Epoch 16/25
11112/11111 [==============================] - 154s 14ms/step - loss: 0.5263 - acc: 0.7477 - val_loss: 0.5825 - val_acc: 0.7039
Epoch 17/25
11112/11111 [==============================] - 155s 14ms/step - loss: 0.5249 - acc: 0.7485 - val_loss: 0.5821 - val_acc: 0.7037
I need your suggestion on this or any snippet to try .
Make sure you are overfitting on a small sample before trying to extend the network.
I would remove some/all of the Dropout layers and see if it improves performance. I think 3 Dropout layers is quite high.
Try reducing the learning rate.
Try and understand some of the basic principles of CNNs and how they are constructed; implement a simple one which works before arbitrarily putting in your own parameters.
For example, typically the number of filters in successive convolutions increases in powers of two (e.g. 32, 64, 128 etc). Your use of dropout also is questionable, 0.6 is very high, not to mention stacking the three dropouts like you have doesn't make any sense.
Hmm if you look at it closely, its not that its not moviing. it is moving a bit. There are times when models only get better at a certain point no matter how long you train it, or even how much more layers you add. When that happens, it all boils down to the data. I think it would be best to determine what is hindering your model to improve. Also, my friend, training a good model doesn't happen overnight specially with real world data, much more with complex data such as images of humans.
I guess, if you are just following a tutorial which has achieved a better score than yours, you could check the version of packages their using, the data that you have, the steps they took and much more importantly the re run the model. There are instances where models could get different scores on different instances of training.
I suggest you should try playing with the layers more, or even use a different type of Neural Network. If not, you should try playing with your data more. 300k images are a lot but when it comes to image classification, it could be really hard.
Finally, I guess you could look into transfer learning by tensorflow. You can read about it there. It works by retraining pre-made image recognition models. Keras has a tutorial on Transfer learning too.
I am training a Keras model to predict availability of bike-sharing stations. I am giving in the training set a whole row with day of the year, time, weekday, station and free bikes. Each sample contains the availability for the previous day (144 samples) and I am trying to predict the availability for the next day (144 samples). The shapes for the sets used are
Train X (2362, 144, 5)
Train Y (2362, 144)
Test X (39, 144, 5)
Test Y (39, 144)
Validation X (1535, 144, 5)
Validation Y (1535, 144)
The model I am using is this one
model.add(LSTM(20, input_shape=(self.train_x.shape[1], self.train_x.shape[2]), return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(20))
model.add(Dense(144))
model.compile(loss='mse', optimizer='adam', metrics = ['acc', 'mape', 'mse'])
history = self.model.fit(self.train_x, self.train_y, batch_size=50, epochs=20, validation_data=(self.validation_x, self.validation_y), verbose=1, shuffle = True)
The predictions made after training have nothing to do with the expected output, they have like a sawtooth shape with values that exceed the original size.
The accuracy rarely goes up but loss has a normal shape.
As an example the history after each epoch looks like this
Epoch 17/20
2362/2362 [==============================] - 12s 5ms/step - loss: 9.1214 - acc: 0.0000e+00 - mean_absolute_percentage_error: 21925846.0813 - mean_squared_error: 9.1214 - val_loss: 9.0642 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 24162847.3779 - val_mean_squared_error: 9.0642
Epoch 18/20
2362/2362 [==============================] - 12s 5ms/step - loss: 8.2241 - acc: 0.0013 - mean_absolute_percentage_error: 21906919.9136 - mean_squared_error: 8.2241 - val_loss: 8.1923 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 22754663.8013 - val_mean_squared_error: 8.1923
Epoch 19/20
2362/2362 [==============================] - 12s 5ms/step - loss: 7.4190 - acc: 0.0000e+00 - mean_absolute_percentage_error: 21910003.1744 - mean_squared_error: 7.4190 - val_loss: 7.3926 - val_acc: 0.0000e+00 - val_mean_absolute_percentage_error: 24673277.8420 - val_mean_squared_error: 7.3926
Epoch 20/20
2362/2362 [==============================] - 12s 5ms/step - loss: 6.7067 - acc: 0.0013 - mean_absolute_percentage_error: 22076339.2168 - mean_squared_error: 6.7067 - val_loss: 6.6758 - val_acc: 6.5147e-04 - val_mean_absolute_percentage_error: 22987089.8436 - val_mean_squared_error: 6.6758
I really don't know where the problem might be, more layers?, less layers?, different approach?
UPDATE: Plots of training/test data. Left part of the plot shows the previous day of availability that is fed to the model and the right part shows what the result should be and the prediction made by the model.
I have created the following toy dataset:
I am trying to predict the class with a neural net in keras:
model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
With nr_feats and nr_classes set to 2.
The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. Using Logistic Regression results in 100 percent accuracy.
I can not find what is going wrong here.
I have uploaded a notebook to github if you quickly want to try something.
EDIT 1
I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98.
This still seems extremely slow for such a simple dataset.
I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation.
I pre-process my dataframe as follows:
from sklearn.preprocessing import LabelEncoder
x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]
nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()
label_enc = LabelEncoder()
label_enc.fit(y_train)
y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)
Training and evaluation:
model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train), df_train.iloc[:, -1].values)
EDIT 2
After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later.
Note: Read the "Update" section at the end of my answer if you want the true reason. In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3).
I put together some code. It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like):
import numpy as np
import random
# generate random data with two features
n_samples = 200
n_feats = 2
cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))
# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
np.random.shuffle(indices)
x_train = x_train[indices]
y_train = y_train[indices]
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=True)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
dense_25 (Dense) (None, 2) 6
_________________________________________________________________
dense_26 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000
As you can see the accuracy never increases from 50%. What if you increase the number of epochs to say 50:
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 2) 6
_________________________________________________________________
dense_36 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625
The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs).
Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (i.e. quickly converge).
Why so many epochs needed?
I think it is because of these two reasons (combined):
1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and
2) The number of data points compared to the size of neural net (i.e. number of trainable parameters, which is 9 in example code above) is relatively large.
Therefore, it takes more epochs for the model to learn the weights. It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. As an evidence, just try to increase the number of units in the first dense layer. You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. Here you increase the capacity and therefore the model converges (i.e. trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue).
Side note:
Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation, the results will be "officially undefined" in this case.
Update:
One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. If the learning rate is too low, the model converges slowly. Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs:
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-1),
metrics=['accuracy'])
# or you may use adam
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=1e-1),
metrics=['accuracy'])
The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2, but it is not capable of predicting 2.
Subtract 1 from all your y values. As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax. Finally, use binary_crossentropy for the loss for binary classification problems.