I am new to ML, and I would like to use keras to categorize every number in a sequence as a 1 or 0 depending on whether it is greater than the previous number. That is, if I had:
sequence a = [1, 2, 6, 4, 5],
The solution should be:
sequence b = [0, 1, 1, 0, 1].
So far, I have written:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1,1])])
model.add(tf.keras.layers.Dense(17))
model.add(tf.keras.layers.Dense(17))
model.compile(optimizer='sgd', loss='BinaryCrossentropy', metrics=['binary_accuracy'])
b = [1,6,8,3,5,8,90,5,432,3,5,6,8,8,4,234,0]
a = [0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,0]
b = np.array(b, dtype=float)
a = np.array(a, dtype=float)
model.fit(b, a, epochs=500, batch_size=1)
# # Generate predictions for samples
predictions = model.predict(b)
print(predictions)
When I do this, I end up with:
Epoch 500/500
17/17 [==============================] - 0s 499us/step - loss: 7.9229 - binary_accuracy: 0.4844
[[[-1.37064695e+01 4.70858345e+01 -4.67341652e+01 -1.94298875e+00
5.75960045e+01 6.70146179e+01 6.34545479e+01 -4.86319550e+02
2.26250134e+01 -8.60109329e+00 -4.03220863e+01 -1.67574768e+01
3.36148148e+01 -4.55171967e+00 -1.39924898e+01 6.31023712e+01
-9.14120102e+00]]
[[-6.92644653e+01 2.40270264e+02 -2.37715302e+02 -9.42625141e+00
2.93314209e+02 3.41092743e+02 3.23760315e+02 -2.49306396e+03
1.15242020e+02 -4.38339310e+01 -2.05973328e+02 -8.48139114e+01
1.70274872e+02 -2.48692398e+01 -7.15372696e+01 3.22131958e+02
-4.57872620e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-3.59296684e+01 1.24359612e+02 -1.23126640e+02 -4.93629456e+00
1.51883270e+02 1.76645889e+02 1.67576874e+02 -1.28901733e+03
5.96718216e+01 -2.26942272e+01 -1.06582588e+02 -4.39800491e+01
8.82788391e+01 -1.26787395e+01 -3.70104065e+01 1.66714172e+02
-2.37996235e+01]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-1.00263879e+03 3.48576855e+03 -3.44619800e+03 -1.35145050e+02
4.25337939e+03 4.94560596e+03 4.69689697e+03 -3.62063594e+04
1.67120789e+03 -6.35745117e+02 -2.98891406e+03 -1.22816174e+03
2.46616406e+03 -3.66204163e+02 -1.03828992e+03 4.67382764e+03
-6.61441223e+02]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-4.80280518e+03 1.66995840e+04 -1.65093086e+04 -6.47000305e+02
2.03765059e+04 2.36925508e+04 2.25018145e+04 -1.73467625e+05
8.00621289e+03 -3.04566919e+03 -1.43194590e+04 -5.88322070e+03
1.18137129e+04 -1.75592432e+03 -4.97435352e+03 2.23914492e+04
-3.16803076e+03]]
[[-3.59296684e+01 1.24359612e+02 -1.23126640e+02 -4.93629456e+00
1.51883270e+02 1.76645889e+02 1.67576874e+02 -1.28901733e+03
5.96718216e+01 -2.26942272e+01 -1.06582588e+02 -4.39800491e+01
8.82788391e+01 -1.26787395e+01 -3.70104065e+01 1.66714172e+02
-2.37996235e+01]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-6.92644653e+01 2.40270264e+02 -2.37715302e+02 -9.42625141e+00
2.93314209e+02 3.41092743e+02 3.23760315e+02 -2.49306396e+03
1.15242020e+02 -4.38339310e+01 -2.05973328e+02 -8.48139114e+01
1.70274872e+02 -2.48692398e+01 -7.15372696e+01 3.22131958e+02
-4.57872620e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-4.70412598e+01 1.62996490e+02 -1.61322891e+02 -6.43295908e+00
1.99026932e+02 2.31461517e+02 2.19638016e+02 -1.69036609e+03
7.81952209e+01 -2.97407875e+01 -1.39712814e+02 -5.75913391e+01
1.15610855e+02 -1.67422562e+01 -4.85193672e+01 2.18520096e+02
-3.11288433e+01]]
[[-2.60270850e+03 9.04948047e+03 -8.94645508e+03 -3.50663330e+02
1.10420654e+04 1.28390557e+04 1.21937041e+04 -9.40005859e+04
4.33857861e+03 -1.65045227e+03 -7.75966846e+03 -3.18818774e+03
6.40197412e+03 -9.51349304e+02 -2.69557886e+03 1.21338779e+04
-1.71684766e+03]]
[[-2.59487200e+00 8.44894505e+00 -8.53793907e+00 -4.46333081e-01
1.04523640e+01 1.21989994e+01 1.13933916e+01 -8.49708328e+01
4.10160637e+00 -1.55452514e+00 -7.19183874e+00 -3.14619255e+00
6.28279734e+00 -4.88203079e-01 -2.48353434e+00 1.12964716e+01
-1.81198704e+00]]]
There are few issues with how you are approaching this -
Your setup for the deep learning problem is flawed. You want to use the information of the previous element to infer the labels for the next element. But for inference (and training), you only pass the current element. If tomorrow I deploy this model, imagine what would happen. The only information I will provide you, say, "15" and as you if it's bigger than the previous element, which doesn't exist. How will your model respond?
Secondly, why are your output layer is predicting a 17-dimensional vector? Shouldn't the goal be to predict a 0 or 1 (probability)? In that case your output should be a single element with sigmoid activation. Refer to this diagram as a guide for your future setups for neural networks.
Third, you are not using any activation functions which is the core reason to be using neural networks (nonlinearity). Without activation functions, you are just building a standard regression model. Here is a basic proof -
#2 layer neural network without activation
h = W1.X+B1
o = W2.h+B2
o = W2.(W1.X+B1)+B2
= W2.W1.X + (W1.B1+B2)
= W3.X + B3 #Same as linear regression!
#2 layer neural network with activations.
h = activation(W1.X+B1)
o = activation(W2.h+B2)
I would advise starting from basics of neural networks to first build best practices, then jumping into making your own problem statements. The Keras author Fchollet has some excellent starter notebooks that you can explore.
For your case, try these modifications -
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
#Modify input shape and output shape + add activations
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=(2,))]) #<------
model.add(tf.keras.layers.Dense(17, activation='relu')) #<------
model.add(tf.keras.layers.Dense(1, activation='sigmoid')) #<------
model.compile(optimizer='sgd', loss='BinaryCrossentropy', metrics=['binary_accuracy'])
#create 2 features, 1st is previous element 2nd is current element
b = [1,6,8,3,5,8,90,5,432,3,5,6,8,8,4,234,0]
b = np.array([i for i in zip(b,b[1:])]) #<---- (16,2)
#Start from first paid of elements
a = np.array([0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,0])[1:] #<---- (16,)
model.fit(b, a, epochs=20, batch_size=1)
# # Generate predictions for samples
predictions = model.predict(b)
print(np.round(predictions))
Epoch 1/20
16/16 [==============================] - 0s 1ms/step - loss: 3.0769 - binary_accuracy: 0.7086
Epoch 2/20
16/16 [==============================] - 0s 823us/step - loss: 252.6490 - binary_accuracy: 0.6153
Epoch 3/20
16/16 [==============================] - 0s 1ms/step - loss: 3.8109 - binary_accuracy: 0.9212
Epoch 4/20
16/16 [==============================] - 0s 787us/step - loss: 0.0131 - binary_accuracy: 0.9845
Epoch 5/20
16/16 [==============================] - 0s 2ms/step - loss: 0.0767 - binary_accuracy: 1.0000
Epoch 6/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0143 - binary_accuracy: 0.9800
Epoch 7/20
16/16 [==============================] - 0s 2ms/step - loss: 0.0111 - binary_accuracy: 1.0000
Epoch 8/20
16/16 [==============================] - 0s 2ms/step - loss: 4.0658e-04 - binary_accuracy: 1.0000
Epoch 9/20
16/16 [==============================] - 0s 941us/step - loss: 6.3996e-04 - binary_accuracy: 1.0000
Epoch 10/20
16/16 [==============================] - 0s 1ms/step - loss: 1.1477e-04 - binary_accuracy: 1.0000
Epoch 11/20
16/16 [==============================] - 0s 837us/step - loss: 6.8807e-04 - binary_accuracy: 1.0000
Epoch 12/20
16/16 [==============================] - 0s 2ms/step - loss: 5.0521e-04 - binary_accuracy: 1.0000
Epoch 13/20
16/16 [==============================] - 0s 851us/step - loss: 0.0015 - binary_accuracy: 1.0000
Epoch 14/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0012 - binary_accuracy: 1.0000
Epoch 15/20
16/16 [==============================] - 0s 765us/step - loss: 0.0014 - binary_accuracy: 1.0000
Epoch 16/20
16/16 [==============================] - 0s 906us/step - loss: 3.9230e-04 - binary_accuracy: 1.0000
Epoch 17/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0022 - binary_accuracy: 1.0000
Epoch 18/20
16/16 [==============================] - 0s 1ms/step - loss: 2.2149e-04 - binary_accuracy: 1.0000
Epoch 19/20
16/16 [==============================] - 0s 2ms/step - loss: 1.7345e-04 - binary_accuracy: 1.0000
Epoch 20/20
16/16 [==============================] - 0s 1ms/step - loss: 7.7950e-05 - binary_accuracy: 1.0000
[[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]]
The above model is easy to train since the problem is not a complex problem. You can see that the accuracy goes to 100% very quickly. Let's try to make predictions on unseen data with this new model -
np.round(model.predict([[5,1], #<- Is 5 < 1
[5,500], #<- Is 5 < 500
[5,6]])) #<- Is 5 < 6
array([[0.], #<- No
[1.], #<- Yes
[1.]], dtype=float32) #<- Yes
The problem is that your output layer has 17 neurons. This does not make sense. You would want to have 1 or 2 neurons at the output for a binary choice like this.
Change the last layer to:
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
You will than get for each input an output prediction. As you get probabilities and not 1 and 0 values, you will have to round them with e.g. np.round.
Sigmoid actiavtion function is used, to get probabilities between 0 and 1. 1 Output neuron is used, as your output is a binary choice and there is only 1 state that it can have.
However, this simply solves your issues in the code. I would argue that a Dense neural network is NOT the right choice for your problem and will probably have a hard time learning anything useful.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I'm trying to learn some machine learning and after looking up some tutorials I managed to train a linear regression and second degree equation with acceptable precision. I then decided to step it up a notch and try with: y = x^3 + 9x^2 .
Since now everything worked fine, but with this new set my loss remains above 100k all the time and predictions are off by about +-100.
Here is a list of the things i tried:
Increase number or layers
Increase number of neurons
Increase number of layers and neurons
Vary batch size
Increase and decrease learning rate
Divided the number of epochs by 3 and trained him 3 times while feeding him a random data set each time
Remove the kernel_regularizer (still have to understand what this does)
None of this solutions worked, each time loss was above 100k. Moreover I noticed that it's not a steady decrease, the resulting loss looks pretty random going from 100k to 800k and down again to 400k and then up to 1 million and down again....you can only notice that the average loss is going down but it's still hard to tell in that randomness
Some examples:
Epoch 832/10000
32/32 [==============================] - 0s 3ms/step - loss: 757260.0625 - val_loss: 624795.0000
Epoch 833/10000
32/32 [==============================] - 0s 3ms/step - loss: 784539.6250 - val_loss: 257286.3906
Epoch 834/10000
32/32 [==============================] - 0s 3ms/step - loss: 481110.4688 - val_loss: 246353.5469
Epoch 835/10000
32/32 [==============================] - 0s 3ms/step - loss: 383954.2812 - val_loss: 508324.5312
Epoch 836/10000
32/32 [==============================] - 0s 3ms/step - loss: 516217.7188 - val_loss: 543258.3750
Epoch 837/10000
32/32 [==============================] - 0s 3ms/step - loss: 1042559.3125 - val_loss: 1702137.1250
Epoch 838/10000
32/32 [==============================] - 0s 3ms/step - loss: 3192045.2500 - val_loss: 1154483.5000
Epoch 839/10000
32/32 [==============================] - 0s 3ms/step - loss: 1195508.7500 - val_loss: 4658847.0000
Epoch 840/10000
32/32 [==============================] - 0s 3ms/step - loss: 1251505.8750 - val_loss: 275300.7188
Epoch 841/10000
32/32 [==============================] - 0s 3ms/step - loss: 294105.2188 - val_loss: 330317.0000
Epoch 842/10000
32/32 [==============================] - 0s 3ms/step - loss: 528083.4375 - val_loss: 4624526.0000
Epoch 843/10000
32/32 [==============================] - 0s 4ms/step - loss: 3371695.2500 - val_loss: 2008547.0000
Epoch 844/10000
32/32 [==============================] - 0s 3ms/step - loss: 723132.8125 - val_loss: 884099.5625
Epoch 845/10000
32/32 [==============================] - 0s 3ms/step - loss: 635335.8750 - val_loss: 372132.1562
Epoch 846/10000
32/32 [==============================] - 0s 3ms/step - loss: 424794.2812 - val_loss: 349575.8438
Epoch 847/10000
32/32 [==============================] - 0s 3ms/step - loss: 266175.3125 - val_loss: 247624.6719
Epoch 848/10000
32/32 [==============================] - 0s 3ms/step - loss: 387106.7500 - val_loss: 1091736.7500
This was my original (and cleaner) code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep
model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1], kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=8, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
keras.layers.Dense(units=1)])
lr = 1e-1
decay = lr/10000
optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')
xs = np.random.random((10000, 1)) * 100 - 50;
ys = xs**3 + 9*xs**2
model.fit(xs, ys, epochs=10000, batch_size=256, validation_split=0.2)
print(model.predict([10.0]))
resp = input('Want to save model? y/n: ')
if resp == 'y':
model.save('zig-zag')
I also found this question where the reported solution would be to use relu, but I already had that implemented and copying the code didn't work either.
Am I missing something? What and why?
For numerical reasons neural networks often dont play nice with somewhat unbounded very large numbers. So just reducing the range of values for x from -50..50 to -5..5 will let your model train.
For your case you also want to remove the l2-regularizer since you cant really overfit here and definitely not have a decay of 1e-5. I gave it a go with lr=1e-2 and decay=lr/2
Epoch 1000/1000
32/32 [==============================] - 0s 2ms/step - loss: 0.1471 - val_loss: 0.1370
Full code:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from time import sleep
model = tf.keras.Sequential([keras.layers.Dense(units=8, activation='relu', input_shape=[1]),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=8, activation='relu'),
keras.layers.Dense(units=1)])
lr = 1e-2
decay = lr/2
optimizer = keras.optimizers.Adam(lr=lr, decay=decay)
model.compile(optimizer=optimizer, loss='mean_squared_error')
xs = np.random.random((10000, 1)) * 10 - 5
ys = xs**3 + 9*xs**2
print(np.shape(xs))
print(np.shape(ys))
model.fit(xs, ys, epochs=1000, batch_size=256, validation_split=0.2)
print(model.predict([4.0]))
Why does the error of my NN not divergate to zero when my input reveals the result? I always set input[2] to the right result, so the NN should set all weights to 0, except this one.
from random import random
import numpy
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
datax = []
datay = []
for i in range(100000):
input = []
for j in range(1000):
input.append(random())
yval=random()
# should be found out by the nn that input[2] is always the correct output
input[2] = yval
datax.append(input)
datay.append(yval)
datax = numpy.array(datax)
datay = numpy.array(datay)
model = Sequential()
model.add(Dense(10))
model.add(Dense(10))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam())
model.fit(datax, datay, epochs=100, batch_size=32, verbose=1)
it oscillates around e-05 but never gets really better than that
Epoch 33/100
3125/3125 [==============================] - 4s 1ms/step - loss: 1.2802e-04
Epoch 34/100
3125/3125 [==============================] - 4s 1ms/step - loss: 3.7720e-05
Epoch 35/100
3125/3125 [==============================] - 4s 1ms/step - loss: 4.0858e-05
Epoch 36/100
3125/3125 [==============================] - 4s 1ms/step - loss: 8.5453e-05
Epoch 37/100
3125/3125 [==============================] - 5s 1ms/step - loss: 5.5722e-05
Epoch 38/100
3125/3125 [==============================] - 5s 1ms/step - loss: 3.6459e-05
Epoch 39/100
3125/3125 [==============================] - 5s 1ms/step - loss: 1.3339e-05
Epoch 40/100
3125/3125 [==============================] - 5s 1ms/step - loss: 5.8943e-05
...
Epoch 100/100
3125/3125 [==============================] - 4s 1ms/step - loss: 1.5929e-05
The step of the gradient descent method is calculated as gradient multiplied by learning rate. So theoretically - you can not reach minimum of loss function.
Try decaying learning rate though (decaying to zero). If you are lucky - I think it could be possible because of descrete nature of float types.
# We have 2 inputs, 1 for each picture
left_input = Input(img_size)
right_input = Input(img_size)
# We will use 2 instances of 1 network for this task
convnet = MobileNetV2(weights='imagenet', include_top=False, input_shape=img_size,input_tensor=None)
convnet.trainable=True
x=convnet.output
x=tf.keras.layers.GlobalAveragePooling2D()(x)
x=Dense(320,activation='relu')(x)
x=Dropout(0.2)(x)
preds = Dense(101, activation='sigmoid')(x) # Apply sigmoid
convnet = Model(inputs=convnet.input, outputs=preds)
# Connect each 'leg' of the network to each input
# Remember, they have the same weights
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
# Getting the L1 Distance between the 2 encodings
L1_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1]))
# Add the distance function to the network
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='sigmoid')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
optimizer = Adam(lr, decay=2.5e-4)
#//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking
siamese_net.compile(loss=keras.losses.binary_crossentropy,optimizer=optimizer,metrics=['accuracy'])
siamese_net.summary()
and the result of training is as follows
Epoch 1/10
126/126 [==============================] - 169s 1s/step - loss: 0.5683 - accuracy: 0.6840 - val_loss: 0.4644 - val_accuracy: 0.8044
Epoch 2/10
126/126 [==============================] - 163s 1s/step - loss: 0.2032 - accuracy: 0.9795 - val_loss: 0.2117 - val_accuracy: 0.9681
Epoch 3/10
126/126 [==============================] - 163s 1s/step - loss: 0.1110 - accuracy: 0.9925 - val_loss: 0.1448 - val_accuracy: 0.9840
Epoch 4/10
126/126 [==============================] - 164s 1s/step - loss: 0.0844 - accuracy: 0.9950 - val_loss: 0.1384 - val_accuracy: 0.9820
Epoch 5/10
126/126 [==============================] - 163s 1s/step - loss: 0.0634 - accuracy: 0.9990 - val_loss: 0.0829 - val_accuracy: 1.0000
Epoch 6/10
126/126 [==============================] - 165s 1s/step - loss: 0.0526 - accuracy: 0.9995 - val_loss: 0.0729 - val_accuracy: 1.0000
Epoch 7/10
126/126 [==============================] - 164s 1s/step - loss: 0.0465 - accuracy: 0.9995 - val_loss: 0.0641 - val_accuracy: 1.0000
Epoch 8/10
126/126 [==============================] - 163s 1s/step - loss: 0.0463 - accuracy: 0.9985 - val_loss: 0.0595 - val_accuracy: 1.0000
The model is predicting with good accuracy, when i am comparing two dissimilar images. Further it is predicting really good with same class of images.
But when I am comparing Image1 with image1 itself, it is predicting that they are similar only with the probability of 0.5.
in other case if I compare image1 with image2, then it is predicting correctly with a probability of 0.8.(here image1 and image2 belongs to same class)
when I am comparing individual images, it is predicting correctly, I have tried different alternatives did not workout.
May I know what might be the error?
The L1 distance between two equal vectors is always zero.
When you pass the same image, the encodings generated are equal (encoded_l is equal to encoded_r). Hence, the input to your final sigmoid layer is a zero vector.
And, sigmoid(0) = 0.5.
This is the reason providing identical inputs to your model gives 0.5 as the output.
I have created the following toy dataset:
I am trying to predict the class with a neural net in keras:
model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
With nr_feats and nr_classes set to 2.
The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. Using Logistic Regression results in 100 percent accuracy.
I can not find what is going wrong here.
I have uploaded a notebook to github if you quickly want to try something.
EDIT 1
I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98.
This still seems extremely slow for such a simple dataset.
I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation.
I pre-process my dataframe as follows:
from sklearn.preprocessing import LabelEncoder
x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]
nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()
label_enc = LabelEncoder()
label_enc.fit(y_train)
y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)
Training and evaluation:
model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train), df_train.iloc[:, -1].values)
EDIT 2
After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later.
Note: Read the "Update" section at the end of my answer if you want the true reason. In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3).
I put together some code. It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like):
import numpy as np
import random
# generate random data with two features
n_samples = 200
n_feats = 2
cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))
# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
np.random.shuffle(indices)
x_train = x_train[indices]
y_train = y_train[indices]
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=True)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
dense_25 (Dense) (None, 2) 6
_________________________________________________________________
dense_26 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000
As you can see the accuracy never increases from 50%. What if you increase the number of epochs to say 50:
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 2) 6
_________________________________________________________________
dense_36 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625
The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs).
Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (i.e. quickly converge).
Why so many epochs needed?
I think it is because of these two reasons (combined):
1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and
2) The number of data points compared to the size of neural net (i.e. number of trainable parameters, which is 9 in example code above) is relatively large.
Therefore, it takes more epochs for the model to learn the weights. It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. As an evidence, just try to increase the number of units in the first dense layer. You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. Here you increase the capacity and therefore the model converges (i.e. trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue).
Side note:
Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation, the results will be "officially undefined" in this case.
Update:
One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. If the learning rate is too low, the model converges slowly. Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs:
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-1),
metrics=['accuracy'])
# or you may use adam
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=1e-1),
metrics=['accuracy'])
The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2, but it is not capable of predicting 2.
Subtract 1 from all your y values. As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax. Finally, use binary_crossentropy for the loss for binary classification problems.
I have a question about my NN model. I am using keras from python. My training consists of 1000 samples, each with 4320 features. There are 10 categories, and my Y contains numpy arrays of 10 elements with 0 on all the positions except one.
However, my NN doesn't learn from the first epoch and I probably have my model wrong, it's my first attempt of building a NN model and I must have got wrong a couple of things.
Epoch 1/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7110 - acc: 0.5796
Epoch 2/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 3/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 5/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 6/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 7/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
Epoch 8/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
And this is part of my NN code:
model = Sequential()
model.add(Dense(4320, input_dim=4320, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)
So, my X is a numpy array of length 1000 that contains other numpy arrays of 4320 elements. My Y is a numpy array of length 1000 that contains other numpy arrays of 10 elements (categories).
Am I doing something wrong or it just can't learn based on this training set? (On 1NN with manhattan distance I'm getting ~80% accuracy on this training set)
L.E.: After I've normalized the data, this is the output of my first 10 epochs:
Epoch 1/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.9834 - acc: 0.4360
Epoch 2/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.2943 - acc: 0.5080
Epoch 3/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.0326 - acc: 0.4070
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 8.7106 - acc: 0.4320
Epoch 5/150
1000/1000 [==============================] - 40s 40ms/step - loss: 7.7547 - acc: 0.4900
Epoch 6/150
1000/1000 [==============================] - 44s 44ms/step - loss: 7.2591 - acc: 0.5270
Epoch 7/150
1000/1000 [==============================] - 42s 42ms/step - loss: 8.5002 - acc: 0.4560
Epoch 8/150
1000/1000 [==============================] - 41s 41ms/step - loss: 9.9525 - acc: 0.3720
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 9.7160 - acc: 0.3920
Epoch 10/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.3523 - acc: 0.4140
Looks like it starts fluctuating so that seems to be good
It seems like your categories, classes are mutually exclusive since your target arrays are one-hot encoded (ie you never have to predict 2 classes at the same time). In that case, you should use softmax on your last layer to produce a distribution and train using categorical_crossentropy. If fact you can just set your targets as Y = [2,4,0,1] as your category indices and train with sparse_categorical_crossentropy which will save you the time of creating a 2 array of shape (samples, 10).
It seems like you have a lot of features, most likely the performance of your network will depend on how you pre-process your data. For continuous inputs, it's wise to normalise it and for discrete input encode it as one-hot to help the learning.