I tried to code a neural network which is trained on the California housing dataset, which I got from Aurelion Geron's GitHup.
But when I run the code, the net does not get trained and loss = nan.
Can someone explain what I did wrong?
Best regards, Robin
Link for the csv file: https://github.com/ageron/handson-ml/tree/master/datasets/housing
My Code:
import numpy
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
# load dataset
df = pd.read_csv("housing.csv", delimiter=",", header=0)
# split into input (X) and output (Y) variables
Y = df["median_house_value"].values
X = df.drop("median_house_value", axis=1)
# Inland / Not Inland -> True / False = 1 / 0
X["ocean_proximity"] = X["ocean_proximity"]== "INLAND"
X=X.values
X= X.astype(float)
Y= Y.astype(float)
model = Sequential()
model.add(Dense(100, activation="relu", input_dim=9))
model.add(Dense(1, activation="linear"))
# Compile model
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(X, Y, epochs=50, batch_size=1000, verbose=1)
I found the error, there was a missing value in the "total_bedrooms" column
You need to drop NaN values from you data.
After having a quick look at data, you also need to normalize your data (as everytime with Neural Nets, to help convergence).
To do this you can use Standard Scaler, Min-Max Scaler etc..
nan values in your DataFrame are causing this behavior. Drop rows with the nan values and normalize your data:
df = df[~df.isnull().any(axis=1)]
df.iloc[:,:-1]=((df.iloc[:,:-1]-df.iloc[:,:-1].min())/(df.iloc[:,:-1].max()-df.iloc[:,:-1].min()))
And you will get:
Epoch 1/50
1000/20433 [>.............................] - ETA: 3s - loss: 0.1732
20433/20433 [==============================] - 0s 11us/step - loss: 0.1001
Epoch 2/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0527
20433/20433 [==============================] - 0s 3us/step - loss: 0.0430
Epoch 3/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0388
20433/20433 [==============================] - 0s 2us/step - loss: 0.0338
Epoch 4/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0301
20433/20433 [==============================] - 0s 2us/step - loss: 0.0288
Epoch 5/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0300
20433/20433 [==============================] - 0s 2us/step - loss: 0.0259
Epoch 6/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0235
20433/20433 [==============================] - 0s 3us/step - loss: 0.0238
Epoch 7/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0242
20433/20433 [==============================] - 0s 2us/step - loss: 0.0225
Epoch 8/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0213
20433/20433 [==============================] - 0s 2us/step - loss: 0.0218
Epoch 9/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0228
20433/20433 [==============================] - 0s 2us/step - loss: 0.0214
Epoch 10/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0206
20433/20433 [==============================] - 0s 2us/step - loss: 0.0211
Related
So I am trying to create an LSTM that can predict the next time step of a double pendulum. The data that I am trying to train with is a (2001, 4) numpy array. (i.e. the first 5 rows will look like:
array([[ 1.04719755, 0. , 1.04719755, 0. ],
[ 1.03659984, -0.42301933, 1.04717544, -0.00178865],
[ 1.00508218, -0.83475539, 1.04682248, -0.01551541],
[ 0.95354768, -1.22094052, 1.04514269, -0.05838011],
[ 0.88372305, -1.56345555, 1.04009056, -0.15443162]])
where each row is a unique representation of the state of the double pendulum.)
So I wanted to created an LSTM that could learn to predict the next state given the current one.
Here was my code for it so far (full_sol is the (2001, 4) matrix:
import numpy as np
from tensorflow import keras
import tensorflow as tf
# full_sol = np.random.rand(2001, 4)
full_sol = full_sol.reshape((full_sol.shape[0], 1, full_sol.shape[1]))
model = keras.Sequential()
model.add(keras.layers.LSTM(100, input_shape=(None, 4), return_sequences=True, dropout=0.2))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(4, activation=tf.keras.layers.LeakyReLU(
alpha=0.3))))
model.compile(loss="mean_squared_error", optimizer="adam", metrics="accuracy")
history = model.fit(full_sol[:-1,:,:], full_sol[1:,:,:], epochs=20)
Then when I train, I get the following results:
Epoch 1/20
63/63 [==============================] - 3s 4ms/step - loss: 1.7181 - accuracy: 0.4200
Epoch 2/20
63/63 [==============================] - 0s 4ms/step - loss: 1.0481 - accuracy: 0.5155
Epoch 3/20
63/63 [==============================] - 0s 5ms/step - loss: 0.7584 - accuracy: 0.5715
Epoch 4/20
63/63 [==============================] - 0s 5ms/step - loss: 0.5134 - accuracy: 0.6420
Epoch 5/20
63/63 [==============================] - 0s 5ms/step - loss: 0.3944 - accuracy: 0.7260
Epoch 6/20
63/63 [==============================] - 0s 5ms/step - loss: 0.3378 - accuracy: 0.7605
Epoch 7/20
63/63 [==============================] - 0s 5ms/step - loss: 0.3549 - accuracy: 0.7825
Epoch 8/20
63/63 [==============================] - 0s 4ms/step - loss: 0.3528 - accuracy: 0.7995
Epoch 9/20
63/63 [==============================] - 0s 5ms/step - loss: 0.3285 - accuracy: 0.8020
Epoch 10/20
63/63 [==============================] - 0s 5ms/step - loss: 0.2874 - accuracy: 0.8030
Epoch 11/20
63/63 [==============================] - 0s 4ms/step - loss: 0.3072 - accuracy: 0.8135
Epoch 12/20
63/63 [==============================] - 0s 4ms/step - loss: 0.3075 - accuracy: 0.8035
Epoch 13/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2942 - accuracy: 0.8030
Epoch 14/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2637 - accuracy: 0.8170
Epoch 15/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2675 - accuracy: 0.8150
Epoch 16/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2644 - accuracy: 0.8085
Epoch 17/20
63/63 [==============================] - 0s 5ms/step - loss: 0.2479 - accuracy: 0.8200
Epoch 18/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2475 - accuracy: 0.8215
Epoch 19/20
63/63 [==============================] - 0s 4ms/step - loss: 0.2243 - accuracy: 0.8340
Epoch 20/20
63/63 [==============================] - 0s 5ms/step - loss: 0.2430 - accuracy: 0.8240
So, quite high accuracy. But when I test it on the training set, the predictions aren't very good.
E.g. when I predict the first value:
model.predict(tf.expand_dims(full_sol[0], axis = 0))
I get array([[[ 1.0172144 , -0.3535697 , 1.1287913 , -0.23707283]]],dtype=float32)
Instead of array([[ 1.03659984, -0.42301933, 1.04717544, -0.00178865]]).
Where have I gone wrong?
I don't think you are doing anything wrong. What you are getting is still fairly close to the actual value. You can either change your choice of metric so it accurately represents the degree of error in your predictions, or you could try to increase the accuracy further.
I am using the following code to train a model in Keras:
model_A.fit(train_X, train_Y, epochs=20)
The code works fine and the outputs are like below:
Epoch 1/20
1800/1800 [==============================] - 0s 34us/step - loss: 0.2764 - acc: 0.9033
Epoch 2/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2704 - acc: 0.9083
Epoch 3/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2687 - acc: 0.9094
Epoch 4/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2748 - acc: 0.9089
Epoch 5/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2902 - acc: 0.8922
Epoch 6/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2357 - acc: 0.9183
Epoch 7/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2499 - acc: 0.9183
Epoch 8/20
1800/1800 [==============================] - 0s 33us/step - loss: 0.2286 - acc: 0.9228
Epoch 9/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2325 - acc: 0.9194
Epoch 10/20
1800/1800 [==============================] - 0s 33us/step - loss: 0.2053 - acc: 0.9261
Epoch 11/20
1800/1800 [==============================] - 0s 33us/step - loss: 0.2256 - acc: 0.9161
Epoch 12/20
1800/1800 [==============================] - 0s 33us/step - loss: 0.2120 - acc: 0.9261
Epoch 13/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.2085 - acc: 0.9328
Epoch 14/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.1881 - acc: 0.9328
Epoch 15/20
1800/1800 [==============================] - 0s 31us/step - loss: 0.1835 - acc: 0.9344
Epoch 16/20
1800/1800 [==============================] - 0s 34us/step - loss: 0.1812 - acc: 0.9356
Epoch 17/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.1704 - acc: 0.9361
Epoch 18/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.1929 - acc: 0.9272
Epoch 19/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.1822 - acc: 0.9317
Epoch 20/20
1800/1800 [==============================] - 0s 32us/step - loss: 0.1713 - acc: 0.9417
I am wondering if there is a way to save the loss/accuracy values in an array, so I could plot them over epochs later.
The fit method returns a History object which contains information about the training process. For example:
# train the model
h = model.fit(...)
# loss values at the end of each epoch
h.history['loss']
# validation loss values per epoch (if you have used validation data)
h.history['val_loss']
# accuracy values at the end of each epoch (if you have used `acc` metric)
h.history['acc']
# validation accuracy values per epoch (if you have used `acc` metric and validation data)
h.history['val_acc']
# list of epochs number
h.epoch
Further, it's not necessary to store the History object in a variable (like h = model.fit(...)) because it could also be accessed using model.history.history (however, note that this history attribute would not be persisted when model is saved using model.save(...)).
I have created the following toy dataset:
I am trying to predict the class with a neural net in keras:
model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
With nr_feats and nr_classes set to 2.
The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. Using Logistic Regression results in 100 percent accuracy.
I can not find what is going wrong here.
I have uploaded a notebook to github if you quickly want to try something.
EDIT 1
I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98.
This still seems extremely slow for such a simple dataset.
I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation.
I pre-process my dataframe as follows:
from sklearn.preprocessing import LabelEncoder
x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]
nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()
label_enc = LabelEncoder()
label_enc.fit(y_train)
y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)
Training and evaluation:
model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train), df_train.iloc[:, -1].values)
EDIT 2
After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later.
Note: Read the "Update" section at the end of my answer if you want the true reason. In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3).
I put together some code. It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like):
import numpy as np
import random
# generate random data with two features
n_samples = 200
n_feats = 2
cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))
# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
np.random.shuffle(indices)
x_train = x_train[indices]
y_train = y_train[indices]
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=True)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
dense_25 (Dense) (None, 2) 6
_________________________________________________________________
dense_26 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000
As you can see the accuracy never increases from 50%. What if you increase the number of epochs to say 50:
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 2) 6
_________________________________________________________________
dense_36 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625
The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs).
Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (i.e. quickly converge).
Why so many epochs needed?
I think it is because of these two reasons (combined):
1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and
2) The number of data points compared to the size of neural net (i.e. number of trainable parameters, which is 9 in example code above) is relatively large.
Therefore, it takes more epochs for the model to learn the weights. It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. As an evidence, just try to increase the number of units in the first dense layer. You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. Here you increase the capacity and therefore the model converges (i.e. trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue).
Side note:
Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation, the results will be "officially undefined" in this case.
Update:
One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. If the learning rate is too low, the model converges slowly. Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs:
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-1),
metrics=['accuracy'])
# or you may use adam
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=1e-1),
metrics=['accuracy'])
The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2, but it is not capable of predicting 2.
Subtract 1 from all your y values. As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax. Finally, use binary_crossentropy for the loss for binary classification problems.
I have a question about my NN model. I am using keras from python. My training consists of 1000 samples, each with 4320 features. There are 10 categories, and my Y contains numpy arrays of 10 elements with 0 on all the positions except one.
However, my NN doesn't learn from the first epoch and I probably have my model wrong, it's my first attempt of building a NN model and I must have got wrong a couple of things.
Epoch 1/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7110 - acc: 0.5796
Epoch 2/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 3/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 5/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 6/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 7/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
Epoch 8/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
And this is part of my NN code:
model = Sequential()
model.add(Dense(4320, input_dim=4320, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)
So, my X is a numpy array of length 1000 that contains other numpy arrays of 4320 elements. My Y is a numpy array of length 1000 that contains other numpy arrays of 10 elements (categories).
Am I doing something wrong or it just can't learn based on this training set? (On 1NN with manhattan distance I'm getting ~80% accuracy on this training set)
L.E.: After I've normalized the data, this is the output of my first 10 epochs:
Epoch 1/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.9834 - acc: 0.4360
Epoch 2/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.2943 - acc: 0.5080
Epoch 3/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.0326 - acc: 0.4070
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 8.7106 - acc: 0.4320
Epoch 5/150
1000/1000 [==============================] - 40s 40ms/step - loss: 7.7547 - acc: 0.4900
Epoch 6/150
1000/1000 [==============================] - 44s 44ms/step - loss: 7.2591 - acc: 0.5270
Epoch 7/150
1000/1000 [==============================] - 42s 42ms/step - loss: 8.5002 - acc: 0.4560
Epoch 8/150
1000/1000 [==============================] - 41s 41ms/step - loss: 9.9525 - acc: 0.3720
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 9.7160 - acc: 0.3920
Epoch 10/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.3523 - acc: 0.4140
Looks like it starts fluctuating so that seems to be good
It seems like your categories, classes are mutually exclusive since your target arrays are one-hot encoded (ie you never have to predict 2 classes at the same time). In that case, you should use softmax on your last layer to produce a distribution and train using categorical_crossentropy. If fact you can just set your targets as Y = [2,4,0,1] as your category indices and train with sparse_categorical_crossentropy which will save you the time of creating a 2 array of shape (samples, 10).
It seems like you have a lot of features, most likely the performance of your network will depend on how you pre-process your data. For continuous inputs, it's wise to normalise it and for discrete input encode it as one-hot to help the learning.
I am trying to run predictions on timeseries data using Keras/Theanos backend in python where I am taking factors from past few days to predict for next day.I am able to generate predictions using other algorithms like xgboost but wanted to try ANN but got stuck with index out of bounds error during prediction step
Part of code is something like this:
clfnn = Sequential()
clfnn.add(Dense(32, input_dim=9,init='uniform',activation='tanh'))
clfnn.add(Dense(9,init='uniform',activation='tanh'))
clfnn.add(Dense(1,activation='tanh'))
clfnn.compile(loss='mse', optimizer=sgd, metrics=['accuracy'])
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(testsample[factorsnew])
testsample[factorsnew]=imp.transform(testsample[factorsnew])
validationsample[factorsnew]=imp.transform(validationsample[factorsnew])
models={'prediction':clfnn}
for key in models:
try:
models[key].fit(testsample[factorsnew].as_matrix(),testsample['returnranksnn'].as_matrix(),verbose=1)
validationsample[key]=models[key].predict_proba(validationsample[factorsnew].as_matrix(),verbose=1)[:,1]
except:
print sys.exc_info()[0]
print sys.exc_info()[1]
pass
The model somehow seems to fit without any problems but prediction step is giving an error. Output looks something like this:
Epoch 1/10
32240/32240 [==============================] - 0s - loss: 0.2506 - acc: 0.5980
Epoch 2/10
32240/32240 [==============================] - 0s - loss: 0.2504 - acc: 0.6054
Epoch 3/10
32240/32240 [==============================] - 0s - loss: 0.2504 - acc: 0.6069
Epoch 4/10
32240/32240 [==============================] - 0s - loss: 0.2505 - acc: 0.6028
Epoch 5/10
32240/32240 [==============================] - 0s - loss: 0.2504 - acc: 0.6015
Epoch 6/10
32240/32240 [==============================] - 0s - loss: 0.2503 - acc: 0.6067
Epoch 7/10
32240/32240 [==============================] - 0s - loss: 0.2504 - acc: 0.6020
Epoch 8/10
32240/32240 [==============================] - 0s - loss: 0.2505 - acc: 0.5999
Epoch 9/10
32240/32240 [==============================] - 0s - loss: 0.2504 - acc: 0.6040
Epoch 10/10
32240/32240 [==============================] - 0s - loss: 0.2505 - acc: 0.6024
32/40 [=======================>......] - ETA: 0s<type 'exceptions.IndexError'>
index 1 is out of bounds for axis 1 with size 1
Note: The data is normalized without any NaNs and predictor variable is int type having only two outcomes 0 or 1 and outcome should be just a probablity number
Tried changing settings of the optimizers, different factors in data but in vain. Majority of the samples get stuck at 32/40 or 32/*** as you see in the output. Any ideas on what I am missing here? Thanks