Failure plotting the graph and accuracy - python

I'm new to Machine learning and I'm trying to predict the Lira rate with keras. I think the values are right, but I cannot properly plot the values. It looks like this: Image
and here's my code (the csv file is on German and because of this here are the translations: Datum -> Date, Erster -> Open, Hoch -> High, Tief -> Low, Schlusskurs -> Close):
The problem is below:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
X_train = []
y_train = []
csv_file = "wkn_A0C32V_historic.csv" #csv file (path)
data = pd.read_csv(csv_file, sep=";") #reading the csv file
data["Erster vorher"] = data["Erster"].shift(-1) #moving the data in Erster(Open) one step backwards
data["Erster"] = data["Erster"].str.replace(",", ".") #replacing all commas with dots in order to calculate with float numbers
data["Erster vorher"] = data["Erster vorher"].str.replace(",", ".") #same here
data["Changes"] = (data["Erster"].astype(float) / data["Erster vorher"].astype(float)) - 1 #calculating the changes
data = data.dropna() #dropping the NaNs
changes = data["Changes"]
#X_train = (number_of_examples, sequence_length, input_dimension)
for i in range(len(changes) - 20):
X_train.append(np.array(changes[i+1:i+21][::-1]))
y_train.append(changes[i])
X_train = np.array(X_train).reshape(-1, 20, 1)
y_train = np.array(y_train)
print("X_train shape: " + str(X_train.shape))
print("y_train shape: " + str(y_train.shape))
#Training the data
model = Sequential()
model.add(LSTM(1, input_shape=(20, 1)))
model.compile(optimizer="rmsprop", loss="mse", metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=32, epochs=10)
preds = model.predict(X_train)
preds = preds.reshape(-1)
print("Shape of predictions: " + str(preds.shape))
preds = np.append(preds, np.zeros(20))
data["predictions"] = preds
data["Open_predicted"] = data["Erster vorher"].astype(float) * (1 + data["predictions"].astype(float)) #calculating the new Open with the predicted numbers
print(data)
import matplotlib.pyplot as plt
dates = np.array(data["Datum"]).astype(np.datetime64)
#HERE BEGINS THE PROBLEM...
plt.plot(dates, data["Erster"], label="Erster")
plt.plot(dates, data["Open_predicted"], label="Erster (predicted)")
plt.legend()
plt.show()
Output:
Epoch 9/10
32/3444 [..............................] - ETA: 0s - loss: 9.5072e-05 - accuracy: 0.1250
448/3444 [==>...........................] - ETA: 0s - loss: 1.8344e-04 - accuracy: 0.0513
960/3444 [=======>......................] - ETA: 0s - loss: 1.2734e-04 - accuracy: 0.0583
1472/3444 [===========>..................] - ETA: 0s - loss: 1.0480e-04 - accuracy: 0.0577
1984/3444 [================>.............] - ETA: 0s - loss: 9.7956e-05 - accuracy: 0.0600
2464/3444 [====================>.........] - ETA: 0s - loss: 9.0399e-05 - accuracy: 0.0621
2976/3444 [========================>.....] - ETA: 0s - loss: 8.5287e-05 - accuracy: 0.0649
3444/3444 [==============================] - 0s 122us/step - loss: 8.1555e-05 - accuracy: 0.0633
Epoch 10/10
32/3444 [..............................] - ETA: 0s - loss: 5.5561e-05 - accuracy: 0.0312
544/3444 [===>..........................] - ETA: 0s - loss: 6.1705e-05 - accuracy: 0.0662
1056/3444 [========>.....................] - ETA: 0s - loss: 1.2215e-04 - accuracy: 0.0644
1536/3444 [============>.................] - ETA: 0s - loss: 9.9676e-05 - accuracy: 0.0651
2048/3444 [================>.............] - ETA: 0s - loss: 9.2219e-05 - accuracy: 0.0625
2592/3444 [=====================>........] - ETA: 0s - loss: 8.8050e-05 - accuracy: 0.0625
3104/3444 [==========================>...] - ETA: 0s - loss: 8.1685e-05 - accuracy: 0.0651
3444/3444 [==============================] - 0s 118us/step - loss: 8.1349e-05 - accuracy: 0.0633
Shape of predictions: (3444,)
Datum Erster Hoch ... Changes predictions Open_predicted
0 2020-09-04 8.8116 8,8226 ... 0.011816 0.000549 8.713479
1 2020-09-03 8.7087 8,8263 ... -0.006457 0.001141 8.775301
2 2020-09-02 8.7653 8,7751 ... -0.005051 0.001849 8.826093
3 2020-09-01 8.8098 8,8377 ... 0.009465 0.001102 8.736818
4 2020-08-31 8.7272 8,7993 ... 0.000069 0.001149 8.736630
... ... ... ... ... ... ... ...
3459 2009-01-07 2.0449 2,1288 ... -0.021392 0.000000 2.089600
3460 2009-01-06 2.0896 2,0922 ... -0.020622 0.000000 2.133600
3461 2009-01-05 2.1336 2,1477 ... 0.002914 0.000000 2.127400
3462 2009-01-04 2.1274 2,1323 ... -0.005377 0.000000 2.138900
3463 2009-01-02 2.1389 2,1521 ... 0.000000 0.000000 2.138900
[3464 rows x 9 columns]

From the graph, two things stand out: (1) Erster and Erster (predicted) appear as if they are on different scales of magnitude, and (2) the large amount of labels on the y-axis label are reminiscent of what you get when you plot datetimes, instead of numbers. I imagine there is some mix-up somewhere, but it is not obvious where.
My suggestions for troubleshooting are: (i) plotting Erster vs Erster (predicted) to check that the scales are similar, and (ii) print the output of data.info() to check that the data types are as expected.
Side note: I recommend sorting the data frame to have increasing order ascending in date.

Related

Deep learning AI for integers in a sequence

I am new to ML, and I would like to use keras to categorize every number in a sequence as a 1 or 0 depending on whether it is greater than the previous number. That is, if I had:
sequence a = [1, 2, 6, 4, 5],
The solution should be:
sequence b = [0, 1, 1, 0, 1].
So far, I have written:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1,1])])
model.add(tf.keras.layers.Dense(17))
model.add(tf.keras.layers.Dense(17))
model.compile(optimizer='sgd', loss='BinaryCrossentropy', metrics=['binary_accuracy'])
b = [1,6,8,3,5,8,90,5,432,3,5,6,8,8,4,234,0]
a = [0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,0]
b = np.array(b, dtype=float)
a = np.array(a, dtype=float)
model.fit(b, a, epochs=500, batch_size=1)
# # Generate predictions for samples
predictions = model.predict(b)
print(predictions)
When I do this, I end up with:
Epoch 500/500
17/17 [==============================] - 0s 499us/step - loss: 7.9229 - binary_accuracy: 0.4844
[[[-1.37064695e+01 4.70858345e+01 -4.67341652e+01 -1.94298875e+00
5.75960045e+01 6.70146179e+01 6.34545479e+01 -4.86319550e+02
2.26250134e+01 -8.60109329e+00 -4.03220863e+01 -1.67574768e+01
3.36148148e+01 -4.55171967e+00 -1.39924898e+01 6.31023712e+01
-9.14120102e+00]]
[[-6.92644653e+01 2.40270264e+02 -2.37715302e+02 -9.42625141e+00
2.93314209e+02 3.41092743e+02 3.23760315e+02 -2.49306396e+03
1.15242020e+02 -4.38339310e+01 -2.05973328e+02 -8.48139114e+01
1.70274872e+02 -2.48692398e+01 -7.15372696e+01 3.22131958e+02
-4.57872620e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-3.59296684e+01 1.24359612e+02 -1.23126640e+02 -4.93629456e+00
1.51883270e+02 1.76645889e+02 1.67576874e+02 -1.28901733e+03
5.96718216e+01 -2.26942272e+01 -1.06582588e+02 -4.39800491e+01
8.82788391e+01 -1.26787395e+01 -3.70104065e+01 1.66714172e+02
-2.37996235e+01]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-1.00263879e+03 3.48576855e+03 -3.44619800e+03 -1.35145050e+02
4.25337939e+03 4.94560596e+03 4.69689697e+03 -3.62063594e+04
1.67120789e+03 -6.35745117e+02 -2.98891406e+03 -1.22816174e+03
2.46616406e+03 -3.66204163e+02 -1.03828992e+03 4.67382764e+03
-6.61441223e+02]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-4.80280518e+03 1.66995840e+04 -1.65093086e+04 -6.47000305e+02
2.03765059e+04 2.36925508e+04 2.25018145e+04 -1.73467625e+05
8.00621289e+03 -3.04566919e+03 -1.43194590e+04 -5.88322070e+03
1.18137129e+04 -1.75592432e+03 -4.97435352e+03 2.23914492e+04
-3.16803076e+03]]
[[-3.59296684e+01 1.24359612e+02 -1.23126640e+02 -4.93629456e+00
1.51883270e+02 1.76645889e+02 1.67576874e+02 -1.28901733e+03
5.96718216e+01 -2.26942272e+01 -1.06582588e+02 -4.39800491e+01
8.82788391e+01 -1.26787395e+01 -3.70104065e+01 1.66714172e+02
-2.37996235e+01]]
[[-5.81528549e+01 2.01633392e+02 -1.99519104e+02 -7.92959309e+00
2.46170563e+02 2.86277161e+02 2.71699158e+02 -2.09171509e+03
9.67186279e+01 -3.67873497e+01 -1.72843094e+02 -7.12026062e+01
1.42942856e+02 -2.08057709e+01 -6.00283318e+01 2.70326050e+02
-3.84580460e+01]]
[[-6.92644653e+01 2.40270264e+02 -2.37715302e+02 -9.42625141e+00
2.93314209e+02 3.41092743e+02 3.23760315e+02 -2.49306396e+03
1.15242020e+02 -4.38339310e+01 -2.05973328e+02 -8.48139114e+01
1.70274872e+02 -2.48692398e+01 -7.15372696e+01 3.22131958e+02
-4.57872620e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-9.14876480e+01 3.17544006e+02 -3.14107819e+02 -1.24195509e+01
3.87601562e+02 4.50723969e+02 4.27882660e+02 -3.29576172e+03
1.52288818e+02 -5.79270554e+01 -2.72233856e+02 -1.12036469e+02
2.24938889e+02 -3.29962883e+01 -9.45551834e+01 4.25743744e+02
-6.04456978e+01]]
[[-4.70412598e+01 1.62996490e+02 -1.61322891e+02 -6.43295908e+00
1.99026932e+02 2.31461517e+02 2.19638016e+02 -1.69036609e+03
7.81952209e+01 -2.97407875e+01 -1.39712814e+02 -5.75913391e+01
1.15610855e+02 -1.67422562e+01 -4.85193672e+01 2.18520096e+02
-3.11288433e+01]]
[[-2.60270850e+03 9.04948047e+03 -8.94645508e+03 -3.50663330e+02
1.10420654e+04 1.28390557e+04 1.21937041e+04 -9.40005859e+04
4.33857861e+03 -1.65045227e+03 -7.75966846e+03 -3.18818774e+03
6.40197412e+03 -9.51349304e+02 -2.69557886e+03 1.21338779e+04
-1.71684766e+03]]
[[-2.59487200e+00 8.44894505e+00 -8.53793907e+00 -4.46333081e-01
1.04523640e+01 1.21989994e+01 1.13933916e+01 -8.49708328e+01
4.10160637e+00 -1.55452514e+00 -7.19183874e+00 -3.14619255e+00
6.28279734e+00 -4.88203079e-01 -2.48353434e+00 1.12964716e+01
-1.81198704e+00]]]
There are few issues with how you are approaching this -
Your setup for the deep learning problem is flawed. You want to use the information of the previous element to infer the labels for the next element. But for inference (and training), you only pass the current element. If tomorrow I deploy this model, imagine what would happen. The only information I will provide you, say, "15" and as you if it's bigger than the previous element, which doesn't exist. How will your model respond?
Secondly, why are your output layer is predicting a 17-dimensional vector? Shouldn't the goal be to predict a 0 or 1 (probability)? In that case your output should be a single element with sigmoid activation. Refer to this diagram as a guide for your future setups for neural networks.
Third, you are not using any activation functions which is the core reason to be using neural networks (nonlinearity). Without activation functions, you are just building a standard regression model. Here is a basic proof -
#2 layer neural network without activation
h = W1.X+B1
o = W2.h+B2
o = W2.(W1.X+B1)+B2
= W2.W1.X + (W1.B1+B2)
= W3.X + B3 #Same as linear regression!
#2 layer neural network with activations.
h = activation(W1.X+B1)
o = activation(W2.h+B2)
I would advise starting from basics of neural networks to first build best practices, then jumping into making your own problem statements. The Keras author Fchollet has some excellent starter notebooks that you can explore.
For your case, try these modifications -
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
#Modify input shape and output shape + add activations
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=(2,))]) #<------
model.add(tf.keras.layers.Dense(17, activation='relu')) #<------
model.add(tf.keras.layers.Dense(1, activation='sigmoid')) #<------
model.compile(optimizer='sgd', loss='BinaryCrossentropy', metrics=['binary_accuracy'])
#create 2 features, 1st is previous element 2nd is current element
b = [1,6,8,3,5,8,90,5,432,3,5,6,8,8,4,234,0]
b = np.array([i for i in zip(b,b[1:])]) #<---- (16,2)
#Start from first paid of elements
a = np.array([0,1,1,0,1,1,1,0,1,0,1,1,1,0,0,1,0])[1:] #<---- (16,)
model.fit(b, a, epochs=20, batch_size=1)
# # Generate predictions for samples
predictions = model.predict(b)
print(np.round(predictions))
Epoch 1/20
16/16 [==============================] - 0s 1ms/step - loss: 3.0769 - binary_accuracy: 0.7086
Epoch 2/20
16/16 [==============================] - 0s 823us/step - loss: 252.6490 - binary_accuracy: 0.6153
Epoch 3/20
16/16 [==============================] - 0s 1ms/step - loss: 3.8109 - binary_accuracy: 0.9212
Epoch 4/20
16/16 [==============================] - 0s 787us/step - loss: 0.0131 - binary_accuracy: 0.9845
Epoch 5/20
16/16 [==============================] - 0s 2ms/step - loss: 0.0767 - binary_accuracy: 1.0000
Epoch 6/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0143 - binary_accuracy: 0.9800
Epoch 7/20
16/16 [==============================] - 0s 2ms/step - loss: 0.0111 - binary_accuracy: 1.0000
Epoch 8/20
16/16 [==============================] - 0s 2ms/step - loss: 4.0658e-04 - binary_accuracy: 1.0000
Epoch 9/20
16/16 [==============================] - 0s 941us/step - loss: 6.3996e-04 - binary_accuracy: 1.0000
Epoch 10/20
16/16 [==============================] - 0s 1ms/step - loss: 1.1477e-04 - binary_accuracy: 1.0000
Epoch 11/20
16/16 [==============================] - 0s 837us/step - loss: 6.8807e-04 - binary_accuracy: 1.0000
Epoch 12/20
16/16 [==============================] - 0s 2ms/step - loss: 5.0521e-04 - binary_accuracy: 1.0000
Epoch 13/20
16/16 [==============================] - 0s 851us/step - loss: 0.0015 - binary_accuracy: 1.0000
Epoch 14/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0012 - binary_accuracy: 1.0000
Epoch 15/20
16/16 [==============================] - 0s 765us/step - loss: 0.0014 - binary_accuracy: 1.0000
Epoch 16/20
16/16 [==============================] - 0s 906us/step - loss: 3.9230e-04 - binary_accuracy: 1.0000
Epoch 17/20
16/16 [==============================] - 0s 1ms/step - loss: 0.0022 - binary_accuracy: 1.0000
Epoch 18/20
16/16 [==============================] - 0s 1ms/step - loss: 2.2149e-04 - binary_accuracy: 1.0000
Epoch 19/20
16/16 [==============================] - 0s 2ms/step - loss: 1.7345e-04 - binary_accuracy: 1.0000
Epoch 20/20
16/16 [==============================] - 0s 1ms/step - loss: 7.7950e-05 - binary_accuracy: 1.0000
[[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]]
The above model is easy to train since the problem is not a complex problem. You can see that the accuracy goes to 100% very quickly. Let's try to make predictions on unseen data with this new model -
np.round(model.predict([[5,1], #<- Is 5 < 1
[5,500], #<- Is 5 < 500
[5,6]])) #<- Is 5 < 6
array([[0.], #<- No
[1.], #<- Yes
[1.]], dtype=float32) #<- Yes
The problem is that your output layer has 17 neurons. This does not make sense. You would want to have 1 or 2 neurons at the output for a binary choice like this.
Change the last layer to:
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
You will than get for each input an output prediction. As you get probabilities and not 1 and 0 values, you will have to round them with e.g. np.round.
Sigmoid actiavtion function is used, to get probabilities between 0 and 1. 1 Output neuron is used, as your output is a binary choice and there is only 1 state that it can have.
However, this simply solves your issues in the code. I would argue that a Dense neural network is NOT the right choice for your problem and will probably have a hard time learning anything useful.

How to take as Input a list of arrays in Keras API

Well, i'm new to Machine Learning, and so with Keras. I'm trying to create a model from which can be passed as Input a list of arrays of arrays (a list of 6400 arrays within 2 arrays).
This is my code's problem:
XFIT = np.array([x_train, XX_train])
YFIT = np.array([y_train, yy_train])
Inputs = keras.layers.Input(shape=(6400, 2))
hidden1 = keras.layers.Dense(units=100, activation="sigmoid")(Inputs)
hidden2 = keras.layers.Dense(units=100, activation='relu')(hidden1)
predictions = keras.layers.Dense(units=3, activation='softmax')(hidden2)
model = keras.Model(inputs=Inputs, outputs=predictions)
There's no error; however, the Input layer (Inputs) forces me to pass a (6400, 2) shape, as each array (x_train and XX_train) has 6400 arrays inside. The result, with the epochs done, is this:
Train on 2 samples
Epoch 1/5
2/2 [==============================] - 1s 353ms/sample - loss: 1.1966 - accuracy: 0.2488
Epoch 2/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.1303 - accuracy: 0.2544
Epoch 3/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0982 - accuracy: 0.3745
Epoch 4/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0854 - accuracy: 0.3745
Epoch 5/5
2/2 [==============================] - 0s 9ms/sample - loss: 1.0835 - accuracy: 0.3745
Process finished with exit code 0
I can't train more than twice in each epoch because of the input shape. How can I change this input?
I have triend other shapes but they got me errors.
x_train, XX_train seems like this
[[[0.505834 0.795461]
[0.843175 0.975741]
[0.22349 0.035036]
...
[0.884796 0.867509]
[0.396942 0.659936]
[0.873194 0.05454 ]]
[[0.95968 0.281957]
[0.137547 0.390005]
[0.635382 0.901555]
...
[0.887062 0.486206]
[0.49827 0.949123]
[0.034411 0.983711]]]
Thank you and forgive me if i've commited any fault, first time in Keras and first time in StackOverFlow :D
You are almost there. The problem is with:
XFIT = np.array([x_train, XX_train])
YFIT = np.array([y_train, yy_train])
Let's see with an example:
import numpy as np
x_train = np.random.random((6400, 2))
y_train = np.random.randint(2, size=(6400,1))
xx_train = np.array([x_train, x_train])
yy_train = np.array([y_train, y_train])
print(xx_train.shape)
(2, 6400, 2)
print(yy_train.shape)
(2, 6400, 1)
In the array, we have 2 batches with 6400 samples each. This means when we call model.fit, it only has 2 batches to train on. Instead, what we can do:
xx_train = np.vstack([x_train, x_train])
yy_train = np.vstack([y_train, y_train])
print(xx_train.shape)
(12800, 2)
print(yy_train.shape)
(12800, 1)
Now, we have correctly joined both sample and can now train.
Inputs = Input(shape=(2, ))
hidden1 = Dense(units=100, activation="sigmoid")(Inputs)
hidden2 = Dense(units=100, activation='relu')(hidden1)
predictions = Dense(units=1, activation='sigmoid')(hidden2)
model = Model([Inputs], outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(xx_train, yy_train, batch_size=10, epochs=5)
Train on 12800 samples
Epoch 1/5
12800/12800 [==============================] - 3s 216us/sample - loss: 0.6978 - acc: 0.5047
Epoch 2/5
12800/12800 [==============================] - 2s 186us/sample - loss: 0.6952 - acc: 0.5018
Epoch 3/5
12800/12800 [==============================] - 3s 196us/sample - loss: 0.6942 - acc: 0.4962
Epoch 4/5
12800/12800 [==============================] - 3s 217us/sample - loss: 0.6938 - acc: 0.4898
Epoch 5/5
12800/12800 [==============================] - 3s 217us/sample - loss: 0.6933 - acc: 0.5002

Getting constant Prediction values using LSTM Keras syntax

I am trying to predict the growth rate of a user using LSTM and Adam algo. But the predictions which I am getting from code is way far then accurate values. I am new to ML and just trying to learn how things are measured in ML. That what does units basically do in the LSTM model. I am reading values from CSV and trying to find the Growth rate of a user based on the amount he collected in 2 years. But my Predictions seem to be giving inaccurate values. Can anyone tell me how Can I find the correct prediction in order to get the Growth rate of a user?
Here my code:
import pymysql
import pandas as pd
import numpy as np
import csv
from datetime import datetime
import time
import json
import matplotlib.pyplot as plt
import seaborn as sns
import pprint
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from keras.layers import Input
import os
os.environ['KERAS_BACKEND']='tensorflow'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.layers.recurrent import LSTM
from matplotlib import style
from keras.layers import Activation, Dense, Dropout
df = pd.read_csv("trakop.csv")
print("="*50)
print("First Five Rows ","\n")
print(df.head(2),"\n")
dataset = df
dataset["Month"] = pd.to_datetime(df["timestamp"]).dt.month
dataset["Year"] = pd.to_datetime(df["timestamp"]).dt.year
dataset["Date"] = pd.to_datetime(df["timestamp"]).dt.date
dataset["Time"] = pd.to_datetime(df["timestamp"]).dt.time
dataset["Week"] = pd.to_datetime(df["timestamp"]).dt.week
dataset["Day"] = pd.to_datetime(df["timestamp"]).dt.day_name()
dataset["Hour"] = pd.to_datetime(df["timestamp"]).dt.hour
dataset = df.set_index("timestamp")
dataset.index = pd.to_datetime(dataset.index)
dataset.head(1)
print(df.Year.unique(),"\n")
print("Total Number of Unique Year", df.Year.nunique(), "\n")
NewDataSet = dataset.resample('D').mean()
# print(NewDataSet)
print("Old Dataset ",dataset.shape )
print("New Dataset ",NewDataSet.shape )
excludedValue = 5
TestData = NewDataSet.tail(10)
Training_Set = NewDataSet.iloc[:,0:1]
Training_Set = Training_Set[:-excludedValue]
print("Training Set Shape ", Training_Set.shape)
print("Test Set Shape ", TestData.shape)
Training_Set = Training_Set.values
sc = MinMaxScaler(feature_range=(0, 1))
Train = sc.fit_transform(Training_Set)
X_Train = []
Y_Train = []
# Range should be fromm 60 Values to END
for i in range(excludedValue, Train.shape[0]):
# X_Train 0-9
X_Train.append(Train[i- excludedValue:i])
# Y Would be 10 th Value based on past 10 Values
Y_Train.append(Train[i])
# Convert into Numpy Array
X_Train = np.array(X_Train)
Y_Train = np.array(Y_Train)
print(X_Train.shape)
print(Y_Train.shape)
X_Train = np.reshape(X_Train, newshape=(X_Train.shape[0], X_Train.shape[1], 1))
X_Train.shape
regressor = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1, return_sequences = True, input_shape = (X_Train.shape[1], 1)))
regressor.add(Dropout(0.4))
# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))
# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))
# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1))
regressor.add(Dropout(0.4))
# Adding the output layer
regressor.add(Dense(units = 1))
# Compiling the RNN
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics=['acc'])
regressor.fit(X_Train, Y_Train, epochs = 30, batch_size = 12,verbose=2)
Df_Total = pd.concat((NewDataSet[["amount"]], TestData[["amount"]]), axis=0)
Df_Total.shape
inputs = Df_Total[len(Df_Total) - len(TestData) - excludedValue:].values
# We need to Reshape
inputs = inputs.reshape(-1,1)
# Normalize the Dataset
inputs = sc.transform(inputs)
X_test = []
for i in range(excludedValue, inputs.shape[0]):
X_test.append(inputs[i- excludedValue:i])
# Convert into Numpy Array
X_test = np.array(X_test)
# Reshape before Passing to Network
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Pass to Model
predicted_raise = regressor.predict(X_test)
# Do inverse Transformation to get Values
predicted_raise = sc.inverse_transform(predicted_raise)
Predicted_Amount = predicted_raise
dates = TestData.index.to_list()
True_Amount = TestData["amount"].to_list()
Predicted_Amount = predicted_raise
dates = TestData.index.to_list()
growth_rate= (True_Amount-Predicted_Amount)/True_Amount*100
Machine_Df = pd.DataFrame(data={
"Date":dates,
"TrueAmount": True_Amount,
"PredictedAmount":[x[0] for x in Predicted_Amount ],
"Growthrate": [x[0] for x in growth_rate]
})
print(Machine_Df)
fig = plt.figure()
ax1= fig.add_subplot(111)
x = dates
y = True_Amount
y1 = Predicted_Amount
plt.plot(x,y, color="green")
plt.plot(x,y1, color="red")
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.xlabel('Dates')
plt.ylabel("Amount")
plt.title("Machine Learned the Pattern Predicting Future Values ")
plt.legend()
Here is what I am getting in my output:
('First Five Rows ', '\n')
( timestamp amount
0 2019-09-08 06:30:23 38.0
1 2019-09-08 06:36:48 19.0, '\n')
(array([2019, 2020]), '\n')
('Total Number of Unique Year', 2, '\n')
('Old Dataset ', (12492, 8))
('New Dataset ', (129, 5))
('Training Set Shape ', (124, 1))
('Test Set Shape ', (10, 5))
(119, 5, 1)
(119, 1)
Epoch 1/30
- 15s - loss: 0.0177 - acc: 0.0084
Epoch 2/30
- 1s - loss: 0.0165 - acc: 0.0084
Epoch 3/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 4/30
- 1s - loss: 0.0167 - acc: 0.0084
Epoch 5/30
- 1s - loss: 0.0157 - acc: 0.0084
Epoch 6/30
- 1s - loss: 0.0158 - acc: 0.0084
Epoch 7/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 8/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 9/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 10/30
- 1s - loss: 0.0160 - acc: 0.0084
Epoch 11/30
- 1s - loss: 0.0158 - acc: 0.0084
Epoch 12/30
- 1s - loss: 0.0155 - acc: 0.0084
Epoch 13/30
- 1s - loss: 0.0157 - acc: 0.0084
Epoch 14/30
- 1s - loss: 0.0155 - acc: 0.0084
Epoch 15/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 16/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 17/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 18/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 19/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 20/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 21/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 22/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 23/30
- 1s - loss: 0.0150 - acc: 0.0084
Epoch 24/30
- 1s - loss: 0.0153 - acc: 0.0084
Epoch 25/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 26/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 27/30
- 1s - loss: 0.0152 - acc: 0.0084
Epoch 28/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 29/30
- 1s - loss: 0.0151 - acc: 0.0084
Epoch 30/30
- 1s - loss: 0.0151 - acc: 0.0084
Date Growthrate PredictedAmount TrueAmount
0 2020-01-05 1.695584 122.266731 124.375625
1 2020-01-06 1.691683 122.271584 98.166667
2 2020-01-07 1.682077 122.283531 120.892473
3 2020-01-08 1.690008 122.273666 84.863636
4 2020-01-09 1.694407 122.268196 94.673077
5 2020-01-10 1.706436 122.253235 99.140341
6 2020-01-11 1.700952 122.260056 124.580882
7 2020-01-12 1.701755 122.259056 56.390071
8 2020-01-13 1.696290 122.265854 78.746951
9 2020-01-14 1.698001 122.263725 49.423529
[100 rows x 3 columns]
Screenshot of Graph:
​
The CSV I am using:
https://drive.google.com/file/d/1nKHNqh7fJJJVvb2Qy-DxAO7c7HwNpEI0/view?usp=sharing
Any help would be greatly appreciated!!!
I have worked on your code. First of all please reduce the batch size because the size of dataset is small and change the optimizer from "adam" to "rmsprop". Because adam uses constant learning rate, that's why you are receiving the constant values in the prediction. I have also increased the dropout to 0.4.
For calculating the growth rate, I have used the formula,
growth rate= (True Amount- Predicted Amount)/True Amount *100
this formula, gives you the percentage score of growth in the predicted amount and true amount.
For full code, please follow the GitHub link
https://github.com/rohitnarain24/Optimizing-LSTM-model/blob/master/optimized%20lstm.txt

make prediction in new dataset

I build a keras logistic regression model. I am trying to find a way that i could give my model new data-set and give me prediction in the new data set that i passed. my new data-set will be the same shape of my model
my second question is there a way to improve the accuracy of my model becouse my accrucy is 69% and when i print the classification repoert i got bad precion in one class
X=new.drop('reassed',axis=1)
y=new['reassed'].astype(int)
split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu', input_dim = 6))
# Adding the second hidden layer
classifier.add(Dense(units = 27, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])`enter code here`
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)
Epoch 1/20
16704/16704 [==============================] - 1s 76us/step - loss: 0.6159 - acc: 0.6959
Epoch 2/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6114 - acc: 0.6967
Epoch 3/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6110 - acc: 0.6964
Epoch 4/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6101 - acc: 0.6965
Epoch 5/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6091 - acc: 0.6961
Epoch 6/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6094 - acc: 0.6963
Epoch 7/20
16704/16704 [==============================] - 1s 68us/step - loss: 0.6086 - acc: 0.6967
Epoch 8/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6083 - acc: 0.6965
Epoch 9/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6081 - acc: 0.6964: 0s - loss: 0.6085 - acc:
Epoch 10/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6082 - acc: 0.6971
Epoch 11/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6077 - acc: 0.6968
Epoch 12/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6073 - acc: 0.6971
Epoch 13/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6067 - acc: 0.6971
Epoch 14/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6070 - acc: 0.6965
Epoch 15/20
16704/16704 [==============================] - 1s 65us/step - loss: 0.6066 - acc: 0.6967: 0s - loss: 0.6053 - ac
Epoch 16/20
16704/16704 [==============================] - 1s 66us/step - loss: 0.6060 - acc: 0.6967
Epoch 17/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6061 - acc: 0.6968
Epoch 18/20
16704/16704 [==============================] - 1s 67us/step - loss: 0.6062 - acc: 0.6971
Epoch 19/20
16704/16704 [==============================] - 1s 69us/step - loss: 0.6057 - acc: 0.6968
Epoch 20/20
16704/16704 [==============================] - 1s 74us/step - loss: 0.6055 - acc: 0.6973
y_pred = classifier.predict(X_test)
y_pred = [ 1 if y>=0.5 else 0 for y in y_pred ]
print(classification_report(y_test, y_pred))
precision recall f1-score support
0 0.71 1.00 0.83 2968
1 0.33 0.00 0.01 1208
micro avg 0.71 0.71 0.71 4176
macro avg 0.52 0.50 0.42 4176
weighted avg 0.60 0.71 0.59 4176
I expect to improve my model
I expect to find a way that i could make prediction in new data-set
To make prediction on the new data set
Load the data the same you load your test set
Apply all the per-processing steps applied on your training set.
Use the
model.predict(X)
function to make prediction and carry on with your post processing.
It's almost same as predicting with the test set.

Neural Network with California Housing Data

I tried to code a neural network which is trained on the California housing dataset, which I got from Aurelion Geron's GitHup.
But when I run the code, the net does not get trained and loss = nan.
Can someone explain what I did wrong?
Best regards, Robin
Link for the csv file: https://github.com/ageron/handson-ml/tree/master/datasets/housing
My Code:
import numpy
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
# load dataset
df = pd.read_csv("housing.csv", delimiter=",", header=0)
# split into input (X) and output (Y) variables
Y = df["median_house_value"].values
X = df.drop("median_house_value", axis=1)
# Inland / Not Inland -> True / False = 1 / 0
X["ocean_proximity"] = X["ocean_proximity"]== "INLAND"
X=X.values
X= X.astype(float)
Y= Y.astype(float)
model = Sequential()
model.add(Dense(100, activation="relu", input_dim=9))
model.add(Dense(1, activation="linear"))
# Compile model
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(X, Y, epochs=50, batch_size=1000, verbose=1)
I found the error, there was a missing value in the "total_bedrooms" column
You need to drop NaN values from you data.
After having a quick look at data, you also need to normalize your data (as everytime with Neural Nets, to help convergence).
To do this you can use Standard Scaler, Min-Max Scaler etc..
nan values in your DataFrame are causing this behavior. Drop rows with the nan values and normalize your data:
df = df[~df.isnull().any(axis=1)]
df.iloc[:,:-1]=((df.iloc[:,:-1]-df.iloc[:,:-1].min())/(df.iloc[:,:-1].max()-df.iloc[:,:-1].min()))
And you will get:
Epoch 1/50
1000/20433 [>.............................] - ETA: 3s - loss: 0.1732
20433/20433 [==============================] - 0s 11us/step - loss: 0.1001
Epoch 2/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0527
20433/20433 [==============================] - 0s 3us/step - loss: 0.0430
Epoch 3/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0388
20433/20433 [==============================] - 0s 2us/step - loss: 0.0338
Epoch 4/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0301
20433/20433 [==============================] - 0s 2us/step - loss: 0.0288
Epoch 5/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0300
20433/20433 [==============================] - 0s 2us/step - loss: 0.0259
Epoch 6/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0235
20433/20433 [==============================] - 0s 3us/step - loss: 0.0238
Epoch 7/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0242
20433/20433 [==============================] - 0s 2us/step - loss: 0.0225
Epoch 8/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0213
20433/20433 [==============================] - 0s 2us/step - loss: 0.0218
Epoch 9/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0228
20433/20433 [==============================] - 0s 2us/step - loss: 0.0214
Epoch 10/50
1000/20433 [>.............................] - ETA: 0s - loss: 0.0206
20433/20433 [==============================] - 0s 2us/step - loss: 0.0211

Categories