LSTM keras multiple features: what I did wrong? - python

The code below predicts Close value (stock prices) with 3 inputs: Close, Open and Volume. Dataset:
Close Open Volume
Date
2019-09-20 5489.0 5389.0 1578781
2019-09-23 5420.0 5460.0 622325
2019-09-24 5337.5 5424.0 688395
2019-09-25 5343.5 5326.5 628849
2019-09-26 5387.5 5345.0 619344
... ... ... ...
2020-03-30 4459.0 4355.0 1725236
2020-03-31 4715.0 4550.0 2433310
2020-04-01 4674.5 4596.0 1919728
2020-04-02 5050.0 4865.0 3860103
2020-04-03 5204.5 5050.0 3133078
[134 rows x 3 columns]
Info:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 134 entries, 2019-09-20 to 2020-04-03
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Close 134 non-null float64
1 Open 134 non-null float64
2 Volume 134 non-null int64
dtypes: float64(2), int64(1)
The question is how to correct script to get right prediction with 3 features last 10 days, because I get this:
Epoch 1/1
64/64 [==============================] - 6s 88ms/step - loss: 37135470.9219
[[32.588608]
[32.587284]
[32.586754]
[32.587196]
[32.58649 ]
[32.58663 ]
[32.586098]
[32.58682 ]
[32.586452]
[32.588108]]
rmse: 4625.457010985681
The problem remains even if I remove scaling (fit_transform) at all. In other topics I was told there is no need to scale y_train.
Full script code:
from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding
from keras.layers import LSTM
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf
start = (datetime.now() - timedelta(days=200)).strftime("%Y-%m-%d")
end = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
df = yf.download(tickers="LKOH.ME", start=start, end=end, interval="1d")
dataset = df.loc[start:end].filter(['Close', 'Open', 'Volume']).values
scaler = MinMaxScaler(feature_range=(0,1))
training_data_len = len(dataset) - 10 # last 10 days to test
train_data = dataset[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, :]) # get all 3 features
y_train.append(train_data[i, 0]) # 0 means we predict Close
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1]*x_train.shape[2])) # convert to 2d for fit_transform()
x_train = scaler.fit_transform(x_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
model = Sequential()
# Do I need to change it to input_shape=(x_train.shape[1], 3), because of 3 features?
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, batch_size=1, epochs=1)
test_data = dataset[training_data_len - 60:, :]
x_test = []
y_test = dataset[training_data_len:, 0]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, :])
x_test = np.array(x_test)
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1]*x_test.shape[2]))
x_test = scaler.fit_transform(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predictions = model.predict(x_test)
print(predictions)
print('rmse:', np.sqrt(np.mean(((predictions - y_test) ** 2))))

Even though the posted answer is technically correct and provides useful references, I found it a bit annoying that the results of fitting do not make much sense (you can notice that predictions are constant, even though the y_test isn't). Yes, scaling fixes the loss - with the values in the order of 1000 the L2 measure makes any gradient-based algorithm very unstable and Rishab's answer addresses that. Here is my code snippet. With the following changes in addition to the scaling:
Use more data. I randomly chose 10000 days, but if there is more, you probably will get better results. 200 points are not sufficient to get any convergence better than the straight line.
With more points use a larger batch as otherwise, it'll take a while to fit
With the larger batch, use more epochs (although in this case, more than 3 do not produce any better convergence)
Lastly, do not just look at the RMSE, plot your data. Small RMSE doesn't necessarily mean there is any meaningful fit.
With the snippet below I've got a somewhat good fit on the train data. And foreseeing the questions: yes, I totally know that I'm overfitting the data, but that is what the convergence should be doing at the very least here as fitting a straight line is much less meaningful for this of problem. This, at least, pretends to predict something.
from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding
from keras.layers import LSTM
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf
from matplotlib import pyplot as plt
start = (datetime.now() - timedelta(days=10000)).strftime("%Y-%m-%d")
end = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
df = yf.download(tickers="LKOH.ME", start=start, end=end, interval="1d")
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(df.loc[start:end].filter(['Close', 'Open', 'Volume']).values)
training_data_len = len(dataset) - 10 # last 10 days to test
train_data = dataset[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, :]) # get all 3 features
y_train.append(train_data[i, 0]) # 0 means we predict Close
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1]*x_train.shape[2])) # convert to 2d for fit_transform()
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
model = Sequential()
# Do I need to change it to input_shape=(x_train.shape[1], 3), because of 3 features?
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, batch_size=100, epochs=3)
test_data = dataset[training_data_len - 60:, :]
x_test = []
y_test = dataset[training_data_len:, 0]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, :])
x_test = np.array(x_test)
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1]*x_test.shape[2]))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predictions = model.predict(x_test)
print(predictions)
print('rmse:', np.sqrt(np.mean(((predictions - y_test) ** 2))))
Here is the output:
>>> print(predictions)
[[0.64643383]
[0.63276255]
[0.6288108 ]
[0.6320714 ]
[0.6572328 ]
[0.6998471 ]
[0.7333 ]
[0.7492812 ]
[0.7503019 ]
[0.75124526]]
>>> print('rmse:', np.sqrt(np.mean(((predictions - y_test) ** 2))))
rmse: 0.0712241892828221
To plot use, for training data fit
plt.plot(model.predict(x_train))
plt.plot(y_train)
plt.show()
and for test predictions
plt.plot(model.predict(x_test))
plt.plot(y_test)
plt.show()

As #rvinas has already mentioned, we need to scale the values and then use
inverse_transform to get the desired predicted outcome. You can find the reference here.
After making some small changes in the code and I was able to come up with satisfactory results. We can play around with the data scaling methodologies and model architectures to improve the results.
After some enhancements
from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import SGD
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf
start = (datetime.now() - timedelta(days=200)).strftime("%Y-%m-%d")
end = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
df = yf.download(tickers="LKOH.ME", start=start, end=end, interval="1d")
dataset = df.loc[start:end].filter(['Close', 'Open', 'Volume']).values
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(dataset)
training_data_len = len(dataset) - 10 # last 10 days to test
train_data = dataset[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, :]) # get all 3 features
y_train.append(train_data[i, 0]) # 0 means we predict Close
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1]*x_train.shape[2])) # convert to 2d for fit_transform()
x_train_scale = scaler.fit(x_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
model = Sequential()
# Do I need to change it to input_shape=(x_train.shape[1], 3), because of 3 features?
# yes, i did that.
model.add(LSTM(units=50,return_sequences=True, kernel_initializer='random_uniform', input_shape=(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True, kernel_initializer='random_uniform'))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True, kernel_initializer='random_uniform'))
model.add(Dropout(0.2))
model.add(LSTM(units=50, kernel_initializer='random_uniform'))
model.add(Dropout(0.2))
model.add(Dense(units=25, activation='relu'))
model.add(Dense(units=1))
# compile model
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
model.fit(x_train, y_train, batch_size=5, epochs=2)
test_data = dataset[training_data_len - 60:, :]
x_test = []
y_test = dataset[training_data_len:, 0]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, :])
x_test = np.array(x_test)
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1]*x_test.shape[2]))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predictions = model.predict(x_test)
# predictions = y_train_scale.inverse_transform(predictions)
print(predictions)
print('rmse:', np.sqrt(np.mean(((predictions - y_test) ** 2))))
Predictions 1:
start = (datetime.now() - timedelta(days=200)).strftime("%Y-%m-%d")
opt = SGD(lr=0.01, momentum=0.9, clipnorm=1.0, clipvalue=0.5)
model.compile(loss='mean_squared_error', optimizer=opt)
[[0.6151125 ]
[0.6151124 ]
[0.6151121 ]
[0.6151119 ]
[0.61511195]
[0.61511236]
[0.61511326]
[0.615114 ]
[0.61511385]
[0.6151132 ]]
rmse: 0.24450220836260966
Prediction 2:
start = (datetime.now() - timedelta(days=1000)).strftime("%Y-%m-%d")
model.compile(optimizer='adam', loss='mean_squared_error')
[[0.647125 ]
[0.6458076 ]
[0.6405072 ]
[0.63450944]
[0.6315386 ]
[0.6384401 ]
[0.65666 ]
[0.68073314]
[0.703547 ]
[0.72095114]]
rmse: 0.1236932687978488
Stock market prices are highly unpredictable and volatile. This means that there are no consistent patterns in the data that allow you to model stock prices over time near-perfectly. So this needs a lot of R&D to come up with a good strategy.
Things that you can do:
Add more training data to your model, so that it is able to generalize it better.
Make the model deeper. Play around with the model Hyperparameters to squeeze out the performance of the model. You can find a good reference here about hyperparameter tuning.
You can find more information about various other data preprocessing techniques and model architecture from reference links below:
Stock Market Predictions with LSTM in Python
Machine Learning to Predict Stock Prices

Related

Getting a probability distribution curve from a TensorFlow model

I am trying to learn working with TensorFlow and so I was trying to make a probablistic ML model to get the probability distribution of the next day stock price based on the last n days price sequence, and when doing so I managed to predict the next day's price but not getting a probability distribution of the model. How do I get the curve which the model predictions are based on from the TensorFlow model?
This is the code i have got so far that predicts the actual price for the next day (using this video: https://www.youtube.com/watch?v=PuZY9q-aKLw):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from datetime import timedelta
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfpl = tfp.layers
tfk = tf.keras
tfkl = tf.keras.layers
# preparing data
min_max_scaler = MinMaxScaler(feature_range=(0,1)) # a scaler object which normalizes the number between 0 and 1 in relation to the rest of the dataset
close_prices = price_data.loc[:,'Close'].values # all the close prices of the price_data df
scaled_close_prices = min_max_scaler.fit_transform(close_prices.reshape(-1,1)) # all the close prices, normalized between 1 and 0
n = 60 # the number of days that are used to determine the next value
# x contains the last n days before the predicted day which is y
X_train = [] # 70% of the dataset
y_train = []
X_test = [] # 30% of the dataset
y_test = []
for x in range(n, int(len(scaled_close_prices)*0.7)):
X_train.append(scaled_close_prices[x-n:x,0])
y_train.append(scaled_close_prices[x,0])
for x in range(int(len(scaled_close_prices)*0.7), len(scaled_close_prices)):
X_test.append(scaled_close_prices[x-n:x,0])
y_test.append(scaled_close_prices[x,0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_test, y_test = np.array(X_test), np.array(y_test)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# building the model
model = tfk.Sequential()
model.add(tfkl.LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(tfkl.Dropout(0.2))
model.add(tfkl.LSTM(units=50, return_sequences=True))
model.add(tfkl.Dropout(0.2))
model.add(tfkl.LSTM(units=50))
model.add(tfkl.Dropout(0.2))
model.add(tfkl.Dense(units=1))
# compiling model
model.compile(optimizer=tfk.optimizers.Adam(learning_rate=0.05),
loss='mean_squared_error',
metrics=[])
# fitting model
model.fit(X_train, y_train, epochs=25, batch_size=32)
# testing model
y_predicted = model.predict(X_test)
y_predicted = min_max_scaler.inverse_transform(y_predicted).reshape(-1)
Probabilistic modelling one of the things that tensorflow_probability (tfp) can do.
It can do probabilistic time series forecasting - e.g. see https://www.tensorflow.org/probability/examples/Structural_Time_Series_Modeling_Case_Studies_Atmospheric_CO2_and_Electricity_Demand
... but I don't believe it includes a variational LSTM in its toolbox, and they may be close to the current research frontier. e.g. see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7842932/
As written, this model is not especially probabilistic, per se, but it can be viewed as outputting an estimate of the mean of a gaussian distribution with unit variance; this is implicit in using a squared error loss. To make this more concrete, you could instantiate a tfd.Normal distribution with the output of your model as the loc parameter and scale=1., and call prob(...) to evaluate the pdf of this distribution. I'm not sure if this is what you're looking for, though.

LSTM/GRU TImeSeries multioutput strategy forecasts give dropped values

Currently, I'm playing with Stocks Predictions task which I try to solve using LSTM/GRU.
Problem: After training LSTM/GRU I get huge drop predicted values
Model training process
Train, test data is simply generated using pd.shift in series_to_supervised function.
df['Mid'] = df['Low'] + df['High'] / 2
n_lag = 1 # Lag columns back
n_seq = 1*50 # TimeSteps to predict
seq_col = 'Mid'
seq_col_t = f'{seq_col}(t)'
split_date = '2018-01-01'
def series_to_supervised(data: pd.DataFrame,
seq_col: str,
n_in: int = 1,
n_out: int = 1,
drop_seq_col: bool = True,
dropna: bool = True):
"""Convert time series into supervised learning problem
{input sequence, forecast sequence}
"""
# input sequence (t-n, ... t-1) -> pisitive shift
for i in range(n_in, 0, -1):
data[f'{seq_col}(t-{i})'] = data[seq_col].shift(i)
# no sequence -> no shift
data[f'{seq_col}(t)'] = data[seq_col]
for i in range(1, n_out+1):
# forecast sequence (t, t+1, ... t+n) -> negative shift
data[f'{seq_col}(t+{i})'] = data[seq_col].shift(-i)
if drop_seq_col:
data = data.drop(seq_col, axis=1)
if dropna:
data.dropna(inplace=True)
return data
df = series_to_supervised(df, seq_col=seq_col, n_in=n_lag, n_out=n_seq)
mask = df.index < split_date
train, test = df[mask], df[~mask]
X_cols = ['Mid(t-1)']
y_cols = train.filter(like='Mid(t+').columns
X_train, y_train, X_test, y_test = train[X_cols], train[y_cols], test[X_cols], test[y_cols]
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))
# also returns np.ndarray
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
y_train = y_train.values
y_test = y_test.values
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, GRU
from keras.optimizers import Adam, RMSprop, Adamax
from keras.callbacks import ModelCheckpoint
def get_model(X, y, n_batch):
num_classes=y.shape[1]
# design network
model = Sequential()
# For Stock Predictions has to be used LSTM stateful=True
model.add(GRU(10, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dropout(0.3))
model.add(Dense(num_classes))
opt = Adam(learning_rate=0.01)
# opt = RMSprop(learning_rate=0.001)
model.compile(loss='mean_squared_error', optimizer=opt)
return model
def reshape_batch(X_train, y_train, X_test, y_test, n_batch):
# reshape training into [samples, timesteps, features]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
# cut to equally divided n_batches (without reminder).
# needed for LSTM stateful=True
train_cut = X_train.shape[0] % n_batch
test_cut = X_test.shape[0] % n_batch
if train_cut > 0:
X_train = X_train[:-train_cut]
y_train = y_train[:-train_cut]
if test_cut > 0:
X_test = X_test[:-test_cut]
y_test = y_test[:-test_cut]
return X_train, y_train, X_test, y_test
# fit an LSTM network to training data
def fit_lstm(X_train: np.ndarray,
y_train: np.ndarray,
n_lag: int,
n_seq: int,
n_batch: int,
nb_epoch: int,
X_test: np.ndarray=None,
y_test: np.ndarray=None):
model = get_model(X_train, y_train, n_batch)
# fit network
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), callbacks=None,
epochs=nb_epoch, batch_size=n_batch, verbose=1, shuffle=False)
print('Predict:', model.predict(X_test, batch_size=n_batch))
model.reset_states()
return model, history
n_batch = 32
nb_epoch = 40
X_train, y_train, X_test, y_test = reshape_batch(X_train, y_train, X_test, y_test, n_batch)
model, history = fit_lstm(X_train, y_train, n_lag, n_seq, n_batch, nb_epoch, X_test=X_test, y_test=y_test)
What I Have tried
Different optimizers (kinda all available in keras)
DIfferent recurrent network structures (GRU/LSTM)
Different learning rates
Different epochs from 1 to 1500
Adding/Removing Drop layers with different params (0.1-0.7)
Different LSTM/GRU amount of neurons (1-100)
Number of LSTM/GRU layers, via return_sequences params with more Drop layers.
Different number of forecasts(t+1,t+2 ... t+n) features from 1-365
Different number of lag (t-1, t-2, t-n ...) features from 1-5
Different scale normalization borders (0,1) and (-1,1)
Different n_batch values: 1,8,16,32
What can affect LSTM/GRU give so strange behaviour? And What else should I try to make it work the normal way?

How to predict multiple features using keras with time series?

I have a problem I don't know how to fix transform to add new features in order to make more proper forecast. The code below predicts stock prices by Close value. Data:
Open High Low Close Adj Close Volume
Datetime
2020-03-10 09:30:00+03:00 5033.0 5033.0 4690.0 4840.0 4840.0 702508
2020-03-10 10:30:00+03:00 4840.0 4870.0 4700.0 4746.5 4746.5 1300648
2020-03-10 11:30:00+03:00 4746.5 4783.0 4706.0 4745.5 4745.5 1156482
2020-03-10 12:30:00+03:00 4745.5 4884.0 4730.0 4870.0 4870.0 1213268
2020-03-10 13:30:00+03:00 4874.0 4990.5 4867.5 4886.5 4886.5 1958028
... ... ... ... ... ... ...
2020-04-03 14:30:00+03:00 5177.0 5217.0 5164.0 5211.5 5211.5 385696
2020-04-03 15:30:00+03:00 5212.0 5364.0 5191.0 5269.5 5269.5 1091066
2020-04-03 16:30:00+03:00 5270.0 5297.0 5209.0 5220.5 5220.5 518686
2020-04-03 17:30:00+03:00 5222.0 5271.0 5184.0 5220.5 5220.5 665096
2020-04-03 18:30:00+03:00 5217.5 5223.5 5197.0 5204.5 5204.5 261400
I want to add Volume and Open features, but getting error:
predictions = scaler.inverse_transform(predictions)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 436, in inverse_transform
X -= self.min_
ValueError: non-broadcastable output operand with shape (40,1) doesn't match the broadcast shape (40,3)
Q1: How to change inverse_transform and what else do I need to change (input_shape argument maybe) to get correct results?
Q2: The result will be prediction of Close value. But how do I predict Volume value also? I guess I need to set model.add(Dense(2)), but can I do 2 predictions correctly in one code, or I need to execute script separately? How to do that? How do I get Volume than Open when model.add(Dense(2))?
Full code:
from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding
from keras.layers import LSTM
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf
start = (datetime.now() - timedelta(days=30))
end = (datetime.now() - timedelta(days=0))
df = yf.download(tickers="LKOH.ME", start=start.strftime("%Y-%m-%d"), end=end.strftime("%Y-%m-%d"), interval="60m")
df = df.loc[start.strftime("%Y-%m-%d"):end.strftime("%Y-%m-%d")]
# I need here add another features
# df.filter(['Close', 'Open', 'Volume']) <-- this will make further an error with shapes
data = df.filter(['Close'])
dataset = data.values
#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
model = Sequential()
# should i change to input_shape=(x_train.shape[1], 3) ?
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, batch_size=1, epochs=1)
test_data = scaled_data[training_data_len - 60: , :]
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions) # error here
The problem is that you are fitting MinMaxScaler on dataset, then splitting dataset into x_train and y_train and then later on trying to use the inverse_transform method on the predictions, which have the same shape as y_train. I suggest you create x_train and y_train and fit MinMaxScaler only to x_train. y_train doesn't need to be scaled for the model and that will save you needing to inverse_transform the predictions completely.
So instead of
#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
Use
#Get the number of rows to train the model on, 40 rows for test
training_data_len = len(dataset) - 40
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
x_train, y_train = np.array(x_train), np.array(y_train)
scaler = MinMaxScaler(feature_range=(0,1))
x_train = scaler.fit_transform(x_train) # Only scaling x_train
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
and just delete the line predictions = scaler.inverse_transform(predictions).
Updates relating to additional questions in the comments
The definition of y_test is inconsistent with y_train. Specifically, y_test is defined as y_test = dataset[training_data_len:, :] which is using all of the columns of dataset. Instead, to be consistent with y_train, it should be dataset[training_data_len:, 0].
Handling splitting the data is often clearer and less error-prone if done in pandas:
# Starting with the dataframe 'data'
data = df.filter(['Close', 'Open', 'Volume'])
# Create x/y test/train directly from 'data'
training_data_len = len(data) - 40
x_train = data[['Open', 'Volume']][:training_data_len]
y_train = data.Close[:training_data_len]
x_test = data[['Open', 'Volume']][training_data_len:]
y_test = data.Close[training_data_len:]
# Then confirm you have the expected subsets by checking things like
# shape (and info(), describe(), etc.)
x_train.shape, x_test.shape
> ((160, 2), (40, 2))
y_train.shape, y_test.shape
> ((160,), (40,))

Spiral problem, why does my loss increase in this neural network using Keras?

I'm trying to solve the spiral problem using Keras with 3 spirals instead of 2 using a similar strategy that I used for 2. Problem is my loss is now growing exponentially instead of decreasing with the same parameters I used for 2 spirals (The neural network structure has 3 outputs instead of being binary). I'm not quite sure what could be happening with this issue if anyone could help? I have tried this with various epochs, learning rates, batch sizes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.optimizers import RMSprop
from Question1.utils import create_neural_network, create_test_data
EPOCHS = 250
BATCH_SIZE = 20
def main():
df = three_spirals(1000)
# Set-up data
x_train = df[['x-coord', 'y-coord']].values
y_train = df['class'].values
# Don't need y_test, can inspect visually if it worked or not
x_test = create_test_data()
# Scale data
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
relu_model = create_neural_network(layers=3,
neurons=[40, 40, 40],
activation='relu',
optimizer=RMSprop(learning_rate=0.001),
loss='categorical_crossentropy',
outputs=3)
# Train networks
relu_model.fit(x=x_train, y=y_train, epochs=EPOCHS, verbose=1, batch_size=BATCH_SIZE)
# Predictions on test data
relu_predictions = relu_model.predict_classes(x_test)
models = [relu_model]
test_predictions = [relu_predictions]
# Plot
plot_data(models, test_predictions)
And here is the create_neural_network function:
def create_neural_network(layers, neurons, activation, optimizer, loss, outputs=1):
if layers != len(neurons):
raise ValueError("Number of layers doesn't much the amount of neuron layers.")
model = Sequential()
for i in range(layers):
model.add(Dense(neurons[i], activation=activation))
# Output
if outputs == 1:
model.add(Dense(outputs))
else:
model.add(Dense(outputs, activation='softmax'))
model.compile(optimizer=optimizer,
loss=loss)
return model
I have worked it out, for the output data it isn't like a binary classification where you only need one column. For multi classification you need a column for each class you want to classify...so where I had y could be 0, 1, 2 was incorrect. The correct way to do this was to have y0, y1, y2 which would be 1 if it fit that specific class and 0 if it didn't.

Gender Voice with Python coding

I am working on a sample data set from a link below.
https://www.kaggle.com/enirtium/gender-voice/data
I am trying to open .csv file(maybe I am opening it wrongly) and trying to create fully connected neural layers. Then, I am trying to train them but unfortunately, I am getting input shape not fitting problem.
"ValueError: Error when checking input: expected dense_1_input to have shape (None, 2800) but got array with shape (3168, 1)"
My codes like these:
import csv
import numpy
import string
from keras.models import Sequential
from sklearn.model_selection import train_test_split
import numpy as np
from keras import models
from keras import layers
path = r'/Users/username/Desktop/voice.csv'
meanfreq = []
sd = []
median = []
label = []
with open(path, 'r') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
#print(line['meanfreq'])
meanfreq.append(line[0])
sd.append(line[1])
median.append(line[2])
if line[20] == "female":
label.append(1)
else:
label.append(0)
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(2800,)))
network.add(layers.Dense(1, activation='sigmoid'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
network.fit(meanfreq, label, epochs=5, batch_size=128)
scores = network.evaluate(meanfreq, label)
print("\n%s: %.2f%%" % (network.metrics_names[1], scores[1]*100))
I suppose that maybe, I can't open .csv file (it is opening "list" primitive) or there are any other problems. I am unfortunately fresh man at neural networks and python. I will open this csv file and will use its %70 data to train, %30 data for testing.
Yes,
It works as these;
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
X, y_, train_size=0.80, random_state=42)
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
network.add(layers.Dense(2, activation='sigmoid'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
network.fit(X_train, y_train, epochs=100, batch_size=128)
network.evaluate(X_test, y_test)
Reading in the data seems to be fine.
I Imagine you have a data set that looks like:
mean_freq, label
.12 0
.45 1
And you want to train a classifier. Currently the model is expecting
a training example to have 2800 features. input shape=(2800,) but you only want 1 feature: the mean_freq
The mistake here is that you are trying to tell Keras how much training examples to use while declaring the model. You don't do that here, you'll do that later when you're fitting the model.
So the input_shape to keras's Dense Layer should be (1, ) for the single feature. If you're going to use mean and median freq then you would want two features (2, ) and so on.
# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(1,)))
And you can split your training and test sets in multiple ways. My suggestion is to do something like this:
train_size = 2800
X_train = mean_freq[:train_size]
y_train = label[:train_size]
X_test = mean_freq[train_size:]
y_test = label[:train_size]
Then fit the model with the training set and score with the test set.
network.fit(X_train, y_train, epochs=5, batch_size=128)
scores = network.evaluate(X_test, y_test)
Edit to reflect comments:
well if the case is that you training data has 20 features then
you tell keras that with:
# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
You have do the work necessary to get the data in to the shape you need for training and testing but the template above is how you would fit and evaluate the model.
I would also note that there are better ways read in csv data if you're going to do modeling (as you are). Look at using a pandas dataframe.
Also better (more standard ways) of creating train and test split: look into sklearn's train_test_split
Edit 2: A quick model of the voice data
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.model import Model
from keras.layers import Dense, Input
# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
X, y_, train_size=0.80, random_state=42)
# model using keras functional style
inp = Input(shape =(20, ))
dense = Dense(128)(inp)
out = Dense(2, activation='sigmoid')(dense)
model = Model(inputs=[inp], outputs=[out])
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=128)
model.evaluate(X_test, y_test)

Categories