How to predict next X values in my model? - python

I have this program that is used to predict outflow of a reservoir and I cannot seem to get it to predict the next day worth of data. The data is in 30 minute increments and would expect 48 points of data to represent the next 24 hours. Is there something I am doing incorrect or have missed?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
# Prepare data
data = pd.read_csv('data.csv')
data["Date"] = pd.to_datetime(data["Date"])
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data["Outlet"].values.reshape(-1, 1))
prediction_days = 30
x_train, y_train = [], []
for x in range(prediction_days, len(scaled_data)):
x_train.append(scaled_data[x-prediction_days:x, 0])
y_train.append(scaled_data[x, 0])
x_train, y_train = np.array(x_train), np.array(x_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# Create neural network
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, epochs=25)
# Test the model
test_start = dt.datetime(2022, 1, 1)
test_end = dt.datetime.now()
actual_outlet = data[(data['Date'] >= test_start) & (data['Date'] <= test_end)]['Outlet']
total_dataset = pd.concat((data['Outlet'], actual_outlet), axis=0)
model_inputs = total_dataset[len(total_dataset) - len(actual_outlet) - prediction_days:].values
model_inputs = model_inputs.reshape(-1, 1)
model_inputs = scaler.transform(model_inputs)
x_test = []
for x in range(prediction_days, len(model_inputs)):
x_test.append(model_inputs[x-prediction_days:x, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
prediction_outlet = model.predict(x_test)
prediction_outlet = scaler.inverse_transform(prediction_outlet)
print(len(prediction_outlet))
The length of my prediction_outlet comes to 1056. Have I missed some steps?

Your test data has 22 days of data since you are taking the slice from test_start to today. since you have 48 data points for each day it adds up to 1056 data points to predict. So you are making the right length of predictions. On the other hand, since you don't have it ordered some of the dates are mixed up. In one instance you have half of the data points from Feb-02 and the rest from Jan-13 which will be very confusing for the network. You might want to order them and make sure you don't take missing dates in your slices.

Related

Is there a machine learning model in Python for prediction of the next set of "random" numbers in a sequence?

I have a dataset with three columns, each has a number 0-9. There are ~6700 rows in the set. I am wondering if it is possible to create a machine learning model to predict the next set of three numbers (one in each column)? If this is possible, how difficult is it and what type of a model would you recommend?
I am newer to machine learning so any help would be great!
Thank you!
I have tried using different kinds of LSTMs using tensorflow, but the results appeared to be averages rather than predictions.
Here is one of the attempts I made:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Numbers1.xlsx', sheet_name='midday training')
dataset_train = df.iloc[:, 1:]
dataset_train = dataset_train.iloc[::-1]
time_series_data = np.array(df)
training_set = time_series_data[:, 1:4]
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0, 1))
training_set_scaled = sc.fit_transform(training_set)
x_train = []
y_train = []
for i in range(1, 6003):
x_train.append(training_set_scaled[i-1:i, :3])
y_train.append(training_set_scaled[i, :3])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (6002,1,3))
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
regressor = Sequential()
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (x_train.shape[1], 3)))
regressor.add(Dropout(.2))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(.2))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(.2))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(.2))
regressor.add(Dense(units = 3))
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(x_train, y_train, epochs = 20, batch_size = 32)
dataset_test = pd.read_excel('Numbers1.xlsx', sheet_name='midday')
dataset_test = dataset_test.iloc[:, 1:]
dataset_test = dataset_test.iloc[::-1]
real = dataset_test
dataset_total = pd.concat((dataset_train, dataset_test), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 1:].values
inputs = sc.transform(inputs)
x_test = []
for i in range(1, 6734):
x_test.append(inputs[i-1:i, :3])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (6733,1,3))
predicted = regressor.predict(x_test)
predicted = sc.inverse_transform(predicted)

Using Quandl with Keras and Pandas

I am finished following a tutorial on how to make an RNN LSTM algorithm for stock prediction. I want to repurpose it to be able to use Quandl. I am really unfamiliar with python and this is my first project with it. I went straight into the deep end. I tried a few methods to get it to work but my inexperience with Pandas is the main issue. I feel like there could be some better way to do this. My main proficiency is with Java. Most of this is just filler so I can post this question.
#Imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = 'https://raw.githubusercontent.com/mwitiderrick/stockprice/master/NSE-TATAGLOBAL.csv'
dataset_train = pd.read_csv(url)
training_set = dataset_train.iloc[:, 1:2].values
dataset_train.head()
#Data Normalization
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
#Incorporating Timesteps Into Data
X_train = []
y_train = []
for i in range(60, 2035):
X_train.append(training_set_scaled[i-60:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
#Creating the LSTM Model
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Dense
model = Sequential()
model.add(LSTM(units=50,return_sequences=True,input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(X_train,y_train,epochs=100,batch_size=32)
#Making Predictions on the Test Set
url = 'https://raw.githubusercontent.com/mwitiderrick/stockprice/master/tatatest.csv'
dataset_test = pd.read_csv(url)
real_stock_price = dataset_test.iloc[:, 1:2].values
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 76):
X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = model.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
#Plotting the Results
plt.plot(real_stock_price, color = 'black', label = 'TATA Stock Price')
plt.plot(predicted_stock_price, color = 'green', label = 'Predicted TATA Stock Price')
plt.title('TATA Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('TATA Stock Price')
plt.legend()
plt.show()

LSTM Keras prediction Price use more attribute

I'm a beginner to this AI things, I am working on a stock prediction project, I've made a multilayer LSTM model that uses Close Price to prediction the next day Close Price, my question is how to use more attribute to prediction Close Price, like: use High, Low, Open, Close, Volume to prediction at the same time
MY CODE:
import math
import pandas_datareader as web
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
import datetime
plt.style.use('fivethirtyeight')
apple = web.DataReader('AAPL', 'yahoo', start='2012-01-01', end='2021-04-26')
apple
plt.figure(figsize=(16, 8))
plt.title('Close Price History')
plt.plot(apple.Close)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price', fontsize=18)
plt.show()
# Create a new dataframe with only Close
data = apple.filter(['Close'])
dataset = data.values
training_data_len = math.ceil(len(dataset) * 0.8)
training_data_len
scaler = MinMaxScaler(feature_range=(0, 1))
scaler_data = scaler.fit_transform(dataset)
train_data = scaler_data[0:training_data_len, :]
train_data
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
if i <= 60:
print(x_train)
print(y_train)
print()
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
from tensorflow import keras
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(10))
model.add(Dense(1))
keras.utils.plot_model(model, show_shapes=True)
model.compile(optimizer='adam',
loss='mean_squared_error')
model.fit(x_train, y_train, epochs=15, batch_size=1)
due to my limited ability, Any help or explanation is welcome! Thank you.

I am trying to use CNN for stock price prediction but my code does not seem to work, what do I need to change or add?

import math
import numpy as np
import pandas as pd
import pandas_datareader as pdd
from sklearn.preprocessing import MinMaxScaler
from keras.layers import Dense, Dropout, Activation, LSTM, Convolution1D, MaxPooling1D, Flatten
from keras.models import Sequential
import matplotlib.pyplot as plt
df = pdd.DataReader('AAPL', data_source='yahoo', start='2012-01-01', end='2020-12-31')
data = df.filter(['Close'])
dataset = data.values
len(dataset)
# 2265
training_data_size = math.ceil(len(dataset)*0.7)
training_data_size
# 1586
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
# array([[0.04288701],
# [0.03870297],
# [0.03786614],
# ...,
# [0.96610873],
# [0.98608785],
# [1. ]])
train_data = scaled_data[0:training_data_size,:]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i,0])
if i<=60:
print(x_train)
print(y_train)
'''
[array([0.04288701, 0.03870297, 0.03786614, 0.0319038 , 0.0329498 ,
0.03577404, 0.03504182, 0.03608791, 0.03640171, 0.03493728,
0.03661088, 0.03566949, 0.03650625, 0.03368202, 0.03368202,
0.03598329, 0.04100416, 0.03953973, 0.04110879, 0.04320089,
0.04089962, 0.03985353, 0.04037657, 0.03566949, 0.03640171,
0.03619246, 0.03253139, 0.0294979 , 0.03033474, 0.02960253,
0.03002095, 0.03284518, 0.03357739, 0.03410044, 0.03368202,
0.03472803, 0.02803347, 0.02792885, 0.03556487, 0.03451886,
0.0319038 , 0.03127613, 0.03274063, 0.02688284, 0.02635988,
0.03211297, 0.03096233, 0.03472803, 0.03713392, 0.03451886,
0.03441423, 0.03493728, 0.03587866, 0.0332636 , 0.03117158,
0.02803347, 0.02897494, 0.03546024, 0.03786614, 0.0401674 ])]
[0.03933056376752886]
'''
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_train.shape
# (1526, 60, 1)
model = Sequential()
model.add(Convolution1D(64, 3, input_shape= (100,4), padding='same'))
model.add(MaxPooling1D(pool_size=2))
model.add(Convolution1D(32, 3, padding='same'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('linear'))
model.summary()
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=50, epochs=50, validation_data = (X_test, y_test), verbose=2)
test_data = scaled_data[training_data_size-60: , :]
x_test = []
y_test = dataset[training_data_size: , :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
rsme = np.sqrt(np.mean((predictions - y_test)**2))
rsme
train = data[:training_data_size]
valid = data[training_data_size:]
valid['predictions'] = predictions
plt.figure(figsize=(16,8))
plt.title('PFE')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price in $', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'predictions']])
plt.legend(['Train', 'Val', 'predictions'], loc='lower right')
plt.show
import numpy as np
y_test, predictions = np.array(y_test), np.array(predictions)
mape = (np.mean(np.abs((predictions - y_test) / y_test))) * 100
accuracy = 100 - mape
print(accuracy)
This above is my code. I tried to edit it but does not seem to be working. I am suspecting that I did not format my dataset well but I am new to this field so I do not know what should I do to my codes such that it will fit in. I hope you guys can enlighten me on this, Thank you!
I encountered errors like : ''IndexError: index 2264 is out of bounds for axis 0 with size 2264'' and
'' ValueError: Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 800 but received input with shape [None, 480]''
Your model doesn't tie to your data.
Change this line:
model.add(Convolution1D(64, 3, input_shape= (60,1), padding='same'))

python RNN LSTM error

This is a recurrent neural network LSTM model meant to predict the future values of forex market movement.
The data set shape is (1713, 50), the first column is the Date time index and the others are numeric values.
but right after printing the Training data and Validation data shapes the error start.
When I tried to implement this code:
from sklearn.preprocessing import MinMaxScaler
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv(r"E:\Business\Stocks\StocksDF.csv", parse_dates=[0], index_col=[0], low_memory=False, dtype='object')
features = len(df.columns)
val_ratio = 0.2
epochs = 500
batch_size = df.__len__()
sequence_length = 822
data = df.as_matrix()
data_processed = []
for index in range(len(data) - sequence_length):
data_processed.append(data[index: index + sequence_length])
data_processed = np.array(data_processed)
val_split = round((1 - val_ratio) * data_processed.shape[0])
train = data_processed[:, int(val_split), :]
val = data_processed[int(val_split):, :]
print('Training data: {}'.format(train.shape))
print('Validation data: {}'.format(val.shape))
train_samples, train_nx, train_ny = train.shape
val_samples, val_nx, val_ny = val.shape
train = train.reshape((train_samples, train_nx * train_ny))
val = val.reshape((val_samples, val_nx * val_ny))
preprocessor = MinMaxScaler().fit(train)
train = preprocessor.transform(train)
val = preprocessor.transform(val)
train = train.reshape((train_samples, train_nx, train_ny))
val = val.reshape((val_samples, val_nx, val_ny))
X_train = train[:, : -1]
y_train = train[:, -1][:, -1]
X_val = val[:, : -1]
y_val = val[:, -1][:, -1]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], features))
X_val = np.reshape(X_val, (X_val.shape[0], X_val.shape[1], features))
model = Sequential()
model.add(LSTM(input_shape=(X_train.shape[1:]), units=100, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(2, return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(units=1))
model.add(Activation("relu"))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae', 'mse', 'accuracy'])
history = model.fit(
X_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=2)
preds_val = model.predict(X_val)
diff = []
for i in range(len(y_val)):
pred = preds_val[i][0]
diff.append(y_val[i] - pred)
real_min = preprocessor.data_min_[104]
real_max = preprocessor.data_max_[104]
print(preprocessor.data_min_[:1])
print(preprocessor.data_max_[:1])
preds_real = preds_val * (real_max - real_min) + real_min
y_val_real = y_val * (real_max - real_min) + real_min
plt.plot(preds_real, label='Predictions')
plt.plot(y_val_real, label='Actual values')
plt.xlabel('test')
plt.legend(loc=0)
plt.show()
print(model.summary())
I got this error:
Using TensorFlow backend.
Traceback (most recent call last):
Training data: (891, 50)
File "E:/Tutorial/new.py", line 31, in
Validation data: (178, 822, 50)
train_samples, train_nx, train_ny = train.shape
ValueError: not enough values to unpack (expected 3, got 2)
There's an error in this line:
train = data_processed[:, int(val_split), :]
It should be:
train = data_processed[:int(val_split), :, :]
val = data_processed[int(val_split):, :, :]

Categories