Predicting 100 values of function f(x) using LSTM

Predicting 100 values of function f(x) using LSTM - python

I am trying to predict 100 values of the function f(x) in the future. I found a script online where it was predicting the price of a stock into the future and decided to modify it for the purpose of my research. Unfortunately, the prediction is quite a bit off. Here is the script:
#import packages
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
import tensorflow as tf
#for normalizing data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
#read the file
df = pd.read_csv('MyDataSet')
#print the head
df.head()
look_back = 60
num_periods = 20
NUM_NEURONS_FirstLayer = 128
NUM_NEURONS_SecondLayer = 64
EPOCHS = 1
num_prediction = 100
training_size = 8000 #initial
#creating dataframe
data = df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['XValue', 'YValue'])
for i in range(0,len(data)):
new_data['XValue'][i] = data['XValue'][i]
new_data['YValue'][i] = data['YValue'][i]
#setting index
new_data.index = new_data.XValue
new_data.drop('XValue', axis=1, inplace=True)
#creating train and test sets
dataset = new_data.values
train = dataset[0:training_size,:]
valid = dataset[training_size:,:]
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
x_train, y_train = [], []
for i in range(look_back,len(train)):
x_train.append(scaled_data[i-look_back:i,0])
y_train.append(scaled_data[i,0])
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))
#Custom Loss Function
def my_loss_fn(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
return tf.reduce_mean(squared_difference, axis=-1)
#Training Phase
model = Sequential()
model.add(LSTM(NUM_NEURONS_FirstLayer,input_shape=(look_back,1),
return_sequences=True))
model.add(LSTM(NUM_NEURONS_SecondLayer,input_shape=(NUM_NEURONS_FirstLayer,1)))
model.add(Dense(1))
model.compile(loss=my_loss_fn, optimizer='adam')
model.fit(x_train,y_train,epochs=EPOCHS,batch_size=2, verbose=2)
inputs = dataset[(len(dataset) - len(valid) - look_back):]
inputs = inputs.reshape(-1,1)
inputs = scaler.transform(inputs)
X_test = []
for i in range(look_back,inputs.shape[0]):
X_test.append(inputs[i-look_back:i,0])
X_test = np.array(X_test)
#Validation Phase
X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))
PredictValue = model.predict(X_test)
PredictValue = scaler.inverse_transform(PredictValue)
#Measures the RMSD between the validation data and the predicted data
rms=np.sqrt(np.mean(np.power((valid-PredictValue ),2)))
train = new_data[:int(new_data.index[training_size])]
valid = new_data[int(new_data.index[training_size]):]
valid['Predictions'] = PredictValue
valid = np.asarray(valid).astype('float32')
valid = valid.reshape(-1,1)
#Prediction Phase
def predict(num_prediction, model):
prediction_list = valid[-look_back:]
for _ in range(num_prediction):
x = prediction_list[-look_back:]
x = x.reshape((1, look_back, 1))
out = model.predict(x)[0][0]
prediction_list = np.append(prediction_list, out)
prediction_list = prediction_list[look_back-1:]
return prediction_list
#Get the predicted x-value
def predict_x(num_prediction):
last_x = df['XValue'].values[-1]
XValue = range(int(last_x),int(last_x)+num_prediction+1)
prediction_x = list(XValue)
return prediction_x
#Predict the y-value and the x-value
forecast = predict(num_prediction, model)
forecast_x = predict_x(num_prediction)
The file I was using was just for the function f(x) = sin(2πx/10), where x is all natural numbers from 0 to 10,000. Following the above, I get the following plot:
Prediction-1
Granted I am using only epochs of 1 and I am busy running a new job with a value of 50, but I was hoping for something a lot better. Any advice to improve the prediction here?
I also decided to modify the script and only predict one value at a time, add it to the dataset and then re-run the script. 100 times. It takes long and it obviously has its own issues (such as using the predicted value of the previous x (step)), but it's the best solution I can think of now. This is the plot I got for it. I can attach this file if you want, but it's not the one I want to continue focusing on with this project:
Prediction-2
Any sort of help would be appreciated in getting a better prediction.
Much appreciated

Related

Predict data outside existing data

Iv'e made a basic prediction to learn this complicated area, the prediction works but only as long as i already have data.
The result i want is to predict data further than my existing data to predict what hasn't been yet.
So like i said i'm new and if someone can show me how or tell me what i dont understand good enough in the code to do this i would be grateful.
heres the code:
from cProfile import label
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 20,10
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Dense
from sklearn.preprocessing import MinMaxScaler
df = pd.read_csv('./Data/market-price-3y.csv')
# Use only one column
df = df[['Date', 'Close']]
# Change type of date from object to datetime
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
# Set Date as index
df.index = df['Date']
print(df.head())
# Show the data as a graph
# plt.plot(df['Close'], label='Close Price History', color='red')
# plt.show()
df = df.sort_index(ascending=True, axis=0)
data = pd.DataFrame(index=range(0, len(df)), columns=['Date', 'Close'])
for i in range(0, len(data)):
data['Date'][i] = df['Date'][i]
data['Close'][i] = df['Close'][i]
scaler = MinMaxScaler(feature_range=(0,1))
data.index = data.Date
data.drop('Date', axis=1, inplace=True)
# Split data into training and testing datasets
final_data = data.values
train_data = final_data[0:900,:]
valid_data = final_data[900:,:]
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(final_data)
x_train_data, y_train_data = [], []
for i in range(60, len(train_data)):
x_train_data.append(scaled_data[i-60:i,0])
y_train_data.append(scaled_data[i,0])
x_train_data = np.asarray(x_train_data)
y_train_data = np.asarray(y_train_data)
x_train_data = np.reshape(x_train_data, (x_train_data.shape[0], x_train_data.shape[1],1))
# Create the LSTM Model
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(np.shape(x_train_data)[1], 1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
model_data = data[len(data) - len(valid_data)-60:].values
model_data = model_data.reshape(-1,1)
model_data = scaler.transform(model_data)
# Train and test the data
lstm_model.compile(loss='mean_squared_error', optimizer='adam')
lstm_model.fit(x_train_data, y_train_data, epochs=1, batch_size=1, verbose=2)
# Test Data
X_test = []
for i in range(60, model_data.shape[0]):
X_test.append(model_data[i-60:i,0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
# Result
predicted_price = lstm_model.predict(X_test)
predicted_price = scaler.inverse_transform(predicted_price)
train_data = data[:900]
valid_data = data[900:]
valid_data['Predictions'] = predicted_price
plt.plot(train_data['Close'])
plt.plot(valid_data[['Close', 'Predictions']])
plt.show()
[![Matplotlib output][1]][1] [1]: https://i.stack.imgur.com/7j0Yf.png
So i want the prediction to keep going for lets say 15more points instead of ending when the test data ends.

here you can increase the data length at this code phase:
# Test Data
X_test = []
for i in range(60, model_data.shape[0]):
X_test.append(model_data[i-60:i,0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
here instead of 60, you can give 60+15.
or you can create a new test dataset like similar to the above coding it depends on your data set. :)

Forecast future values with LSTM in Python

This code predicts the values of a specified stock up to the current date but not a date beyond the training dataset. This code is from an earlier question I had asked and so my understanding of it is rather low. I assume the solution would be a simple variable change to add the extra time but I am unaware as to which value needs to be manipulated.
import pandas as pd
import numpy as np
import yfinance as yf
import os
import matplotlib.pyplot as plt
from IPython.display import display
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
pd.options.mode.chained_assignment = None
# download the data
df = yf.download(tickers=['AAPL'], period='2y')
# split the data
train_data = df[['Close']].iloc[: - 200, :]
valid_data = df[['Close']].iloc[- 200:, :]
# scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(train_data)
train_data = scaler.transform(train_data)
valid_data = scaler.transform(valid_data)
# extract the training sequences
x_train, y_train = [], []
for i in range(60, train_data.shape[0]):
x_train.append(train_data[i - 60: i, 0])
y_train.append(train_data[i, 0])
x_train = np.array(x_train)
y_train = np.array(y_train)
# extract the validation sequences
x_valid = []
for i in range(60, valid_data.shape[0]):
x_valid.append(valid_data[i - 60: i, 0])
x_valid = np.array(x_valid)
# reshape the sequences
x_train = x_train.reshape(x_train.shape[0],
x_train.shape[1], 1)
x_valid = x_valid.reshape(x_valid.shape[0],
x_valid.shape[1], 1)
# train the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True,
input_shape=x_train.shape[1:]))
model.add(LSTM(units=50))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_train, y_train, epochs=50, batch_size=128, verbose=1)
# generate the model predictions
y_pred = model.predict(x_valid)
y_pred = scaler.inverse_transform(y_pred)
y_pred = y_pred.flatten()
# plot the model predictions
df.rename(columns={'Close': 'Actual'}, inplace=True)
df['Predicted'] = np.nan
df['Predicted'].iloc[- y_pred.shape[0]:] = y_pred
df[['Actual', 'Predicted']].plot(title='AAPL')
display(df)
plt.show()

You could train your model to predict a future sequence (e.g. the next 30 days) instead of predicting the next value (the next day) as it is currently the case.
In order to do that, you need to define the outputs as y[t: t + H] (instead of y[t] as in the current code) where y is the time series and H is the length of the forecast period (i.e. the number of days ahead that you want to forecast). You also need to set the number of outputs of the last layer equal to H (instead of equal to 1 as in the current code).
You can still define the inputs as y[t - T: t] where T is the length of the lookback period (or number of timesteps), and therefore the model's input shape is still (T, 1). The lookback period T is usually longer than the forecast period H (i.e. T > H) and it's often set equal to a multiple of H (i.e. T = m * H where m > 1 is an integer.).
import numpy as np
import pandas as pd
import yfinance as yf
import tensorflow as tf
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
pd.options.mode.chained_assignment = None
tf.random.set_seed(0)
# download the data
df = yf.download(tickers=['AAPL'], period='1y')
y = df['Close'].fillna(method='ffill')
y = y.values.reshape(-1, 1)
# scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = scaler.fit(y)
y = scaler.transform(y)
# generate the input and output sequences
n_lookback = 60 # length of input sequences (lookback period)
n_forecast = 30 # length of output sequences (forecast period)
X = []
Y = []
for i in range(n_lookback, len(y) - n_forecast + 1):
X.append(y[i - n_lookback: i])
Y.append(y[i: i + n_forecast])
X = np.array(X)
Y = np.array(Y)
# fit the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(n_lookback, 1)))
model.add(LSTM(units=50))
model.add(Dense(n_forecast))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, Y, epochs=100, batch_size=32, verbose=0)
# generate the forecasts
X_ = y[- n_lookback:] # last available input sequence
X_ = X_.reshape(1, n_lookback, 1)
Y_ = model.predict(X_).reshape(-1, 1)
Y_ = scaler.inverse_transform(Y_)
# organize the results in a data frame
df_past = df[['Close']].reset_index()
df_past.rename(columns={'index': 'Date', 'Close': 'Actual'}, inplace=True)
df_past['Date'] = pd.to_datetime(df_past['Date'])
df_past['Forecast'] = np.nan
df_past['Forecast'].iloc[-1] = df_past['Actual'].iloc[-1]
df_future = pd.DataFrame(columns=['Date', 'Actual', 'Forecast'])
df_future['Date'] = pd.date_range(start=df_past['Date'].iloc[-1] + pd.Timedelta(days=1), periods=n_forecast)
df_future['Forecast'] = Y_.flatten()
df_future['Actual'] = np.nan
results = df_past.append(df_future).set_index('Date')
# plot the results
results.plot(title='AAPL')

ANN problem: I am building an ANN model to predict the profit of a new startup based on certain features

The image of the dataset
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
Loading the data set using pandas as data frame format
import pandas as pd
df = pd.read_csv(r"E:\50_Startups.csv")
df.drop(['State'],axis = 1, inplace = True)
from sklearn.preprocessing import MinMaxScaler
mm = MinMaxScaler()
df.iloc[:,:] = mm.fit_transform(df.iloc[:,:])
info = df.describe()
x = df.iloc[:,:-1].values
y = df.iloc[:,-1].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split( x,y, test_size=0.2, random_state=42)
Initializing the model
model = Sequential()
model.add(Dense(40,input_dim =3,activation="relu",kernel_initializer='he_normal'))
model.add(Dense(30,activation="relu"))
model.add(Dense(1))
model.compile(loss="mean_squared_error",optimizer="adam",metrics=["accuracy"])
fitting model on train data
model.fit(x=x_train,y=y_train,epochs=150, batch_size=32,verbose=1)
Evaluating the model on test data
eval_score_test = model.evaluate(x_test,y_test,verbose = 1)
I am getting zero accuracy.

The problem is that accuracy is a metric for discrete values (classification).
you should use:
r2 score
mape
smape
instead.
e.g:
model.compile(loss="mean_squared_error",optimizer="adam",metrics=["mean_absolute_percentage_error"])

Adding to the answer of #GuintherKovalski accuracy is not for regression but if you still want to use it then you can use it along with some threshold using following steps:
Set a threshold such that if the absolute difference in the predicted value and the actual value is less than equal to the threshold then you consider that value as correct, otherwise false.
Ex -> predicted values = [0.3, 0.7, 0.8, 0.2], original values = [0.2, 0.8, 0.5, 0.4].
Now abs diff -> [0.1, 0.1, 0.3, 0.2] and let's take a threshold of 0.2. So with this threshold the correct -> [1, 1, 0, 1] and your accuracy will be correct.sum()/len(correct) and that is 3/4 -> 0.75.
This could be implemented in TensorFlow like this
import numpy as np
import tensorflow as tf
from sklearn.datasets import make_regression
data = make_regression(10000)
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(100,))])
def custom_metric(a, b):
threshold = 1 # Choose accordingly
abs_diff = tf.abs(b - a)
correct = abs_diff >= threshold
correct = tf.cast(correct, dtype=tf.float16)
res = tf.math.reduce_mean(correct)
return res
model.compile('adam', 'mae', metrics=[custom_metric])
model.fit(data[0], data[1], epochs=30, batch_size=32)

Just want to say Thank you to everyone who took their precious time to help me. I am posting this code as this worked for me. I hope it helps everyone who is stuck somewhere looking for answers. I got this code after consulting with my friend.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import pandas as pd
from sklearn.model_selection import train_test_split
# Loading the data set using pandas as data frame format
startups = pd.read_csv(r"E:\0Assignments\DL_assign\50_Startups.csv")
startups = startups.drop("State", axis =1)
train, test = train_test_split(startups, test_size = 0.2)
x_train = train.iloc[:,0:3].values.astype("float32")
x_test = test.iloc[:,0:3].values.astype("float32")
y_train = train.Profit.values.astype("float32")
y_test = test.Profit.values.astype("float32")
def norm_func(i):
x = ((i-i.min())/(i.max()-i.min()))
return (x)
x_train = norm_func(x_train)
x_test = norm_func(x_test)
y_train = norm_func(y_train)
y_test = norm_func(y_test)
# one hot encoding outputs for both train and test data sets
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Storing the number of classes into the variable num_of_classes
num_of_classes = y_test.shape[1]
x_train.shape
y_train.shape
x_test.shape
y_test.shape
# Creating a user defined function to return the model for which we are
# giving the input to train the ANN mode
def design_mlp():
# Initializing the model
model = Sequential()
model.add(Dense(500,input_dim =3,activation="relu"))
model.add(Dense(200,activation="tanh"))
model.add(Dense(100,activation="tanh"))
model.add(Dense(50,activation="tanh"))
model.add(Dense(num_of_classes,activation="linear"))
model.compile(loss="mean_squared_error",optimizer="adam",metrics =
["accuracy"])
return model
# building a cnn model using train data set and validating on test data set
model = design_mlp()
# fitting model on train data
model.fit(x=x_train,y=y_train,batch_size=100,epochs=10)
# Evaluating the model on test data
eval_score_test = model.evaluate(x_test,y_test,verbose = 1)
print ("Accuracy: %.3f%%" %(eval_score_test[1]*100))
# accuracy score on train data
eval_score_train = model.evaluate(x_train,y_train,verbose=0)
print ("Accuracy: %.3f%%" %(eval_score_train[1]*100))

Training loop for XGBoost in different dataset

I have developed some different datasets and I want to write a for loop to do the training for each of which and at the end, I want to have RMSE for each dataset. I tried by passing through a for loop but it does not work since it gives back the same value for each dataset while I know that it should be different. The code that I have written is below:
for i in NEW_middle_index:
DF = df1.iloc[i-100:i+100,:]
# Append an empty sublist inside the list
FINAL_DF.append(DF)
y = DF.iloc[:,3]
X = DF.drop(columns='Target')
index_train = int(0.7 * len(X))
X_train = X[:index_train]
y_train = y[:index_train]
X_test = X[index_train:]
y_test = y[index_train:]
scaler_x = MinMaxScaler().fit(X_train)
X_train = scaler_x.transform(X_train)
X_test = scaler_x.transform(X_test)
xgb_r = xg.XGBRegressor(objective ='reg:linear',
n_estimators = 20, seed = 123)
for i in range(len(NEW_middle_index)):
# print(i)
# Fitting the model
xgb_r.fit(X_train,y_train)
# Predict the model
pred = xgb_r.predict(X_test)
# RMSE Computation
rmse = np.sqrt(mean_squared_error(y_test,pred))
# print(rmse)
RMSE.append(rmse)

Not sure if you indented it correctly. You are overwriting X_train and X_test and when you fit your model, its always on the same dataset, hence you get the same results.
One option is to fit the model once you create the train / test dataframes. Else if you want to keep the train / test set, maybe something like below, to store them in a list of dictionaries, without changing too much of your code:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import xgboost as xg
df1 = pd.DataFrame(np.random.normal(0,1,(600,3)))
df1['Target'] = np.random.uniform(0,1,600)
NEW_middle_index = [100,300,500]
NEWDF = []
for i in NEW_middle_index:
y = df1.iloc[i-100:i+100:,3]
X = df1.iloc[i-100:i+100,:].drop(columns='Target')
index_train = int(0.7 * len(X))
scaler_x = MinMaxScaler().fit(X)
X_train = scaler_x.transform(X[:index_train])
y_train = y[:index_train]
X_test = scaler_x.transform(X[index_train:])
y_test = y[index_train:]
NEWDF.append({'X_train':X_train,'y_train':y_train,'X_test':X_test,'y_test':y_test})
Then we fit and calculate RMSE:
RMSE = []
xgb_r = xg.XGBRegressor(objective ='reg:linear',n_estimators = 20, seed = 123)
for i in range(len(NEW_middle_index)):
xgb_r.fit(NEWDF[i]['X_train'],NEWDF[i]['y_train'])
pred = xgb_r.predict(NEWDF[i]['X_test'])
rmse = np.sqrt(mean_squared_error(NEWDF[i]['y_test'],pred))
RMSE.append(rmse)
RMSE
[0.3524827559800294, 0.3098101362502435, 0.3843173269966071]

Prediction with LSTM using Keras

I am predicting Y based on X from past values. Our formatted CSV dataset has three columns (time_stamp, X and Y - where Y is the actual value) whose sample format is
time,X,Y
0.000561,0,10
0.000584,0,10
0.040411,5,10
0.040437,10,10
0.041638,12,10
0.041668,14,10
0.041895,15,10
0.041906,19,10
... ... ...
Before training the prediction model, here is how the plots of X and Y respectively look like the following.
Here is how I approached the problem with LSTM Recurrent Neural Networks in Python with Keras.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
np.random.seed(7)
# Load data
df = pd.read_csv('test32_C_data.csv')
n_features = 100
def create_sequences(data, window=15, step=1, prediction_distance=15):
x = []
y = []
for i in range(0, len(data) - window - prediction_distance, step):
x.append(data[i:i + window])
y.append(data[i + window + prediction_distance][1])
x, y = np.asarray(x), np.asarray(y)
return x, y
# Scaling prior to splitting
scaler = MinMaxScaler(feature_range=(0.01, 0.99))
scaled_data = scaler.fit_transform(df.loc[:, ["X", "Y"]].values)
# Build sequences
x_sequence, y_sequence = create_sequences(scaled_data)
# Create test/train split
test_len = int(len(x_sequence) * 0.90)
valid_len = int(len(x_sequence) * 0.90)
train_end = len(x_sequence) - (test_len + valid_len)
x_train, y_train = x_sequence[:train_end], y_sequence[:train_end]
x_valid, y_valid = x_sequence[train_end:train_end + valid_len], y_sequence[train_end:train_end + valid_len]
x_test, y_test = x_sequence[train_end + valid_len:], y_sequence[train_end + valid_len:]
# Initialising the RNN
model = Sequential()
# Adding the input layerand the LSTM layer
model.add(LSTM(15, input_shape=(15, 2)))
# Adding the output layer
model.add(Dense(1))
# Compiling the RNN
model.compile(loss='mse', optimizer='rmsprop')
# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=5)
# Getting the predicted values
y_pred = model.predict(x_test)
#y_pred = scaler.inverse_transform(y_pred)
plot_colors = ['#332288', '#3cb44b']
# Plot the results
pd.DataFrame({"Actual": y_test, "Predicted": np.squeeze(y_pred)}).plot(color=plot_colors)
plt.xlabel('Time [Index]')
plt.ylabel('Values')
Finally, when I run the code - the neural model seems to capture the pattern of the signal well as it is shown below.
However, one problem that I encountered in this output is the ranges of Y. As it is shown in the first two plots, the ranges should be 0-400 as shown above and to solve that I tried to use the scaler to inverse_transform as y_pred = scaler.inverse_transform(y_pred) but this throws an error: ValueError: non-broadcastable output operand with shape (7625,1) doesn't match the broadcast shape (7625,2). How can we solve this broadcast shape error?

Basically, the scaler has remembered that it was fed 2 features(/columns). So it is expecting 2 features to invert the transformation.
Two options here.
1) You make two different scalers: scaler_x and scaler_y like this :
# Scaling prior to splitting
scaler_x = MinMaxScaler(feature_range=(0.01, 0.99))
scaler_y = MinMaxScaler(feature_range=(0.01, 0.99))
scaled_x = scaler_x.fit_transform(df.loc[:, "X"].reshape([-1, 1]))
scaled_y = scaler_y.fit_transform(df.loc[:, "Y"].reshape([-1, 1]))
scaled_data = np.column_stack((scaled_x, scaled_y))
Then you will be able to do :
y_pred = scaler_y.inverse_transform(y_pred)
2) You fake the X column in your output like this :
y_pred_reshaped = np.zeros((len(y_pred), 2))
y_pred_reshaped[:,1] = y_pred
y_pred = scaler.inverse_transform(y_pred_reshaped)[:,1]
Does that help?
EDIT
here is the full code as required
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
np.random.seed(7)
# Load data
#df = pd.read_csv('test32_C_data.csv')
df = pd.DataFrame(np.random.randint(0,100, size=(100,3)), columns = ['time', 'X', 'Y'])
n_features = 100
def create_sequences(data, window=15, step=1, prediction_distance=15):
x = []
y = []
for i in range(0, len(data) - window - prediction_distance, step):
x.append(data[i:i + window])
y.append(data[i + window + prediction_distance][1])
x, y = np.asarray(x), np.asarray(y)
return x, y
# Scaling prior to splitting
scaler_x = MinMaxScaler(feature_range=(0.01, 0.99))
scaler_y = MinMaxScaler(feature_range=(0.01, 0.99))
scaled_x = scaler_x.fit_transform(df.loc[:, "X"].reshape([-1,1]))
scaled_y = scaler_y.fit_transform(df.loc[:, "Y"].reshape([-1,1]))
scaled_data = np.column_stack((scaled_x, scaled_y))
# Build sequences
x_sequence, y_sequence = create_sequences(scaled_data)
test_len = int(len(x_sequence) * 0.90)
valid_len = int(len(x_sequence) * 0.90)
train_end = len(x_sequence) - (test_len + valid_len)
x_train, y_train = x_sequence[:train_end], y_sequence[:train_end]
x_valid, y_valid = x_sequence[train_end:train_end + valid_len], y_sequence[train_end:train_end + valid_len]
x_test, y_test = x_sequence[train_end + valid_len:], y_sequence[train_end + valid_len:]
# Initialising the RNN
model = Sequential()
# Adding the input layerand the LSTM layer
model.add(LSTM(15, input_shape=(15, 2)))
# Adding the output layer
model.add(Dense(1))
# Compiling the RNN
model.compile(loss='mse', optimizer='rmsprop')
# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=5)
# Getting the predicted values
y_pred = model.predict(x_test)
y_pred = scaler_y.inverse_transform(y_pred)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Predicting 100 values of function f(x) using LSTM - python

Related

Predict data outside existing data

Forecast future values with LSTM in Python

ANN problem: I am building an ANN model to predict the profit of a new startup based on certain features

Training loop for XGBoost in different dataset

Prediction with LSTM using Keras

Categories

Resources