I am working on a path prediction problem where I am predicting the path (Latitude, Longitude) one time step ahead. I have the path data for nearly 1500 "events", which I use for training the LSTM model. For training, since I know the path a priori, I shift the time series by one step, and use it as a target vector. For example:
Event 1:
Lat (t), Lon (t) --> Lat (t+1), Lon (t+1)
Lat (t+1), Lon (t+1) --> Lat (t+2), Lon (t+2)
However, for testing since the path is not known a priori, I take the trained LSTM model, and predict one time step at a time, and feed the predicted value as an input for the next time step. Below are the snippets from my code-
# Extract only the Lat, Lon values to arrays
train_full = train_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
test_full = test_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
print('train_full.shape = ', train_full.shape)
print('test_full.shape = ', test_full.shape)
# Separate the Inputs and Targets
x_train_full = train_full[:,0:2]
y_train_full = train_full[:,2:4]
x_test_full = test_full[:,0:2]
y_test_full = test_full[:,2:4]
# Defining the LSTM model
model = Sequential()
model.add(LSTM(40,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(20,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(Dense(2))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.summary()
epochs = 50
for i in range(epochs):
print ('Running Epoch No: ', i)
for stormID, data in train_df.groupby('EventID'):
train = data[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']]
train = train.values
x_train = np.expand_dims(train[:,0:2], axis=0)
y_train = np.expand_dims(train[:,2:4], axis=0)
#print (x_train.shape, y_train.shape)
model.train_on_batch(x_train,y_train)
model.reset_states()
print('Model training done.....')
#Use the optimized weights to estimate target values for training data
train_pred = new_model.predict_on_batch(np.expand_dims(train_df[['LatNor','LonNor']].values, axis=0))
train_pred_val = x_scaler.inverse_transform(train_pred[0])
The model trains well (see plots below)
enter image description here
enter image description here
When I use the trained model, and do a predict_on_batch on the test data, it works great. But, in reality we would not know the time series ahead of time. So, when I predict one instance at a time for the test set, and use it as an input to for the next time step, it is not working well. I suspect I am missing something, and changing the state/weights of the trained network whenever I make a predict call to the network.
x_values = TestDF[['LatNor','LonNor']].values
x_values_scaled = x_values
start = x_values_scaled[0,:]
startX = start.reshape(1,1,2)
Results = np.empty(())
Results = x_scaler.inverse_transform(startX[0])
for i in range(x_values.shape[0]):
nextLoc = model.predict(startX)
nextLoc_rescaled = x_scaler.inverse_transform(nextLoc[0])
Results = np.vstack((Results,nextLoc_rescaled))
startX = nextLoc
Any thoughts or recommendations?
Related
I have Encoder-Decoder LSTM model that learns to predict 12 months data in advance, while looking back 12 months. If it helps at all, my dataset has around 10 years in total (120 months). I keep 8 years for training/validation, and 2 years for testing. My understanding is that my model does not have access to the testing data at the training time.
The puzzling thing is that my model predictions are simply a shift of previous points. But how did my model know the actual previous points at the time of prediction? I did not give the monthly values in the testing set to the model! If we say that it simply copies the previous point which you give as input, then I am saying that I am giving it 12 months with completely different values than the ones it predicts (so it does not copy the 12 months I am giving), but the forecasted values are shifts of actual ones (which have never been seen).
Below is an example:
My code source is from here:
Below is my code:
#train/test splitting
split_position=int(len(scaled_data)*0.8)# 8 years for training
train=scaled_data[0:split_position]
test=scaled_data[split_position:]
#print(train)
print('length of train=',len(train))
#print(test)
print('length of test=',len(test))
# split train and test data into yearly train/test sets (3d)[observation,year, month]
def split_data_yearly(train, test):
# restructure into windows of yearly data
train = array(split(train, len(train)/12))
test = array(split(test, len(test)/12))
return train, test
# evaluate one or more yearly forecasts against expected values
def evaluate_forecasts(actual, predicted):
scores = list()
# calculate an RMSE score for each day
for i in range(actual.shape[1]):
# calculate mse
mse = mean_squared_error(actual[:, i], predicted[:, i])
# calculate rmse
rmse = math.sqrt(mse)
# store
scores.append(rmse)
# calculate overall RMSE
s = 0
for row in range(actual.shape[0]):
for col in range(actual.shape[1]):
s += (actual[row, col] - predicted[row, col])**2
score = math.sqrt(s / (actual.shape[0] * actual.shape[1]))
################plot prediction vs actual###############################
predicted=predicted.reshape(predicted.shape[0],predicted.shape[1])
jump=12
inv_scores = list()
for i in range(len(predicted)):
sample_predicted = predicted[i,:]
sample_actual=actual[i,:]
#inverse normalization
sample_predicted_inv= scaler.inverse_transform(sample_predicted.reshape(-1, 1))
sample_actual_inv= scaler.inverse_transform(sample_actual.reshape(-1, 1))
#print(sample_actual_inv)
#print(data_sd[(split_position+(i*jump)-1):(split_position+(i*jump-1))+len(sample_actual_inv)])
#inverse differencing
s=numpy.array(smoothed).reshape(-1,1)
sample_actual_inv=sample_actual_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
sample_predicted_inv=sample_predicted_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
months=['August-'+str(19+i),'September-'+str(19+i),'October-'+str(19+i),'November-'+str(19+i),'December-'+str(19+i),'January-'+str(20+i),'February-'+str(20+i),'March-'+str(20+i),'April-'+str(20+i),'May-'+str(20+i),'June-'+str(20+i),'July-'+str(20+i)]
pyplot.plot( months,sample_actual_inv,'b-',label='Actual')
pyplot.plot(months,sample_predicted_inv,'--', color="orange",label='Predicted')
pyplot.legend()
pyplot.xticks(rotation=25)
pyplot.title('Encoder Decoder LSTM Prediction', y=1.08)
pyplot.show()
################### determine RMSE after inversion ################################
mse = mean_squared_error(sample_actual_inv, sample_predicted_inv)
rmse = math.sqrt(mse)
inv_scores.append(rmse)
return score, scores,inv_scores
# summarize scores
def summarize_scores(name, score, scores):
s_scores = ', '.join(['%.1f' % s for s in scores])
print('%s: [%.3f] %s' % (name, score, s_scores))
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=12):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
X.append(data[in_start:in_end, :])
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
#take portion for validation
val_size=12;
test_x,test_y=train_x[-val_size:], train_y[-val_size:]
train_x,train_y=train_x[0:-val_size],train_y[0:-val_size]
# define parameters
verbose, epochs, batch_size = 1,25, 8
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
#sgd = optimizers.SGD(lr=0.004, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse', optimizer='adam')
# fit network
train_history= model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_x, test_y),verbose=verbose)
loss = train_history.history['loss']
val_loss = train_history.history['val_loss']
pyplot.plot(loss)
pyplot.plot(val_loss)
pyplot.legend(['loss', 'val_loss'])
pyplot.show()
return model
# make a forecast
def forecast(model, history, n_input):
# flatten data
data = array(history)
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
# retrieve last observations for input data
input_x = data[-n_input:, :]
# reshape into [1, n_input, n]
input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
# forecast the next year
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]
return yhat
# evaluate a single model
def evaluate_model(train, test, n_input):
# fit model
model = build_model(train, n_input)
# history is a list of yearly data
history = [x for x in train]
# walk-forward validation over each year
predictions = list()
for i in range(len(test)):
# predict the year
yhat_sequence = forecast(model, history, n_input)
# store the predictions
predictions.append(yhat_sequence)
# get real observation and add to history for predicting the next year
history.append(test[i,:])
# evaluate predictions days for each year
predictions = array(predictions)
score, scores, inv_scores = evaluate_forecasts(test[:, :, 0], predictions)
return score, scores,inv_scores
# split into train and test
train, test = split_data_yearly(train, test)
# evaluate model and get scores
n_input = 12
score, scores, inv_scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('lstm', score, scores)
print('RMSE score after inversion:',inv_scores)
# plot scores
months=['July','August','September','October','November','December','January','February','March','April','May','June']
#pyplot.plot(months, scores, marker='o', label='lstm')
#pyplot.show()
Differencing is the key here!
After further investigation, I found out that my model produces values that is almost zero before differencing (not learning).....When I invert the differencing, I am adding zero to the actual value in the previous timestep, which results in the shifted pattern above.
Therefore, I need to tune my LSTM model to make it learn or maybe remove the zeros part in the data itself since I have many of those.
Is it possible to apply k-fold on fitting the model and later on predicting?
We have a structure built like the following, where data is trained upon time-series data between 2020-02-01 and 2021-02-01, and then tested on time-series data between 2021-04-01 and 2021-04-08, but unfortunately the results for predicting is very bad, and when we calculate the evaluation metrics, it is obvious something is wrong. Since we only have one year of data to train on, we might believe we are underfitting, and therefore we believe k-fold would help us.
What we want exactly is to k-fold on the period 2020-02-01 to 2021-02-01 and then afterwards predict on the period 2021-04-01 to 2021-04-08. By doing this, we efficiently train the model with the 1 year of data we have, and ensure a model fit to do a proper prediction. Is this possible, and perhaps an even better question to ask, is this the correct way of predicting data?
Our data looks like the following, where datetime is in %y%m%d format
datetime temperature solar humidity rain temp_60
2020-02-01 00:00:00, 4.32, 22.84, 82.12, 16.36, 3.12
2020-02-01 00:10:00, 4.38, 21.99, 82.11, 16.25, 3.11
...
2021-11-01 00:00:00, 9.94, 15.43, 82.29, 14.83, 3.11
Code:
TargetVariable = ['temp_60']
predictors = ['temperature', 'humidity', 'rain', 'solar']
columnAll = ['datetime', 'temperature', 'humidity', 'rain', 'solar', 'temp_60', 'Predicted_temp_60', 'APE']
feature_selection_list = []
if __name__ == '__main__':
# Setting random seed for the initial starting weights
rndseed = 7
seed(rndseed)
np.random.seed(rndseed)
random.seed(rndseed)
# Retrieve data from csv file
dataframe = pd.read_csv("../data.csv")
Algorithm(dataframe=dataframe)
def Algorithm(dataframe):
# Split data
dateToTrain = dataframe.loc[dataframe['datetime'].between('2020-02-01', '2021-02-01', inclusive=True)]
dateToPredict = dataframe.loc[dataframe['datetime'].between('2021-04-01', '2021-04-08', inclusive=True)]
# Initialize the pipeline
estimator = pipelining()
# Get values of the predictors and target variable for training data
# For the datetime between 2020-02-01 and 2021-02-01
X_train_year, y_train_year = dateToTrain[predictors].values, dateToTrain[TargetVariable].values
# Get values of the predictors and target variable for testing data
# For the datetime between 2021-04-01 and 2021-04-08
X_test_all, y_test_all = dateToPredict[predictors].values, dateToPredict[TargetVariable].values
# ...
ann = ArtificialNeuralNetwork(predictors=predictors, pipeline=estimator, X_train=X_train_year, y_train=y_train_year)
annPredictionDataframe = ann.modelFitAndPredict(X_test=X_test_all, y_test=y_test_all)
# ...
annPredictionDataframe['datetime'] = dateToPredict['datetime'].values
kfold_score = ann.evaluateScoreKFold()
annPredictionDataframe = annPredictionDataframe[columnAll]
# Evaluation of the performance of the Artificial Neural Network (ANN)
evaluation = performanceEvaluation(y_test_orig=annPredictionDataframe['temp_60'], y_test_pred=annPredictionDataframe['Predicted_temp_60'])
evaluation["Generalization Error"] = kfold_score.mean()
def pipelining():
# Standardizing the features
estimators = [('standardize', StandardScaler())]
estimators.append(('mlp', KerasRegressor(build_fn=make_regression_ann, batch_size=10, epochs=100)))
# The pipeline can be used as any other estimator
# and avoids leaking the test set into the train set
pipeline = Pipeline(estimators)
return pipeline
def make_regression_ann(initializer='uniform', activation='relu', optimizer='rmsprop', loss='mae', neurons=12):
inputDim = len(predictors) if selection == 1 or selection == 3 or selection == 4 else len(tmp)
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
# model.add(Dense(units=neurons, input_dim=len(feature_selection_list), kernel_initializer=initializer, activation=activation, activity_regularizer=l1(0.0001)))
model.add(Dense(units=neurons, input_dim=inputDim, kernel_initializer=initializer, activation=activation, activity_regularizer=l1(0.0001)))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=neurons, kernel_initializer=initializer, activation=activation))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer=initializer))
# Compiling the model
model.compile(loss=loss, optimizer=optimizer)
return model
def performanceEvaluation(y_test_orig, y_test_pred):
evaluationColumns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)', 'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
dataframe = pd.DataFrame([eval_list], columns=evaluationColumns)
return dataframe
In a seperate file. This is where we do the fit and prediction:
class ArtificialNeuralNetwork:
def __init__(self, predictors, pipeline, X_train, y_train):
self.predictors = predictors
self.pipeline = pipeline
self.X_train = X_train
self.y_train = y_train
def evaluateScoreKFold(self):
cv = KFold(n_splits=10)
results = cross_val_score(self.pipeline, X=self.X_train, y=self.y_train, cv=cv, scoring="neg_mean_absolute_error")
print(f'Cross Validation Results: {results}')
print("Standardized: % .2f( % .2f) MAE" % (results.mean(), results.std()))
return results
def modelFitAndPredict(self, X_test, y_test):
# THIS IS WHERE WE SHOULD PROBABABLY DO KFOLD WITH THE ONE YEAR DATA TO TRAIN UPON THIS PIPELINE. IS THIS CORRECTLY ASSUMED???
self.pipeline.fit(self.X_train, self.y_train)
# Generating Predictions on testing data
Predictions = self.pipeline.predict(X_test)
TestingData = pd.DataFrame(data=X_test, columns=self.predictors)
TestingData['temp_60'] = y_test
TestingData['Predicted_temp_60'] = Predictions
TestingData.head()
# Computing the absolute percent error
APE = 100 * (abs(TestingData['temp_60'] - TestingData['Predicted_temp_60']) / TestingData['temp_60'])
TestingData['APE'] = APE
# Rounding all floats to 2 decimals
TestingData = TestingData.round(2)
TestingData.to_csv("TestingData.csv")
return TestingData
The Evaluation Metrics:
Generalization Error R2 RMSE MSE MAPE MAE MBE
0.921 -327.534 6.945 48.229 0.811 5.212 1.835
The purpose of cross-validation (K-fold) is model checking, not model building.
Once you have checked with cross-validation that you obtain similar metrics for every split, you have to train your model with all your training data.
Maybe, what you are looking for is ensemble methods. But I'm not sure if they can be applied with NN. A clear example is Random forest: many decision trees algorithm are trained and evaluated.
Finally, just comment that with NN, in general, as more data you have, better knowledge you obtain, so maybe splitting data to obtain different models is not the best option.
As you can see below i have two functions , get_data() outputs a data frame for the selected asset history and passes it to train_model() every thing works fine but as the model trains the accuracy does not seem to change the loss does go down but the accuracy stays the same after the second epoch ,when training with 1000 epochs the accuracy also does not change
Things i tried changing with this code:
changing unit count for each of the LSTM layers
tried using differnet data frames from different sources ( alpha-vantage )
changing epoch count
unfortunately nothing changed
def train_model( df):
if not os.path.exists("/py_stuff/"):
os.makedirs("/py_stuff/")
checkpoint_filepath ="/py_stuff/check_point"
weights_checkpoint = "/py_stuff/"
checkpoint_dir = os.path.dirname(checkpoint_filepath)
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='accuracy',
mode='max',
save_best_only=True,
verbose=1)
dataset_train = df
training_set = dataset_train.iloc[:, 1:2].values
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(100, len(df)):
X_train.append(training_set_scaled[i-100:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
model = Sequential()
model.add(LSTM(units = 100, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=['accuracy'])
## loading weights
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")
model.fit(X_train,y_train,epochs=50,batch_size=100, callbacks=[model_checkpoint_callback])
## saving weights
try:
model.save(checkpoint_filepath)
model.save_weights(filepath=checkpoint_filepath)
print ("Saving weights and model done ")
except OSError as no_model:
print ("Error saving weights and model !!!!!!!!!!!! ")
def get_data(CHOICE):
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = CHOICE,
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "5y",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1d",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
dff = pd.DataFrame(data)
return dff
df = get_data(CHOICE="BTC-USD")
train_model(df)
From your loss function, it looks like you have a regression network. Your loss is Mean Squared Error and the metric accuracy does not have any meaning for regression networks. Accuracy metric is only meaningful when used for classification models. So you can remove the metrics=['accuracy'] from your compile code and and use loss value to evaluate your model. So if the loss is decreasing that means your optimizer is successfully training the network.
You are dealing with a regression problem where the accuracy is not defined.
The accuracy is defined as the probability of belonging to a specific class. For example, the probability of the output to be the digit 9. The number of classes is finite (or countable).
In your case, your network outputs a real number. The notion of accuracy does not make any sense in this context.
For example, the probability of your output to be 1.000 for example is 0. Although (and surprisingly!), a probability of zero does not mean that the event will never happen!
Ideally, Keras should return an error saying accuracy not defined.
I want to make a Seq2Seq model for reconstruction purpose. I want a model trained to reconstruct the normal time-series and it is assumed that such a model would do badly to reconstruct the anomalous time-series having not seen them during training.
I have some gaps in my code and also in the understanding. I took this as an orientation and did so far:
traindata: input_data.shape(1000,60,1) and target_data.shape(1000,50,1) with target data being the same training data only in reversed order as sugested in the paper here.
for inference: I want to predict another time series data with the trained model having the shape (3000,60,1). T Now 2 points are open: how do I specify the input data for my training model and how do I build the inference part with the stop condition ?
Please correct any mistakes.
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
num_encoder_tokens = 1#number of features
num_decoder_tokens = 1#number of features
encoder_seq_length = None
decoder_seq_length = None
batch_size = 50
epochs = 40
# same data for training
input_seqs=()#shape (1000,60,1) with sliding windows
target_seqs=()#shape(1000,60,1) with sliding windows but reversed
x= #what has x to be ?
#data for inference
# how do I specify the input data for my other time series ?
# Define training model
encoder_inputs = Input(shape=(encoder_seq_length,
num_encoder_tokens))
encoder = LSTM(128, return_state=True, return_sequences=True)
encoder_outputs = encoder(encoder_inputs)
_, encoder_states = encoder_outputs[0], encoder_outputs[1:]
decoder_inputs = Input(shape=(decoder_seq_length,
num_decoder_tokens))
decoder = LSTM(128, return_sequences=True)
decoder_outputs = decoder(decoder_inputs, initial_state=encoder_states)
decoder_outputs = TimeDistributed(
Dense(num_decoder_tokens, activation='tanh'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Training
model.compile(optimizer='adam', loss='mse')
model.fit([input_seqs,x], target_seqs,
batch_size=batch_size, epochs=epochs)
# Define sampling models for inference
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(100,))
decoder_state_input_c = Input(shape=(100,))
decoder_states = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder(decoder_inputs,
initial_state=decoder_states)
decoder_model = Model([decoder_inputs] + decoder_states,
decoder_outputs)
# Sampling loop for a batch of sequences
states_values = encoder_model.predict(input_seqs)
stop_condition = False
while not stop_condition:
output_tokens = decoder_model.predict([target_seqs] + states_values)
#what else do I need to include here ?
break
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
You can see that the output from every timestep is fed back to the LSTM cell externally.
You can refer the blog and find how it is done during inference.
https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/
During training, we give the data in a one shot manner. I think you understand that part.
But during the inference time, we can't do like that. We have to give the data at every time step and then return the cell states, hidden states and the loop should continue till the last word is generated
I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
bias_hidden_4))
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
net.run(tf.global_variables_initializer())
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
print(pred)`
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!