As you can see below i have two functions , get_data() outputs a data frame for the selected asset history and passes it to train_model() every thing works fine but as the model trains the accuracy does not seem to change the loss does go down but the accuracy stays the same after the second epoch ,when training with 1000 epochs the accuracy also does not change
Things i tried changing with this code:
changing unit count for each of the LSTM layers
tried using differnet data frames from different sources ( alpha-vantage )
changing epoch count
unfortunately nothing changed
def train_model( df):
if not os.path.exists("/py_stuff/"):
os.makedirs("/py_stuff/")
checkpoint_filepath ="/py_stuff/check_point"
weights_checkpoint = "/py_stuff/"
checkpoint_dir = os.path.dirname(checkpoint_filepath)
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='accuracy',
mode='max',
save_best_only=True,
verbose=1)
dataset_train = df
training_set = dataset_train.iloc[:, 1:2].values
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(100, len(df)):
X_train.append(training_set_scaled[i-100:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
model = Sequential()
model.add(LSTM(units = 100, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=['accuracy'])
## loading weights
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")
model.fit(X_train,y_train,epochs=50,batch_size=100, callbacks=[model_checkpoint_callback])
## saving weights
try:
model.save(checkpoint_filepath)
model.save_weights(filepath=checkpoint_filepath)
print ("Saving weights and model done ")
except OSError as no_model:
print ("Error saving weights and model !!!!!!!!!!!! ")
def get_data(CHOICE):
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = CHOICE,
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "5y",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1d",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
dff = pd.DataFrame(data)
return dff
df = get_data(CHOICE="BTC-USD")
train_model(df)
From your loss function, it looks like you have a regression network. Your loss is Mean Squared Error and the metric accuracy does not have any meaning for regression networks. Accuracy metric is only meaningful when used for classification models. So you can remove the metrics=['accuracy'] from your compile code and and use loss value to evaluate your model. So if the loss is decreasing that means your optimizer is successfully training the network.
You are dealing with a regression problem where the accuracy is not defined.
The accuracy is defined as the probability of belonging to a specific class. For example, the probability of the output to be the digit 9. The number of classes is finite (or countable).
In your case, your network outputs a real number. The notion of accuracy does not make any sense in this context.
For example, the probability of your output to be 1.000 for example is 0. Although (and surprisingly!), a probability of zero does not mean that the event will never happen!
Ideally, Keras should return an error saying accuracy not defined.
Related
I'm trying to train a basic Graph Neural Network using the StellarGraph library, in particular starting from the example provided in [0].
The example works fine, but now I would like to repeat the same exercize removing the N-Fold Crossvalidation and providing specific training, validation and test sets. I'm trying to do so with the following code:
# One hot encoding
graph_training_set_labels_encoded = pd.get_dummies(graphs_training_set_labels, drop_first=True)
graph_validation_set_labels_encoded = pd.get_dummies(graphs_validation_set_labels, drop_first=True)
graphs = graphs_training_set + graphs_validation_set
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, len(graphs_training_set))],
targets=graph_training_set_labels_encoded,
batch_size=batch_size)
valid_gen = generator.flow([x for x in range(len(graphs_training_set),
len(graphs_training_set) + len(graphs_validation_set))],
targets=graph_validation_set_labels_encoded,
batch_size=batch_size)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0,
patience=20,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Creating Keras model and preparing it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=0, callbacks=[es])
# Calculate performance on the validation data
test_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = test_metrics[model.metrics_names.index("acc")]
print(f"Test Accuracy model = {valid_acc}")
Where graphs_training_set and graphs_validation_set are lists of StellarDiGraphs.
I am able to run this piece of code, but it provides NaN as result. What could be the problem?
Since it is the first time I am using StellarGraph and in particular PaddedGraphGenerator. I think my mistake rely on the usage of that generator, but providing training set and validation set in different manner didn't produce better results.
Thank you in advance.
UPDATE Fixed I typo in the code, as pointed out here (thanks to george123).
[0] https://stellargraph.readthedocs.io/en/stable/demos/graph-classification/gcn-supervised-graph-classification.html
I found a solution digging in the StellarGraph documentation for PaddedGraphGenerator and GCN Neural Network Class GCNSupervisedGraphClassification. Furthermore, I have found a similar question on StellarGraph Issue Tracker which also points out to the solution.
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, num_graphs_for_training)],
targets=training_graphs_labels,
batch_size=35)
valid_gen = generator.flow([x for x in range(num_graphs_for_training, num_graphs_for_training + num_graphs_for_validation)],
targets=validation_graphs_labels,
batch_size=35)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0.001,
patience=10,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Let's create the Keras model and prepare it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=1, callbacks=[es])
# Evaluate performance on the validation data
valid_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = valid_metrics[model.metrics_names.index("acc")]
# Define test set indices temporary vars
index_begin_test_set = num_graphs_for_training + num_graphs_for_validation
index_end_test_set = index_begin_test_set + num_graphs_for_testing
test_set_indices = [x for x in range(index_begin_test_set, index_end_test_set)]
# Evaluate performance on test set
generator_for_test_set = PaddedGraphGenerator(graphs=graphs)
test_gen = generator_for_test_set.flow(test_set_indices)
result = model.predict(test_gen)
I have Encoder-Decoder LSTM model that learns to predict 12 months data in advance, while looking back 12 months. If it helps at all, my dataset has around 10 years in total (120 months). I keep 8 years for training/validation, and 2 years for testing. My understanding is that my model does not have access to the testing data at the training time.
The puzzling thing is that my model predictions are simply a shift of previous points. But how did my model know the actual previous points at the time of prediction? I did not give the monthly values in the testing set to the model! If we say that it simply copies the previous point which you give as input, then I am saying that I am giving it 12 months with completely different values than the ones it predicts (so it does not copy the 12 months I am giving), but the forecasted values are shifts of actual ones (which have never been seen).
Below is an example:
My code source is from here:
Below is my code:
#train/test splitting
split_position=int(len(scaled_data)*0.8)# 8 years for training
train=scaled_data[0:split_position]
test=scaled_data[split_position:]
#print(train)
print('length of train=',len(train))
#print(test)
print('length of test=',len(test))
# split train and test data into yearly train/test sets (3d)[observation,year, month]
def split_data_yearly(train, test):
# restructure into windows of yearly data
train = array(split(train, len(train)/12))
test = array(split(test, len(test)/12))
return train, test
# evaluate one or more yearly forecasts against expected values
def evaluate_forecasts(actual, predicted):
scores = list()
# calculate an RMSE score for each day
for i in range(actual.shape[1]):
# calculate mse
mse = mean_squared_error(actual[:, i], predicted[:, i])
# calculate rmse
rmse = math.sqrt(mse)
# store
scores.append(rmse)
# calculate overall RMSE
s = 0
for row in range(actual.shape[0]):
for col in range(actual.shape[1]):
s += (actual[row, col] - predicted[row, col])**2
score = math.sqrt(s / (actual.shape[0] * actual.shape[1]))
################plot prediction vs actual###############################
predicted=predicted.reshape(predicted.shape[0],predicted.shape[1])
jump=12
inv_scores = list()
for i in range(len(predicted)):
sample_predicted = predicted[i,:]
sample_actual=actual[i,:]
#inverse normalization
sample_predicted_inv= scaler.inverse_transform(sample_predicted.reshape(-1, 1))
sample_actual_inv= scaler.inverse_transform(sample_actual.reshape(-1, 1))
#print(sample_actual_inv)
#print(data_sd[(split_position+(i*jump)-1):(split_position+(i*jump-1))+len(sample_actual_inv)])
#inverse differencing
s=numpy.array(smoothed).reshape(-1,1)
sample_actual_inv=sample_actual_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
sample_predicted_inv=sample_predicted_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
months=['August-'+str(19+i),'September-'+str(19+i),'October-'+str(19+i),'November-'+str(19+i),'December-'+str(19+i),'January-'+str(20+i),'February-'+str(20+i),'March-'+str(20+i),'April-'+str(20+i),'May-'+str(20+i),'June-'+str(20+i),'July-'+str(20+i)]
pyplot.plot( months,sample_actual_inv,'b-',label='Actual')
pyplot.plot(months,sample_predicted_inv,'--', color="orange",label='Predicted')
pyplot.legend()
pyplot.xticks(rotation=25)
pyplot.title('Encoder Decoder LSTM Prediction', y=1.08)
pyplot.show()
################### determine RMSE after inversion ################################
mse = mean_squared_error(sample_actual_inv, sample_predicted_inv)
rmse = math.sqrt(mse)
inv_scores.append(rmse)
return score, scores,inv_scores
# summarize scores
def summarize_scores(name, score, scores):
s_scores = ', '.join(['%.1f' % s for s in scores])
print('%s: [%.3f] %s' % (name, score, s_scores))
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=12):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
X.append(data[in_start:in_end, :])
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
#take portion for validation
val_size=12;
test_x,test_y=train_x[-val_size:], train_y[-val_size:]
train_x,train_y=train_x[0:-val_size],train_y[0:-val_size]
# define parameters
verbose, epochs, batch_size = 1,25, 8
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
#sgd = optimizers.SGD(lr=0.004, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse', optimizer='adam')
# fit network
train_history= model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_x, test_y),verbose=verbose)
loss = train_history.history['loss']
val_loss = train_history.history['val_loss']
pyplot.plot(loss)
pyplot.plot(val_loss)
pyplot.legend(['loss', 'val_loss'])
pyplot.show()
return model
# make a forecast
def forecast(model, history, n_input):
# flatten data
data = array(history)
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
# retrieve last observations for input data
input_x = data[-n_input:, :]
# reshape into [1, n_input, n]
input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
# forecast the next year
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]
return yhat
# evaluate a single model
def evaluate_model(train, test, n_input):
# fit model
model = build_model(train, n_input)
# history is a list of yearly data
history = [x for x in train]
# walk-forward validation over each year
predictions = list()
for i in range(len(test)):
# predict the year
yhat_sequence = forecast(model, history, n_input)
# store the predictions
predictions.append(yhat_sequence)
# get real observation and add to history for predicting the next year
history.append(test[i,:])
# evaluate predictions days for each year
predictions = array(predictions)
score, scores, inv_scores = evaluate_forecasts(test[:, :, 0], predictions)
return score, scores,inv_scores
# split into train and test
train, test = split_data_yearly(train, test)
# evaluate model and get scores
n_input = 12
score, scores, inv_scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('lstm', score, scores)
print('RMSE score after inversion:',inv_scores)
# plot scores
months=['July','August','September','October','November','December','January','February','March','April','May','June']
#pyplot.plot(months, scores, marker='o', label='lstm')
#pyplot.show()
Differencing is the key here!
After further investigation, I found out that my model produces values that is almost zero before differencing (not learning).....When I invert the differencing, I am adding zero to the actual value in the previous timestep, which results in the shifted pattern above.
Therefore, I need to tune my LSTM model to make it learn or maybe remove the zeros part in the data itself since I have many of those.
I'm new to machine learning and deep learning and I'm trying to classify texts from 5 categories using neural networks. For that, I made a dictionary in order to translate the words to indexes, finally getting an array with lists of indexes. Moreover I change the labels to integers. I also did the padding and that stuff. The problem is that when I fit the model the accuracy keeps quite low (~0.20) and does not change across the epochs. I have tried to change a lot of params, like the size of the vocabulary, number of neurones, dropout probability, optimizer parameter, etc. The key parts of the code are below.
# Arrays with indexes (that works fine)
X_train = tokens_to_indexes(tokenized_tr_mrp, vocab, return_vocab=False)
X_test, vocab_dict = tokens_to_indexes(tokenized_te_mrp, vocab)
# Labels to integers
labels_dict = {}
labels_dict['Alzheimer'] = 0
labels_dict['Bladder Cancer'] = 1
labels_dict['Breast Cancer'] = 2
labels_dict['Cervical Cancer'] = 3
labels_dict['Negative'] = 4
y_train = np.array([labels_dict[i] for i in y_tr])
y_test = np.array([labels_dict[i] for i in y_te])
# One-hot encoding of labels
from keras.utils import to_categorical
encoded_train = to_categorical(y_train)
encoded_test = to_categorical(y_test)
# Padding
max_review_length = 235
X_train_pad = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test_pad = sequence.pad_sequences(X_test, maxlen=max_review_length)
# Model
# Vocab size
top_words = len(list(vocab_dict.keys()))
# Neurone type
rnn = LSTM
# dropout
set_dropout = True
p = 0.2
# embedding size
embedding_vector_length = 64
# regularization strength
L = 0.0005
# Number of neurones
N = 50
# Model
model = Sequential()
# Embedding layer
model.add(Embedding(top_words,
embedding_vector_length,
embeddings_regularizer=regularizers.l1(l=L),
input_length=max_review_length
#,embeddings_constraint=UnitNorm(axis=1)
))
# Dropout layer
if set_dropout:
model.add(Dropout(p))
# Recurrent layer
model.add(rnn(N))
# Output layer
model.add(Dense(5, activation='softmax'))
# Compilation
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['Accuracy'])
# Split training set for validation
X_tr, X_va, y_tr_, y_va = train_test_split(X_train_pad, encoded_train,
test_size=0.3, random_state=2)
# Parameters
batch_size = 50
# N epochs
n_epocas = 20
best_val_acc = 0
best_val_loss = 1e20
best_i = 0
best_weights = []
acum_tr_acc = []
acum_tr_loss = []
acum_val_acc = []
acum_val_loss = []
# Training
for e in range(n_epocas):
h = model.fit(X_tr, y_tr_,
batch_size=batch_size,
validation_data=(X_va, y_va),
epochs=1, verbose=1)
acum_tr_acc = acum_tr_acc + h.history['accuracy']
acum_tr_loss = acum_tr_loss + h.history['loss']
val_acc = h.history['val_accuracy'][0]
val_loss = h.history['val_loss'][0]
acum_val_acc = acum_val_acc + [val_acc]
acum_val_loss = acum_val_loss + [val_loss]
# if val_acc > best_val_acc:
if val_loss < best_val_loss:
best_i = len(acum_val_acc)-1
best_val_acc = val_acc
best_val_loss = val_loss
best_weights = model.get_weights().copy()
if len(acum_tr_acc)>1 and (len(acum_tr_acc)+1) % 1 == 0:
if e>1:
clear_output()
The code you posted is really bad practice.
You can either train for n_epocas using your current method and add callbacks to get the best weights (ex ModelCheckpoint) or use tf.GradientTape but using model.fit() for one epoch at a time can lead to weird results, since your optimizer doesn't know which epoch it is at.
I suggest keeping your current code but training for n_epocas all in one go and report the results here (accuracy + loss).
Someone gave me the solution. I just had to change this line:
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['Accuracy'])
For this:
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['acc'])
I also changed the lines in the final loop relating to accuracy. The one-hot encoding was necessary as well.
My model weights (I output them to weights_before.txt and weights_after.txt) are precisely the same before and after the training, i.e. the training doesn't change anything, there's no fitting happening.
My data look like this (I basically want the model to predict the sign of feature, result is 0 if feature is negative, 1 if positive):
,feature,zerosColumn,result
0,-5,0,0
1,5,0,1
2,-3,0,0
3,5,0,1
4,3,0,1
5,3,0,1
6,-3,0,0
...
Brief summary of my approach:
Load the data.
Split it column-wise to x (feature) and y (result), split these two row-wise to test and validation sets.
Transform these sets into TimeseriesGenerators (not necessary in this scenario but I want to get this setup working and I don't see any reason why it shouldn't).
Create and compile simple Sequential model with few Dense layers and softmax activation on its output layer, use binary_crossentropy as loss function.
Train the model... nothing happens!
Complete code follows:
import keras
import pandas as pd
import numpy as np
np.random.seed(570)
TIMESERIES_LENGTH = 1
TIMESERIES_SAMPLING_RATE = 1
TIMESERIES_BATCH_SIZE = 1024
TEST_SET_RATIO = 0.2 # the portion of total data to be used as test set
VALIDATION_SET_RATIO = 0.2 # the portion of total data to be used as validation set
RESULT_COLUMN_NAME = 'feature'
FEATURE_COLUMN_NAME = 'result'
def create_network(csv_path, save_model):
before_file = open("weights_before.txt", "w")
after_file = open("weights_after.txt", "w")
data = pd.read_csv(csv_path)
data[RESULT_COLUMN_NAME] = data[RESULT_COLUMN_NAME].shift(1)
data = data.dropna()
x = data.ix[:, 1:2]
y = data.ix[:, 3]
test_set_length = int(round(len(x) * TEST_SET_RATIO))
validation_set_length = int(round(len(x) * VALIDATION_SET_RATIO))
x_train_and_val = x[:-test_set_length]
y_train_and_val = y[:-test_set_length]
x_train = x_train_and_val[:-validation_set_length].values
y_train = y_train_and_val[:-validation_set_length].values
x_val = x_train_and_val[-validation_set_length:].values
y_val = y_train_and_val[-validation_set_length:].values
train_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_train,
y_train,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
val_gen = keras.preprocessing.sequence.TimeseriesGenerator(
x_val,
y_val,
length=TIMESERIES_LENGTH,
sampling_rate=TIMESERIES_SAMPLING_RATE,
batch_size=TIMESERIES_BATCH_SIZE
)
model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation='relu', input_shape=(TIMESERIES_LENGTH, 1)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(1, activation='softmax'))
for item in model.get_weights():
before_file.write("%s\n" % item)
model.compile(
loss=keras.losses.binary_crossentropy,
optimizer="adam",
metrics=[keras.metrics.binary_accuracy]
)
history = model.fit_generator(
train_gen,
epochs=10,
verbose=1,
validation_data=val_gen
)
for item in model.get_weights():
after_file.write("%s\n" % item)
before_file.close()
after_file.close()
create_network("data/sign_data.csv", False)
Do you have any ideas?
The problem is that you are using softmax as the activation function of last layer. Essentially, softmax normalizes its input to make the sum of the elements to be one. Therefore, if you use it on a layer with only one unit (i.e. Dense(1,...)), then it would always output 1. To fix this, change the activation function of last layer to sigmoid which outputs a value in the range (0,1).
I am working on a path prediction problem where I am predicting the path (Latitude, Longitude) one time step ahead. I have the path data for nearly 1500 "events", which I use for training the LSTM model. For training, since I know the path a priori, I shift the time series by one step, and use it as a target vector. For example:
Event 1:
Lat (t), Lon (t) --> Lat (t+1), Lon (t+1)
Lat (t+1), Lon (t+1) --> Lat (t+2), Lon (t+2)
However, for testing since the path is not known a priori, I take the trained LSTM model, and predict one time step at a time, and feed the predicted value as an input for the next time step. Below are the snippets from my code-
# Extract only the Lat, Lon values to arrays
train_full = train_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
test_full = test_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
print('train_full.shape = ', train_full.shape)
print('test_full.shape = ', test_full.shape)
# Separate the Inputs and Targets
x_train_full = train_full[:,0:2]
y_train_full = train_full[:,2:4]
x_test_full = test_full[:,0:2]
y_test_full = test_full[:,2:4]
# Defining the LSTM model
model = Sequential()
model.add(LSTM(40,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(20,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(Dense(2))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.summary()
epochs = 50
for i in range(epochs):
print ('Running Epoch No: ', i)
for stormID, data in train_df.groupby('EventID'):
train = data[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']]
train = train.values
x_train = np.expand_dims(train[:,0:2], axis=0)
y_train = np.expand_dims(train[:,2:4], axis=0)
#print (x_train.shape, y_train.shape)
model.train_on_batch(x_train,y_train)
model.reset_states()
print('Model training done.....')
#Use the optimized weights to estimate target values for training data
train_pred = new_model.predict_on_batch(np.expand_dims(train_df[['LatNor','LonNor']].values, axis=0))
train_pred_val = x_scaler.inverse_transform(train_pred[0])
The model trains well (see plots below)
enter image description here
enter image description here
When I use the trained model, and do a predict_on_batch on the test data, it works great. But, in reality we would not know the time series ahead of time. So, when I predict one instance at a time for the test set, and use it as an input to for the next time step, it is not working well. I suspect I am missing something, and changing the state/weights of the trained network whenever I make a predict call to the network.
x_values = TestDF[['LatNor','LonNor']].values
x_values_scaled = x_values
start = x_values_scaled[0,:]
startX = start.reshape(1,1,2)
Results = np.empty(())
Results = x_scaler.inverse_transform(startX[0])
for i in range(x_values.shape[0]):
nextLoc = model.predict(startX)
nextLoc_rescaled = x_scaler.inverse_transform(nextLoc[0])
Results = np.vstack((Results,nextLoc_rescaled))
startX = nextLoc
Any thoughts or recommendations?