LSTM/GRU TImeSeries multioutput strategy forecasts give dropped values

LSTM/GRU TImeSeries multioutput strategy forecasts give dropped values - python

Currently, I'm playing with Stocks Predictions task which I try to solve using LSTM/GRU.
Problem: After training LSTM/GRU I get huge drop predicted values
Model training process
Train, test data is simply generated using pd.shift in series_to_supervised function.
df['Mid'] = df['Low'] + df['High'] / 2
n_lag = 1 # Lag columns back
n_seq = 1*50 # TimeSteps to predict
seq_col = 'Mid'
seq_col_t = f'{seq_col}(t)'
split_date = '2018-01-01'
def series_to_supervised(data: pd.DataFrame,
seq_col: str,
n_in: int = 1,
n_out: int = 1,
drop_seq_col: bool = True,
dropna: bool = True):
"""Convert time series into supervised learning problem
{input sequence, forecast sequence}
"""
# input sequence (t-n, ... t-1) -> pisitive shift
for i in range(n_in, 0, -1):
data[f'{seq_col}(t-{i})'] = data[seq_col].shift(i)
# no sequence -> no shift
data[f'{seq_col}(t)'] = data[seq_col]
for i in range(1, n_out+1):
# forecast sequence (t, t+1, ... t+n) -> negative shift
data[f'{seq_col}(t+{i})'] = data[seq_col].shift(-i)
if drop_seq_col:
data = data.drop(seq_col, axis=1)
if dropna:
data.dropna(inplace=True)
return data
df = series_to_supervised(df, seq_col=seq_col, n_in=n_lag, n_out=n_seq)
mask = df.index < split_date
train, test = df[mask], df[~mask]
X_cols = ['Mid(t-1)']
y_cols = train.filter(like='Mid(t+').columns
X_train, y_train, X_test, y_test = train[X_cols], train[y_cols], test[X_cols], test[y_cols]
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))
# also returns np.ndarray
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
y_train = y_train.values
y_test = y_test.values
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, GRU
from keras.optimizers import Adam, RMSprop, Adamax
from keras.callbacks import ModelCheckpoint
def get_model(X, y, n_batch):
num_classes=y.shape[1]
# design network
model = Sequential()
# For Stock Predictions has to be used LSTM stateful=True
model.add(GRU(10, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dropout(0.3))
model.add(Dense(num_classes))
opt = Adam(learning_rate=0.01)
# opt = RMSprop(learning_rate=0.001)
model.compile(loss='mean_squared_error', optimizer=opt)
return model
def reshape_batch(X_train, y_train, X_test, y_test, n_batch):
# reshape training into [samples, timesteps, features]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
# cut to equally divided n_batches (without reminder).
# needed for LSTM stateful=True
train_cut = X_train.shape[0] % n_batch
test_cut = X_test.shape[0] % n_batch
if train_cut > 0:
X_train = X_train[:-train_cut]
y_train = y_train[:-train_cut]
if test_cut > 0:
X_test = X_test[:-test_cut]
y_test = y_test[:-test_cut]
return X_train, y_train, X_test, y_test
# fit an LSTM network to training data
def fit_lstm(X_train: np.ndarray,
y_train: np.ndarray,
n_lag: int,
n_seq: int,
n_batch: int,
nb_epoch: int,
X_test: np.ndarray=None,
y_test: np.ndarray=None):
model = get_model(X_train, y_train, n_batch)
# fit network
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), callbacks=None,
epochs=nb_epoch, batch_size=n_batch, verbose=1, shuffle=False)
print('Predict:', model.predict(X_test, batch_size=n_batch))
model.reset_states()
return model, history
n_batch = 32
nb_epoch = 40
X_train, y_train, X_test, y_test = reshape_batch(X_train, y_train, X_test, y_test, n_batch)
model, history = fit_lstm(X_train, y_train, n_lag, n_seq, n_batch, nb_epoch, X_test=X_test, y_test=y_test)
What I Have tried
Different optimizers (kinda all available in keras)
DIfferent recurrent network structures (GRU/LSTM)
Different learning rates
Different epochs from 1 to 1500
Adding/Removing Drop layers with different params (0.1-0.7)
Different LSTM/GRU amount of neurons (1-100)
Number of LSTM/GRU layers, via return_sequences params with more Drop layers.
Different number of forecasts(t+1,t+2 ... t+n) features from 1-365
Different number of lag (t-1, t-2, t-n ...) features from 1-5
Different scale normalization borders (0,1) and (-1,1)
Different n_batch values: 1,8,16,32
What can affect LSTM/GRU give so strange behaviour? And What else should I try to make it work the normal way?

Related

Pytorch features and classes from .npy files

I am very rookie in moving from TensorFlow to Pytorch. In tensorflow, I can simply load features and labels from separate .npy files and train a CNN using them. It is simple as below:
def finetune_resnet(file_train_classes, file_train_features, name_model_to_save):
#Lets load features and classes first
print("Loading, organizing and pre-processing features")
num_classes = 12
x_train=np.load(file_train_features)
y_train=np.load(file_train_classes)
#Defining train as 70% and validation 30% of the data
#The partition is stratified with a fixed random state
#Therefore, for all networks, the partition will be the same
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train, test_size=0.30, stratify=y_train, random_state=42)
print("transforming to categorical")
y_train = to_categorical(y_train, num_classes)
y_validation = to_categorical(y_validation, num_classes)
y_train= tf.constant(y_train, shape=[y_train.shape[0], num_classes])
y_validation= tf.constant(y_validation, shape=[y_validation.shape[0], num_classes])
print("preprocessing data")
#Preprocessing data
x_train = x_train.astype('float32')
x_validation=x_validation.astype('float32')
x_train /= 255.
x_validation /= 255.
print("Setting up the network")
#Parameters for network training
batch_size = 32
epochs=300
sgd = SGD(lr=0.01)
trainAug = ImageDataGenerator(rotation_range=30,zoom_range=0.15,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.15,horizontal_flip=True,fill_mode="nearest")
print("Compiling the network")
#Load model and prepare it for fine tuning
baseModel = ResNet50(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(512, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(num_classes, activation="softmax")(headModel)
# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
model.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])
trainAug.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
print("[INFO] training head...")
H=model.fit(trainAug.flow(x_train, y_train, batch_size=batch_size), steps_per_epoch=x_train.shape[0] // batch_size, epochs=epochs, validation_data=(x_validation, y_validation), callbacks=callbacks)
However, I have no idea how to load, train and evaluate training and testing data if loading these data from .npy files. I checked a tutorial that loads training data from folders, which is not what I want.
How can I train and test a RESNET-50 model starting with imagenet weights loading train and test data from .npy files with Pytorch?
P.s: most of Pytorch training loops require <class 'torch.utils.data.dataloader.DataLoader'> inputs to train. Is that possible to transform my training data in numpy arrays to such a format?
P.s= you can try with my data here

It seems like you need to create a custom Dataset.
class MyDataSet(torch.utils.data.Dataset):
def __init__(self, x, y):
super(MyDataSet, self).__init__()
# store the raw tensors
self._x = np.load(file_train_features)
self._y = np.load(file_train_classes)
def __len__(self):
# a DataSet must know it size
return self._x.shape[0]
def __getitem__(self, index):
x = self._x[index, :]
y = self._y[index, :]
return x, y
You can further use Dataset methods to split MyDataSet into train and validation (e.g., using torch.utils.data.random_split).
You might also find TensorDataset useful.

train test split is not splitting correctly

I am still a beginner in AI and deep learning but I wanted to test whether a neural network will be able to calculate the sum of two numbers so I generated a dataset of 5000 numbers and made test size = 0.3 so the training dataset will be equal to 3500 but what was weird that I found the model is training only on 110 input instead of 3500.
The code used:
import tensorflow as tf
from sklearn.model_selection import train_test_split
import numpy as np
from random import random
def generate_dataset(num_samples, test_size=0.33):
"""Generates train/test data for sum operation
:param num_samples (int): Num of total samples in dataset
:param test_size (int): Ratio of num_samples used as test set
:return x_train (ndarray): 2d array with input data for training
:return x_test (ndarray): 2d array with input data for testing
:return y_train (ndarray): 2d array with target data for training
:return y_test (ndarray): 2d array with target data for testing
"""
# build inputs/targets for sum operation: y[0][0] = x[0][0] + x[0][1]
x = np.array([[random()/2 for _ in range(2)] for _ in range(num_samples)])
y = np.array([[i[0] + i[1]] for i in x])
# split dataset into test and training sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size)
return x_train, x_test, y_train, y_test
if __name__ == "__main__":
# create a dataset with 2000 samples
x_train, x_test, y_train, y_test = generate_dataset(5000, 0.3)
# build model with 3 layers: 2 -> 5 -> 1
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(5, input_dim=2, activation="sigmoid"),
tf.keras.layers.Dense(1, activation="sigmoid")
])
# choose optimiser
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
# compile model
model.compile(optimizer=optimizer, loss='mse')
# train model
model.fit(x_train, y_train, epochs=100)
# evaluate model on test set
print("\nEvaluation on the test set:")
model.evaluate(x_test, y_test, verbose=2)
# get predictions
data = np.array([[0.1, 0.2], [0.2, 0.2]])
predictions = model.predict(data)
# print predictions
print("\nPredictions:")
for d, p in zip(data, predictions):
print("{} + {} = {}".format(d[0], d[1], p[0]))

The 110/110 you are seeing in your image is actually the batch count, not the sample count. So 110 batches * the default batch size of 32 gives you ~3500 training samples, which matches what you'd expect as 70% of 5000.
You can see by backing into it the other way that the last batch would be a partial batch, since it's not evenly divisible by 32:
>>> (.7 * 5000) / 110
31.818181818181817
In neural networks, an epoch is a full pass over the data. It trains in small batches (also called steps), and this is the way Keras logs them.

Getting gradient of a Keras model output w.r.t input, but with the last layer being an SVM

I have a CNN model built in Keras. I then took out its last layer as a feature and retrained an SVM with it.
Is it possible to now find the gradient of the SVMs output wrt the CNN model's input?
I know of this method (Getting gradient of model output w.r.t weights using Keras) and am able to use it to get the gradient wrt input for the layer i am pulling the features out of. I can also get the numerical gradient of the SVM wrt to its input, albeit at the moment its a bit of a mess. Would appreciate some input here as well actually.
But now I need to somehow combine these two to get the gradient of the SVM to the input of the entire CNN model.
"""
Main CNN script
"""
# Imports ##
# general
import matplotlib.pyplot as plt
import numpy as np
# ML libraries
from tensorflow.keras.datasets import mnist
# ML utilities
from tensorflow.keras.utils import to_categorical
# Python scripts used
import train_CNN
import load_CNN
import train_subSVMs
import load_subSVMs
import train_finalSVM
import load_finalSVM
import joblib
def save_array(array, name):
joblib.dump(array, name+'.pkl', compress = 3)
return
def load_array(array, name):
array = joblib.load(array, name)
return array
def show_data_example(i, dataset):
# show some of the images in the dataset
# call multiple times for multiple images
# squeeze is necessary here to get rid of the extra dimension introduced in rehsaping
print('\nExample Image: %s from selected dataset' %i)
plt.imshow(np.squeeze(dataset[i]), cmap=plt.get_cmap('gray'))
plt.show()
return
def load_and_encode(target_shape):
# load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, y_train = X_train[:,:,:],y_train[:]
X_test, y_test = X_test[:,:,:], y_test[:]
print('Loaded Mnist dataset')
print('Train: X=%s, y=%s' % (X_train.shape, y_train.shape))
print('Test: X=%s, y=%s' % (X_test.shape, y_test.shape))
# encode y data
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# normalise X data (X/255 -> [0,1])
X_train = X_train/255.0
X_test = X_test/255.0
# currently dimensions are (m x 28 x 28)
# making them into (m x 28x28x1) 3Dimensional for convolution networks
X_train = X_train.reshape(X_train.shape[0], target_shape[0], target_shape[1], target_shape[2])
X_test = X_test.reshape(X_test.shape[0], target_shape[0], target_shape[1], target_shape[2])
# show an arbitary example image from training set
show_data_example(12, X_train)
return X_train, y_train, X_test, y_test
image_shape = (28,28,1)
# load and encode mnist data
X_train, y_train, X_test, y_test = load_and_encode(image_shape)
# hyper-parameters
learning_rate = 0.1
momentum = 0.9
dropout = 0.5
batch_size = 128
epochs = 50
decay = 1e-6
number_of_classes = 10
# store required data into a packet to send to various imports
packet = [learning_rate, momentum, dropout, batch_size, epochs, decay,
number_of_classes, image_shape,
X_train, y_train, X_test, y_test]
data = [X_train, y_train, X_test, y_test]
#CNN_model = train_CNN.train_model(packet, save_model = 'True')
CNN_model = load_CNN.load_model(packet) # keras sequential model
#subSVM1, subSVM2, subSVM3, features = train_subSVMs.train(CNN_model, data, c=0.1, save_model = 'True', get_accuracies= 'True')
subSVM1, subSVM2, subSVM3, features = load_subSVMs.load(CNN_model, data, c=0.1, get_accuracies='False')
subSVMs = [subSVM1, subSVM2, subSVM3]
feature1_train, feature1_test,\
feature2_train, feature2_test,\
feature3_train, feature3_test = features
final_SVM = joblib.load('saved_finalSVM.pkl') # sklearn svm trained from features
NUMBER = 48
plt.imshow(np.squeeze(X_train[NUMBER,:,:,:]), cmap=plt.get_cmap('binary'))
# gradients of features wrt to input
import tensorflow.keras.backend as K
gradients = K.gradients(CNN_model.get_layer(name='feature1').output, CNN_model.input) # K.gradients(y,x) for dy/dx
f = K.function([CNN_model.input], gradients)
x = np.expand_dims(X_train[NUMBER,:,:,:],axis=0)
a=f([x])

Neural network does not predict properly with Count Vectorizer

I'm trying to do a Sentiment Analysis prediction using the text and the scores of random IMDB reviews. I turned all the words into a Bag Of Words and put it all in a neural network. The prediction however does not seem to be correct and it always shows a 50% positive and a 50% negative prediction for anything that I type as a review.
reviews = pd.read_csv('reviews.txt', header=None)
labels = pd.read_csv('labels.txt', header=None)
Y = (labels=='positive').astype(np.int_)
print(type(reviews))
print(reviews.head())
print(labels.head())
#Split into train/test
x_train, x_test, y_train, y_test = train_test_split(reviews,Y)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train)
#min_df = 19 seems to be the first number that fills all 10 000 entries - thus the 10 most commonly used words
vect = CountVectorizer(min_df=19, max_features=10000)
fitter = vect.fit(x_train[0])
X_train = fitter.transform(x_train[0])
X_test = fitter.transform(x_test[0])
X_val = fitter.transform(x_val[0])
print("Vocabulary size: {}".format(len(vect.vocabulary_)))
feature_names = vect.get_feature_names()
print("Number of features: {}".format(len(feature_names)))
print("Vocabulary content:\n {}".format(fitter.vocabulary_))
X_train = pad_sequences(X_train.toarray(), maxlen=100, value=0.)
X_test = pad_sequences(X_test.toarray(), maxlen=100, value=0.)
X_val = pad_sequences(X_val.toarray(), maxlen=100, value=0.)
Y_train = to_categorical(y_train, 2)
Y_test = to_categorical(y_test, 2)
Y_val = to_categorical(y_val, 2)
tensorflow.reset_default_graph()
input_layer = tflearn.input_data(shape=[None, 100])
net = tflearn.embedding(input_layer, input_dim=10000, output_dim=128)
hid = tflearn.fully_connected(input_layer, 10, activation='tanh') # a hidden layer with 10 neurons
output_layer = tflearn.fully_connected(hid, 2, activation='softmax')
sgd = tflearn.SGD(learning_rate=0.04, lr_decay=0.96, decay_step=1000)
net = tflearn.regression(output_layer, optimizer=sgd, loss='categorical_crossentropy')
model = tflearn.DNN(net, tensorboard_verbose=3, tensorboard_dir='tfdir')
try:
model.fit(X_train, Y_train, n_epoch=5, validation_set=(X_val, Y_val), batch_size=100, show_metric=True, run_id="Imdb")
except KeyboardInterrupt as e:
print("Stopped by user")
The training, validation and test accuracy is always ~0.65 at maximum no matter how much I tune the hyperparameters.
my_review = "This movie sucks"
my_review_enc = fitter.transform([my_review])
my_review_enc_pad = pad_sequences(my_review_enc.toarray(), maxlen=100, value=0.)
prediction = model.predict(my_review_enc_pad)
prediction
As you can see, the positive and negative prediction is always at 50%
What am I doing wrong?

Good accuracy on validation and test but bad predictions keras lstm

I'm having trouble with LSTM and Keras.
I try to predict normal/fake domain names.
My dataset is like this:
domain,fake
google, 0
bezqcuoqzcjloc,1
...
with 50% normal and 50% fake domains
Here's my LSTM model:
def build_model(max_features, maxlen):
"""Build LSTM model"""
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['acc'])
return model
then I preprocess my text data to transform it into numbers:
"""Run train/test on logistic regression model"""
indata = data.get_data()
# Extract data and labels
X = [x[1] for x in indata]
labels = [x[0] for x in indata]
# Generate a dictionary of valid characters
valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X)))}
max_features = len(valid_chars) + 1
maxlen = 100
# Convert characters to int and pad
X = [[valid_chars[y] for y in x] for x in X]
X = sequence.pad_sequences(X, maxlen=maxlen)
# Convert labels to 0-1
y = [0 if x == 'benign' else 1 for x in labels]
Then I split my data into training, testing and validation sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("Build model...")
model = build_model(max_features, maxlen)
print("Train...")
X_train, X_holdout, y_train, y_holdout = train_test_split(X_train, y_train, test_size=0.2)
And then I train my model on training data and validation data, and evaluate on test data:
history = model.fit(X_train, y_train, epochs=max_epoch, validation_data=(X_holdout, y_holdout), shuffle=False)
scores = model.evaluate(X_test, y_test, batch_size=batch_size)
At the end of my training/testing I have these results:
And these scores when evaluating on test dataset:
loss = 0.060554939906234596
accuracy = 0.978109902033532
However when I predict on a sample of the dataset like this:
LSTM_model = load_model('LSTMmodel_64_sgd.h5')
data = pickle.load(open('traindata.pkl', 'rb'))
#### LSTM ####
"""Run train/test on logistic regression model"""
# Extract data and labels
X = [x[1] for x in data]
labels = [x[0] for x in data]
X1, _, labels1, _ = train_test_split(X, labels, test_size=0.9)
# Generate a dictionary of valid characters
valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X1)))}
max_features = len(valid_chars) + 1
maxlen = 100
# Convert characters to int and pad
X1 = [[valid_chars[y] for y in x] for x in X1]
X1 = sequence.pad_sequences(X1, maxlen=maxlen)
# Convert labels to 0-1
y = [0 if x == 'benign' else 1 for x in labels1]
y_pred = LSTM_model.predict(X1)
I have very poor performance:
accuracy = 0.5934741842730341
confusion matrix = [[25201 14929]
[17589 22271]]
F1-score = 0.5780171295094731
Can someone explain to me why?
I have tried 64 instead of 128 for the LSTM node, adam and rmsprop for optimizers, increasing batch_size however performance remains very low.

Ok so I have found the answer.
This is this line
valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X1)))}
In Python 3 set seems to produce different results everytime a new python3 console is open.
So running the code in Python 2 has resolved my issues !

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

LSTM/GRU TImeSeries multioutput strategy forecasts give dropped values - python

Related

Pytorch features and classes from .npy files

train test split is not splitting correctly

Getting gradient of a Keras model output w.r.t input, but with the last layer being an SVM

Neural network does not predict properly with Count Vectorizer

Good accuracy on validation and test but bad predictions keras lstm

Categories

Resources