How to difference a data set with both continuous & categorical features

How to difference a data set with both continuous & categorical features - python

I have a minor doubt suppose my data has a continuous time series(non stationary) & other categorical variables (which are already encoded). Under that circumstance what is the best way to difference the data ? Because categorical data are not differenced but they have to be used while training the model.
I am trying to build an LSTM model so any help is much appreciated.
The data that I am currently using is at daily level & for the sake of illustration I have considered only a univariate scenario (ignoring the variable Var1 & applied differencing only to the variable "TS"
TS Var1
15000 1
14000 1
16000 0
raw_values = daily_data_dummies_V2.values
interval=1
diff = list()
for i in range(interval, len(raw_values)):
value = raw_values[i] - raw_values[i - interval]
diff.append(value)
# rescale values to -1, 1
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_values = scaler.fit_transform(diff)
X = entireData[:,:-1,:]
y = entireData[:,1:,:]
from keras.layers import merge, Input, Dense, TimeDistributed, Lambda
from keras.callbacks import LambdaCallback
from keras.layers import Dropout
# design network
model = Sequential()
model.add(LSTM(150, stateful=True, batch_input_shape=(1, None, 1),
return_sequences=True))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit network
model.fit(X, y, epochs=5000, batch_size=1, verbose=2)
previous_inputs=X
model.reset_states()
predictions = model.predict(previous_inputs)
Then I have predicted the entire data & would like to use the last prediction to predict ahead.In doing so I will have to revert the scaling & revert the differencing. Not sure how will I adjust the loop to do the same:
first, let set the model's states (it's important for it to know the previous trends)
predictions = model.predict(previous_inputs) #this creates states
#future predictions
future = []
currentStep = predictions[:,-1:,:] #last step from the previous
prediction
for i in range(31):
currentStep = model.predict(currentStep)
one=currentStep[0]
two=scaler.inverse_transform(one)
three=raw_values[-1]+two
future.append(three)

Related

Poor LSTM Performance with Keras on Time Series Data

I'm trying to construct a network that will predict a Boolean target.
The data provided to the network contains both categorical and numerical entries but has all ready been properly processed. The data I am working with is Time Series data with 84 fields and 310033 rows of data. All the data has been scaled to remain between 0 and 1. Ever row represents a second in the data.
I created a database, data, with a shape (310033, 60, 500) and the target vector is of shape (1000, 1). The Time Step dimension was defined to be 60 because that is the maximum full 60 min hours possible with the amount of data available.
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give a matrix like this to the lstm model and expect a good prediction (if the relationships are there)? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer. Which is what I have done here.
Below is the Transformation code from 2D to 3D:
X_train, X_test, y_train, y_test = train_test_split(scaled, target, train_size=.7, shuffle = False)
# Generate Lag time Steps 3D framework for LSTM - Currently in 2D Framework
# As required for LSTM networks, we must reshape the input data into N_samples x TimeSteps x Variables
hours = len(X_train)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_train) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_train = np.array(temp) #Export Train Features in (70% x Hours x Variables)
hours = len(X_test)/3600
hours = math.floor(hours) #Find Most full 60-min-hours available in subset of data
temp =[]
# Pull hours into the three dimensional field
for hr in range(hours, len(X_test) + hours):
temp.append(scaled[hr - hours:hr, 0:scaled.shape[1]])
X_test = np.array(temp) #Export Test Features in (30% x Hours x Variables)
Below is the Framework of the Model:
model = Sequential()
#Layer 1 - returns a sequence of vectors
model.add(LSTM(128, return_sequences=True,
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 2
model.add(LSTM(256, return_sequences=False))
model.add(Dropout(0.15)) #15% drop out layer
#model.add(BatchNormalization())
#Layer 3 - return a single vector
model.add(Dense(32))
#Output of 2 because we have 2 classes
model.add(Dense(2, activation= 'sigmoid'))
# Define optimiser
opt = tf.keras.optimizers.Adam(learning_rate=1e-5, decay=1e-6)
# Compile model
model.compile(loss='sparse_categorical_crossentropy', # Mean Square Error Loss = 'mse'; Mean Absolute Error = 'mae'; sparse_categorical_crossentropy
optimizer=opt,
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=epoch, batch_size=batch, validation_data=(X_test, y_test), verbose=3, shuffle=False)
I have experimented with many different frameworks for the LSTM. Single layer, multilayer, a double LSTM layer with 2 truncating Dense layers (LSTM -> LSTM -> Dense(32) -> Dense(2)), Batch normalization, etc...
Is there a suggested framework for this type of time series data to improve performance? I was getting better results when the data only had a single TimeStep = 1.

Why am I getting a constant loss and accuracy?

This is my code:-
# Importing the essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Getting the dataset
data = pd.read_csv("sales_train.csv")
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
# y = np.array(y).reshape(-1, 1)
# Getting the values for november 2013 and 2014 to predict 2015
list_of_november_values = []
list_of_november_values_y = []
for i in range(0, len(y)):
if X[i, 0] == 10 or X[i, 0] == 22:
list_of_november_values.append(X[i, 1:])
list_of_november_values_y.append(y[i])
# Converting list to array
arr_of_november_values = np.array(list_of_november_values)
y_train = np.array(list_of_november_values_y).reshape(-1, 1)
# Scaling the independent values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(arr_of_november_values)
# Creating the neural network
from keras.models import Sequential
from keras.layers import Dense
nn = Sequential()
nn.add(Dense(units=120, activation='relu'))
nn.add(Dense(units=60, activation='relu'))
nn.add(Dense(units=30, activation='relu'))
nn.add(Dense(units=15, activation='relu'))
nn.add(Dense(units=1, activation='softmax'))
nn.compile(optimizer='adam', loss='mse')
nn.fit(X_train, y_train, batch_size=100, epochs=25)
# Saving the weights
nn.save_weights('weights.h5')
print("Weights Saved")
For my loss, I am getting the same value for every epoch. Is it possible if there is a concept I am missing that is causing my loss to be constant??
Here is the dataset for the code.

The predominant reason is your odd choice of final-layer activation, paired with the loss function used. Reconsider this: you are using softmax activation on a single-unit fully-connected layer. Softmax activation takes a vector and scales it such that the sum of the values are equal to one and it retains proportion according to the following function:
The idea is that your network will only ever output 1, thus there are no gradients, and no learning.
To resolve this, first change your final layer activation to either ReLU or Linear, depending upon the structure of your dataset (I'm not willing to use the provided data myself, but I'm sure you understand the structure of your dataset).
I expect there may be further issues regarding the structure of your network, but I'll leave that up to you. For now, the big issue is your final-layer activation.

Change this line:
nn.add(Dense(units=1, activation='softmax'))
To this line:
nn.add(Dense(units=1))
For a regression problem, you don't need an activation function.

Regression with LSTM - python and Keras

I am trying to use a LSTM network in Keras to make predictions of timeseries data one step into the future. The data I have is of 5 dimensions, and I am trying to use the previous 3 periods of readings to predict the a future value in the next period. I have normalised the data and removed all NaN etc, and this is the code I am trying to use to train the network:
def Network_ii(IN, OUT, TIME_PERIOD, EPOCHS, BATCH_SIZE, LTSM_SHAPE):
length = len(OUT)
train_x = IN[:int(0.9 * length)]
validation_x = IN[int(0.9 * length):]
train_y = OUT[:int(0.9 * length)]
validation_y = OUT[int(0.9 * length):]
# Define Network & callback:
train_x = train_x.reshape(train_x.shape[0],3, 5)
validation_x = validation_x.reshape(validation_x.shape[0],3, 5)
model = Sequential()
model.add(LSTM(units=128, return_sequences= True, input_shape=(train_x.shape[1],3)))
model.add(LSTM(units=128))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
train_y = np.asarray(train_y)
validation_y = np.asarray(validation_y)
history = model.fit(train_x, train_y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(validation_x, validation_y))
# Score model
score = model.evaluate(validation_x, validation_y, verbose=0)
print('Test loss:', score)
# Save model
model.save(f"models/new_model")
I am attempting to roughly follow the steps outlined here- https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
However, no matter what adjustments I have made in terms of changing the number of dimensions used to train the network or the length of the time period I cannot get the output of the model to give predictions that are not either 1 or 0. This is even though the target data, in the array 'OUT' is made up of data continuous on [0,1].
I think there may be something wrong with how I am setting up the .Sequential() function, but I cannot see what to adjust. I am relatively new to this so any help would be greatly appreciated.

You are probably using a prediction function that is not the standard. Maybe you are using predict_classes?
The one that is well documented and the standard is model.predict.

How to deal with situation where LSTM fails to learn (constantly makes the same incorrect prediction)

I am trying to use LSTM neural networks in order to make a song composer. Basically this is based of a text generator (tries to predict the next character after looking at a sequence of characters) but instead of characters, it tried to predict notes.
Structure of the midi file that serves as the input (Y-axis is the pitch or note value while X-axis is time):
And this is the predicted note values:
I set an epoch of 50, but it seems that the LSTM's loss rate does not decrease, most of the time its loss rate does not improve.
I suspect this is because there is an overwhelming number of a particular note (in this case, note value 65) which makes the LSTM lazy during training phase and predict 65 each and every time.
I feel like this is a common problem among LSTMs and time-series based learning algorithms. How would I solve a problem like this? If what I mentioned is not the problem, then what is the problem and how do I solve that?
Here is the code I am using to train if you need it:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
seq_length = 100
read_path = '../matrices/input/world-is-mine/world-is-mine-y-0.npy'
raw_text = numpy.load(read_path)
# create mapping of unique chars to integers, and a reverse mapping
chars = sorted(list(set(raw_text)))
char_to_int = dict((c,i) for i,c in enumerate(chars))
n_chars = len(raw_text)
n_vocab = len(chars)
# prepare the dataset of input to output pairs encoded as integers
dataX = []
dataY = []
# dataX is the encoding version of the sequence
# dataY is an encoded version of the next prediction
for i in range(0, n_chars - seq_length, 1):
seq_in = raw_text[i:i + seq_length]
seq_out = raw_text[i+seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print "Total Patterns: ", n_patterns
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length,1))
# normalize
X = X/float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
print 'X: ', X.shape
print 'Y: ', y.shape
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
#model.add(Dropout(0.05))
model.add(LSTM(256))
#model.add(Dropout(0.05))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# There is no test dataset. We are modeling the entire training dataset to learn the probability of each character in a sequence.
# We are interested in a generalization of the dataset that minimizes the chosen loss function
# We are seeking a balance between generalization of the dataset and overfitting but short of memorization
# define the check point
filepath="../checkpoints/weights-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(X,y, nb_epoch=50, batch_size=64, callbacks=callbacks_list)

I have no experience on working with music data. From my experience with text data, this seems like a under-fitted model. Increasing the training dataset with different note value should overcome the underfitting problem. It seems like the training examples are not enough for learning the note variation. For example, for char language model, 1 MB data is too small for training a reasonable LSTM model. Also, try to train with smaller sequence length (let's say with 20) first. Smaller sequence length will be easier to learn than the longer one, with limited training data.

How to decide the size of layers in Keras' Dense method?

Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?

Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.

There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.

My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to difference a data set with both continuous & categorical features - python

Related

Poor LSTM Performance with Keras on Time Series Data

Why am I getting a constant loss and accuracy?

Regression with LSTM - python and Keras

How to deal with situation where LSTM fails to learn (constantly makes the same incorrect prediction)

How to decide the size of layers in Keras' Dense method?

Categories

Resources