Neural Network model validation accuracy and tranning accuracy never change - python

The Task:
So me and a friend have been tasked to make a NeuralNet for a old Kaggle competition which is a regression problem as you have to predict the sales of the shop for upto x amount of months which is Here.
The Dataset:
We've decided to use Keras and Pandas to preprocess the train dataset, The train dataset has the following fields
(The text inside parenthesis are NOT in the dataset but only to show you the range and data type)
Store(1-1115) DayOfWeek(1-7) Date(YYYY-MM-DD) Sales(int) Customers(int) Open(0 OR 1) Promo(0 OR 1) StateHoliday(0,a,b,c) SchoolHoliday(0 OR 1)
How We Process/Make The Datasets:
Train.csv
Y:
So we decided to delete 'Sales' from the trainning dataset and to use it as Y, the target which is where I come to the first question. Should we normalise the Y/Sales target data?
X:
Now We process the rest of the trainning data as follows.
We remove the date as we thought it would not provide much use
especially if we put the data in order.
We have changed State Holiday to a (0 or 1) binary value to state weather it is/isn't a State Holiday. We have also changed it to 0.25/0.5/0.75/1.0 to represent all the State Holidays, we did this in a different iteration of the dataset (the different iteration part is important later)
We then normalised Customers to between 0-1 just like the sales
We also removed School Holiday as we thought it didnt hold a strong enough value to the sales target
We also removed Open as if the shop was shut it got no Sales and just didnt make sense to keep
We also got rid of Day Of Week in One iteration of the dataset and kept it in another
Now my friend is much more of a mathematician than I am, I'm more of a Computer Scientist. So using another dataset which was also in the competition called Store.csv and it looks like the following: Store.csv
Store(1-1115) StoreType(a-d) Assortment(a-c) CompetitionDistance(int Meters) CompetitionOpenSinceMonth(int M) CompetitionOpenSinceYear(int YYYY) Promo2(0 OR 1) Promo2SinceWeek(int) Promo2SinceYear(YYYY) PromoInterval(Str months seperated by csv)
Now my friend wrote a distance function to make N clusters for our trainning dataset, the function is as follows(We settled on 5 clusters):
I put the most similiar together based on customer distance store type
assortment and promo
We then have 5 datasets which are split by our distance function.
We now 5 datasets and we remove the following:
Clusters
Store
Sales - save this as Y/Target
So now our final dataset looks like the following:
DayOfWeek(1-7) Customers(normalised) Promo(0 OR 1) StateHoliday(0 OR 1)
The Problem
So far when we train our the Neural Network the accuracy and validation accurcy of our network just converges to a number and does not budge, no matter what epoch or learning rates or momentum we set. We have tried differnt clusters that still converge to different numbers. We made one big dataset with all the clusters and the numebrs still converge.
We made a big datasete with no cluster seperation and the accuracy and validation accuracy just reaches a fixed number after 2/3/4 epochs and never changes.
the network we made looks like the following:
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(3, activation='relu'))
model.add(Dense(1, activation='relu'))
# model.add(Dense(1, activation='linear'))
model.compile(loss="MSE", optimizer="nadam", metrics=['accuracy'])
model.fit(x=xt.as_matrix(), y=yt.as_matrix(), validation_data=(xv.as_matrix(), yv.as_matrix()), epochs=5000)
print("stop")
We'v even tried adding removing a few features and the same problem still happens.
Questions
Why is the numbers converging no matter it we change the dataset?
Could this be a network shape/activation function/loss/optimizer
issue?
Is the problem with the dataset?
Is the clustering idea a good idea to pre process the data?
How would you process this data to get good results?
What type of layers/neurons would be appropriate?
Should we normalize the data?

Related

Why is my Keras model predicting trend but not scale?

I'm making a model to predict a the irradiance value on a solar field. The thing is that my model, despite being very simple (added code below), performs very well. The problem is that for any reason, it predicts a different scale, giving almos always lower values but in the same trend. I have appended the plot which compares both outputs and real data, in train and test set. Also linked the dataset.
Some details: The model has a total of 24 columns which correspond to 24 pyranometers which are the ones that gives information about the sun. The model has just been trained with the first one for simplicity, therefore with more data we can achieve better performance. Also, I'm processing my data to have a 15 steps back in time and a predict window of 20 steps forward.
input = Input((LAG,1)) # LAG is the number of steps I take backward
hidden = LSTM(32, return_sequences=True)(input)
output = Dense(1, activation='linear')(hidden)
model = Model(input, output)
Dataset
Model output vs real in train set
Model output vs real in test set

Keras LSTM model only predicts in one direction for a one step ahead prediction

I have an interesting problem with the predictions of my keras LSTM model. The goal of my model is to predict the changes in high and low prices for a selected FX rate pair based on the following variables: open, high, low, close and volume. My data frame looks as follows:
data set
Before training my model, I applied the following data preparation steps to my data frame:
Calculated the percentage changes from one point in time to the next one (current - previous)/previous for all columns shown in the data frame above.
Applied zero-mean normalization for each column on all columns.
Used the MinMaxScaler to transform the data in each column between [-1,1].
As I wanted to train an LSTM model, I have also applied a lookback window to the data frame (see code below).
df_all_lookback_x = {}
for i in range(0,window_size+1):
if(i==0):
df_all_lookback_x = df_x
else:
df_all_lookback_x = pd.concat([df_x.shift(i),df_all_lookback_x], axis=1)
The model itself was then trained as shown below:
model = Sequential()
model.add(LSTM(100, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(2, activation='tanh'))
model.compile(loss='mean_squared_error', optimizer='adam')
If I then use the model to predict on train + test data, both: the percentage changes for low and high prices do not make sense to me. It seems like for the predicted high percentage price changes nearly all values are to low (forming a line at the mean) and for the low percentage price changes it is the opposite. Please find the pictures below.
Prediction vs Real
Prediction vs Real
Has anyone experienced a similar problem and would be able to advise me on this issue?
Things already tried out:
Used a different number of features as input for the model.
Changed model parameters: lookback window, epochs, dropout rate, LSTM model size
Changed input for model: removed normalization steps
Used different loss functions, optimizers
Tanh, linear, relu as activation functions
Tried to predict only one output variable instead of both, the high + low percentage price changes.

Tensorflow accuracy stuck at 25% for 4 labels, text classification

The accuracy starts off at around 40% and drops down during one epoch to 25%
My model:
self._model = keras.Sequential()
self._model.add(keras.layers.Dense(12, activation=tf.nn.sigmoid)) # hidden layer
self._model.add(keras.layers.Dense(len(VCDNN.conventions), activation=tf.nn.softmax)) # output layer
optimizer = tf.train.AdamOptimizer(0.01)
self._model.compile(optimizer, loss=tf.losses.sparse_softmax_cross_entropy, metrics=["accuracy"])
I have 4 labels, 60k rows of data, split evenly for each label so 15k each and 20k rows of data for evaluation
my data example:
name label
abcTest label1
mete_Test label2
ROMOBO label3
test label4
The input is turned into integers for each character and then hot encoded and output is just turned into integers [0-3]
1 epoch evaluation (loss, acc):
[0.7436684370040894, 0.25]
UPDATE
More details about the data
The strings are of up to 20 characters
I first convert each character to int based on an alphabet dictionary (a: 1, b:2, c:3) and if a word is shorter than 20 chars i fill the rest with 0's now those values are hot encoded and reshaped so
assume max 5 characters
1. ["abc","d"]
2. [[1,2,3,0,0],[4,0,0,0,0]]
3. [[[0,1,0,0,0],[0,0,1,0,0],[0,0,0,1,0],[1,0,0,0,0],[1,0,0,0,0]],[[0,0,0,0,1],[1,0,0,0,0],[1,0,0,0,0],[1,0,0,0,0],[1,0,0,0,0]]]
4. [[0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0],[0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0]]
and labels describe the way a word is spelled basically naming convention e.g. all lowercase - unicase, testBest - camelCase, TestTest - PascalCase, test_test - snake_case
With added 2 extra layers and LR reduced to 0.001
Pic of training
Update 2
self._model = keras.Sequential()
self._model.add(
keras.layers.Embedding(VCDNN.alphabetLen, 12, input_length=VCDNN.maxFeatureLen * VCDNN.alphabetLen))
self._model.add(keras.layers.LSTM(12))
self._model.add(keras.layers.Dense(len(VCDNN.conventions), activation=tf.nn.softmax)) # output layer
self._model.compile(tf.train.AdamOptimizer(self._LR), loss="sparse_categorical_crossentropy",
metrics=self._metrics)
Seems to start and immediately dies with no error (-1073740791)
The 0.25 acc means the model couldn't learn anything useful as it is same as the random guess. This means the network structure may not good for the problem.
Currently, the recurring neural network, like LSTM, is more commonly used for sequence modeling. For instance:
model = Sequential()
model.add(Embedding(char_size, embedding_size))
model.add(LSTM(hidden_size))
model.add(Dense(len(VCDNN.conventions), activation='softmax'))
This will work better if the label is related to the char sequence information about the input words.
This means your models isn't really learning anything useful. It might be stuck in a local minima. This could be due to following reasons:
a) you don't have enough train data to train a neural network. NNs usually require fairly large datasets to converge. Try using a RandomForest classifier at first to see what results you can get there
b) it's possible your target data might not have anything to do with your train data and so it's impossible to train such a model that would map efficiently without overfitting
c) your model could do with some improvements
If you want to give improving your model a go I would add a few extra dense layers with a few more units. So after line 2 of your model I'd add:
self._model.add(keras.layers.Dense(36, activation=tf.nn.sigmoid))
self._model.add(keras.layers.Dense(36, activation=tf.nn.sigmoid))
Another thing you can try is a different learning rate. I'd go with the default for AdamOptimizer which is 0.001. So just change 0.01 to 0.001 in the AdamOptimizer() call
You may also want to train more than just one epoch

Regressor Neural Network built with Keras only ever predicts one value

I'm trying to build a NN with Keras and Tensorflow to predict the final chart position of a song, given a set of 5 features.
After playing around with it for a few days I realised that although my MAE was getting lower, this was because the model had just learned to predict the mean value of my training set for all input, and this was the optimal solution. (This is illustrated in the scatter plot below)
This is a random sample of 50 data points from my testing set vs what the network thinks they should be
At first I realised this was probably because my network was too complicated. I had one input layer with shape (5,) and a single node in the output layer, but then 3 hidden layers with over 32 nodes each.
I then stripped back the excess layers and moved to just a single hidden layer with a couple nodes, as shown here:
self.model = keras.Sequential([
keras.layers.Dense(4,
activation='relu',
input_dim=num_features,
kernel_initializer='random_uniform',
bias_initializer='random_uniform'
),
keras.layers.Dense(1)
])
Training this with a gradient descent optimiser still results in exactly the same prediction being made the whole time.
Then it occurred to me that perhaps the actual problem I'm trying to solve isn't hard enough for the network, that maybe it's linearly separable. Since this would respond better to not having a hidden layer at all, essentially just doing regular linear regression, I tried that. I changed my model to:
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
This also changed nothing. My MAE, the predicted value are all the same.
I've tried so many different things, different permutations of optimisation functions, learning rates, network configurations, and nothing can help. I'm pretty sure the data is good, but I've included a sample of it just in case.
chartposition,tagcount,dow,artistscore,timeinchart,finalpos
121,3925,5,35128,7,227
131,4453,3,85545,25,130
69,2583,4,17594,24,523
145,1165,3,292874,151,187
96,1679,5,102593,111,540
134,3494,5,1252058,37,370
6,34895,7,6824048,22,5
A sample of my dataset, finalpos is the value I'm trying to predict. Dataset contains ~40,000 records, split 80/20 - training/testing
def __init__(self, validation_split, num_features, should_log):
self.should_log = should_log
self.validation_split = validation_split
inp = keras.Input(shape=(num_features,))
out = keras.layers.Dense(1, activation='relu')(inp)
self.model = keras.Model(inp,out)
optimizer = tf.train.GradientDescentOptimizer(0.01)
self.model.compile(loss='mae',
optimizer=optimizer,
metrics=['mae'])
def train(self, data, labels, plot=False):
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)
history = self.model.fit(data,
labels,
epochs=self.epochs,
validation_split=self.validation_split,
verbose=0,
callbacks = [PrintDot(), early_stop])
if plot: self.plot_history(history)
All code relevant to constructing and training the networ
def normalise_dataset(df, mini, maxi):
return (df - mini)/(maxi-mini)
Normalisation of the input data. Both my testing and training data are normalised to the max and min of the testing set
Graph of my loss vs validation curves with the one hidden layer network with an adamoptimiser, learning rate 0.01
Same graph but with linear regression and a gradient descent optimiser.
So I am pretty sure that your normalization is the issue: You are not normalizing by feature (as is the de-fact industry standard), but across all data.
That means, if you have two different features that have very different orders of magnitude/ranges (in your case, compare timeinchart with artistscore.
Instead, you might want to normalize using something like scikit-learn's StandardScaler. Not only does this normalize per column (so you can pass all features at once), but it also does unit variance (which is some assumption about your data, but can potentially help, too).
To transform your data, use something along these lines
from sklearn.preprocessing import StandardScaler
import numpy as np
raw_data = np.array([[1,40], [2, 80]])
scaler = StandardScaler()
processed_data = scaler.fit_transform(raw_data)
# fit() calculates mean etc, transform() puts it to the new range.
print(processed_data) # returns [[-1, -1], [1,1]]
Note that you have two possibilities to normalize/standardize your training data:
Either scale them together with your training data, and then split afterwards,
or you instead only fit the training data, and then use the same scaler to transform your test data.
Never fit_transform your test set separate from training data!
Since you have potentially different mean/min/max values, you can end up with totally wrong predictions! In a sense, the StandardScaler is your definition of your "data source distribution", which is inherently still the same for your test set, even though they might be a subset not exactly following the same properties (due to small sample size etc.)
Additionally, you might want to use a more advanced optimizer, like Adam, or specify some momentum property (0.9 is a good choice in practic, as a rule of thumb) for your SGD.
Turns out the error was a really stupid and easy to miss bug.
When I was importing my dataset, I shuffle it, however when I performed the shuffling, I was accidentally applying the shuffling only to the labels set, not the whole dataset as a whole.
As a result, each label was being assigned to a completely random feature set, of course the model didn't know what to do with this.
Thanks to #dennlinger for suggesting for me to look in the place where I eventually found this bug.

Delay gap between reality and prediction

Using machine learning (as library I've tried Tensorflow and Tflearn (which, I know is just a wrapping of Tensorflow)) I'm trying to predict the congestion in an area for the next week (see my previous questions if you want more backstory on it). My training set is composed of 400K tagged entry (with the date an congestion value for each minute).
My problem is that I now have a time gap between predictions and reality.
If I had to draw a chart with the reality and prediction you would see that my prediction, while have the same shape as the reality is in advance. She increase/decrease before the reality. It started to make me think that maybe my training had a problem. It would seem like that my prediction didn't start when my training ended.
Both of my data-sets (training/testing) are on 2 different file. First I train on my training set (for convenience sake let's say it end at 100th minutes and my testing set start at 101th minute), once my model saved I do my predictions, it should normally then start to predict 101 or am I wrong somewhere? Because it seem like it's starting to predict way way after my training stopped (if I keep my example it would start predicting value 107 for example).
For now one of a bad fix was to remove from the training set as many value as I had of delay (take this example, it would be 7) and it worked, no more delay but I don't understand why I have this problem or how to fix it so it wouldn't happen later.
Following some advices found on different website it seem like having gap in my training dataset (missing timestamp in this case) could be a problem, seeing that there do was some (in total around 7 to 9% of the whole dataset was missing) I've used Pandas to add the missing timestamps (I've also gave them the congestion value of the last know timestamp) while I do think that it may have helped a little (the gap is smaller) it haven't fixed the problem.
I tried multistep forecasting, multivariate forecasting, LSTM, GRU, MLP, Tensorflow, Tflearn but it change nothing making me think it could come from my training.
Here is my model training.
def fit_lstm(train, batch_size, nb_epoch, neurons):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
print X.shape
print y.shape
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(None, X.shape[1], X.shape[2]), stateful=False))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
return model
The 2 shape are :
(80485, 1, 1)
(80485,)
(On this example I'm using only 80K of data as training for speed purpose).
As parameter I'm using 1 neuron, 64 of batch_size and 5 epoch.
My dataset is made of 2 file. First is the training file with 2 column:
timestamp | values
The second have the same shape but is the testing set (separated to avoid any influence of it on my prediction), the file is only used once every prediction have been made and to compare reality and prediction. The testing set start where the training set stop.
Do you have an idea of what could be the reason of this problem?
Edit:
On my code I have this function:
# invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]
It's supposed to invert the difference (to go from a scaled value to the real one).
When using it like in the pasted example (using the testing set) I get perfection, accuracy above 95% and no gap.
Since in reality we wouldn't know theses values I had to change it.
I tried first to use the training set but got the problem explained on this post:
Why is this happening? Is there an explanation for this problem?
Found it. It was a problem with the "def inverse_difference(history, yhat, interval=1):" function. In fact it make my result look like my last lines of training. This is why I had a gap, since there is a pattern in my data (peak always at more or less the same moment) I thought he was doing prediction while he was just giving me back values from training.

Categories