I've been working on reproducing a CNN-LSTM model for PV power forecasting from literature for the past four weeks for my Master Thesis in Energy Science (http://www.mdpi.com/2076-3417/8/8/1286). However I've been stuck on a seemingly simple issue: Any configuration of LSTM model that I've tried yields one of two things:
Rediculous output, makes no sense whatsoever (flat line, complete
stochasticity, negative values, you name it)
Exactly the same (very believable) PV power forecast.
I've done my best to reproduce the issue with as little code as possible:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.layers import CuDNNLSTM
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from time import time
SUN_UP, SUN_DOWN = '03:00:00', '23:00:00'
df = pd.read_csv('../Model_Xander/CNN-LSTM-wang/pv_data/all_data_resample-15T_interpolate-4.csv',
index_col = 0,
parse_dates = True)
df = pd.DataFrame(df['151'])
df = df.between_time(SUN_UP, SUN_DOWN)
TIME_STEPS_PER_DAY = len(df.loc['1-1-2016'])
print('each day consists of ' + str(TIME_STEPS_PER_DAY) + ' time steps of 15 minutes')
df = df.values
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
df = np.nan_to_num(df_scaled, nan = -1)
#df = np.float16(df)
def multivariate_data(dataset, target, start_index, end_index, history_size,
target_size, step, single_step=False):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index, step):
indices = range(i-history_size, i)
if single_step:
return np.array(data), np.array(labels)
TRAIN_TEST_SPLIT = round(((2/3)*len(df)))
TARGET_COL = df[:,0]
x_train, y_train = multivariate_data(df, TARGET_COL, 0, TRAIN_TEST_SPLIT, HISTORY_SIZE, TARGET_SIZE, STEP)
x_test, y_test = multivariate_data(df, TARGET_COL, TRAIN_TEST_SPLIT, None, HISTORY_SIZE, TARGET_SIZE, STEP)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(Masking(mask_value = -1))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = False))
lstm.add(Dense(units = 100, activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
begin = time()
history = lstm.fit(x_train, y_train,
epochs = 5,
batch_size = 24,
validation_data = (x_test, y_test),
verbose = 1,
shuffle = False)
end = time()
print('it took ' + str(round(end-begin)) + ' seconds to train 5 epochs')
predict = lstm.predict(x_test)
for i in range(10, 20):
for i in range(0, x_test.shape[0]):
The problem is clearly seen in the last plot:
Plot of 350 predictions overlayed on top of one another
As you can see, all forecasts are identical, I have run out of ideas on how to combat this issue.
As far as i could deduce, there are a number of possible causes, first, my dataset contains a large number of NaN's, I've done my best to combat that issue with three methods:
Resampling from very high resolution (10 seconds) to standard resolution (15 min)
Interpolating up to 4 consecutive NaN's with linear interpolation (any more seems stupid to me)
The masking layer an observant reader might've noticed in the model definition in the code
Even after these steps, my dataset still contains a large amount of NaN's, I'm not really sure what to do about it, or if the Masking layer is even doing its intended job. I do know for sure that the masking layer cannot play nicely with CuDNNLSTM, and my normal LSTM model runs a LOT slower with the masking layer.
The best I've been able to accomplish in terms of obtaining differently shaped predictions for differently shaped inputs is this: Differently shaped output for differently shaped inputs However, as you can see, this is just the same shape with a slightly different amplitude.
Another thing I've noticed is that when i input data from 9 other sensors as features (each with a similar amount and location of NaN's), the amplitude changes per prediction (yay), but the shape remains the same across all predictions: yay different amplitude! Aww, same shape :(.
I will be uploading my model to my university's cluster (for the 200th time) to train for more than 5 epochs, who knows, maybe today is my lucky day. If anyone knows how to combat these issues, i would be very glad and thankful to hear your thoughts.
In light of the lessons learned from the response below i made the following changes: Regularization and dropout to combat overfitting (which will lead to the average being forecasted for every input if left unchecked).
Last LSTM layer with return_sequences = True
Added Flatten layer after last LSTM layer
Removed NaN values from my dataset removing the need for the masking layer and enabling the use of the CuDNNLSTM layer (train on GPU if I understand it correctly).
However, now that each day has a unique forecast, I noticed that increasing the number of units in the LSTM layer beyond somewhere between 20 and 50 (I tested 20 and 50). Will return the problem of each day having the exact same forecast. I am still stumped as to why this is. (See below for the model I used to produce unique forecasts for each day)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(Dropout(rate = 0.2))
lstm.add(Dense(units = int(0.5*x_train.shape[1]), activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dropout(rate = 0.2))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
I structured a Convolutional LSTM model to predict the forthcoming Bitcoin price data, using the analyzed past data of the Bitcoin close price and other features.
Let me jump straight to the code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow.keras as keras
import keras_tuner as kt
from keras_tuner import HyperParameters as hp
from keras.models import Sequential
from keras.layers import InputLayer, ConvLSTM1D, LSTM, Flatten, RepeatVector, Dense, TimeDistributed
from keras.callbacks import EarlyStopping
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
import keras.backend as K
from keras.losses import Huber
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
DIR = '../input/btc-features-targets'
SEG_DIR = '../input/segmented'
segmentized_features = os.listdir(SEG_DIR)
btc_train_features = []
for seg in segmentized_features:
train_features = pd.read_csv(f'{SEG_DIR}/{seg}')
train_features.set_index('date', inplace=True)
btc_train_targets = pd.read_csv(f'{DIR}/btc_train_targets.csv')
btc_train_targets.set_index('date', inplace=True)
btc_test_features = pd.read_csv(f'{DIR}/btc_test_features.csv')
btc_tef1 = btc_test_features.iloc[:111]
btc_tef2 = btc_test_features.iloc[25:]
btc_tef1.set_index('date', inplace=True)
btc_tef2.set_index('date', inplace=True)
btc_test_targets = pd.read_csv(f'{DIR}/btc_test_targets.csv')
btc_test_targets.set_index('date', inplace=True)
btc_trt_log = np.log(btc_train_targets)
btc_tefs1 = scaler.fit_transform(btc_tef1.values)
btc_tefs2 = scaler.fit_transform(btc_tef2.values)
btc_tet_log = np.log(btc_test_targets)
scaled_train_features = []
for features in btc_train_features:
shape = features.shape
scaled_train_features.append(np.expand_dims(features, [0,3]))
shape_2 = btc_tefs1.shape
btc_tefs1 = np.expand_dims(btc_tefs1, [0,3])
shape_3 = btc_tefs2.shape
btc_tefs2 = np.expand_dims(btc_tefs2, [0,3])
btc_trt_log = btc_trt_log.values[0]
btc_tet_log = btc_tet_log.values[0]
def build(hp):
model = keras.Sequential()
# Input Layer
# ConvLSTM1D
convLSTM_hp_filters = hp.Int(name='convLSTM_filters', min_value=32, max_value=512, step=32)
convLSTM_hp_kernel_size = hp.Choice(name='convLSTM_kernel_size', values=[3,5,7])
convLSTM_activation = hp.Choice(name='convLSTM_activation', values=['selu', 'relu'])
# Flatten
# RepeatVector
LSTM_hp_units = hp.Int(name='LSTM_units', min_value=32, max_value=512, step=32)
LSTM_activation = hp.Choice(name='LSTM_activation', values=['selu', 'relu'])
model.add(LSTM(units=LSTM_hp_units, activation=LSTM_activation, return_sequences=True))
# TimeDistributed Dense
dense_units = hp.Int(name='dense_units', min_value=32, max_value=512, step=32)
dense_activation = hp.Choice(name='dense_activation', values=['selu', 'relu'])
model.add(TimeDistributed(Dense(units=dense_units, activation=dense_activation)))
# TimeDistributed Dense_Output
# Set Learning Rate
hp_learning_rate = hp.Choice(name='learning_rate', values=[1e-2, 1e-3, 1e-4])
# Compile Model
return model
tuner = kt.Hyperband(build,
objective=kt.Objective('root_mean_squared_error', direction='min'),
early_stop = EarlyStopping(monitor='root_mean_squared_error', patience=5)
opt_hps = []
for train_features in scaled_train_features:
tuner.search(train_features, btc_trt_log, epochs=50, callbacks=[early_stop])
models, epochs = ([] for _ in range(2))
for hps in opt_hps:
model = tuner.hypermodel.build(hps)
history = model.fit(train_features, btc_trt_log, epochs=70, verbose=0)
rmse = history.history['root_mean_squared_error']
best_epoch = rmse.index(min(rmse)) + 1
hypermodel = tuner.hypermodel.build(opt_hps[0])
for train_features, epoch in zip(scaled_train_features, epochs): hypermodel.fit(train_features, btc_trt_log, epochs=epoch)
tp1 = hypermodel.predict(btc_tefs1).flatten()
tp2 = hypermodel.predict(btc_tefs2).flatten()
test_predictions = np.concatenate((tp1, tp2[86:]), axis=None)
The hyperparameters of the model are configured using keras_tuner; as there were ResourceExhaustError issues output by the notebook when training is done with the full features dataset, sequentially segmented datasets are used instead (and apparently, referring to the study done utilizing the similar model architecture, training is able to be efficiently done through this training approach).
The input dimension of each segmented dataset is (111,32,1).
There aren't any issues reported until before the last code block. The models work fine. Yet, when the .predict() function is executed, the notebook prints out an error, which states that the dimension of the input features for making predictions is incompatible with the dimension of the input features used while training. I did not understand the reason behind its occurrence, since as far as I know, the input dimensions of a train dataset for a DNN model cannot be identical as the input dimensions of a test dataset.
Even though all the price data from 2018 to early 2021 are used as training datasets, predictions are only needed for the mid 2021 timeframe.
The dataset used for prediction has a dimension of (136,32,1).
I tried matching the dimension of this dataset to (111,32,1), through index slicing.
Now this showed issues in the output dimension. While predictions should be made for 136 data points, the result only returned 10.
Are there any issues relevant to the model configuration? Cannot interpret the current situation.
I would like to find which is the optimal neural network based on some criteria. The criteria are the following ones:
Test 4 architectures with one, two, three, four hidden layers + output layer
Learning rates to be tested: 0.1,0.01,0.001
Epochs to be tested: 10,50,100
Input dimensions = 20
The output should be a table showing each combination (36 rows). For example, with one hidden layer, lr = 0.1, epochs = 10, the accuracy was X.
Please, see my code below:
#Function to create the model
def create_model(layers,learn_rate):
model = Sequential()
for i, nodes in enumerate(layers):
if i==0:
model.add(Dense(nodes),input_dim = 20,activation = 'relu')
model.add(Dense(nodes),activation = 'relu')
model.add(Dense(units = 4,activation = 'softmax'))
model.compile(optimizer=adam(lr=learn_rate), loss='categorical_crossentropy',metrics=['accuracy'])
return model
#Initialization of variables
#Here there are the four possible types of layers with the neurons in each.
layers = [[20], [40, 20], [45, 30, 15],[32,16,8,4]]
learn_rate = [0.1,0.01,0.001]
epochs = [10,50,100]
#GridSearchCV for hyperparameter tuning
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
model = KerasClassifier(build_fn = create_model, verbose = 0)
param_grid = dict(layers = layers,learn_rate = learn_rate,epochs = epochs)
grid = GridSearchCV(estimator = model, param_grid = param_grid,cv = 3)
grid_result = grid.fit(train_x,train_y)
But when I´m running the code I get the following error:
RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x000001AA272C7748>, as the constructor either does not set or modifies parameter layers
Cannot clone object is not main problem. It is consequence of another error in model generator function.
You had some syntax errors in create_model(). Please look at errors that were before "Cloning problem" in your output.
Here is fixed function:
from keras import optimizers
def create_model(layers, learn_rate):
model = Sequential()
for i, nodes in enumerate(layers):
if i==0:
model.add(Dense(nodes,input_dim = 20,activation = 'relu'))
model.add(Dense(nodes,activation = 'relu'))
model.add(Dense(units = 4,activation = 'softmax'))
model.compile(optimizer=optimizers.adam(lr=learn_rate), loss='categorical_crossentropy',metrics=['accuracy'])
return model
i'm stuck as you can see, with my lstm model. I'm trying to predict the amount of tons to produce per month. When i run the model to train the accuracy is almost constant, it has a minimal variation like:
I tried different combination of activations, initializers and parameters, and the acc don't increase.
I don't know if the problem here is my data, my model or this value is the max acc the model can reach.
Here is the code (if you notice some libraries unused, its because i made some changes by the first version)
import numpy as np
import pandas as pd
from pandas.tseries.offsets import DateOffset
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn import preprocessing
import keras
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
from keras.optimizers import Adam
import warnings
%matplotlib inline
from plotly.offline import iplot
import matplotlib.pyplot as plt
import chart_studio.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go
df_ventas = pd.read_csv('/content/drive/My Drive/proyectoPanimex/DEOPE.csv', parse_dates=['Data Emissão'], index_col=0, squeeze=True)
#df_ventas = df_ventas.resample('M').sum().reset_index()
df_ventas = df_ventas.drop(columns= ['weekday', 'month'], axis=1)
df_ventas = df_ventas.reset_index()
df_ventas = df_ventas.rename(columns= {'Data Emissão':'Fecha','Un':'Cantidad'})
df_ventas['dia'] = [x.day for x in df_ventas.Fecha]
df_ventas['mes']=[x.month for x in df_ventas.Fecha]
df_ventas['anio']=[x.year for x in df_ventas.Fecha]
df_ventas = df_ventas[:-48]
df_ventas = df_ventas.drop(columns='Fecha')
df_diff = df_ventas.copy()
df_diff['cantidad_anterior'] = df_diff['Cantidad'].shift(1)
df_diff = df_diff.dropna()
df_diff['diferencia'] = (df_diff['Cantidad'] - df_diff['cantidad_anterior'])
df_supervised = df_diff.drop(['cantidad_anterior'],axis=1)
#adding lags
for inc in range(1,31):
nombre_columna = 'retraso_' + str(inc)
df_supervised[nombre_columna] = df_supervised['diferencia'].shift(inc)
df_supervised = df_supervised.dropna()
df_supervisedNumpy = df_supervised.to_numpy()
train = df_supervisedNumpy
scaler = MinMaxScaler(feature_range=(0, 1))
X_train = scaler.fit(train)
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
X_train, y_train = train_scaled[:, 1:], train_scaled[:, 0:1]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
model = Sequential()
act = 'tanh'
actF = 'relu'
model.add(LSTM(200, activation = act, input_dim=34, return_sequences=True ))
model.add(LSTM(200, activation= act))
model.add(Dense(200, activation= act))
model.add(Dense(1, activation= actF))
optimizer = keras.optimizers.Adam(lr=0.00001)
model.compile(optimizer=optimizer, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size = 100,
epochs = 50, verbose = 1)
hist = pd.DataFrame(history.history)
hist['Epoch'] = history.epoch
History plot:
loss acc Epoch
0 0.847146 0.344070 0
1 0.769400 0.344070 1
2 0.703548 0.344070 2
3 0.698137 0.344070 3
4 0.653952 0.344070 4
As you can see the only value that change its loss, but what is going on with Acc?. I'm starting with machine learning, and i have no more knowledge to can see my errors. Thanks!
A Dense(1, activation='softmax') will always freeze and not learn anything
A Dense(1, activation='relu') will very probably freeze and not learn anything
A Dense(1, activation='sigmoid') is ideal for classification (binary) problems and somewhat good for regression with values between 0 and 1.
A Dense(1, activation='tanh') is somewhat good for regression with values between -1 and 1
A Dense(1, activation='softplus') is somewhat good for regression with values between 0 and +infinite
A Dense(1, actiavation='linear') is good for regression in general with no limits (but it's highly recommended that the data be normalized before)
For regression, you can't use accuracy, but the metrics 'mae' and 'mse' don't provide "relative" difference, they provide "absolute" mean difference, one linear, the other squared.
Your output activation should be linear for continuous prediction or softmax for classification. Also multiply your learning rate by 100. Your loss should be mean_absolute_error. You could also easily divide your lstm neurons by a factor of 10. The tanh should be replaced by relu or the likes.
For your accuracy problem, it makes no sense to use accuracy, since you're not trying to classify. For metrics, you can use mae. You're trying to know how far the prediction is from the actual target, on a continuous scale. Accuracy is for categories, not continuous data.
I have a Time series data for almost 5 years. Using this data I want to forecast next 2 years. How to do this?
I referred many websites regarding this. I noticed that mostly predictions are done only with same set of data used for training they are not forecasting for future such as for next 30 days. If it possible to achieve this via TensorFlow. May I know how to achieve this?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
dataset_train = pd.read_csv(r'C:\Users\Kavin\source\repos\SampleTensorFlow\SampleTensorFlow\data\traindataset.csv')
training_set = dataset_train.iloc[:, 1:2].values
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(60, 2035):
X_train.append(training_set_scaled[i-60:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
regressor = Sequential()
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(LSTM(units = 50))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
dataset_test = pd.read_csv(r'C:\Users\Kavin\source\repos\SampleTensorFlow\SampleTensorFlow\data\testdataset.csv')
result = dataset_test[['Date','Open']]
real_stock_price = dataset_test.iloc[:, 1:2].values
dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
X_test = []
for i in range(60, 76):
X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
result['PredictedResult'] = pd.Series(predicted_stock_price.ravel(), index=result.index)
result.to_csv(r"C:\Users\Kavin\Downloads\PredictedStocks.csv", index=False)
ax = plt.gca()
result.plot(kind='line', x='Date', y='Open', color='red', label = 'Real Stock Price', ax=ax)
result.plot(kind='line', x='Date', y='PredictedResult', color='blue', label = 'Predicted Stock Price', ax=ax)
for all machine learning problem you want to ask yourself the question "What do i want to predict and what data do i have ?"
In your case you want to predict values at an undefined time in the future, let's call that time T.
We suppose that your current data is labelled ie. for each sample/row (x) you have a corresponding value (y). Let xt be the timestamp of your x data
If you want to predict y at time xt + T then you must feed your algorithm with data such as for each sample x, the corresponding label is y at time xt + T.
This way your algorithm will "learn" to predict the value of y at time xt + T from data at time xt
With Pandas, this can be achieved with shift.
time is mostly an abstraction - means nothing, better think about Sequencies. And in order to predict next yet unknown step in sequence provide to DL model correct input_shape & to predict() method the same set of NEW features that you consider to become base for the prediction next moment... e.g. here or here - ED
-- though I still think that encoder-decoder seq2seq model still gives decoded output ONLY if it was present in past (before encoding) & besides if the task of reconstruction of features by decoder from encoded data is correct (that is not always possible to reconstruct similar to those that were encoded)
So, I still consider example in TF to be the best for your goal - though am not sure in adequacy of prediction (that it will become true - as so as even DL gives only likelihood as well as ML based on Bayesian statistics )
if your Dependency is continuous in time and you found or know the Function that describes it - of course you can get prediction for any steps forward for any horizon that you'd like... e.g. you discovered a tendency or cyclicity (e.g. daily - here time can be considered to be a feature)...
another approach is Differencing - it is a technique that removes the trend and seasonality of TimeSeries in order to provide stationarity to these TimeSeries.
that's all, nothing else about the mystery of Dependency and Backpropagation
I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. It fails to notice some very obvious correlations that I can clearly see by myself. Here is my current code:
`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
maxi = 0.96
mini = 0.7
# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]
# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]
# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]
# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi
# Session
net = tf.InteractiveSession()
# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])
# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()
# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))
# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))
# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4),
# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))
# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))
# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)
# Init
# Fit neural net
batch_size = 10
mse_train = []
mse_test = []
# Run
epochs = 10
for e in range(epochs):
# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]
# Minibatch training
for i in range(0, len(y_train) // batch_size):
start = i * batch_size
batch_x = X_train[start:start + batch_size]
batch_y = y_train[start:start + batch_size]
# Run optimizer with batch
net.run(opt, feed_dict={X: batch_x, Y: batch_y})
# Show progress
if np.mod(i, 50) == 0:
mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))
pred = net.run(out, feed_dict={X: X_test})
Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. However, I am quite new to neural networks, so there might be something very obvious that I'm missing.
Thanks in advance to anyone who managed to read through all of that.
It will make is much easier you you will share a small dataset that illustrate the problem. However, I will state some of the issues with non-standards datasets and how to overcome them.
Possible solutions
Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. See dropout methods here (original paper), and some overview here.
Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. If you read a book, words like the or it will appear much more times than warehouse or such. This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories.
Model structure - sometimes fully connected layers are not enough. See computer vision problems for instance, where we train using convolution layers. The convolution and pooling layers enforce structure on the model, which is suitable for images. This is also some sort of regulation, since we have less parameters in those layers. In time-series problems, convolutions are also possible and turns out that works just fine. See example in Conditional Time Series Forecasting with Convolution Neural Networks.
The above suggestions are presented in the order I would suggest to try.
Good luck!