Hyperparameter tuning to decide optimal neural network - python

I would like to find which is the optimal neural network based on some criteria. The criteria are the following ones:
Test 4 architectures with one, two, three, four hidden layers + output layer
Learning rates to be tested: 0.1,0.01,0.001
Epochs to be tested: 10,50,100
Input dimensions = 20
The output should be a table showing each combination (36 rows). For example, with one hidden layer, lr = 0.1, epochs = 10, the accuracy was X.
Please, see my code below:
#Function to create the model
def create_model(layers,learn_rate):
model = Sequential()
for i, nodes in enumerate(layers):
if i==0:
model.add(Dense(nodes),input_dim = 20,activation = 'relu')
else:
model.add(Dense(nodes),activation = 'relu')
model.add(Dense(units = 4,activation = 'softmax'))
model.compile(optimizer=adam(lr=learn_rate), loss='categorical_crossentropy',metrics=['accuracy'])
return model
#Initialization of variables
#Here there are the four possible types of layers with the neurons in each.
layers = [[20], [40, 20], [45, 30, 15],[32,16,8,4]]
learn_rate = [0.1,0.01,0.001]
epochs = [10,50,100]
#GridSearchCV for hyperparameter tuning
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
model = KerasClassifier(build_fn = create_model, verbose = 0)
param_grid = dict(layers = layers,learn_rate = learn_rate,epochs = epochs)
grid = GridSearchCV(estimator = model, param_grid = param_grid,cv = 3)
grid_result = grid.fit(train_x,train_y)
But when I´m running the code I get the following error:
RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x000001AA272C7748>, as the constructor either does not set or modifies parameter layers

Cannot clone object is not main problem. It is consequence of another error in model generator function.
You had some syntax errors in create_model(). Please look at errors that were before "Cloning problem" in your output.
Here is fixed function:
from keras import optimizers
def create_model(layers, learn_rate):
model = Sequential()
for i, nodes in enumerate(layers):
if i==0:
model.add(Dense(nodes,input_dim = 20,activation = 'relu'))
else:
model.add(Dense(nodes,activation = 'relu'))
model.add(Dense(units = 4,activation = 'softmax'))
model.compile(optimizer=optimizers.adam(lr=learn_rate), loss='categorical_crossentropy',metrics=['accuracy'])
return model

Related

TF2, Tensorflow Probability random seed generator and VAE

Playing around with Variational Autoencoders for some days. I am trying to fit a small toy function with a small model.
I first implemented the model using the Keras Functional API, with the following code:
def define_tfp_encoder(latent_dim, n_inputs=2, kl_weight=1):
prior = tfd.MultivariateNormalDiag(loc=tf.zeros(latent_dim))
input_x = Input((n_inputs,))
input_c = Input((1,))
dense = Dense(25, activation='relu', name='tfpenc/dense_1')(input_x)
dense = Dense(32, activation='relu', name='tfpenc/dense_2')(dense)
dense_z_params = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim), name='tfpenc/z_params')(dense)
dense_z = tfpl.MultivariateNormalTriL(latent_dim, name='tfpenc/z')(dense_z_params)
#activity_regularizer=tfpl.KLDivergenceRegularizer(prior) # weight=kl_weight
kld = tfpl.KLDivergenceAddLoss(prior, name='tfpenc/kld_add')(dense_z)
model = Model(inputs=input_x, outputs=kld)
return model
def define_tfp_decoder(latent_dim, n_inputs=2):
input_c = Input((1,), name='tfpdec/cond_input')
input_n = Input((latent_dim,))
dense = Dense(15, activation='relu', name='tfpdec/dense_1')(input_n)
dense = Dense(32, activation='relu', name='tfpdec/dense_2')(dense)
dense = Dense(tfpl.IndependentNormal.params_size(n_inputs), name='tfpdec/output')(dense)
output = tfpl.IndependentNormal((n_inputs,))(dense)
model = Model(input_n, output)
return model
def get_custom_unconditional_vae():
latent_size = 5
encoder = define_tfp_encoder(latent_dim=latent_size)
decoder = define_tfp_decoder(latent_dim=latent_size)
encoder.trainable = True
decoder.trainable = True
x = encoder.input
z = encoder.output
out = decoder(z)
vae = Model(inputs=x, outputs=out)
vae.compile(loss=lambda x, pred: -pred.log_prob(x), optimizer='adam')
return encoder, decoder, vae
The vae-model was then fitted and trained on 3000 epochs.
However, it only produced garbage for a very simple quadratic function to fit.
Now it comes:
When creating the exact same model using the sequential API it works as expected and the desired function gets approximated nicely:
And it becomes even stranger for me:
After running tf.random.set_seed(None) the model created using the Functional API also works as expected - What am I missing or not understanding correctly so far? - I assume that there are some differences regarding tf.random.set_seed when using the Sequential vs. the Functional API but... ?
Thanks in advance,
codax
EDIT: I forgot to mention that setting a seed (e.g. tf.random.set_seed(123) leads to identical results for both models not fitting the desired function.

LSTM produces identical forecast for each input

I've been working on reproducing a CNN-LSTM model for PV power forecasting from literature for the past four weeks for my Master Thesis in Energy Science (http://www.mdpi.com/2076-3417/8/8/1286). However I've been stuck on a seemingly simple issue: Any configuration of LSTM model that I've tried yields one of two things:
Rediculous output, makes no sense whatsoever (flat line, complete
stochasticity, negative values, you name it)
Exactly the same (very believable) PV power forecast.
I've done my best to reproduce the issue with as little code as possible:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.layers import CuDNNLSTM
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from time import time
SUN_UP, SUN_DOWN = '03:00:00', '23:00:00'
df = pd.read_csv('../Model_Xander/CNN-LSTM-wang/pv_data/all_data_resample-15T_interpolate-4.csv',
index_col = 0,
parse_dates = True)
df = pd.DataFrame(df['151'])
df = df.between_time(SUN_UP, SUN_DOWN)
TIME_STEPS_PER_DAY = len(df.loc['1-1-2016'])
print('each day consists of ' + str(TIME_STEPS_PER_DAY) + ' time steps of 15 minutes')
df = df.values
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
df = np.nan_to_num(df_scaled, nan = -1)
#df = np.float16(df)
def multivariate_data(dataset, target, start_index, end_index, history_size,
target_size, step, single_step=False):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index, step):
indices = range(i-history_size, i)
data.append(dataset[indices])
if single_step:
labels.append(target[i+target_size])
else:
labels.append(target[i:i+target_size])
return np.array(data), np.array(labels)
TRAIN_TEST_SPLIT = round(((2/3)*len(df)))
TARGET_COL = df[:,0]
HISTORY_SIZE = TIME_STEPS_PER_DAY * 10
TARGET_SIZE = TIME_STEPS_PER_DAY
STEP = TIME_STEPS_PER_DAY
x_train, y_train = multivariate_data(df, TARGET_COL, 0, TRAIN_TEST_SPLIT, HISTORY_SIZE, TARGET_SIZE, STEP)
x_test, y_test = multivariate_data(df, TARGET_COL, TRAIN_TEST_SPLIT, None, HISTORY_SIZE, TARGET_SIZE, STEP)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(Masking(mask_value = -1))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(LSTM(units = 100,
kernel_initializer = keras.initializers.Orthogonal(),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = False))
lstm.add(Dense(units = 100, activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()
begin = time()
history = lstm.fit(x_train, y_train,
epochs = 5,
batch_size = 24,
validation_data = (x_test, y_test),
verbose = 1,
shuffle = False)
end = time()
print('it took ' + str(round(end-begin)) + ' seconds to train 5 epochs')
print(history.history)
predict = lstm.predict(x_test)
print(predict.shape)
plt.figure()
for i in range(10, 20):
plt.plot(predict[i,:])
plt.figure()
for i in range(0, x_test.shape[0]):
plt.plot(predict[i,:])
The problem is clearly seen in the last plot:
Plot of 350 predictions overlayed on top of one another
As you can see, all forecasts are identical, I have run out of ideas on how to combat this issue.
As far as i could deduce, there are a number of possible causes, first, my dataset contains a large number of NaN's, I've done my best to combat that issue with three methods:
Resampling from very high resolution (10 seconds) to standard resolution (15 min)
Interpolating up to 4 consecutive NaN's with linear interpolation (any more seems stupid to me)
The masking layer an observant reader might've noticed in the model definition in the code
Even after these steps, my dataset still contains a large amount of NaN's, I'm not really sure what to do about it, or if the Masking layer is even doing its intended job. I do know for sure that the masking layer cannot play nicely with CuDNNLSTM, and my normal LSTM model runs a LOT slower with the masking layer.
The best I've been able to accomplish in terms of obtaining differently shaped predictions for differently shaped inputs is this: Differently shaped output for differently shaped inputs However, as you can see, this is just the same shape with a slightly different amplitude.
Another thing I've noticed is that when i input data from 9 other sensors as features (each with a similar amount and location of NaN's), the amplitude changes per prediction (yay), but the shape remains the same across all predictions: yay different amplitude! Aww, same shape :(.
I will be uploading my model to my university's cluster (for the 200th time) to train for more than 5 epochs, who knows, maybe today is my lucky day. If anyone knows how to combat these issues, i would be very glad and thankful to hear your thoughts.
EDIT:
In light of the lessons learned from the response below i made the following changes: Regularization and dropout to combat overfitting (which will lead to the average being forecasted for every input if left unchecked).
Last LSTM layer with return_sequences = True
Added Flatten layer after last LSTM layer
Removed NaN values from my dataset removing the need for the masking layer and enabling the use of the CuDNNLSTM layer (train on GPU if I understand it correctly).
However, now that each day has a unique forecast, I noticed that increasing the number of units in the LSTM layer beyond somewhere between 20 and 50 (I tested 20 and 50). Will return the problem of each day having the exact same forecast. I am still stumped as to why this is. (See below for the model I used to produce unique forecasts for each day)
lstm = Sequential()
lstm.add(Input(shape = (x_train.shape[1], x_train.shape[2])))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
#lstm.add(Dropout(rate=0.2))
lstm.add(CuDNNLSTM(units = 50,
kernel_initializer = keras.initializers.Orthogonal(),
kernel_regularizer = keras.regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_initializer = keras.initializers.Constant(value=0.1),
return_sequences = True))
lstm.add(Dropout(rate = 0.2))
lstm.add(Flatten())
lstm.add(Dense(units = int(0.5*x_train.shape[1]), activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.add(Dropout(rate = 0.2))
lstm.add(Dense(units = y_test.shape[1], activation = 'relu',
kernel_initializer = keras.initializers.TruncatedNormal(mean=0, stddev=0.05),
bias_initializer = keras.initializers.Constant(value=0.1)))
lstm.compile(loss = 'mse', optimizer = 'adam')
lstm.summary()

Output of hidden layer for every epoch and storing that in a list in keras?

I have a keras MLP with single hidden layer. I am using a multilayer perceptron with some specific number of nodes in a single hidden layer. I want to extract the activation value for all the neurons of that hidden layer when a batch is passed and I want to do that for every epoch and store that in a list to explore. My representation is like following.
class myNetwork:
# Architecture of our neural network.
def multilayerPerceptron(self, Num_Nodes_hidden,input_features,output_dims,activation_function = 'relu', learning_rate=0.001,
momentum_val=0.00):
model = Sequential()
model.add(Dense(Num_Nodes_hidden, input_dim =input_features, activation=activation_function))
model.add(Dense(output_dims,activation='softmax'))
model.compile(loss = "categorical_crossentropy",
optimizer=SGD(lr = learning_rate, momentum = momentum_val),
metrics=['accuracy'])
return model
Below is my call for another part where I am using lambdacallbacks to save the weights. I want something similar but this time to save the actual activation values for the hidden layer.
from keras.callbacks import LambdaCallback
import pickle
from keras.callbacks import ModelCheckpoint
from keras.callbacks import CSVLogger
# setting_parameters and calling inputs.
val = myNetwork()
vals = val.multilayerPerceptron(8,4,3,'relu',0.01)
batch_size_val = 20
number_iters = 200
weights_ih = []
weights_ho = []
activation_vals = []
get_activtaion = LambdaCallback(on_epoch_end=lambda batch, logs: activation_vals.append("What should I put Here"))
print_weights = LambdaCallback(on_epoch_end=lambda batch, logs: weights_ih.append(vals.layers[0].get_weights()))
print_weights_1 = LambdaCallback(on_epoch_end=lambda batch, logs: weights_ho.append(vals.layers[1].get_weights()))
history_callback = vals.fit(X_train, Y_train,
batch_size=batch_size_val,
epochs=number_iters,
verbose=0,
validation_data=(X_test, Y_test),
callbacks = [csv_logger,print_weights,print_weights_1,get_activtaion])
I am super confused and I am not sure what I should put in GetActivtion. Please let me know what I should there in order to get the activation value for all the samples of the batch for that iteration value of the weights.
weights_callback to get weights of each layer:
weights_list = [] #[epoch][layer][unit(l-1)][unit(l)]
def save_weights(model):
inner_list = []
for layer in model.layers:
inner_list.append(layer.get_weights()[0])
weights_list.append(inner_list)
weights_callback = LambdaCallback(on_epoch_end = lambda batch, logs:save_weights(model))
activations_callback to get output of each layer:
activations_list = [] #[epoch][layer][0][X][unit]
def save_activations(model):
outputs = [layer.output for layer in model.layers]
functors = [K.function([model.input],[out]) for out in outputs]
layer_activations = [f([X_input_vectors]) for f in functors]
activations_list.append(layer_activations)
activations_callback = LambdaCallback(on_epoch_end = lambda batch, logs:save_activations(model))
apply callbacks:
result = model.fit(... , callbacks = [weights_callback, activations_callback], ...)
Refs:
Keras: Interpreting the output of get_weights()
Keras, How to get the output of each layer?

Grid Search for Keras with multiple inputs

I am trying to do a grid search over my hyperparameters for tuning a deep learning architecture. I have multiple input options to the model and I am trying to use sklearn's grid search api. The problem is, grid search api only takes single array as input and the code fails while it checks for the data size dimension.(My input dimension is 5*number of data points while according to sklearn api, it should be number of data points*feature dimension). My code looks something like this:
from keras.layers import Concatenate, Reshape, Input, Embedding, Dense, Dropout
from keras.models import Model
from keras.wrappers.scikit_learn import KerasClassifier
def model(hyparameters):
a = Input(shape=(1,))
b = Input(shape=(1,))
c = Input(shape=(1,))
d = Input(shape=(1,))
e = Input(shape=(1,))
//Some operations and I get a single output -->out
model = Model([a, b, c, d, e], out)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
k_model = KerasClassifier(build_fn=model, epochs=150, batch_size=512, verbose=2)
# define the grid search parameters
param_grid = hyperparameter options dict
grid = GridSearchCV(estimator=k_model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit([a_input, b_input, c_input, d_input, e_input], encoded_outputs)
this is workaround to use GridSearch and Keras model with multiple inputs. the trick consists in merge all the inputs in a single array. I create a dummy model that receives a SINGLE input and then split it into the desired parts using Lambda layers. the procedure can be easily modified according to your own data structure
def createMod(optimizer='Adam'):
combi_input = Input((3,)) # (None, 3)
a_input = Lambda(lambda x: tf.expand_dims(x[:,0],-1))(combi_input) # (None, 1)
b_input = Lambda(lambda x: tf.expand_dims(x[:,1],-1))(combi_input) # (None, 1)
c_input = Lambda(lambda x: tf.expand_dims(x[:,2],-1))(combi_input) # (None, 1)
## do something
c = Concatenate()([a_input, b_input, c_input])
x = Dense(32)(c)
out = Dense(1,activation='sigmoid')(x)
model = Model(combi_input, out)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics='accuracy')
return model
## recreate multiple inputs
n_sample = 1000
a_input, b_input, c_input = [np.random.uniform(0,1, n_sample) for _ in range(3)]
y = np.random.randint(0,2, n_sample)
## merge inputs
combi_input = np.stack([a_input, b_input, c_input], axis=-1)
model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=createMod, verbose=0)
batch_size = [10, 20]
epochs = [10, 5]
optimizer = ['adam','SGD']
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(combi_input, y)
Another simple and valuable solution

Keras: Wrong Number of Training Epochs

I'm trying to build a class to quickly initialize and train an autoencoder for rapid prototyping. One thing I'd like to be able to do is quickly adjust the number of epochs I train for. However, it seems like no matter what I do, the model trains each layer for 100 epochs! I'm using the tensorflow backend.
Here is the code from the two offending methods.
def pretrain(self, X_train, nb_epoch = 10):
data = X_train
for ae in self.pretrains:
ae.fit(data, data, nb_epoch = nb_epoch)
ae.layers[0].output_reconstruction = False
ae.compile(optimizer='sgd', loss='mse')
data = ae.predict(data)
.........
def fine_train(self, X_train, nb_epoch):
weights = [ae.layers[0].get_weights() for ae in self.pretrains]
dims = self.dims
encoder = containers.Sequential()
decoder = containers.Sequential()
## add special input encoder
encoder.add(Dense(output_dim = dims[1], input_dim = dims[0],
weights = weights[0][0:2], activation = 'linear'))
## add the rest of the encoders
for i in range(1, len(dims) - 1):
encoder.add(Dense(output_dim = dims[i+1],
weights = weights[i][0:2], activation = self.act))
## add the decoders from the end
decoder.add(Dense(output_dim = dims[len(dims) - 2], input_dim = dims[len(dims) - 1],
weights = weights[len(dims) - 2][2:4], activation = self.act))
for i in range(len(dims) - 2, 1, -1):
decoder.add(Dense(output_dim = dims[i - 1],
weights = weights[i-1][2:4], activation = self.act))
## add the output layer decoder
decoder.add(Dense(output_dim = dims[0],
weights = weights[0][2:4], activation = 'linear'))
masterAE = AutoEncoder(encoder = encoder, decoder = decoder)
masterModel = models.Sequential()
masterModel.add(masterAE)
masterModel.compile(optimizer = 'sgd', loss = 'mse')
masterModel.fit(X_train, X_train, nb_epoch = nb_epoch)
self.model = masterModel
Any suggestions on how to fix the problem would be appreciated. My original suspicion was that it was something to do with tensorflow, so I tried running with the theano backend but encountered the same problem.
Here is a link to the full program.
Following the Keras doc, the fit method uses a default of 100 training epochs (nb_epoch=100):
fit(X, y, batch_size=128, nb_epoch=100, verbose=1, callbacks=[], validation_split=0.0, validation_data=None, shuffle=True, show_accuracy=False, class_weight=None, sample_weight=None)
I'm sure how you are running these methods, but following the "Typical usage" from the original code, you should be able to run something like (adjusting the variable num_epoch as required):
#Typical usage:
num_epoch = 10
ae = JPAutoEncoder(dims)
ae.pretrain(X_train, nb_epoch = num_epoch)
ae.train(X_train, nb_epoch = num_epoch)
ae.predict(X_val)

Categories