How can I tune neural network architecture using KerasTuner? - python

I'm trying to use KerasTuner to automatically tune the neural network architecture, i.e., the number of hidden layers and the number of nodes in each hidden layer. Currently, the neural network architecture is defined using one parameter NN_LAYER_SIZES. For example,
NN_LAYER_SIZES = [128, 128, 128, 128]
indicates the NN has 4 hidden layers and each hidden layer has 128 nodes.
KerasTuner has the following hyperparameter types (https://keras.io/api/keras_tuner/hyperparameters/):
Int
Float
Boolean
Choice
It seems none of these hyperparameter types fits my use case. So I wrote the following code to scan the number of hidden layers and the number of nodes. However, it's not been recognized as a hyperparameter.
number_of_hidden_layer = hp.Int("layer_number", min_value=2, max_value=5, step=1)
number_of_nodes = hp.Int("node_number", min_value=4, max_value=8, step=1)
NN_LAYER_SIZES = [2**number_of_nodes for _ in range(number of hidden_layer)]
Any suggestions on how to make it right?

Maybe treat the number of layers as a hyperparameter by iterating through it when building your model. That way you can experiment with different numbers of layers combined with different numbers of nodes:
import tensorflow as tf
import keras_tuner as kt
def model_builder(hp):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
units = hp.Int('units', min_value=32, max_value=512, step=32)
layers = hp.Int('layers', min_value=2, max_value=5, step=1)
for _ in range(layers):
model.add(tf.keras.layers.Dense(units=units, activation='relu'))
model.add(tf.keras.layers.Dense(10))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
(img_train, label_train), (_, _) = tf.keras.datasets.fashion_mnist.load_data()
img_train = img_train.astype('float32') / 255.0
tuner = kt.Hyperband(model_builder,
objective='val_accuracy',
max_epochs=10,
factor=3)
tuner.search(img_train, label_train, epochs=50, validation_split=0.2)
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)

If you want more control and versatility in your architecture tuning, I recommend you check out My answer to "Keras Tuner: select number of units conditional on number of layers". The intuition is to define one hparam for the number of nodes in each layer individually. Like so:
neurons_first_layer = hp.Choice('neurons_first_layer', [16,32,64,128])
neurons_second_layer = hp.Choice('neurons_second_layer', [0,16,32,64,])
I implemented the build function thus that if layer has 0 nodes, it vanishes entirely. That way if neurons_second_layer = 0, the ANN has no second layer.

Related

Surrogate model for [parameter vector] to [time series]

Say I have a function F that takes in a parameter vector P (say, a 5-element vector), and produces a (numerical) time series Y[t] of length T (eg T=100, so t=1,...,100). The function could be complicated (eg enzyme reaction models)
I want to make a neural network that predicts the output (Y[t]) that would result from feeding a new parameter set (P') into the function. How can this be done?
A simple feed-forward network can work, but it requires a very large number of output nodes, and doesn't take into account the temporal correlation / relationships between points. Is it possible/better to use a RNN or Transformer instead?
Using RNN might work for you. Here is some example code in Keras to get you started:
param_length = 5
time_length = 100
hidden_size = 20
model = tf.keras.Sequential([
# Encode input parameters.
tf.keras.layers.Dense(hidden_size, input_shape=[param_length]),
# Generate a sequence.
tf.keras.layers.RepeatVector(time_length),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
model.compile(loss="mse", optimizer="nadam")
model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=10)
The first Dense layer converts input parameters to a hidden state. Then LSTM RNN units generate time sequences. You will need to experiment with hyperparameters like the number of dense and LTSM layers, the size of hidden layers etc.
One more thing you can try is to use different loss function like:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_mae", patience=50, restore_best_weights=True)
model.compile(loss=tf.keras.losses.Huber(), optimizer="nadam", metrics=["mae"])
history = model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=500,
callbacks=[early_stopping_cb])

Image sequence detection with Keras, Convolutional and Stateful Neural Network

I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.
Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.
In order to achieve that I need a network with two parts:
feature extractor (normal CNN).
temporal layer (LSTM/GRU).
The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.
Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.
I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.
In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.
This is what I tried:
I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.
The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).
I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.
Data Import
batch_size=120
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"./Figures_1/Train",
validation_split=None,
subset=None,
labels="inferred",
label_mode="binary",
color_mode="rgb",
interpolation='bilinear',
batch_size=batch_size,
image_size=(600, 600),
shuffle=False,
seed=123
)
Get a sequence of Images
Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.
sequence_lengh=60
def sequence_x(train_dataset):
x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,x_numpy.shape[0]):
for i in range(0, x_numpy.shape[0],sequence_lengh):
x_seq = x_numpy[element][i:i+sequence_lengh]
yield x_seq
def sequence_y(train_dataset):
y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,y_numpy.shape[0]):
for i in range(0, y_numpy.shape[0],sequence_lengh):
y_seq = y_numpy[element][i:i+sequence_lengh]
yield y_seq
CNN Model
I build the CNN model based on a pre-trained DenseNet
from keras.layers import TimeDistributed, GRU
def build_convnet(shape=(600, 600, 3)):
inputs = keras.Input(shape = shape)
x = inputs
# preprocessing
x = keras.applications.densenet.preprocess_input(x)
#Convbase
x = convBase(x)
x = layers.Flatten()(x)
# Fine tuning
x = keras.layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = keras.layers.Dense(512, activation='relu')(x)
x = keras.layers.GlobalMaxPool2D()
return x
GRU Model
I build the time part of the network with a GRU layer
def action_model(shape=(15, 600, 600, 3), nbout=15):
# Create our convnet with (112, 112, 3) input shape
convnet = build_convnet(shape[1:]) #[1:]
# then create our final model
model = keras.Sequential()
# add the convnet with (5, 112, 112, 3) shape
model.add(TimeDistributed(convnet, input_shape=shape))
# here, you can also use GRU or LSTM
model.add(GRU(64))
# and finally, we make a decision network
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(15, activation='softmax'))
return model
Transfer Learning
I retrain a part of the GRU
convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")
for layer in convBase.layers:
if 'conv5' in layer.name:
layer.trainable = True
for layer in convBase.layers:
if 'conv4' in layer.name:
layer.trainable = True
Model Compile
Model compilation ( image size= 600x600x3)
INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)
model.compile(
optimizer,
'categorical_crossentropy',
metrics='accuracy'
)
Model Fit
Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.
epochs = 10
for value in range(0, epochs):
train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
for i in range(0,278): #
x = next(train_x, "none")
y = next(train_y, "none")
if (x!="none" or y!="none"):
if (np.any(x) and np.any(y)):
x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
y_stack=y_stack.reshape(4,15)
model.fit(x=x_stack, y=y_stack,
validation_data=None,
batch_size=None,
shuffle=False
)
else:
continue
else:
continue
The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.
The model does compile without any errors but the results it provides are horrible:
.
What am I doing incorrectly? Is there a more effective way to write all of this?

Is there a rule to find and set the number of neurons for the hidden layers of DNN?

My situation is that: multiclass classification problem, with 5 features (columns in my data), 15 classes, single label.
My model is : one input layer with 5 neurons, just one hidden layer with ReLU, and one output layer with softmax.
I have two questions:
How many neurons for the input layer? Is it certain that it is set according to the number of features plus bias? I tried tweaking the number of neurons in the input layer, say 77 neurons, the performance improved so I am confused.
I tried Randomized Search cv to find the number of hidden layer, number of neurons and learning rate, I used Randomizedsearchcv in Scikit learn, then the best_params will display something like this:
{'learning_rate': 0.0023716395806862335, 'n_layer': 1, 'n_neurons': 291}
So, the question is that, let's say,if it showed best_params 'n_layer': 2, but 'n_neurons': 291. so is it interpreted as 291 neurons per each layer, and 2 hidden layers in the model?
Thank you in advance!
The answer to your first question: input layer's shape set base num of features. In your problem you need 5 features then the input layer needs to be 5 and in my example, I have 784 features then the input layer shape should be 784.
Yes, We have the rule to find the number of neurons in the layer for DNN. I highly recommend you to use Keras Tuner. KerasTuner finds the best hyperparameter values for your models with Bayesian Optimization, Hyperband, and Random Search algorithms. I write an example with the fashion_mnist dataset with the model that you explain in your question. I use epoch=2, you can use this search with larger epochs for your problem. For this problem, KerasTuner finds that the best num neuron for the first layer = 416 (<- you want to find this) and best learning_rate-0.0001.
# !pip install -q -U keras-tuner
import tensorflow as tf
import keras_tuner as kt
(img_train, label_train), (img_test, label_test) = tf.keras.datasets.fashion_mnist.load_data()
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_train = img_train.reshape(60000, -1)
img_test = img_test.astype('float32') / 255.0
img_test = img_test.reshape(10000, -1)
label_train = tf.keras.utils.to_categorical(label_train, 10)
label_test = tf.keras.utils.to_categorical(label_test, 10)
def model_builder(hp):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(784,))
# Tune the number of units in the first Dense layer
# Choose an optimal value between 32-512
hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
model.add(tf.keras.layers.Dense(units=hp_units, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
# Tune the learning rate for the optimizer
# Choose an optimal value from 0.01, 0.001, or 0.0001
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
loss = 'categorical_crossentropy', metrics = ['accuracy'])
return model
tuner = kt.Hyperband(model_builder,objective='val_accuracy',max_epochs=3,
factor=3,directory='my_dir',project_name='intro_to_kt')
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
tuner.search(img_train, label_train, epochs=2, validation_split=0.2, callbacks=[stop_early])
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"BEST num neurons for Dense Layer : {best_hps.get('units')}")
print(f"BEST learning_rate : {best_hps.get('learning_rate')}")
Output:
Trial 11 Complete [00h 00m 14s]
val_accuracy: 0.8530833125114441
Best val_accuracy So Far: 0.8823333382606506
Total elapsed time: 00h 01m 03s
INFO:tensorflow:Oracle triggered exit
BEST num neurons for Dense Layer : 416 # <- You want this
BEST learning_rate : 0.001

Make fixed timestep length LSTM Keras model free timestep length

I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.
In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:
l_input = Input(shape=(None,), dtype="int32", name=input_name)
However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.
l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)
The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.
In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.
Here is a simplified version of the model
l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)
# Sequential output
l_out1 = TimeDistributed(Dense(len(labels.keys()),
activation="softmax"))(l_blstm)
# Global output
conv1 = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1 = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))
conv2 = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2 = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))
conv = Concatenate()( [conv1,conv2] )
conv = Dense(50, activation="relu")(conv)
l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)
model = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.
Thanks
Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:
# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)
conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)
gpool = GlobalAveragePooling1D()(conv2)
x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)
model = Model(inputs=l_input, outputs=[l_out1, l_out2])
You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.
As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.

How to decide the size of layers in Keras' Dense method?

Below is the simple example of multi-class classification task with
IRIS data.
import seaborn as sns
import numpy as np
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.regularizers import l2
from keras.utils import np_utils
#np.random.seed(1335)
# Prepare data
iris = sns.load_dataset("iris")
iris.head()
X = iris.values[:, 0:4]
y = iris.values[:, 4]
# Make test and train set
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.5, random_state=0)
################################
# Evaluate Keras Neural Network
################################
# Make ONE-HOT
def one_hot_encode_object_array(arr):
'''One hot encode a numpy array of objects (e.g. strings)'''
uniques, ids = np.unique(arr, return_inverse=True)
return np_utils.to_categorical(ids, len(uniques))
train_y_ohe = one_hot_encode_object_array(train_y)
test_y_ohe = one_hot_encode_object_array(test_y)
model = Sequential()
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
# Actual modelling
# If you increase the epoch the accuracy will increase until it drop at
# certain point. Epoch 50 accuracy 0.99, and after that drop to 0.977, with
# epoch 70
hist = model.fit(train_X, train_y_ohe, verbose=0, nb_epoch=100, batch_size=1)
score, accuracy = model.evaluate(test_X, test_y_ohe, batch_size=16, verbose=0)
print("Test fraction correct (NN-Score) = {:.2f}".format(score))
print("Test fraction correct (NN-Accuracy) = {:.2f}".format(accuracy))
My question is how do people usually decide the size of layers?
For example based on code above we have:
model.add(Dense(16, input_shape=(4,),
activation="tanh",
W_regularizer=l2(0.001)))
model.add(Dense(3, activation='sigmoid'))
Where first parameter of Dense is 16 and second is 3.
Why two layers uses two different values for Dense?
How do we choose what's the best value for Dense?
Basically it is just trial and error. Those are called hyperparameters and should be tuned on a validation set (split from your original data into train/validation/test).
Tuning just means trying different combinations of parameters and keep the one with the lowest loss value or better accuracy on the validation set, depending on the problem.
There are two basic methods:
Grid search: For each parameter, decide a range and steps into that range, like 8 to 64 neurons, in powers of two (8, 16, 32, 64), and try each combination of the parameters. This is obviously requires an exponential number of models to be trained and tested and takes a lot of time.
Random search: Do the same but just define a range for each parameter and try a random set of parameters, drawn from an uniform distribution over each range. You can try as many parameters sets you want, for as how long you can. This is just a informed random guess.
Unfortunately there is no other way to tune such parameters. About layers having different number of neurons, that could come from the tuning process, or you can also see it as dimensionality reduction, like a compressed version of the previous layer.
There is no known way to determine a good network structure evaluating the number of inputs or outputs. It relies on the number of training examples, batch size, number of epochs, basically, in every significant parameter of the network.
Moreover, a high number of units can introduce problems like overfitting and exploding gradient problems. On the other side, a lower number of units can cause a model to have high bias and low accuracy values. Once again, it depends on the size of data used for training.
Sadly it is trying some different values that give you the best adjustments. You may choose the combination that gives you the lowest loss and validation loss values, as well as the best accuracy for your dataset, as said in the previous post.
You could do some proportion on your number of units value, something like:
# Build the model
model = Sequential()
model.add(Dense(num_classes * 8, input_shape=(shape_value,), activation = 'relu' ))
model.add(Dropout(0.5))
model.add(Dense(num_classes * 4, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes * 2, activation = 'relu'))
model.add(Dropout(0.2))
#Output layer
model.add(Dense(num_classes, activation = 'softmax'))
The model above shows an example of a categorisation AI system. The num_classes are the number of different categories the system has to choose. For instance, in the iris dataset from Keras, we have:
Iris Setosa
Iris Versicolour
Iris Virginica
num_classes = 3
However, this could lead to worse results than with other random values. We need to adjust the parameters to the training dataset by making some different tries and then analyse the results seeking for the best combination of parameters.
My suggestion is to use EarlyStopping(). Then check the number of epochs and accuracy with test loss.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
rlp = lrd = ReduceLROnPlateau(monitor = 'val_loss',patience = 2,verbose = 1,factor = 0.8, min_lr = 1e-6)
es = EarlyStopping(verbose=1, patience=2)
his = classifier.fit(X_train, y_train, epochs=500, batch_size = 128, validation_split=0.1, verbose = 1, callbacks=[lrd,es])

Categories