How to visualize a keras neural network with trained weights? - python

I have created a sequential model using keras package similar to this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
# Adding the input layer and the first hidden layer
model.add(Dense(6, activation='relu',input_dim = 11))
# Adding the second hidden layer (The real model contains many hidden layers)
# Adding the output layer
model.add(Dense( 1, activation = 'sigmoid'))
Then I used keras visualizer to get a visualization of the neural network without weights.
# Compiling the ANN
classifier.compile(optimizer = 'Adamax', loss = 'binary_crossentropy',metrics=['accuracy']), y_train.to_numpy(), batch_size = 10, epochs = 100)
I want to print trained weights of the model for this kind of visualization. Is there any library or module that I can use for that? Any suggestion will be helpful. Here is the picture of the designed neural network without printing weights.

Option1: deepreplay
There is a workaround in the form of package\module so-called Deep Replay you can import as a library for resolving your problem.
Thanks to this package, you can visualize\animate and the most probably print trained weights using the following example:
# install FFMPEG (to generate animations)
#!apt-get install ffmpeg
# install actual deepreplay package
#!pip install deepreplay
from keras.initializers import glorot_normal, glorot_uniform, he_normal, he_uniform
from keras.layers import Dense
from keras.models import Sequential
from deepreplay.callbacks import ReplayData
from deepreplay.datasets.ball import load_data
from deepreplay.plot import compose_plots, compose_animations
from deepreplay.replay import Replay
from matplotlib import pyplot as plt
plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'
X, y = load_data(n_dims=10)
activation = 'relu'
initializer_name = 'he_uniform'
initializer = eval(initializer_name)(seed=13)
title = 'Activation: ReLU - Initializer: {}'.format(initializer_name)
group_name = 'relu_{}'.format(initializer_name)
filename = f'{group_name}_{initializer_name}_{activation}_weight_initializers.h5'
# Model builder function
def build_model(n_layers, input_dim, units, activation, initializer):
if isinstance(units, list):
assert len(units) == n_layers
units = [units] * n_layers
model = Sequential()
# Adds first hidden layer with input_dim parameter
# Adds remaining hidden layers
for i in range(2, n_layers + 1):
# Adds output layer
model.add(Dense(units=1, activation='sigmoid', kernel_initializer=initializer, name='o'))
# Compiles the model
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['acc'])
return model
replaydata = ReplayData(X, y, filename=filename, group_name=group_name)
# Create the MLP model with 5 layers within 10 input neurons and 100 unists in hidden and output layers
model = build_model(n_layers=5, input_dim=10, units=100, activation=activation, initializer=initializer)
# fit the model over 10 epochs with batch size of 16, y, epochs=10, batch_size=16, callbacks=[replaydata])
# Plot the results
replay = Replay(replay_filename=filename, group_name=group_name)
fig = plt.figure(figsize=(12, 6))
ax_zvalues = plt.subplot2grid((2, 2), (0, 0))
ax_weights = plt.subplot2grid((2, 2), (0, 1))
ax_activations = plt.subplot2grid((2, 2), (1, 0))
ax_gradients = plt.subplot2grid((2, 2), (1, 1))
wv = replay.build_weights(ax_weights)
gv = replay.build_gradients(ax_gradients)
# Z-values
zv = replay.build_outputs(ax_zvalues, before_activation=True,
exclude_outputs=True, include_inputs=False)
# Activations
av = replay.build_outputs(ax_activations, exclude_outputs=True, include_inputs=False)
# Save plots
fig = compose_plots([zv, wv, av, gv], epoch=0, title=title)
fig.savefig('part2.png', format='png', dpi=120)
# Animate & save mp4
sample_anim = compose_animations([zv, wv, av, gv])'part2.mp4', dpi=120, fps=5)
visulize output results using violin plots over 10 epochs for simple:
So the top right subplot shows Weights change through the layers over ten epochs. The other subplots illustrate the performance of Z-values, Activation functions, and Gradients changes.
Note1: if you are interested to interpret violin plots, please check these posts: post1 , post2, post3
Note2: Please notice that the training process starts with some initializers, which can have different weighing at the beginning. The common initialization schemes are as follows:
Xavier / Glorot
By default, kernel initializer is glorot_uniform when you use keras module (reference), but you can check this post and this paper Understanding the difficulty of training deep feedforward neural networks for further info. It is also possible to initialize weights in NN manually. You can check this post.
Note3: Recently, this package has a bug and can't be implemented in Google Colab Notebook, which is still an open issue; its GH Repo as well as post in SoF. So it is better to try it on your own local machine, hopefully.
There is another ML-based tool, so-called W&B (Weights and Biases) you can import as a library for resolving your problem.
once you sign-up and login into your account based on instructions, you can use this API to track and visualize all the pieces of your ML pipeline, including Weights and Biases and other parameters in your pipeline:
import wandb
from wandb.keras import WandbCallback
# Step1: Initialize W&B run
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
# Model training code here ...
import tensorflow as tf
from tensorflow import keras
Optimiser=tf.keras.optimizers.Adam(learning_rate =0.001)
model.compile(loss=loss, optimizer=Optimiser, metrics=['accuracy'])
wandb.log({"loss": loss})
# Step 3: Add WandbCallback, y, epochs=10, batch_size=16, callbacks=[WandbCallback()])
once you run your model, you can check graph info in the Model section which is selected\shown on the left side with blue color:
hope this answer helps you out, and if it is so, you can accept it as an answer ✅.


Image sequence detection with Keras, Convolutional and Stateful Neural Network

I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.
Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.
In order to achieve that I need a network with two parts:
feature extractor (normal CNN).
temporal layer (LSTM/GRU).
The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.
Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.
I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.
In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.
This is what I tried:
I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.
The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).
I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.
Data Import
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
image_size=(600, 600),
Get a sequence of Images
Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.
def sequence_x(train_dataset):
x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,x_numpy.shape[0]):
for i in range(0, x_numpy.shape[0],sequence_lengh):
x_seq = x_numpy[element][i:i+sequence_lengh]
yield x_seq
def sequence_y(train_dataset):
y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,y_numpy.shape[0]):
for i in range(0, y_numpy.shape[0],sequence_lengh):
y_seq = y_numpy[element][i:i+sequence_lengh]
yield y_seq
CNN Model
I build the CNN model based on a pre-trained DenseNet
from keras.layers import TimeDistributed, GRU
def build_convnet(shape=(600, 600, 3)):
inputs = keras.Input(shape = shape)
x = inputs
# preprocessing
x = keras.applications.densenet.preprocess_input(x)
x = convBase(x)
x = layers.Flatten()(x)
# Fine tuning
x = keras.layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = keras.layers.Dense(512, activation='relu')(x)
x = keras.layers.GlobalMaxPool2D()
return x
GRU Model
I build the time part of the network with a GRU layer
def action_model(shape=(15, 600, 600, 3), nbout=15):
# Create our convnet with (112, 112, 3) input shape
convnet = build_convnet(shape[1:]) #[1:]
# then create our final model
model = keras.Sequential()
# add the convnet with (5, 112, 112, 3) shape
model.add(TimeDistributed(convnet, input_shape=shape))
# here, you can also use GRU or LSTM
# and finally, we make a decision network
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(15, activation='softmax'))
return model
Transfer Learning
I retrain a part of the GRU
convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")
for layer in convBase.layers:
if 'conv5' in
layer.trainable = True
for layer in convBase.layers:
if 'conv4' in
layer.trainable = True
Model Compile
Model compilation ( image size= 600x600x3)
INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)
Model Fit
Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.
epochs = 10
for value in range(0, epochs):
train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
for i in range(0,278): #
x = next(train_x, "none")
y = next(train_y, "none")
if (x!="none" or y!="none"):
if (np.any(x) and np.any(y)):
x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
y_stack=y_stack.reshape(4,15), y=y_stack,
The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.
The model does compile without any errors but the results it provides are horrible:
What am I doing incorrectly? Is there a more effective way to write all of this?

How can I tune neural network architecture using KerasTuner?

I'm trying to use KerasTuner to automatically tune the neural network architecture, i.e., the number of hidden layers and the number of nodes in each hidden layer. Currently, the neural network architecture is defined using one parameter NN_LAYER_SIZES. For example,
NN_LAYER_SIZES = [128, 128, 128, 128]
indicates the NN has 4 hidden layers and each hidden layer has 128 nodes.
KerasTuner has the following hyperparameter types (
It seems none of these hyperparameter types fits my use case. So I wrote the following code to scan the number of hidden layers and the number of nodes. However, it's not been recognized as a hyperparameter.
number_of_hidden_layer = hp.Int("layer_number", min_value=2, max_value=5, step=1)
number_of_nodes = hp.Int("node_number", min_value=4, max_value=8, step=1)
NN_LAYER_SIZES = [2**number_of_nodes for _ in range(number of hidden_layer)]
Any suggestions on how to make it right?
Maybe treat the number of layers as a hyperparameter by iterating through it when building your model. That way you can experiment with different numbers of layers combined with different numbers of nodes:
import tensorflow as tf
import keras_tuner as kt
def model_builder(hp):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
units = hp.Int('units', min_value=32, max_value=512, step=32)
layers = hp.Int('layers', min_value=2, max_value=5, step=1)
for _ in range(layers):
model.add(tf.keras.layers.Dense(units=units, activation='relu'))
return model
(img_train, label_train), (_, _) = tf.keras.datasets.fashion_mnist.load_data()
img_train = img_train.astype('float32') / 255.0
tuner = kt.Hyperband(model_builder,
factor=3), label_train, epochs=50, validation_split=0.2)
model =
history =, label_train, epochs=50, validation_split=0.2)
If you want more control and versatility in your architecture tuning, I recommend you check out My answer to "Keras Tuner: select number of units conditional on number of layers". The intuition is to define one hparam for the number of nodes in each layer individually. Like so:
neurons_first_layer = hp.Choice('neurons_first_layer', [16,32,64,128])
neurons_second_layer = hp.Choice('neurons_second_layer', [0,16,32,64,])
I implemented the build function thus that if layer has 0 nodes, it vanishes entirely. That way if neurons_second_layer = 0, the ANN has no second layer.

How to bypass portion of neural network in TensorFlow for some (but not all) features

In my TensorFlow model I have some data that I feed into a stack of CNNs before it goes into a few fully connected layers. I have implemented that with Keras' Sequential model. However, I now have some data that should not go into the CNN and instead be fed directly into the first fully connected layer because that data contains some values and labels that are part of the input data but that data should not undergo convolutions as it is not image data.
Is such a thing possible with tensorflow.keras or should I do that with tensorflow.nn instead? As far as I understand Keras' sequential models is that the input goes in one end and comes out the other with no special wiring in the middle.
Am I correct that to do this I have to use tensorflow.concat on the data from the last CNN layer and the data that bypasses the CNNs before feeding it into the first fully connected layer?
Here is an simple example in which the operation is to sum the activations from different subnets:
import keras
import numpy as np
import tensorflow as tf
from keras.layers import Input, Dense, Activation
# this represents your cnn model
def nn_model(input_x):
feature_maker = Dense(10, activation='relu')(input_x)
feature_maker = Dense(20, activation='relu')(feature_maker)
feature_maker = Dense(1, activation='linear')(feature_maker)
return feature_maker
# a list of input layers, of course the input shapes can be different
input_layers = [Input(shape=(3, )) for _ in range(2)]
coupled_feature = [nn_model(input_x) for input_x in input_layers]
# assume you take the sum of the outputs
coupled_feature = keras.layers.Add()(coupled_feature)
prediction = Dense(1, activation='relu')(coupled_feature)
model = keras.models.Model(inputs=input_layers, outputs=prediction)
model.compile(loss='mse', optimizer='adam')
# example training set
x_1 = np.linspace(1, 90, 270).reshape(90, 3)
x_2 = np.linspace(1, 90, 270).reshape(90, 3)
y = np.random.rand(90)
inputs_x = [x_1, x_2], y, batch_size=32, epochs=10)
You can actually plot the model to gain more intuition
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True)
The model of the above code looks like this
With a little remodeling and the functional API you can:
#create the CNN - it can also be a sequential
cnn_input = Input(image_shape)
cnn_output = Conv2D(...)(cnn_input)
cnn_output = Conv2D(...)(cnn_output)
cnn_output = MaxPooling2D()(cnn_output)
cnn_model = Model(cnn_input, cnn_output)
#create the FC model - can also be a sequential
fc_input = Input(fc_input_shape)
fc_output = Dense(...)(fc_input)
fc_output = Dense(...)(fc_output)
fc_model = Model(fc_input, fc_output)
There is a lot of space for creativity, this is just one of the ways.
#create the full model
full_input = Input(image_shape)
full_output = cnn_model(full_input)
full_output = fc_model(full_output)
full_model = Model(full_input, full_output)
You can use any of the three models in any way you want. They share the layers and the weights, so internally they are the same.
Saving and loading the full model might be quirky. I'd probably save the other two separately and when loading create the full model again.
Notice also that if you save two models that share the same layers, after loading they will probably not share these layers anymore. (Another reason for saving/loading only fc_model and cnn_model, while creating full_model again from code)

Keras: What is the difference between model and layers?

EDIT: This video by Franchois Chollet says that Layer + training eval methods = Model
In keras documentation it says that models are made up of layers. However in this section it shows that a model can be made up of models.
from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten
from keras.models import Model
# First, define the vision modules
digit_input = Input(shape=(27, 27, 1))
x = Conv2D(64, (3, 3))(digit_input)
x = Conv2D(64, (3, 3))(x)
x = MaxPooling2D((2, 2))(x)
out = Flatten()(x)
vision_model = Model(digit_input, out)
# Then define the tell-digits-apart model
digit_a = Input(shape=(27, 27, 1))
digit_b = Input(shape=(27, 27, 1))
# The vision model will be shared, weights and all
out_a = vision_model(digit_a)
out_b = vision_model(digit_b)
concatenated = keras.layers.concatenate([out_a, out_b])
out = Dense(1, activation='sigmoid')(concatenated)
classification_model = Model([digit_a, digit_b], out)
So, what is the effective difference between Model and layers? Is it just for code readability or does it serve some function?
In Keras, a network is a directed acyclic graph (DAG) of layers. A model is a network with added training and evaluation routines.
The framework allows you to build network DAGs out of both individual layers and other DAGs. The latter is what you're seeing in the example and what seems to be causing the confusion.
The difference is that models can be trained (they have a fit method), while layers do not have such method and need to be part of a Model instance so you can train them. Generally speaking, layers in isolation aren't useful.
The idea of the Functional API to use models inside models is that you can define one model, and reuse its weights as part of another model, in a way that weights are shared. This is not possible with layers alone.

Keras dynamic graphs

It is known that Keras is based on static graphs. However, it seems to me that it is not really the case since I can make my graph implementation dynamic.
Here is a simple example proving my claim:
import keras.backend as K
import numpy as np
from keras.models import Model
from keras.layers import Dense, Input, Dropout
from keras.losses import mean_squared_error
def get_mnist():
np.random.seed(1234) # set seed for deterministic ordering
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_all = np.concatenate((x_train, x_test), axis = 0)
Y = np.concatenate((y_train, y_test), axis = 0)
X = x_all.reshape(-1,x_all.shape[1]*x_all.shape[2])
p = np.random.permutation(X.shape[0])
X = X[p].astype(np.float32)*0.02
Y = Y[p]
return X, Y
X, Y = get_mnist()
drop = K.variable(0.2)
input = Input(shape=(784,))
x = Dropout(rate=drop.value)(input)
x = Dense(128, activation="relu", name="encoder_layer")(x)
decoder = Dense(784, activation="relu", name="decoder_layer")(x)
autoencoder = Model(inputs=input, outputs=decoder)
autoencoder.compile(optimizer='adadelta', loss= mean_squared_error), X, batch_size=256, epochs=300)
K.set_value(drop, 0.5), X, batch_size=256, epochs=300)
It is obvious that we can change the value of drop at any time, even after compiling the model.
If it is a Static Graph, how should I be able to do so?
Am I missing the point?
In case I do, what is the real interpretation of Dynamic Graphs?
It's true that you can change drop at any time, but that doesn't mean that Keras supports a dynamic graph. You are most likely used to seeing a neural network node described as a linear function with an activation function. By stacking these nodes you get a neural network. Then by reasoning what dropout does is take out nodes dynamically. However, this is not the Keras implementation of dropout. Dropout will set the nodes output to zero. This can be done by setting all of the weights in the respective node to zero. These operations are equivalent since the premise is that the next layer receives no output from the dropped node. However, the first method requires a dynamic graph, and the second can be done with both a dynamic and static graph, which is the implementation that Keras uses. Thus, there is no need for a dynamic graph for this operation.
To understand what a dynamic graph is, it's the model itself changing. The dropout operation just doesn't change the graph architecture (number of nodes, number of layers, ect. ) after the initial construction of the graph.
Certainly not because the dropout layer doesn't really affect the network architecture.
