It is known that Keras is based on static graphs. However, it seems to me that it is not really the case since I can make my graph implementation dynamic.
Here is a simple example proving my claim:
import keras.backend as K
import numpy as np
from keras.models import Model
from keras.layers import Dense, Input, Dropout
from keras.losses import mean_squared_error
def get_mnist():
np.random.seed(1234) # set seed for deterministic ordering
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_all = np.concatenate((x_train, x_test), axis = 0)
Y = np.concatenate((y_train, y_test), axis = 0)
X = x_all.reshape(-1,x_all.shape[1]*x_all.shape[2])
p = np.random.permutation(X.shape[0])
X = X[p].astype(np.float32)*0.02
Y = Y[p]
return X, Y
X, Y = get_mnist()
drop = K.variable(0.2)
input = Input(shape=(784,))
x = Dropout(rate=drop.value)(input)
x = Dense(128, activation="relu", name="encoder_layer")(x)
decoder = Dense(784, activation="relu", name="decoder_layer")(x)
autoencoder = Model(inputs=input, outputs=decoder)
autoencoder.compile(optimizer='adadelta', loss= mean_squared_error)
autoencoder.fit(X, X, batch_size=256, epochs=300)
K.set_value(drop, 0.5)
autoencoder.fit(X, X, batch_size=256, epochs=300)
It is obvious that we can change the value of drop at any time, even after compiling the model.
If it is a Static Graph, how should I be able to do so?
Am I missing the point?
In case I do, what is the real interpretation of Dynamic Graphs?
It's true that you can change drop at any time, but that doesn't mean that Keras supports a dynamic graph. You are most likely used to seeing a neural network node described as a linear function with an activation function. By stacking these nodes you get a neural network. Then by reasoning what dropout does is take out nodes dynamically. However, this is not the Keras implementation of dropout. Dropout will set the nodes output to zero. This can be done by setting all of the weights in the respective node to zero. These operations are equivalent since the premise is that the next layer receives no output from the dropped node. However, the first method requires a dynamic graph, and the second can be done with both a dynamic and static graph, which is the implementation that Keras uses. Thus, there is no need for a dynamic graph for this operation.
To understand what a dynamic graph is, it's the model itself changing. The dropout operation just doesn't change the graph architecture (number of nodes, number of layers, ect. ) after the initial construction of the graph.
Certainly not because the dropout layer doesn't really affect the network architecture.
Related
I have created a sequential model using keras package similar to this:
from keras.layers import Dense
from keras.models import Sequential
model = Sequential()
# Adding the input layer and the first hidden layer
model.add(Dense(6, activation='relu',input_dim = 11))
# Adding the second hidden layer (The real model contains many hidden layers)
model.add(Dense(6,activation='relu'))
# Adding the output layer
model.add(Dense( 1, activation = 'sigmoid'))
Then I used keras visualizer to get a visualization of the neural network without weights.
# Compiling the ANN
classifier.compile(optimizer = 'Adamax', loss = 'binary_crossentropy',metrics=['accuracy'])
model_history=classifier.fit(X_train, y_train.to_numpy(), batch_size = 10, epochs = 100)
I want to print trained weights of the model for this kind of visualization. Is there any library or module that I can use for that? Any suggestion will be helpful. Here is the picture of the designed neural network without printing weights.
I want to print trained weights of the model to this kind of visualization. Is there any library or module that I can use for that?
Option1: deepreplay
There is a workaround in the form of package\module so-called Deep Replay you can import as a library for resolving your problem.
Thanks to this package, you can visualize\animate and the most probably print trained weights using the following example:
# install FFMPEG (to generate animations)
#!apt-get install ffmpeg
# install actual deepreplay package
#!pip install deepreplay
from keras.initializers import glorot_normal, glorot_uniform, he_normal, he_uniform
from keras.layers import Dense
from keras.models import Sequential
from deepreplay.callbacks import ReplayData
from deepreplay.datasets.ball import load_data
from deepreplay.plot import compose_plots, compose_animations
from deepreplay.replay import Replay
from matplotlib import pyplot as plt
plt.rcParams['animation.ffmpeg_path'] = '/usr/bin/ffmpeg'
X, y = load_data(n_dims=10)
activation = 'relu'
initializer_name = 'he_uniform'
initializer = eval(initializer_name)(seed=13)
title = 'Activation: ReLU - Initializer: {}'.format(initializer_name)
group_name = 'relu_{}'.format(initializer_name)
filename = f'{group_name}_{initializer_name}_{activation}_weight_initializers.h5'
# Model builder function
def build_model(n_layers, input_dim, units, activation, initializer):
if isinstance(units, list):
assert len(units) == n_layers
else:
units = [units] * n_layers
model = Sequential()
# Adds first hidden layer with input_dim parameter
model.add(Dense(units=units[0],
input_dim=input_dim,
activation=activation,
kernel_initializer=initializer,
name='h1'))
# Adds remaining hidden layers
for i in range(2, n_layers + 1):
model.add(Dense(units=units[i-1],
activation=activation,
kernel_initializer=initializer,
name='h{}'.format(i)))
# Adds output layer
model.add(Dense(units=1, activation='sigmoid', kernel_initializer=initializer, name='o'))
# Compiles the model
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['acc'])
return model
replaydata = ReplayData(X, y, filename=filename, group_name=group_name)
# Create the MLP model with 5 layers within 10 input neurons and 100 unists in hidden and output layers
model = build_model(n_layers=5, input_dim=10, units=100, activation=activation, initializer=initializer)
# fit the model over 10 epochs with batch size of 16
model.fit(X, y, epochs=10, batch_size=16, callbacks=[replaydata])
# Plot the results
replay = Replay(replay_filename=filename, group_name=group_name)
fig = plt.figure(figsize=(12, 6))
ax_zvalues = plt.subplot2grid((2, 2), (0, 0))
ax_weights = plt.subplot2grid((2, 2), (0, 1))
ax_activations = plt.subplot2grid((2, 2), (1, 0))
ax_gradients = plt.subplot2grid((2, 2), (1, 1))
wv = replay.build_weights(ax_weights)
gv = replay.build_gradients(ax_gradients)
# Z-values
zv = replay.build_outputs(ax_zvalues, before_activation=True,
exclude_outputs=True, include_inputs=False)
# Activations
av = replay.build_outputs(ax_activations, exclude_outputs=True, include_inputs=False)
# Save plots
fig = compose_plots([zv, wv, av, gv], epoch=0, title=title)
fig.savefig('part2.png', format='png', dpi=120)
# Animate & save mp4
sample_anim = compose_animations([zv, wv, av, gv])
sample_anim.save('part2.mp4', dpi=120, fps=5)
visulize output results using violin plots over 10 epochs for simple:
So the top right subplot shows Weights change through the layers over ten epochs. The other subplots illustrate the performance of Z-values, Activation functions, and Gradients changes.
Note1: if you are interested to interpret violin plots, please check these posts: post1 , post2, post3
Note2: Please notice that the training process starts with some initializers, which can have different weighing at the beginning. The common initialization schemes are as follows:
Random
Xavier / Glorot
He
By default, kernel initializer is glorot_uniform when you use keras module (reference), but you can check this post and this paper Understanding the difficulty of training deep feedforward neural networks for further info. It is also possible to initialize weights in NN manually. You can check this post.
Note3: Recently, this package has a bug and can't be implemented in Google Colab Notebook, which is still an open issue; its GH Repo as well as post in SoF. So it is better to try it on your own local machine, hopefully.
Option2:wandb
There is another ML-based tool, so-called W&B (Weights and Biases) you can import as a library for resolving your problem.
once you sign-up and login into your account based on instructions, you can use this API to track and visualize all the pieces of your ML pipeline, including Weights and Biases and other parameters in your pipeline:
import wandb
from wandb.keras import WandbCallback
# Step1: Initialize W&B run
wandb.init(project='project_name')
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
# Model training code here ...
import tensorflow as tf
from tensorflow import keras
loss=tf.keras.losses.MeanSquaredError()
Optimiser=tf.keras.optimizers.Adam(learning_rate =0.001)
model.compile(loss=loss, optimizer=Optimiser, metrics=['accuracy'])
wandb.log({"loss": loss})
# Step 3: Add WandbCallback
model.fit(X, y, epochs=10, batch_size=16, callbacks=[WandbCallback()])
once you run your model, you can check graph info in the Model section which is selected\shown on the left side with blue color:
hope this answer helps you out, and if it is so, you can accept it as an answer ✅.
I don't understand why my keras model is under-estimating the target. I include the minimal example below. If I simplify the model architecture, the predictions are closer to the true one. But what I confuse is if the complex model overfits, why isn't the predictions be extremely close to the training true value but off systematically like that? (The plot is for training data)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.metrics import mean_squared_error
def create_dataset(num_series=1, num_steps=1000, period=500, mu=1, sigma=0.3):
noise = np.random.normal(mu, sigma, size=(num_series, num_steps))
sin_minumPi_Pi = np.sin(np.tile(np.linspace(-np.pi, np.pi, period), int(num_steps / period)))
sin_Zero_2Pi = np.sin(np.tile(np.linspace(0, 2 * np.pi, period), int(num_steps / period)))
pattern = np.concatenate((np.tile(sin_minumPi_Pi.reshape(1, -1),
(int(np.ceil(num_series / 2)),1)),
np.tile(sin_Zero_2Pi.reshape(1, -1),
(int(np.floor(num_series / 2)), 1))
),
axis=0
)
target = noise + pattern
return target[0]
avail=create_dataset(mu=5)
window_size = 7
def getdata(data,window_size):
X,y = np.array([1]*window_size),np.array([])
for i in range(window_size, len(data)):
X = np.vstack((X,data[i-window_size:i]))
y = np.append(y,data[i:i+1])
return X[1:],y
X,y = getdata(avail,window_size)
def train_model(X,y,a_dim=100,epoch=50,batch_size=32,d=0.2):
model = Sequential()
model.add(Dense(a_dim,activation='relu',input_dim=X.shape[1]))
model.add(Dropout(d))
model.add(Dense(a_dim,activation='relu'))
model.add(Dropout(d))
model.add(Dense(a_dim,activation='relu'))
model.add(Dropout(d))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=epoch,batch_size=batch_size, verbose=2)
return model
model = train_model(X,y)
plt.plot(model.predict(X)[:,0])
plt.plot(y)
plt.show()
It is because your model has been built for this behaviour.
For a model is the main purpose to get an optimized function which fits any unseen data as well as possible. Aming this, your model starts to learn from training samples. The problem is, that during the training process your model tends to overfit on your training samples loosing its generalization ability i.e. to perform well on unseen data.
To avoid this, we use some technics, one of them is dropout.
You used it as well:
model.add(Dropout(d))
with the default parameter d=0.2:
def train_model(X,y,a_dim=100,epoch=50,batch_size=32,d=0.2):
As I wrote above this is to avoid overfit on your training data and that is why your model constantly overestimate it, but get a better estimation to unseen data.
Passing other dropout value to your train_model() function you will get better fitting to your train data:
model = train_model(X, y, d=0.0)
plt.plot(model.predict(X)[:,0])
plt.plot(y)
plt.show()
Out:
In my TensorFlow model I have some data that I feed into a stack of CNNs before it goes into a few fully connected layers. I have implemented that with Keras' Sequential model. However, I now have some data that should not go into the CNN and instead be fed directly into the first fully connected layer because that data contains some values and labels that are part of the input data but that data should not undergo convolutions as it is not image data.
Is such a thing possible with tensorflow.keras or should I do that with tensorflow.nn instead? As far as I understand Keras' sequential models is that the input goes in one end and comes out the other with no special wiring in the middle.
Am I correct that to do this I have to use tensorflow.concat on the data from the last CNN layer and the data that bypasses the CNNs before feeding it into the first fully connected layer?
Here is an simple example in which the operation is to sum the activations from different subnets:
import keras
import numpy as np
import tensorflow as tf
from keras.layers import Input, Dense, Activation
tf.reset_default_graph()
# this represents your cnn model
def nn_model(input_x):
feature_maker = Dense(10, activation='relu')(input_x)
feature_maker = Dense(20, activation='relu')(feature_maker)
feature_maker = Dense(1, activation='linear')(feature_maker)
return feature_maker
# a list of input layers, of course the input shapes can be different
input_layers = [Input(shape=(3, )) for _ in range(2)]
coupled_feature = [nn_model(input_x) for input_x in input_layers]
# assume you take the sum of the outputs
coupled_feature = keras.layers.Add()(coupled_feature)
prediction = Dense(1, activation='relu')(coupled_feature)
model = keras.models.Model(inputs=input_layers, outputs=prediction)
model.compile(loss='mse', optimizer='adam')
# example training set
x_1 = np.linspace(1, 90, 270).reshape(90, 3)
x_2 = np.linspace(1, 90, 270).reshape(90, 3)
y = np.random.rand(90)
inputs_x = [x_1, x_2]
model.fit(inputs_x, y, batch_size=32, epochs=10)
You can actually plot the model to gain more intuition
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True)
The model of the above code looks like this
With a little remodeling and the functional API you can:
#create the CNN - it can also be a sequential
cnn_input = Input(image_shape)
cnn_output = Conv2D(...)(cnn_input)
cnn_output = Conv2D(...)(cnn_output)
cnn_output = MaxPooling2D()(cnn_output)
....
cnn_model = Model(cnn_input, cnn_output)
#create the FC model - can also be a sequential
fc_input = Input(fc_input_shape)
fc_output = Dense(...)(fc_input)
fc_output = Dense(...)(fc_output)
fc_model = Model(fc_input, fc_output)
There is a lot of space for creativity, this is just one of the ways.
#create the full model
full_input = Input(image_shape)
full_output = cnn_model(full_input)
full_output = fc_model(full_output)
full_model = Model(full_input, full_output)
You can use any of the three models in any way you want. They share the layers and the weights, so internally they are the same.
Saving and loading the full model might be quirky. I'd probably save the other two separately and when loading create the full model again.
Notice also that if you save two models that share the same layers, after loading they will probably not share these layers anymore. (Another reason for saving/loading only fc_model and cnn_model, while creating full_model again from code)
BatchNormalization (BN) operates slightly differently when in training and in inference. In training, it uses the average and variance of the current mini-batch to scale its inputs; this means that the exact result of the application of batch normalization depends not only on the current input, but also on all other elements of the mini-batch. This is clearly not desirable when in inference mode, where we want a deterministic result. Therefore, in that case, a fixed statistic of the global average and variance over the entire training set is used.
In Tensorflow, this behavior is controlled by a boolean switch training that needs to be specified when calling the layer, see https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization. How do I deal with this switch when using Keras high-level API? Am I correct in assuming that it is dealt with automatically, depending whether we are using model.fit(x, ...) or model.predict(x, ...)?
To test this, I have written this example. We start with a random distribution and we want to classify whether the input is positive or negative. However, we also have a test dataset coming from a different distribution where the inputs are displaced by 2 (and consequently the labels check whether x>2).
import numpy as np
from math import ceil
from tensorflow.python.data import Dataset
from tensorflow.python.keras import Input, Model
from tensorflow.python.keras.layers import Dense, BatchNormalization
np.random.seed(18)
xt = np.random.randn(10_000, 1)
yt = np.array([[int(x > 0)] for x in xt])
train_data = Dataset.from_tensor_slices((xt, yt)).shuffle(10_000).repeat().batch(32).prefetch(2)
xv = np.random.randn(100, 1)
yv = np.array([[int(x > 0)] for x in xv])
valid_data = Dataset.from_tensor_slices((xv, yv)).repeat().batch(32).prefetch(2)
xs = np.random.randn(100, 1) + 2
ys = np.array([[int(x > 2)] for x in xs])
test_data = Dataset.from_tensor_slices((xs, ys)).repeat().batch(32).prefetch(2)
x = Input(shape=(1,))
a = BatchNormalization()(x)
a = Dense(8, activation='sigmoid')(a)
a = BatchNormalization()(a)
y = Dense(1, activation='sigmoid')(a)
model = Model(inputs=x, outputs=y, )
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_data, epochs=10, steps_per_epoch=ceil(10_000 / 32), validation_data=valid_data,
validation_steps=ceil(100 / 32))
zs = model.predict(test_data, steps=ceil(100 / 32))
print(sum([ys[i] == int(zs[i] > 0.5) for i in range(100)]))
Running the code prints the value 0.5, meaning that half the examples are labeled properly. This is what I would expect if the system was using the global statistics on the training set to implement BN.
If we change the BN layers to read
x = Input(shape=(1,))
a = BatchNormalization()(x, training=True)
a = Dense(8, activation='sigmoid')(a)
a = BatchNormalization()(a, training=True)
y = Dense(1, activation='sigmoid')(a)
and run the code again we find 0.87. Forcing always the training state, the percentage of correct prediction has changed. This is consistent with the idea that model.predict(x, ...) is now using the statistic of the mini-batch to implement BN, and is therefore able to slightly "correct" the mismatch in the source distributions between training and test data.
Is that correct?
If I'm understanding your question correctly, then yes, keras does automatically manage training vs inference behavior based on fit vs predict/evaluate. The flag is called learning_phase, and it determines the behavior of batch norm, dropout, and potentially other things. The current learning phase can be seen with keras.backend.learning_phase(), and set with keras.backend.set_learning_phase().
https://keras.io/backend/#learning_phase
I'm trying to get the activation values for each layer in this baseline autoencoder built using Keras since I want to add a sparsity penalty to the loss function based on the Kullbach-Leibler (KL) divergence, as shown here, pag. 14.
In this scenario, I'm going to calculate the KL divergence for each layer and then sum all of them with the main loss function, e.g. mse.
I therefore made a script in Jupyter where I do that but all the time, when I try to compile I get ZeroDivisionError: integer division or modulo by zero.
This is the code
import numpy as np
from keras.layers import Conv2D, Activation
from keras.models import Sequential
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32')
beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
model = Sequential()
model.add(Conv2D(filters=16,kernel_size=(4,4),padding='same',
name='encoder',input_shape=(128,128,1)))
model.add(Activation('relu'))
# get the average activation
A = K.mean(x=model.output)
# calculate the value for the KL divergence
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, A)],axis=0)
# decoder
model.add(Conv2D(filters=1,kernel_size=(4,4),padding='same', name='encoder'))
model.add(Activation('relu'))
B = K.mean(x=model.output)
kl = K.concatenate([kl, losses.kullback_leibler_divergence(p, B)],axis=0)
Here seems the cause
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in _normalize_axis(axis, ndim)
989 else:
990 if axis is not None and axis < 0:
991 axis %= ndim <----------
992 return axis
993
so there might be something wrong in the mean calculation. If I print the value I get
Tensor("Mean_10:0", shape=(), dtype=float32)
that is quite strange because the weights and the biases are non-zero initialised. Thus, there might be something wrong in the way of getting the activation values either.
I really would not know hot to fix it, I'm not much of a skilled programmer.
Could anyone help me in understanding where I'm wrong?
First, you shouldn't be doing calculations outside layers. The model must keep track of all calculations.
If you need a specific calculation to be done in the middle of the model, you should use a Lambda layer.
If you need that a specific output be used in the loss function, you should split your model for that output and do calculations inside a custom loss function.
Here, I used Lambda layer to calculate the mean, and a customLoss to calculate the kullback-leibler divergence.
import numpy as np
from keras.layers import *
from keras.models import Model
from keras import backend as K
from keras import losses
x_train = np.random.rand(128,128).astype('float32')
kl = K.placeholder(dtype='float32') #you'll probably not need this anymore, since losses will be treated individually in each output.
beta = beta = K.constant(value=5e-1)
p = K.constant(value=5e-2)
# encoder
inp = Input((128,128,1))
lay = Convolution2D(filters=16,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(inp)
#apply the mean using a lambda layer:
intermediateOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(lay)
# decoder
finalOut = Convolution2D(filters=1,kernel_size=(4,4),padding='same', name='encoder',activation='relu')(lay)
#but from that, let's also calculate a mean output for loss:
meanFinalOut = Lambda(lambda x: K.mean(x),output_shape=(1,))(finalOut)
#Now, you have to create a model taking one input and those three outputs:
splitModel = Model(inp,[intermediateOut,meanFinalOut,finalOut])
And finally, compile your model with your custom loss function (we will define that later). But since I don't know if you're actually using the final output (not mean) for training, I'll suggest creating one model for training and another for predicting:
trainingModel = Model(inp,[intermediateOut,meanFinalOut])
trainingModel.compile(...,loss=customLoss)
predictingModel = Model(inp,finalOut)
#you don't need to compile the predicting model since you're only training the trainingModel
#both will share the same weights, you train one, and predict in the other
Our custom loss function should then deal with the kullback.
def customLoss(p,mean):
return #your own kullback expression (I don't know how it works, but maybe keras' one can be used with single values?)
Alternatively, if you want a single loss function to be called instead of two:
summedMeans = Add([intermediateOut,meanFinalOut])
trainingModel = Model(inp, summedMeans)