Replace BackPropagation functionality of Keras Layers - python

I am using the yolo v3 keras implmementation and wish to add a guided backpropagation module to further analyse the behaviour of the network.
Inside the yolo.py file, I have added this function in attempt to compute backprop:
def propb_guided(self, image, layer_index):
from tensorflow.python.ops import nn_ops, gen_nn_ops
from tensorflow.python.framework import ops
box_confidence = self.yolo_model.layers[layer_index].output[..., 4:5]
box_class_probs = self.yolo_model.layers[layer_index].output[...,5:]
#box_class_probs = tf.Print(box_class_probs, [tf.shape(box_class_probs)], message="box class probs")
scores = tf.reduce_sum(box_class_probs[0] * box_confidence[0], axis=2)
#scores = tf.contrib.layers.flatten(scores)
print(self.yolo_model.input.get_shape(), "input blabal")
scores = tf.Print(scores, [tf.shape(scores)], message="scores")
#grads = tf.map_fn(lambda x: tf.gradients(x, image)[0], tf.contrib.layers.flatten(scores), dtype=tf.float32)
# gradients are the 1,1,w,h,c shape, c = 3 because RGB
grads = tf.reduce_mean(tf.gradients(scores, self.yolo_model.input)[0][0], axis=2)
grads = tf.Print(grads, [tf.shape(grads)], message="grad shape")
# prepare image for forward prop
if self.model_image_size != (None, None):
assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
else:
new_image_size = (image.width - (image.width % 32),
image.height - (image.height % 32))
boxed_image = letterbox_image(image, new_image_size)
image_data = np.array(boxed_image, dtype='float32')
print(image_data.shape)
image_data /= 255.
image_data = np.expand_dims(image_data, 0) # Add batch dimension.
# replace all relu layers with guided relu:
# TODO replacing backprop layers doesn't work. The gradient override map doesnt work in this case
#ops.RegisterGradient("GuidedReluBP")
def _GuidedReluGrad(op, grad):
#return tf.where(0. < grad, gen_nn_ops.relu_grad(grad, op.outputs[0]), tf.zeros(tf.shape(grad)))
return 100000000
tf_graph = K.get_session().graph
layers = [op.name for op in tf_graph.get_operations() if op.type=="Maximum"]
print(layers, "layers are")
with tf_graph.gradient_override_map({'Maximum': 'GuidedReluBP'}):
activation = sess.run(grads, feed_dict={
self.yolo_model.input: image_data,
self.input_image_shape: [image.size[1], image.size[0]],
K.learning_phase(): 0})
return activation
Right now, the function gets the last layer (shape (?, ?, 255)) , and multiples the 5th filter (box confidence) with the class logits (filter 6 to 80, called box_class_probs). It then sums up a multiplication of all the filters and stores this in the scores tensor.
It then calculates gradients of each pixel from scores, with respect to some input image, and stores it in grads (grads at tf.Print has shape (416,416), which is the width and height of the input image).
at the end (where the comment is 'replace all relu layers with guided relu), I want to get all of the leaky relu keras layers, and replace its back propagation mechanism. I noticed that the keras leaky relu layer has a 'maximum' operation at the end, so I tried to replace each maximum operation with the guided relu operation I have made. Although this method does not work. Instead, the activation variable returned by this function has no effect, whether I implement the RegisterGradient code or not.
How do I replace the back propagation mechanism of each leaky relu inside some keras graph? This is so that I can implement guided backprop inside the keras yolo v3 implementation.

Related

How does tensorflow handle training data passed to a neural network?

I am having an issue with my code that I modified from https://keras.io/examples/generative/wgan_gp/ . Instead of the data being images, my data is a (1001,2) array of sequential data. The first column being the time and the second the velocity measurements. I'm getting this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14704/3651127346.py in <module>
21 # Training the WGAN-GP model
22 tic = time.perf_counter()
---> 23 WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
24 toc = time.perf_counter()
25 time_elapsed(toc-tic)
~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~\Anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1021, in train_function *
return step_function(self, iterator)
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\sissonn\Anaconda3\lib\site-packages\keras\engine\training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 141, in train_step
gp = self.gradient_penalty(batch_size, x_real, x_fake)
File "C:\Users\sissonn\AppData\Local\Temp/ipykernel_14704/3074469771.py", line 106, in gradient_penalty
alpha = tf.random.uniform(batch_size,1,1)
ValueError: Shape must be rank 1 but is rank 0 for '{{node random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0](strided_slice)' with input shapes: [].
And here is my code:
import time
from tqdm.notebook import tqdm
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import numpy as np
import matplotlib.pyplot as plt
def define_generator(latent_dim):
# This function creates the generator model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=latent_dim, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(2, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the generator model
model = Model(inputs=inputs, outputs=outputs, name='GENERATOR')
return model
def generator_loss(fake_logits):
# This function calculates and returns the WGAN-GP generator loss.
# Expected value of critic ouput from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = -expectation_fake
return loss
def define_critic():
# This function creates the critic model using the functional API.
# Layers...
# Input Layer
inputs = Input(shape=2, name='INPUT_LAYER')
# 1st hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_1')(inputs)
# 2nd hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_2')(x)
# 3rd hidden layer
x = Dense(300, activation='relu', name='HIDDEN_LAYER_3')(x)
# 4th hidden layer
x = Dense(150, activation='relu', name='HIDDEN_LAYER_4')(x)
# 5th hidden layer
x = Dense(50, activation='relu', name='HIDDEN_LAYER_5')(x)
# Output layer
outputs = Dense(1, activation='linear', name='OUPUT_LAYER')(x)
# Instantiating the critic model
model = Model(inputs=inputs, outputs=outputs, name='CRITIC')
return model
def critic_loss(real_logits, fake_logits):
# This function calculates and returns the WGAN-GP critic loss.
# Expected value of critic output from real images
expectation_real = tf.reduce_mean(real_logits)
# Expected value of critic output from fake images
expectation_fake = tf.reduce_mean(fake_logits)
# Loss to minimize
loss = expectation_fake - expectation_real
return loss
class define_wgan(keras.Model):
# This class creates the WGAN-GP object.
# Attributes:
# critic = the critic model.
# generator = the generator model.
# latent_dim = defines generator input dimension.
# critic_steps = defines how many times the discriminator gets trained for each training cycle.
# gp_weight = defines and returns the critic gradient for the gradient penalty term.
# Methods:
# compile() = defines the optimizer and loss function of both the critic and generator.
# gradient_penalty() = calcuates and returns the gradient penalty term in the WGAN-GP loss function.
# train_step() = performs the WGAN-GP training by updating the critic and generator weights
# and returns the loss for both. Called by fit().
def __init__(self, gen, critic, latent_dim, n_critic_train, gp_weight):
super().__init__()
self.critic = critic
self.generator = gen
self.latent_dim = latent_dim
self.critic_steps = n_critic_train
self.gp_weight = gp_weight
def compile(self, generator_loss, critic_loss):
super().compile()
self.generator_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.critic_optimizer = keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5, beta_2=0.9)
self.generator_loss_function = generator_loss
self.critic_loss_function = critic_loss
def gradient_penalty(self, batch_size, x_real, x_fake):
# Random uniform samples of points between distribution.
# "alpha" must be a tensor so that "x_interp" will also be a tensor.
alpha = tf.random.uniform(batch_size,1,1)
# Data interpolated between real and fake distributions
x_interp = alpha*x_real + (1-alpha)*x_fake
# Calculating critic output gradient wrt interpolated data
with tf.GradientTape() as gp_tape:
gp_tape.watch(x_interp)
critc_output = self.discriminator(x_interp, training=True)
grad = gp_tape.gradient(critic_output, x_interp)[0]
# Calculating norm of gradient
grad_norm = tf.sqrt(tf.reduce_sum(tf.square(grad)))
# calculating gradient penalty
gp = tf.reduce_mean((norm - 1.0)**2)
return gp
def train_step(self, x_real):
# Critic training
# Getting batch size for creating latent vectors
print(x_real)
batch_size = tf.shape(x_real)[0]
print(batch_size)
# Critic training loop
for i in range(self.critic_steps):
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Critic output from real data
real_logits = self.critic(x_real, training=True)
# Calculating critic loss
c_loss = self.critic_loss_function(real_logits, fake_logits)
# Calcuating gradient penalty
gp = self.gradient_penalty(batch_size, x_real, x_fake)
# Adjusting critic loss with gradient penalty
c_loss = c_loss + gp_weight*gp
# Calculating gradient of critic loss wrt critic weights
critic_grad = tape.gradient(c_loss, self.critic.trainable_variables)
# Updating critic weights
self.critic_optimizer.apply_gradients(zip(critic_gradient, self.critic.trainable_variables))
# Generator training
# Generating latent vectors
latent = tf.random.normal(shape=(batch_size, self.latent_dim))
with tf.GradientTape() as tape:
# Obtaining fake data from generator
x_fake = self.generator(latent, training=True)
# Critic output from fake data
fake_logits = self.critic(x_fake, training=True)
# Calculating generator loss
g_loss = self.generator_loss_function(fake_logits)
# Calculating gradient of generator loss wrt generator weights
genertor_grad = tape.gradient(g_loss, self.generator.trainable_variables)
# Updating generator weights
self.generator_optimizer.apply_gradients(zip(generator_gradient, self.generator.trainable_variables))
return g_loss, c_loss
class GAN_monitor(keras.callbacks.Callback):
def __init__(self, n_samples, latent_dim):
self.n_samples = n_samples
self.latent_dim = latent_dim
def on_epoch_end(self, epoch, logs=None):
latent = tf.random.normal(shape=(self.n_samples, self.latent_dim))
generated_data = self.model.generator(latent)
plt.plot(generated_data)
plt.savefig('Epoch _'+str(epoch)+'.png', dpi=300)
data = np.genfromtxt('Flight_1.dat', dtype='float', encoding=None, delimiter=',')[0:1001,0]
time_span = np.linspace(0,20,1001)
dataset = np.concatenate((time_sapn[:,np.newaxis], data[:,np.newaxis]), axis=1)
dataset.shape
# Training Parameters
latent_dim = 100
n_epochs = 10
n_critic_train = 5
gp_weight = 10
batch_Size = 100
# Instantiating the generator and discriminator models
gen = define_generator(latent_dim)
critic = define_critic()
# Instantiating the WGAN-GP object
WGAN = define_wgan(gen, critic, latent_dim, n_critic_train, gp_weight)
# Compling the WGAN-GP model
WGAN.compile(generator_loss, critic_loss)
# Instantiating custom Keras callback
cbk = GAN_monitor(n_samples=1, latent_dim=latent_dim)
# Training the WGAN-GP model
tic = time.perf_counter()
WGAN.fit(dataset, batch_size=batch_Size, epochs=n_epochs, callbacks=[cbk])
toc = time.perf_counter()
time_elapsed(toc-tic)
This issue is the shape I am providing to tf.random.rand() for the assignment of alpha. I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example. So I don't know how to specify the shape for my example. Furthermore I don't understand this line in the Keras example:
batch_size = tf.shape(real_images)[0]
In this example 'real_images' is a (60000, 28, 28, 1) array and it gets passed to the fit() method which then passes it to the train_step() method. (It gets passed as "train_images", but they are the same variable.) If I add a line that prints out 'real_images' before this tf.shape() this is what it produces:
Tensor("IteratorGetNext:0", shape=(None, 28, 28, 1), dtype=float32)
Why is the 60000 now None? Then, I added a line that printed out "batch_size" after the tf.shape() and this is what it produces:
Tensor("strided_slice:0", shape=(), dtype=int32)
I googled "tf strided_slice", but all I could find is the method tf.strided_slice(). So what exactly is the value of "batch_size" and why are the output of variables so ambiguous when they are tensors? In fact, I type:
tf.shape(train_images)[0]
in another cell of Jupyter notebook. I get a completely different output:
<tf.Tensor: shape=(), dtype=int32, numpy=60000>
I really need to understand this Keras example in order to successfully implement this code for my data. Any help is appreciated.
BTW: I am using only one set of data for now, but once I get the GAN running, I will provide multiple sets of these (1001,2) datasets. Also, if you want to test the code yourself, replacing the "dataset" variable with any (1001,2) numpy array should suffice. Thank You.
'Why is the 60000 now None?': In defining TensorFlow models, the first dimension (batch_size) is None. Getting under the hood of what goes on with TensorFlow and how it uses graphs for computation can be quite complex. But for your understanding right now, all you need to know is that batch_size does not need to be specified when defining the model, hence None. This is essential as it allow a model to be defined once but then trained with and applied to datasets of an arbitrary number of examples. For example, when training you may provide the model with a batch of 256 images at a time, but when using the trained model for inference, it's very likely that you might only want the input to be a single image. Therefore the actual value of the first dimension of the size of the input is only important once the computation is going to begin.
'I don't fully understand why the shape input is (batch_size, 1, 1, 1) in the Keras example': The reason for this size is that you want a different random value, alpha, for each image. You have batch_size number of images, hence batch_size in the first dimension, but it is just a single value in tensor format, so it only need size 1 in all other dimensions. The reason it has 4 dimensions overall is so that it can be used in calculation with your inputs, which are 4-D image tensors which will have a shape of something like (batch_size, img_h, img_w, 3) for color images with 3 RGB channels.
In terms of understanding your error, Shape must be rank 1 but is rank 0, this is saying that the function you are using - tf.random.uniform requires a rank 1 tensor, i.e. something with 1 dimension, but is being passed a rank 0 tensor, i.e. a scalar value. It is possible from your code that you are just passing it the value of batch_size rather than a tensor. This might work instead:
alpha = tf.random.uniform([batch_size, 1, 1, 1])
The first parameter of this function is its shape and so it is important to have the [] there. Check out the documentation on this function in order to make sure you're using it correctly - https://www.tensorflow.org/api_docs/python/tf/random/uniform.

How to override gradient for the nonlinearity functions in lasagne?

I have a model, for which i need to compute the gradients of output w.r.t the model's input. But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers. So i tried the idea explained here, which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass. I added the following two classes:
The helper class that allows us to replace a nonlinearity with an Op
that has the same output, but a custom gradient
class ModifiedBackprop(object):
def __init__(self, nonlinearity):
self.nonlinearity = nonlinearity
self.ops = {} # memoizes an OpFromGraph instance per tensor type
def __call__(self, x):
# OpFromGraph is oblique to Theano optimizations, so we need to move
# things to GPU ourselves if needed.
if theano.sandbox.cuda.cuda_enabled:
maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable
else:
maybe_to_gpu = lambda x: x
# We move the input to GPU if needed.
x = maybe_to_gpu(x)
# We note the tensor type of the input variable to the nonlinearity
# (mainly dimensionality and dtype); we need to create a fitting Op.
tensor_type = x.type
# If we did not create a suitable Op yet, this is the time to do so.
if tensor_type not in self.ops:
# For the graph, we create an input variable of the correct type:
inp = tensor_type()
# We pass it through the nonlinearity (and move to GPU if needed).
outp = maybe_to_gpu(self.nonlinearity(inp))
# Then we fix the forward expression...
op = theano.OpFromGraph([inp], [outp])
# ...and replace the gradient with our own (defined in a subclass).
op.grad = self.grad
# Finally, we memoize the new Op
self.ops[tensor_type] = op
# And apply the memoized Op to the input we got.
return self.ops[tensor_type](x)
The subclass that does guided backpropagation through a nonlinearity:
class GuidedBackprop(ModifiedBackprop):
def grad(self, inputs, out_grads):
(inp,) = inputs
(grd,) = out_grads
dtype = inp.dtype
print('It works')
return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
Then i used them in my code as follows:
import lasagne as nn
model_in = T.tensor3()
# model_in = net['input'].input_var
nn.layers.set_all_param_values(net['l_out'], model['param_values'])
relu = nn.nonlinearities.rectify
relu_layers = [layer for layer in
nn.layers.get_all_layers(net['l_out']) if getattr(layer,
'nonlinearity', None) is relu]
modded_relu = GuidedBackprop(relu)
for layer in relu_layers:
layer.nonlinearity = modded_relu
prop = nn.layers.get_output(
net['l_out'], model_in, deterministic=True)
for sample in range(ini, batch_len):
model_out = prop[sample, 'z'] # get prop for label 'z'
gradients = theano.gradient.jacobian(model_out, wrt=model_in)
# gradients = theano.grad(model_out, wrt=model_in)
get_gradients = theano.function(inputs=[model_in],
outputs=gradients)
grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32)
grads = np.array(grads)
grads = grads[sample]
Now when i run the code, it works without any error, and the shape of the output is also correct. But that's because it executes the default theano.grad function and not the one supposed to override it. In other words, the grad() function in the class GuidedBackprop never been invoked.
I can't understand what is the issue?
is there's a solution?
If this is an unresolved issue, is there's an implementation for a Theano Op that can achieve such a functionality or some other way to override gradient for specific nonlinearity functions applied on some of the model's layers?
Are you try to set it back the value of model output into model layer input, all gradients calculation
group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)
## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out = layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

how do I implement Gaussian blurring layer in Keras?

I have an autoencoder and I need to add a Gaussian noise layer after my output. I need a custom layer to do this, but I really do not know how to produce it, I need to produce it using tensors.
what should I do if I want to implement the above equation in the call part of the following code?
class SaltAndPepper(Layer):
def __init__(self, ratio, **kwargs):
super(SaltAndPepper, self).__init__(**kwargs)
self.supports_masking = True
self.ratio = ratio
# the definition of the call method of custom layer
def call(self, inputs, training=None):
def noised():
shp = K.shape(inputs)[1:]
**what should I put here????**
return out
return K.in_train_phase(noised(), inputs, training=training)
def get_config(self):
config = {'ratio': self.ratio}
base_config = super(SaltAndPepper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I also try to implement using lambda layer but it dose not work.
If you are looking for additive or multiplicative Gaussian noise, then they have been already implemented as a layer in Keras: GuassianNoise (additive) and GuassianDropout (multiplicative).
However, if you are specifically looking for the blurring effect as in Gaussian blur filters in image processing, then you can simply use a depth-wise convolution layer (to apply the filter on each input channel independently) with fixed weights to get the desired output (Note that you need to generate the weights of Gaussian kernel to set them as the weights of DepthwiseConv2D layer. For that you can use the function introduced in this answer):
import numpy as np
from keras.layers import DepthwiseConv2D
kernel_size = 3 # set the filter size of Gaussian filter
kernel_weights = ... # compute the weights of the filter with the given size (and additional params)
# assuming that the shape of `kernel_weighs` is `(kernel_size, kernel_size)`
# we need to modify it to make it compatible with the number of input channels
in_channels = 3 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
# define your model...
# somewhere in your model you want to apply the Gaussian blur,
# so define a DepthwiseConv2D layer and set its weights to kernel weights
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')
g_layer_out = g_layer(the_input_tensor_for_this_layer) # apply it on the input Tensor of this layer
# the rest of the model definition...
# do this BEFORE calling `compile` method of the model
g_layer.set_weights([kernel_weights])
g_layer.trainable = False # the weights should not change during training
# compile the model and start training...
After a while trying to figure out how to do this with the code #today has provided, I have decided to share my final code with anyone possibly needing it in future. I have created a very simple model that is only applying the blurring on the input data:
import numpy as np
from keras.layers import DepthwiseConv2D
from keras.layers import Input
from keras.models import Model
def gauss2D(shape=(3,3),sigma=0.5):
m,n = [(ss-1.)/2. for ss in shape]
y,x = np.ogrid[-m:m+1,-n:n+1]
h = np.exp( -(x*x + y*y) / (2.*sigma*sigma) )
h[ h < np.finfo(h.dtype).eps*h.max() ] = 0
sumh = h.sum()
if sumh != 0:
h /= sumh
return h
def gaussFilter():
kernel_size = 3
kernel_weights = gauss2D(shape=(kernel_size,kernel_size))
in_channels = 1 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
inp = Input(shape=(3,3,1))
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')(inp)
model_network = Model(input=inp, output=g_layer)
model_network.layers[1].set_weights([kernel_weights])
model_network.trainable= False #can be applied to a given layer only as well
return model_network
a = np.array([[[1, 2, 3], [4, 5, 6], [4, 5, 6]]])
filt = gaussFilter()
print(a.reshape((1,3,3,1)))
print(filt.predict(a.reshape(1,3,3,1)))
For testing purposes the data are only of shape 1,3,3,1, the function gaussFilter() creates a very simple model with only input and one convolution layer that provides Gaussian blurring with weights defined in the function gauss2D(). You can add parameters to the function to make it more dynamic, e.g. shape, kernel size, channels. The weights according to my findings can be applied only after the layer was added to the model.
As the Error: AttributeError: 'float' object has no attribute 'dtype', just change K.sqrt to math.sqrt, then it will work.

How to get logits from a sequential model in keras/tensorflow? [duplicate]

I have trained a binary classification model with CNN, and here is my code
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (16, 16, 32)
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters*2, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
# (8, 8, 64) = (2048)
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(2)) # define a binary classification problem
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=1,
validation_data=(x_test, y_test))
And here, I wanna get the output of each layer just like TensorFlow, how can I do that?
You can easily get the outputs of any layer by using: model.layers[index].output
For all layers use this:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print layer_outs
Note: To simulate Dropout use learning_phase as 1. in layer_outs otherwise use 0.
Edit: (based on comments)
K.function creates theano/tensorflow tensor functions which is later used to get the output from the symbolic graph given the input.
Now K.learning_phase() is required as an input as many Keras layers like Dropout/Batchnomalization depend on it to change behavior during training and test time.
So if you remove the dropout layer in your code you can simply use:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functors = [K.function([inp], [out]) for out in outputs] # evaluation functions
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test]) for func in functors]
print layer_outs
Edit 2: More optimized
I just realized that the previous answer is not that optimized as for each function evaluation the data will be transferred CPU->GPU memory and also the tensor calculations needs to be done for the lower layers over-n-over.
Instead this is a much better way as you don't need multiple functions but a single function giving you the list of all outputs:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers] # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
From https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer
One simple way is to create a new Model that will output the layers that you are interested in:
from keras.models import Model
model = ... # include here your original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
Alternatively, you can build a Keras function that will return the output of a certain layer given a certain input, for example:
from keras import backend as K
# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
[model.layers[3].output])
layer_output = get_3rd_layer_output([x])[0]
Based on all the good answers of this thread, I wrote a library to fetch the output of each layer. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/keract
It handles almost all the edge cases.
Hope it helps!
Following looks very simple to me:
model.layers[idx].output
Above is a tensor object, so you can modify it using operations that can be applied to a tensor object.
For example, to get the shape model.layers[idx].output.get_shape()
idx is the index of the layer and you can find it from model.summary()
This answer is based on: https://stackoverflow.com/a/59557567/2585501
To print the output of a single layer:
from tensorflow.keras import backend as K
layerIndex = 1
func = K.function([model.get_layer(index=0).input], model.get_layer(index=layerIndex).output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
To print output of every layer:
from tensorflow.keras import backend as K
for layerIndex, layer in enumerate(model.layers):
func = K.function([model.get_layer(index=0).input], layer.output)
layerOutput = func([input_data]) # input_data is a numpy array
print(layerOutput)
I wrote this function for myself (in Jupyter) and it was inspired by indraforyou's answer. It will plot all the layer outputs automatically. Your images must have a (x, y, 1) shape where 1 stands for 1 channel. You just call plot_layer_outputs(...) to plot.
%matplotlib inline
import matplotlib.pyplot as plt
from keras import backend as K
def get_layer_outputs():
test_image = YOUR IMAGE GOES HERE!!!
outputs = [layer.output for layer in model.layers] # all layer outputs
comp_graph = [K.function([model.input]+ [K.learning_phase()], [output]) for output in outputs] # evaluation functions
# Testing
layer_outputs_list = [op([test_image, 1.]) for op in comp_graph]
layer_outputs = []
for layer_output in layer_outputs_list:
print(layer_output[0][0].shape, end='\n-------------------\n')
layer_outputs.append(layer_output[0][0])
return layer_outputs
def plot_layer_outputs(layer_number):
layer_outputs = get_layer_outputs()
x_max = layer_outputs[layer_number].shape[0]
y_max = layer_outputs[layer_number].shape[1]
n = layer_outputs[layer_number].shape[2]
L = []
for i in range(n):
L.append(np.zeros((x_max, y_max)))
for i in range(n):
for x in range(x_max):
for y in range(y_max):
L[i][x][y] = layer_outputs[layer_number][x][y][i]
for img in L:
plt.figure()
plt.imshow(img, interpolation='nearest')
From: https://github.com/philipperemy/keras-visualize-activations/blob/master/read_activations.py
import keras.backend as K
def get_activations(model, model_inputs, print_shape_only=False, layer_name=None):
print('----- activations -----')
activations = []
inp = model.input
model_multi_inputs_cond = True
if not isinstance(inp, list):
# only one input! let's wrap it in a list.
inp = [inp]
model_multi_inputs_cond = False
outputs = [layer.output for layer in model.layers if
layer.name == layer_name or layer_name is None] # all layer outputs
funcs = [K.function(inp + [K.learning_phase()], [out]) for out in outputs] # evaluation functions
if model_multi_inputs_cond:
list_inputs = []
list_inputs.extend(model_inputs)
list_inputs.append(0.)
else:
list_inputs = [model_inputs, 0.]
# Learning phase. 0 = Test mode (no dropout or batch normalization)
# layer_outputs = [func([model_inputs, 0.])[0] for func in funcs]
layer_outputs = [func(list_inputs)[0] for func in funcs]
for layer_activations in layer_outputs:
activations.append(layer_activations)
if print_shape_only:
print(layer_activations.shape)
else:
print(layer_activations)
return activations
Previous solutions were not working for me. I handled this issue as shown below.
layer_outputs = []
for i in range(1, len(model.layers)):
tmp_model = Model(model.layers[0].input, model.layers[i].output)
tmp_output = tmp_model.predict(img)[0]
layer_outputs.append(tmp_output)
Wanted to add this as a comment (but don't have high enough rep.) to #indraforyou's answer to correct for the issue mentioned in #mathtick's comment. To avoid the InvalidArgumentError: input_X:Y is both fed and fetched. exception, simply replace the line outputs = [layer.output for layer in model.layers] with outputs = [layer.output for layer in model.layers][1:], i.e.
adapting indraforyou's minimal working example:
from keras import backend as K
inp = model.input # input placeholder
outputs = [layer.output for layer in model.layers][1:] # all layer outputs except first (input) layer
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = functor([test, 1.])
print layer_outs
p.s. my attempts trying things such as outputs = [layer.output for layer in model.layers[1:]] did not work.
Assuming you have:
1- Keras pre-trained model.
2- Input x as image or set of images. The resolution of image should be compatible with dimension of the input layer. For example 80*80*3 for 3-channels (RGB) image.
3- The name of the output layer to get the activation. For example, "flatten_2" layer. This should be include in the layer_names variable, represents name of layers of the given model.
4- batch_size is an optional argument.
Then you can easily use get_activation function to get the activation of the output layer for a given input x and pre-trained model:
import six
import numpy as np
import keras.backend as k
from numpy import float32
def get_activations(x, model, layer, batch_size=128):
"""
Return the output of the specified layer for input `x`. `layer` is specified by layer index (between 0 and
`nb_layers - 1`) or by name. The number of layers can be determined by counting the results returned by
calling `layer_names`.
:param x: Input for computing the activations.
:type x: `np.ndarray`. Example: x.shape = (80, 80, 3)
:param model: pre-trained Keras model. Including weights.
:type model: keras.engine.sequential.Sequential. Example: model.input_shape = (None, 80, 80, 3)
:param layer: Layer for computing the activations
:type layer: `int` or `str`. Example: layer = 'flatten_2'
:param batch_size: Size of batches.
:type batch_size: `int`
:return: The output of `layer`, where the first dimension is the batch size corresponding to `x`.
:rtype: `np.ndarray`. Example: activations.shape = (1, 2000)
"""
layer_names = [layer.name for layer in model.layers]
if isinstance(layer, six.string_types):
if layer not in layer_names:
raise ValueError('Layer name %s is not part of the graph.' % layer)
layer_name = layer
elif isinstance(layer, int):
if layer < 0 or layer >= len(layer_names):
raise ValueError('Layer index %d is outside of range (0 to %d included).'
% (layer, len(layer_names) - 1))
layer_name = layer_names[layer]
else:
raise TypeError('Layer must be of type `str` or `int`.')
layer_output = model.get_layer(layer_name).output
layer_input = model.input
output_func = k.function([layer_input], [layer_output])
# Apply preprocessing
if x.shape == k.int_shape(model.input)[1:]:
x_preproc = np.expand_dims(x, 0)
else:
x_preproc = x
assert len(x_preproc.shape) == 4
# Determine shape of expected output and prepare array
output_shape = output_func([x_preproc[0][None, ...]])[0].shape
activations = np.zeros((x_preproc.shape[0],) + output_shape[1:], dtype=float32)
# Get activations with batching
for batch_index in range(int(np.ceil(x_preproc.shape[0] / float(batch_size)))):
begin, end = batch_index * batch_size, min((batch_index + 1) * batch_size, x_preproc.shape[0])
activations[begin:end] = output_func([x_preproc[begin:end]])[0]
return activations
In case you have one of the following cases:
error: InvalidArgumentError: input_X:Y is both fed and fetched
case of multiple inputs
You need to do the following changes:
add filter out for input layers in outputs variable
minnor change on functors loop
Minimum example:
from keras.engine.input_layer import InputLayer
inp = model.input
outputs = [layer.output for layer in model.layers if not isinstance(layer, InputLayer)]
functors = [K.function(inp + [K.learning_phase()], [x]) for x in outputs]
layer_outputs = [fun([x1, x2, xn, 1]) for fun in functors]
Well, other answers are very complete, but there is a very basic way to "see", not to "get" the shapes.
Just do a model.summary(). It will print all layers and their output shapes. "None" values will indicate variable dimensions, and the first dimension will be the batch size.
Generally, output size can be calculated as
[(W−K+2P)/S]+1
where
W is the input volume - in your case you have not given us this
K is the Kernel size - in your case 2 == "filter"
P is the padding - in your case 2
S is the stride - in your case 3
Another, prettier formulation:

First Neural Network, (MLP), from Scratch, Python -- Questions

I understand how the Neural Network with backpropogation is supposed to work. I know how to use Python's own MLPClassifier and fit functions work in sklearn. I am creating my own because I'd like to know the details better. I will first show my code (with comments) and then discuss my problems.
import numpy as np
import scipy as sp
import sklearn as ML
# z: the linear combination of the previous layer
#
# returns the activation for the node
#
def sigmoid(z):
a = 1 / (1 + np.exp(-z))
return a
# z: the contribution of a layer
#
# returns the derivative of the sigmoid evaluated at z
#
def sig_grad(z):
d = (1 - sigmoid(z))*sigmoid(z)
return d
# input: the data we want to train the network with
# hidden_layers: the number of nodes in the hidden layers
# num_layers: how many hidden layers between the input layer and the output layer
# num_output: how many outputs there are... this becomes relevant when we input many features.
#
# returns the activations determined
# and the linear combinations of previous layer's nodes for each layer
#
def feedforward(input, hidden_layers, num_layers, num_output, thresh, weights):
#initialize the vector for inputs AND threshold values
X = np.hstack([thresh[0], input])
#intialize the activations list
A = []
#intialize the linear combos for each layer
Z = []
w = list(weights)
#place ones in the first row of each layer of weights for the threshold
w[0] = np.vstack([np.ones([1,hidden_layers]), w[0]])
for i in range(1,num_layers):
w[i] = np.vstack([np.ones([1,hidden_layers]), weights[i]])
w[-1] = np.vstack([np.ones([1,num_output]), w[-1]])
#the first layer of weights are initialized outside function
#cycle through the hidden layers
for i in range(1, num_layers+1):
Z.append( np.dot(X, w[i-1])); S = sigmoid(Z[i-1]); A.append(S); X = np.hstack([thresh[i], A[i-1]])
#find the output/last layer activations
Z.append( np.dot(X, w[-1]) ); S = sigmoid(Z[-1]); A.append(S);
return A, Z
#
# truth: what we know the output should be
# activations: the activations determined at each node by the sigmoid
# function in the previous feedforward pass
# combos: the linear combinations at each layer in the prev. ff pass
# num_layers: the number of hidden layers
#
# error: the errors determined at each layer; will be needed for gradient descent
#
def backprop(input, truth, activations, combos, num_layers, weights):
#initialize an array of errors for each hidden layer and the output layer
error = [0 for x in range(0,num_layers+1)]
#intialize the lists containing the gradients w.r.t. weights and threshold
derivW = []; derivb = []
#set the output layer since its error is computed differently than the others
error[num_layers] = (activations[num_layers] - truth)*sig_grad(combos[num_layers])
#find the rate of change for weights and thresh for connections to output
derivW.append( activations[num_layers-1]*error[num_layers]); derivb.append(np.sum(error[num_layers]))
if(num_layers > 1):
#find the errors for each of the hidden layers
for i in range(num_layers - 1, 0, -1):
error[i] = np.dot(weights[i+1],error[i+1])*sig_grad(combos[i])
derivW.append( np.outer(activations[i-1], error[i]) ); derivb.append(np.sum(error[i]))
#
#finding the derivative for weights of input to next layer
#
error[0] = np.dot(weights[i],error[i])*sig_grad(combos[0])
derivW.append( np.outer(input, error[0]) ); derivb.append(np.sum(error[0]))
return derivW, derivb
#
# weights: our networks weights to update via gradient descent
# thresh: the threshold values to update for our system
# derivb: the derivative of our cost function with respect to b for each layer
# derivW: the derivative of our cost function with respect to W for each layer
# stepsize: the stepsize we want to take, determines how big of a step we take
#
# returns the updated weights and threshold values for our network
def gradDesc(weights, thresh, derivb, derivW, stepsize, num_layers):
#perform gradient descent
for j in range(100):
for i in range(0, num_layers + 1):
weights[i] = weights[i] - stepsize*derivW[num_layers-i]
thresh[i] = thresh[i] - stepsize*derivb[num_layers-i]
return weights, thresh
#input: the data to send through the network
#hidden_layers: the number of hidden_layers between the input layer and the output layer
#num_layers: the number of nodes in the hidden layer
#num_output: the number of nodes in the output layer
#
#returns the output of the network
#
def nNetwork(input, truth, hidden_layers, num_layers, num_output, maxiter, stepsize):
#assuming that input is an array where each element is an input/sample
#we also need to know the size of each sample itself
m = input.size
thresh = np.random.randn(num_layers + 1, 1)
thresh_weights = np.ones([num_layers + 1, 1])
# initialize the weights as a list because each layer might have
# a different number of weights
weights = []; weights.append(np.random.randn(m,hidden_layers));
if( num_layers > 1):
for i in range(1, num_layers):
weights.append(np.random.randn(hidden_layers, hidden_layers))
weights.append(np.random.randn(hidden_layers, num_output))
for i in range(maxiter):
activations, combos = feedforward(input, hidden_layers, num_layers, num_output, thresh, weights)
derivW, derivb = backprop(input, truth, activations, combos, num_layers, weights)
weights, thresh = gradDesc(weights, thresh, derivb, derivW, stepsize, num_layers)
return weights, thresh
def main():
# a very, very simple neural network
input = np.array([1,0,0])
truth = 0
hidden_layers = 3
num_layers = 2
num_output = 1
#train the network
w, t = nNetwork(input, truth, hidden_layers, num_layers, num_output, maxiter = 10, stepsize = 0.001)
#test the network on a new set of arguments
#activations, combos = feedforward(new_input, hidden_layers = 3, num_layers = 2, thresh = t, weights = w)
main()
I've tested this code on simple examples where there are n input of one dimension and output of n dimension (not yet able to work out the bugs when I type import NN.py into the console, but works when I run it piece by piece in the console). I have a few questions to help me better understand what is going on when I have n input there are m dimensions. For example, the digits data in Python (there are 1797 samples and each sample is 64x1 -- an 8x8 image vectorized).
1) Is each of the 64 pixels considered an input? If so, is the neural net trained one image at a time? This would be an easy fix for me.
2) If the neural net is trained all images at once, what are suggestions for modifying my code?
3) Obviously the output for an image comes in the form of 0, 1, 2, 3, ... , or 9. But, does the output come in the form of a vector 10x1 where there is a 1 in the digit the image represents and 0's elsewhere? So, my prediction vector would have the highest value where the 1 might be, right?
4) Then, I'm not quite sure how #3 would look if #2 is true..
I apologize for the long note. Thanks for taking a look and helping me understand better!

Categories