Set weight and bias tensors of tensorflow conv2d operation - python

I have been given a trained neural network in torch and I need to rebuild it exactly in tensorflow. I believe I have correctly defined the network's architecture in tensorflow but I am having trouble transferring the weight and bias tensors. Using a third party package, I converted all the weight and bias tensors from the torch network to numpy arrays then wrote them to disk. I can load them back into my python program but I cannot figure out a way to assign them to the corresponding layers in my tensorflow network.
For instance, I have a convolution layer defined in tensorflow as
kernel_1 = tf.Variable(tf.truncated_normal([11,11,3,64], stddev=0.1))
conv_kernel_1 = tf.nn.conv2d(input, kernel_1, [1,4,4,1], padding='SAME')
biases_1 = tf.Variable(tf.zeros[64])
bias_layer_1 = tf.nn_add(conv_kernel_1, biases_1)
According to the tensorflow documentation, the tf.nn.conv2d operation uses the shape defined in the kernel_1 variable to construct the weight tensor. However, I cannot figure out how to access that weight tensor to set it to the weight array I have loaded from file.
Is it possible to explicitly set the weight tensor? And if so, how?
(The same question applies to bias tensor.)

If you have the weights and biases in a NumPy array, it should be easy to connect them into your TensorFlow network:
weights_1_array = ... # ndarray of weights for layer 1
biases_1_array = ... # ndarray of biases for layer 1
conv_kernel_1 = tf.nn.conv2d(input, weights_1_array, [1, 4, 4, 1], padding='SAME')
bias_layer_1 = tf.nn.bias_add(conv_kernel_1, biases_1_array)
Note that you must ensure that weights_1_array and biases_1_array are in the correct data format. See the documentation for tf.nn.conv2d() for an explanation of the required filter shape.

Related

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

How to extract Keras layer weights as trainable parameter?

I'm training a GAN-like models, but not exactly the same. I'm using Keras with TensorFlow backend.
I have two Keras models G and D. I want to output the weights parameter of a target layer in G, as the input of model D, and use the result of D.predict(G.weights) as part of the loss function for G, i.e. D is not trainable, but the argument G.weights are trainable. In this way to want to further train G.weights.
I tried to use
def custom_loss(ytrue, ypred):
### Something to do with ytrue and ypred
weight = self.G.get_layer('target').get_weights()
loss += self.D.predict(weight)
return loss
but apparently it does not work since weight is just a numpy array and is not trainable.
Is there a way to get the weights of model that is still trainable in Keras? I'm new to Keras and know very little about TensorFlow. I will be very appreciate it someone can help!
As you mention, layer.get_weights() will return the current weights of the matrix. What you want to feed for prediction is a the node in the computation graph representing such weights. You can use layer.trainable_weights instead, which will return two tf.Variable which you can feed to another layer/model.
Note that there is one variable for the unit to unit connections and another one for the bias. If you want to get a flattened tensor from it you could do something like:
from keras import backend as K
...
ww, bias = self.G.get_layer('target').trainable_weights
flattened_weights = Flatten()(K.concat([ww, K.reshape(bias, (5, 1))], axis=1))

Keras Convolutional layer with kernels initialized identically, according to a predefined initializer?

Is it possible in Keras to create a Convolutional layer (Conv2D) such that the kernels are initialized according to an initializer (like he_normal, glorot_uniform, etc.), but are created to be identical for every kernel?
In other words, I'd like to initialize one of the kernels using kernel_initializer='he_normal', and then copy and use that initialized weight matrix to initialize all of the other kernels with, in that layer only.
In semi-pseudo-code fashion, this is (similar in concept) to what I'm looking for:
n_filters = 64
x = Conv2D(1, 3,... kernel_initializer='he_normal')
he_normal_kernel = *the kernel weight matrix that was just created* #copy kernel weight matrix
he_normal_kernels = he_normal_kernel * n_filters #make n_filters copies of that matrix
x = Conv2D(n_filters, 3,... kernel_initializer=he_normal_kernels) # use those as the initialization of our convolutional layer
I'm impartial to exactly how to implement this, as long as it works as intended.
While we're here, are there any inherent theoretical reasons why this might be a bad idea?
Thanks!

Implementing weight normalization using TensorFlow layers' `kernel_constraint`

Some of the TensorFlow layers, such as tf.layers.dense and tf.layers.conv2d, take in a kernel_constraint argument, which according to the tf api docs docs implements an
Optional projection function to be applied to the kernel after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights).
In [1], Salimans et al. present a neural network normalization technique, called weight normalization, which normalizes the weight vectors of the network layers, in contrast to, for example the batch normalization [2], which normalizes the actual data batch flowing through the layer. In some cases the computational overhead of the weight normalization method is lower and it can also be used in cases where the use of batch normalization is not feasible.
My question is: is it possible to implement the weight normalization using the abovementioned TensorFlow layers' kernel_constraint? Assuming x is an input with shape (batch, height, width, channels), I thought I could implement it as follows:
x = tf.layers.conv2d(
inputs=x,
filters=16,
kernel_size=(3, 3),
strides=(1, 1),
kernel_constraint=lambda kernel: (
tf.nn.l2_normalize(w, list(range(kernel.shape.ndims-1)))))
What would be a simple test case to validate/invalidate my solution?
[1] SALIMANS, Tim; KINGMA, Diederik P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems. 2016. p. 901-909.
[2] IOFFE, Sergey; SZEGEDY, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
Despite the title, the paper by Salimans and Kingma suggests to decouple the weight norm and their direction, rather than actually normalising the weights (i.e. setting their l2 norm to one as you suggested).
If you want to verify that your code has the intended effect even if it is not what they proposed, you can get the weights of the model and check their norm.
In pseudo-code:
model = tf.models.Model(inputs=inputs, outputs=x)
weights = model.get_weights()[i] # checking the weights of the i-th layer
flat_weights = weights.flatten()
import numpy as np
print(np.linalg.norm(flat_weights, 2))

Batch normalization with 3D convolutions in TensorFlow

I'm implementing a model relying on 3D convolutions (for a task that is similar to action recognition) and I want to use batch normalization (see [Ioffe & Szegedy 2015]). I could not find any tutorial focusing on 3D convs, hence I'm making a short one here which I'd like to review with you.
The code below refers to TensorFlow r0.12 and it explicitly instances variables - I mean I'm not using tf.contrib.learn except for the tf.contrib.layers.batch_norm() function. I'm doing this both to better understand how things work under the hood and to have more implementation freedom (e.g., variable summaries).
I will get to the 3D convolution case smoothly by first writing the example for a fully-connected layer, then for a 2D convolution and finally for the 3D case. While going through the code, it would be great if you could check if everything is done correctly - the code runs, but I'm not 100% sure about the way I apply batch normalization. I end this post with a more detailed question.
import tensorflow as tf
# This flag is used to allow/prevent batch normalization params updates
# depending on whether the model is being trained or used for prediction.
training = tf.placeholder_with_default(True, shape=())
Fully-connected (FC) case
# Input.
INPUT_SIZE = 512
u = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE))
# FC params: weights only, no bias as per [Ioffe & Szegedy 2015].
FC_OUTPUT_LAYER_SIZE = 1024
w = tf.Variable(tf.truncated_normal(
[INPUT_SIZE, FC_OUTPUT_LAYER_SIZE], dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
fc = tf.matmul(u, w)
# Batch normalization.
fc_bn = tf.contrib.layers.batch_norm(
fc,
center=True,
scale=True,
is_training=training,
scope='fc-batch_norm')
# Activation function.
fc_bn_relu = tf.nn.relu(fc_bn)
print(fc_bn_relu) # Tensor("Relu:0", shape=(?, 1024), dtype=float32)
2D convolutional (CNN) layer case
# Input: 640x480 RGB images (whitened input, hence tf.float32).
INPUT_HEIGHT = 480
INPUT_WIDTH = 640
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))
# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN_FILTER_HEIGHT = 3 # Space dimension.
CNN_FILTER_WIDTH = 3 # Space dimension.
CNN_FILTERS = 128
w = tf.Variable(tf.truncated_normal(
[CNN_FILTER_HEIGHT, CNN_FILTER_WIDTH, INPUT_CHANNELS, CNN_FILTERS],
dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
CNN_LAYER_STRIDE_VERTICAL = 1
CNN_LAYER_STRIDE_HORIZONTAL = 1
CNN_LAYER_PADDING = 'SAME'
cnn = tf.nn.conv2d(
input=u, filter=w,
strides=[1, CNN_LAYER_STRIDE_VERTICAL, CNN_LAYER_STRIDE_HORIZONTAL, 1],
padding=CNN_LAYER_PADDING)
# Batch normalization.
cnn_bn = tf.contrib.layers.batch_norm(
cnn,
data_format='NHWC', # Matching the "cnn" tensor which has shape (?, 480, 640, 128).
center=True,
scale=True,
is_training=training,
scope='cnn-batch_norm')
# Activation function.
cnn_bn_relu = tf.nn.relu(cnn_bn)
print(cnn_bn_relu) # Tensor("Relu_1:0", shape=(?, 480, 640, 128), dtype=float32)
3D convolutional (CNN3D) layer case
# Input: sequence of 9 160x120 RGB images (whitened input, hence tf.float32).
INPUT_SEQ_LENGTH = 9
INPUT_HEIGHT = 120
INPUT_WIDTH = 160
INPUT_CHANNELS = 3
u = tf.placeholder(tf.float32, shape=(None, INPUT_SEQ_LENGTH, INPUT_HEIGHT, INPUT_WIDTH, INPUT_CHANNELS))
# CNN params: wights only, no bias as per [Ioffe & Szegedy 2015].
CNN3D_FILTER_LENGHT = 3 # Time dimension.
CNN3D_FILTER_HEIGHT = 3 # Space dimension.
CNN3D_FILTER_WIDTH = 3 # Space dimension.
CNN3D_FILTERS = 96
w = tf.Variable(tf.truncated_normal(
[CNN3D_FILTER_LENGHT, CNN3D_FILTER_HEIGHT, CNN3D_FILTER_WIDTH, INPUT_CHANNELS, CNN3D_FILTERS],
dtype=tf.float32, stddev=1e-1))
# Layer output with no activation function (yet).
CNN3D_LAYER_STRIDE_TEMPORAL = 1
CNN3D_LAYER_STRIDE_VERTICAL = 1
CNN3D_LAYER_STRIDE_HORIZONTAL = 1
CNN3D_LAYER_PADDING = 'SAME'
cnn3d = tf.nn.conv3d(
input=u, filter=w,
strides=[1, CNN3D_LAYER_STRIDE_TEMPORAL, CNN3D_LAYER_STRIDE_VERTICAL, CNN3D_LAYER_STRIDE_HORIZONTAL, 1],
padding=CNN3D_LAYER_PADDING)
# Batch normalization.
cnn3d_bn = tf.contrib.layers.batch_norm(
cnn3d,
data_format='NHWC', # Matching the "cnn" tensor which has shape (?, 9, 120, 160, 96).
center=True,
scale=True,
is_training=training,
scope='cnn3d-batch_norm')
# Activation function.
cnn3d_bn_relu = tf.nn.relu(cnn3d_bn)
print(cnn3d_bn_relu) # Tensor("Relu_2:0", shape=(?, 9, 120, 160, 96), dtype=float32)
What I would like to make sure is whether the code above exactly implements batch normalization as described in [Ioffe & Szegedy 2015] at the end of Sec. 3.2:
For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a minibatch, over all locations. [...] Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.
UPDATE
I guess the code above is also correct for the 3D conv case. In fact, when I define my model if I print all the trainable variables, I also see the expected numbers of beta and gamma variables. For instance:
Tensor("conv3a/conv3d_weights/read:0", shape=(3, 3, 3, 128, 256), dtype=float32)
Tensor("BatchNorm_2/beta/read:0", shape=(256,), dtype=float32)
Tensor("BatchNorm_2/gamma/read:0", shape=(256,), dtype=float32)
This looks ok to me since due to BN, one pair of beta and gamma are learned for each feature map (256 in total).
[Ioffe & Szegedy 2015]: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
That is a great post about 3D batchnorm, it's often unnoticed that batchnorm can be applied to any tensor of rank greater than 1. Your code is correct, but I couldn't help but add a few important notes on this:
A "standard" 2D batchnorm (accepts a 4D tensor) can be significantly faster in tensorflow than 3D or higher, because it supports fused_batch_norm implementation, which applies one kernel operation:
Fused batch norm combines the multiple operations needed to do batch
normalization into a single kernel. Batch norm is an expensive process
that for some models makes up a large percentage of the operation
time. Using fused batch norm can result in a 12%-30% speedup.
There is an issue on GitHub to support 3D filters as well, but there hasn't been any recent activity and at this point the issue is closed unresolved.
Although the original paper prescribes using batchnorm before ReLU activation (and that's what you did in the code above), there is evidence that it's probably better to use batchnorm after the activation. Here's a comment on Keras GitHub by Francois Chollet:
... I can guarantee that recent code written by Christian [Szegedy]
applies relu
before BN. It is still occasionally a topic of debate, though.
For anyone interested to apply the idea of normalization in practice, there's been recent research developments of this idea, namely weight normalization and layer normalization, which fix certain disadvantages of original batchnorm, for example they work better for LSTM and recurrent networks.

Categories