Layer Normalization with Python Keras - python

I'm studying the paper "An Introduction to Deep Learning for the Physical Layer". While implementing the proposed network with python keras, I should normalize output of some layer.
One way is simple L2 Normalization (||X||^2 = 1), where X is a tensor of former layer output. I can implement simple L2 Normalization by the following code:
from keras import backend as K
Lambda(lambda x: K.l2_normalize(x,axis=1))
The other way, what I want to know, is ||X||^2 ≤ 1.
Is there any way that constrains the value of layer outputs?

You can apply constraint on layer wights (kernels) for some keras layers. For example on a Dense() layer like:
from keras.constraints import max_norm
from keras.layers import Dense
model.add(Dense(units, kernel_constraint=max_norm(1.)))
But keras layer does not accept an activity_constraint argument, However they accept activity_regularizer and you can use that to implement the first kind of regularization easier).
You can also clip output values of any layer to have maximum norm 1.0 (although I'm not sure if this is what you're looking for). For example if you're using a tensorflow backend, you can define a custom activation layer that clips the value of the layer by norm like:
import tensorflow as tf
def norm_clip(x):
return tf.clip_by_norm(x, 1, axes=[1])
And use it in your model like:
model.add(Dense(units, activation=norm_clip))

Related

Adaptation module design for stacking two CNNs

I'm trying to stack two different CNNs using an adaptation module to bridge them, but I'm having a hard time determining the adaption module's layer hyperparameters correctly.
To be more precise, I would like to train the adaptation module to bridge two convolutional layers:
Layer A with output shape: (29,29,256)
Layer B with input shape: (8,8,384)
So, after Layer A, I sequentially add the adaptation module, for which I choose:
Conv2D layer with 384 filters with kernel size: (3,3) / Output shape: (29,29,384)
MaxPool2D with pool size: (2,2), strides: (4,4) and padding: "same" / Output shape: (8,8,384)
Finally, I try to add layer B to the model, but I get the following error from tensorflow:
InvalidArgumentError: Dimensions must be equal, but are 384 and 288 for '{{node batch_normalization_159/FusedBatchNormV3}} = FusedBatchNormV3[T=DT_FLOAT, U=DT_FLOAT, data_format="NHWC", epsilon=0.001, exponential_avg_factor=1, is_training=false](Placeholder, batch_normalization_159/scale, batch_normalization_159/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp_1)' with input shapes: [?,8,8,384], [288], [288], [288], [288].
There's a minimal reproducible example of it:
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.mobilenet import MobileNet
from keras.layers import Conv2D, MaxPool2D
from keras.models import Sequential
mobile_model = MobileNet(weights='imagenet')
server_model = InceptionResNetV2(weights='imagenet')
hybrid = Sequential()
for i, layer in enumerate(mobile_model.layers):
if i <= 36:
layer.trainable = False
hybrid.add(layer)
hybrid.add(Conv2D(384, kernel_size=(3,3), padding='same'))
hybrid.add(MaxPool2D(pool_size=(2,2), strides=(4,4), padding='same'))
for i, layer in enumerate(server_model.layers):
if i >= 610:
layer.trainable = False
hybrid.add(layer)
Sequential models only support models where the layers are arranged like a linked list - each layer takes the output of only one layer, and each layer's output is only fed to a single layer. Your two base models have residual blocks, which breaks the above assumption, and turns the model architecture into directed acyclic graph (DAG).
To do what you want to do, you'll need to use the Functional API. With the Functional API, you explicitly control the intermediate activations aka KerasTensors.
For the first model, you can skip that extra work and just create a new model from a subset of the existing graph like this
sub_mobile = keras.models.Model(mobile_model.inputs, mobile_model.layers[36].output)
Wiring some of the layers of the second model is much more difficult. It's easy to slice off the end of a keras model - it's much more difficult to slice of the beginning because of the need for a tf.keras.Input placeholder. To do this successfully, you'll need to write a model walking algorithm that go through the layers, tracks the output KerasTensors, then calls each layer with the new inputs to create a new output KerasTensor.
You could avoid all that work by simply finding some source code for an InceptionResNet and adding layers via Python rather than introspecting an existing model. Here's one which may fit the bill.
https://github.com/yuyang-huang/keras-inception-resnet-v2/blob/master/inception_resnet_v2.py

How to clip layer output in MLP with `tf.keras.activations.relu()`?

According to the documentation, tf.keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0) seems to clip x within [threshold, max_value], but x must be specified. How can I use it for clipping the output of a layer in neural network? Or is there a more convenient way to do so?
Suppose I want to output the linear combination of all elements of a 10-by-10 2D-array only when the result is between 0 and 5.
import tensorflow as tf
from tensorflow import keras
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[10, 10]))
model.add(keras.layers.Dense(1, activation='relu') # output layer

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

Custom activation function Keras: Applying different activation to different layers

The i/p to my custom activation function is going to be a 19 * 19 * 5 tensor say x. The function needs to be such that it applies sigmoid to the first layer i.e x[:,:,0:1] and relu to the remaining layers i.e. x[:,:,1:5]. I have defined a custom activation function with the following code:
def custom_activation(x):
return tf.concat([tf.sigmoid(x[:,:,:,0:1]) , tf.nn.relu(x[:,:,:,1:5])],axis = 3)
get_custom_objects().update({'custom_activation': Activation(custom_activation)})
The fourth dimension comes into picture because at the input I get at the function custom_activation has batch size as another dimension. So the input tensor is of shape[bathc_size,19,19,5].
Could someone tell me if this is the correct way to do it?
Keras Activations are designed to work on arbitrarily sized layers of almost any imaginable feed forward layer (e.g. Tanh, Relu, Softmax, etc). The transformation you describe sounds specific to a particular layer in the architecture you are using. As a result, I would recommend accomplishing the task using a Lambda Layer:
from keras.layers import Lambda
def custom_activation_shape(input_shape):
# Ensure there is rank 4 tensor
assert len(input_shape) == 4
# Ensure the last input component has 5 dimensions
assert input_shape[3] == 5
return input_shape # Shape is unchanged
Which can then be added to your model using
Lambda(custom_activation, output_shape=custom_activation_shape)
However, if you intend to use this transformation after many different layers in your network, and thus would truly like a custom defined Activation, see How do you create a custom activation function with Keras?, which suggests doing what you wrote in your question.

cannot modify keras layer dropout parameter prior to backend function evaluation

I have trained a network model using keras which includes multiple dropout layers. I have also implemented a stochastic predictor function (using the keras backend) which allows me to get predictions with dropout turned "on."
import keras.backend as K
F = K.function([model.layers[0].input, K.learning_phase()], [m.layers[-1].output])
I call the function using
output = F([x_test[0:1], 1])
where x_test is a sample input.
Currently, the dropout rate used in this predictor function is the same as the dropout rate used for training. I would like to set a different dropout rate without retraining the network.
I wrote a script to change all dropout layer rates:
for layer in [l for l in m.layers if "dropout" in np.str.lower(l.name)]:
layer.rate = 0.5
However, when I call my custom function, it does not change the output. For example, if my trained network uses a rate of 0 (or K.epsilon()), then repeated function calls will yield the same result. Changing the dropout to 0.5 should yield unique results on each function call. Yet, this is not the case. Changing the dropout has no effect.
What does work is extracting a single layer, changing the rate, and calling that single layer:
L = my_net.model.layers[0]
L.rate = 0.5
L_out1 = K.eval(L.call(x_test[0], training=True))
L.rate = K.epsilon()
L_out2 = K.eval(L.call(x_test[0], training=True))
Here, L_out1 and L_out2 are unique. I don't know how to implement this functionality across the whole network.
What is it about the backend function that prevents my model changes from being effective?

Categories