I need to initialize custom Conv2D kernels with weights
W = a1b1 + a2b2 + ... + anbn
where W = custom weight of Conv2D layer to initialise with
a = random weight Tensors as keras.backend.variable(np.random.uniform()), shape=(64, 1, 10)
b = fixed basis filters defined as keras.backend.constant(...), shape=(10, 11, 11)
W = K.sum(a[:, :, :, None, None] * b[None, None, :, :, :], axis=2) #shape=(64, 1, 11, 11)
I want my model to update the 'W' values with only changing the 'a's while keeping the 'b's constant.
I pass the custom 'W's as
Conv2D(64, kernel_size=(11, 11), activation='relu', kernel_initializer=kernel_init_L1)(img)
where kernel_init_L1 returns keras.backend.variable(K.reshape(w_L1, (11, 11, 1, 64)))
Problem:
I am not sure if this is the correct way to do this. Is it possible to specify in Keras which ones are trainable and which are not. I know that layers can be set trainable = True but i am not sure about weights.
I think the implementation is incorrect because I get similar results from my model with or without the custom initializations.
It would be immensely helpful if someone can point out any mistakes in my approach or provide a way to verify it.
Warning about your shapes: If your kernel size is (11,11), and assuming you have 64 input channels and 1 output channel, your final kernel shape must be (11,11,64,1).
You should probably be going for a[None,None] and b[:,:,:,None,None].
class CustomConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB = None, **kwargs):
super(CustomConv2D, self).__init__(filters, kernel_size,**kwargs)
self.kernelB = kernelB
def build(self, input_shape):
#use the input_shape to calculate the shapes of A and B
#if needed, pay attention to the "data_format" used.
#this is an actual weight, because it uses `self.add_weight`
self.kernelA = self.add_weight(
shape=shape_of_kernel_A + (1,1), #or (1,1) + shape_of_A
initializer='glorot_uniform', #or select another
name='kernelA',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
#this is an ordinary var that will participate in the calculation
#not a weight, not updated
if self.kernelB is None:
self.kernelB = K.constant(....)
#use the shape already containing the new axes
#in the original conv layer, this property would be the actual kernel,
#now it's just a var that will be used in the original's "call" method
self.kernel = K.sum(self.kernelA * self.kernelB, axis=2)
#important: the resulting shape should be:
#(kernelSizeX, kernelSizeY, input_channels, output_channels)
#the following are remains of the original code for "build" in Conv2D
#use_bias is True by default
if self.use_bias:
self.bias = self.add_weight(shape=(self.filters,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
# Set input spec.
self.input_spec = InputSpec(ndim=self.rank + 2,
axes={channel_axis: input_dim})
self.built = True
Hints for custom layers
When you create a custom layer from zero (derived from Layer), you should have these methods:
__init__(self, ... parameters ...) - this is the creator, it's called when you create a new instance of your layer. Here, you store the values the user passed as parameters. (In a Conv2D, the init would have the "filters", "kernel_size", etc.)
build(self, input_shape) - this is where you should create the weights (all learnable vars are created here, based on the input shape)
compute_output_shape(self,input_shape) - here you return the output shape based on the input shape
call(self,inputs) - Here you perform the actual layer calculations
Since we're not creating this layer from zero, but deriving it from Conv2D, everything is ready, all we did was to "change" the build method and replace what would be considered the kernel of the Conv2D layer.
More on custom layers: https://keras.io/layers/writing-your-own-keras-layers/
The call method for conv layers is here in class _Conv(Layer):.
Related
I've built an input convex neural network in Tensorflow following this ArXiv paper that is a scalar output feed-forward model. The first hidden layer is dense and subsequent layers are custom that takes two inputs: the output from the previous layer (kernel) and the model input (passthrough). Separate weights are applied to each. This allows a positive weights regularizer to be applied to kernel weights but not the passthrough. I calculate the regularizer and add it using self.add_loss in the call method of the custom layer. I'm also using custom activation functions that are squared leaky ReLU and leaky ReLU.
When I am training this network I am able to calculate a gradient for the bias in the first dense layer but I get a warning that no gradient exists for the bias in the custom layer. When I add #tf.function to my activation functions the warning goes away but the gradient is 0. Furthermore, loss.numpy() throws an error when I use #tf.function and run in a local Jupyter notebook (but not in Colab).
Any ideas why the bias gradient exists for the dense but not the custom layer and how to calculate the bias gradient for all layers? A minimal working example is provided in this Colab notebook. Much appreciated!
Below is my custom layer. It's very similar to the standard dense layer.
class DensePartiallyConstrained(Layer):
'''
A custom layer inheriting from `tf.keras.layers.Layers` class.
This class is a fully-connected layer with two inputs. This allows
for different constraints on the weights of each input. This enables
a passthrough of the inputs to each hidden layer to have no
weight constraints while the input from the previous layer can have
a positive constraint. It also allows for different initializations
of the weight values for each input.
Most of this code and documentation was borrowed from the
`tf.keras.layers.Dense` documentation on Github (thanks!).
'''
def __init__(self,
units,
activation = None,
use_bias = True,
kernel_initializer = 'glorot_uniform',
passthrough_initializer = 'glorot_uniform',
bias_initializer = 'zeros',
kernel_constraint = None,
passthrough_constraint = None,
bias_constraint = None,
activity_regularizer = None,
regularizer_constant = 1.0,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(DensePartiallyConstrained, self).__init__(
activity_regularizer = regularizers.get(activity_regularizer), **kwargs)
self.units = int(units)
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.passthrough_initializer = initializers.get(passthrough_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.passthrough_constraint = constraints.get(passthrough_constraint)
self.bias_constraint = constraints.get(bias_constraint)
# This is for add_loss in call() method
self.regularizer_constant = regularizer_constant
# What does this do?
self.supports_masking = True
self.kernel_input_spec = InputSpec(min_ndim=2)
self.passthrough_input_spec = InputSpec(min_ndim=2)
def build(self, input_shape):
# Input shapes provided as list [kernel, passthrough]
kernel_input_shape, passthrough_input_shape = input_shape
# Check for proper datatype
dtype = dtypes.as_dtype(self.dtype or K.floatx())
if not (dtype.is_floating or dtype.is_complex):
raise TypeError('Unable to build `DensePartiallyConstrained` layer with non-floating point '
'dtype %s' % (dtype,))
# Check kernel input dimensions
kernel_input_shape = tensor_shape.TensorShape(kernel_input_shape)
if tensor_shape.dimension_value(kernel_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
kernel_last_dim = tensor_shape.dimension_value(kernel_input_shape[-1])
self.kernel_input_spec = InputSpec(min_ndim=2,
axes={-1: kernel_last_dim})
# Check passthrough input dimensions
passthrough_input_shape = tensor_shape.TensorShape(passthrough_input_shape)
if tensor_shape.dimension_value(passthrough_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
passthrough_last_dim = tensor_shape.dimension_value(passthrough_input_shape[-1])
self.passthrough_input_spec = InputSpec(min_ndim=2,
axes={-1: passthrough_last_dim})
# Add weights to kernel (between layer connections)
self.kernel = self.add_weight(name = 'kernel',
shape = [kernel_last_dim, self.units],
initializer = self.kernel_initializer,
constraint = self.kernel_constraint,
dtype = self.dtype,
trainable = True)
# Add weight to input passthrough
self.passthrough = self.add_weight(name = 'passthrough',
shape = [passthrough_last_dim, self.units],
initializer = self.passthrough_initializer,
constraint = self.passthrough_constraint,
dtype = self.dtype,
trainable = True)
# Add weights to bias
if self.use_bias:
self.bias = self.add_weight(name = 'bias',
shape = [self.units,],
initializer = self.bias_initializer,
constraint = self.bias_constraint,
dtype = self.dtype,
trainable = True)
else:
self.bias = None
self.built = True
super(DensePartiallyConstrained, self).build(input_shape)
def call(self, inputs):
# Inputs provided as list [kernel, passthrough]
kernel_input, passthrough_input = inputs
# Calculate weights regularizer
self.add_loss(self.regularizer_constant * tf.reduce_sum(tf.square(tf.math.maximum(tf.negative(self.kernel), 0.0))))
# Calculate layer output
outputs = tf.add(tf.matmul(kernel_input, self.kernel), tf.matmul(passthrough_input, self.passthrough))
if self.use_bias:
outputs = tf.add(outputs, self.bias)
if self.activation is not None:
return self.activation(outputs)
return outputs
And my activation functions:
##tf.function
def squared_leaky_ReLU(x, alpha = 0.2):
return tf.square(tf.maximum(x, alpha * x))
##tf.function
def leaky_ReLU(x, alpha = 0.2):
return tf.maximum(x, alpha * x)
Edit:
With a tensorflow update I can now access loss.numpy() when using #tf.function with my activation functions. This returns 0 gradients for the bias in all of my custom layers.
I'm beginning to think that the lack of gradient for the bias terms in the custom layer might have something to do with my loss function:
minimax loss
where
regularizer
is regularization for the weights in the custom layer kernel only. The loss for g(x) is based on the gradient with respect to the inputs, so it doesn't contain any information about the bias (the bias in f(x) update normally). Still though, if this is the case I don't understand why the bias in the first hidden dense layer of g(y) is updated? The networks are identical other than f(x) has a positive constraint on the kernel weights.
I have an autoencoder and I need to add a Gaussian noise layer after my output. I need a custom layer to do this, but I really do not know how to produce it, I need to produce it using tensors.
what should I do if I want to implement the above equation in the call part of the following code?
class SaltAndPepper(Layer):
def __init__(self, ratio, **kwargs):
super(SaltAndPepper, self).__init__(**kwargs)
self.supports_masking = True
self.ratio = ratio
# the definition of the call method of custom layer
def call(self, inputs, training=None):
def noised():
shp = K.shape(inputs)[1:]
**what should I put here????**
return out
return K.in_train_phase(noised(), inputs, training=training)
def get_config(self):
config = {'ratio': self.ratio}
base_config = super(SaltAndPepper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I also try to implement using lambda layer but it dose not work.
If you are looking for additive or multiplicative Gaussian noise, then they have been already implemented as a layer in Keras: GuassianNoise (additive) and GuassianDropout (multiplicative).
However, if you are specifically looking for the blurring effect as in Gaussian blur filters in image processing, then you can simply use a depth-wise convolution layer (to apply the filter on each input channel independently) with fixed weights to get the desired output (Note that you need to generate the weights of Gaussian kernel to set them as the weights of DepthwiseConv2D layer. For that you can use the function introduced in this answer):
import numpy as np
from keras.layers import DepthwiseConv2D
kernel_size = 3 # set the filter size of Gaussian filter
kernel_weights = ... # compute the weights of the filter with the given size (and additional params)
# assuming that the shape of `kernel_weighs` is `(kernel_size, kernel_size)`
# we need to modify it to make it compatible with the number of input channels
in_channels = 3 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
# define your model...
# somewhere in your model you want to apply the Gaussian blur,
# so define a DepthwiseConv2D layer and set its weights to kernel weights
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')
g_layer_out = g_layer(the_input_tensor_for_this_layer) # apply it on the input Tensor of this layer
# the rest of the model definition...
# do this BEFORE calling `compile` method of the model
g_layer.set_weights([kernel_weights])
g_layer.trainable = False # the weights should not change during training
# compile the model and start training...
After a while trying to figure out how to do this with the code #today has provided, I have decided to share my final code with anyone possibly needing it in future. I have created a very simple model that is only applying the blurring on the input data:
import numpy as np
from keras.layers import DepthwiseConv2D
from keras.layers import Input
from keras.models import Model
def gauss2D(shape=(3,3),sigma=0.5):
m,n = [(ss-1.)/2. for ss in shape]
y,x = np.ogrid[-m:m+1,-n:n+1]
h = np.exp( -(x*x + y*y) / (2.*sigma*sigma) )
h[ h < np.finfo(h.dtype).eps*h.max() ] = 0
sumh = h.sum()
if sumh != 0:
h /= sumh
return h
def gaussFilter():
kernel_size = 3
kernel_weights = gauss2D(shape=(kernel_size,kernel_size))
in_channels = 1 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
inp = Input(shape=(3,3,1))
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')(inp)
model_network = Model(input=inp, output=g_layer)
model_network.layers[1].set_weights([kernel_weights])
model_network.trainable= False #can be applied to a given layer only as well
return model_network
a = np.array([[[1, 2, 3], [4, 5, 6], [4, 5, 6]]])
filt = gaussFilter()
print(a.reshape((1,3,3,1)))
print(filt.predict(a.reshape(1,3,3,1)))
For testing purposes the data are only of shape 1,3,3,1, the function gaussFilter() creates a very simple model with only input and one convolution layer that provides Gaussian blurring with weights defined in the function gauss2D(). You can add parameters to the function to make it more dynamic, e.g. shape, kernel size, channels. The weights according to my findings can be applied only after the layer was added to the model.
As the Error: AttributeError: 'float' object has no attribute 'dtype', just change K.sqrt to math.sqrt, then it will work.
I defined a funciton in tensorflow as follows:
def generator(keep_prob, z, out_channel_dim, alphag1, is_train=True):
"""
Create the generator network
:param z: Input z
:param out_channel_dim: The number of channels in the output image
:param is_train: Boolean if generator is being used for training
:return: The tensor output of the generator
"""
# TODO: Implement Function
# when it is training reuse=False
# when it is not training reuse=True
alpha=alphag1
with tf.variable_scope('generator',reuse=not is_train):
layer = tf.layers.dense(z, 3*3*512,activation=None,\
kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False))
layer = tf.reshape(layer, [-1, 3,3,512])
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 256, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 128, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, 64, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
layer = tf.layers.batch_normalization(layer, training=is_train)
layer = tf.maximum(layer*alpha, layer)
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.0000001, dtype=tf.float32)
#layer = tf.nn.dropout(layer,keep_prob)
layer = tf.layers.conv2d_transpose(layer, out_channel_dim, 4, strides=2, padding='same',\
kernel_initializer=tf.contrib.layers.xavier_initializer_conv2d(uniform=False))
#layer = layer+tf.random_normal(shape=tf.shape(layer), mean=0.0, stddev=0.00000001, dtype=tf.float32)
layer = tf.tanh(layer)
return layer
This is complicated such that to track each variable in each layer is difficult.
I later used tf.train.Saver() and saver.save to save everything after training.
Now I would like to restore this function so that I can use it to do further manipulations while keeping the trained weigts of each layer unchanged.
I found online that most function like tf.get_default_graph().get_tensor_by_name or some other functions were limited to restore only the values of the variables but not this function.
For example the input z of this function generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) is a tensor from another function.
I want to restore this function so that I can use two new tensors z1 and z2 with she same shape as z.
layer1 = generator(keep_prob, z1, out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2
and I can put this new tensor layer into another function.
Here layer1 and layer2 use the function with the saved weights.
The thing that is difficlut is that when I use the function generator I have to specifiy it with the trianed weights which was stored using Saver(). I find it difficult to specify this function with its weights. For, 1. too many layers to track off and 2. I don't know how to specify weights for tf.layers.conv2().
So are there anyone who know how to solve this issue?
This is a general question:
I save the whole model into file and need to restore part of the model into part of new model.
Here name_map is a dict:the key is new name in graph and value is name in ckpt file.
def get_restore_saver(self, name_map, restore_optimise_var=True):
var_grp = {_.op.name:_ for _ in tf.global_variables()}
varm = {}
for main_name in var_grp:
if main_name in name_map:
varm[name_map[main_name]] = var_grp[main_name]
elif restore_optimise_var: # I use adam to optimise
var_arr = main_name.split('/')
tail = var_arr[-1]
_ = '/'.join(var_arr[: -1])
if tail in ['Adam', 'Adam_1', 'ExponentialMovingAverage'] and _ in name_map:
varm[name_map[_] + "/" + tail] = var_grp[main_name]
return tf.train.Saver(varm)
Why do you need to restore the function and what does that even mean? If you need to use a model, you have to restore the corresponding graph. What your function does is defining nodes of the graph. You may use your function to build or rebuild that graph again and then load weights stored somewhere using Saver() or you may restore graph from the protobuf file.
In order to rebuild the graph, try to invoke invoke your function somewhere output_layer=generator(keep_prob, z, out_channel_dim, alphag1, is_train=True) and than use Saver class as usual to restore weights. Your function does not compute, it defines a part or whole of the graph. All computations are performed by the graph.
In the last case you will find useful the following thread. Usually, you will need to know names of the input and output layers. That can be obtained by the code:
[n.name for n in tf.get_default_graph().as_graph_def().node]
After a long time of searching, it seems that maybe the following is a solution.
Define all the variables in advance,i.e.layer1 = generator(keep_prob, z1,
out_channel_dim, alphag1, is_train=False)
layer2 = generator(keep_prob, z2, out_channel_dim, alphag1, is_train=False)
layer = layer1 - layer2.
Now you can use tf.get_collection to find the operators.
It seems that tensorflow will not give you the pre defined functions. It keeps the graph and values only but not in the form of function. One needs to set everything needed in the furture in the graph or one should keep track of every weights, even too many.
I'm trying to reshape a tensor from [A, B, C, D] into [A, B, C * D] and feed it into a dynamic_rnn. Assume that I don't know the B, C, and D in advance (they're a result of a convolutional network).
I think in Theano such reshaping would look like this:
x = x.flatten(ndim=3)
It seems that in TensorFlow there's no easy way to do this and so far here's what I came up with:
x_shape = tf.shape(x)
x = tf.reshape(x, [batch_size, x_shape[1], tf.reduce_prod(x_shape[2:])]
Even when the shape of x is known during graph building (i.e. print(x.get_shape()) prints out absolute values, like [10, 20, 30, 40] after the reshaping get_shape() becomes [10, None, None]. Again, still assume the initial shape isn't known so I can't operate with absolute values.
And when I'm passing x to a dynamic_rnn it fails:
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
Why is reshape unable to handle this case? What is the right way of replicating Theano's flatten(ndim=n) in TensorFlow with tensors of rank 4 and more?
It is not a flaw in reshape, but a limitation of tf.dynamic_rnn.
Your code to flatten the last two dimensions is correct. And, reshape behaves correctly too: if the last two dimensions are unknown when you define the flattening operation, then so is their product, and None is the only appropriate value that can be returned at this time.
The culprit is tf.dynamic_rnn, which expects a fully-defined feature shape during construction, i.e. all dimensions apart from the first (batch size) and the second (time steps) must be known. It is a bit unfortunate perhaps, but the current implementation does not seem to allow RNNs with a variable number of features, à la FCN.
I tried a simple code according to your requirements. Since you are trying to reshape a CNN output, the shape of X is same as the output of CNN in Tensorflow.
HEIGHT = 100
WIDTH = 200
N_CHANELS =3
N_HIDDEN =64
X = tf.placeholder(tf.float32, shape=[None,HEIGHT,WIDTH,N_CHANELS],name='input') # output of CNN
shape = X.get_shape().as_list() # get the shape of each dimention shape[0] =BATCH_SIZE , shape[1] = HEIGHT , shape[2] = HEIGHT = WIDTH , shape[3] = N_CHANELS
input = tf.reshape(X, [-1, shape[1] , shape[2] * shape[3]])
print(input.shape) # prints (?, 100, 600)
#Input for tf.nn.dynamic_rnn should be in the shape of [BATCH_SIZE, N_TIMESTEPS, INPUT_SIZE]
#Therefore, according to the reshape N_TIMESTEPS = 100 and INPUT_SIZE= 600
#create the RNN here
lstm_layers = tf.contrib.rnn.BasicLSTMCell(N_HIDDEN, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_layers, input, dtype=tf.float32)
Hope this helps.
I found a solution to this by using .get_shape().
Assuming 'x' is a 4-D Tensor.
This will only work with the Reshape Layer. As you were making changes to the architecture of the model, this should work.
x = tf.keras.layers.Reshape(x, [x.get_shape()[0], x.get_shape()[1], x.get_shape()[2] * x.get_shape()][3])
Hope this works!
If you use the tf.keras.models.Model or tf.keras.layers.Layer wrapper, the build method provides a nice way to do this.
Here's an example:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv1D, Conv2D, Conv2DTranspose, Attention, Layer, Reshape
class VisualAttention(Layer):
def __init__(self, channels_out, key_is_value=True):
super(VisualAttention, self).__init__()
self.channels_out = channels_out
self.key_is_value = key_is_value
self.flatten_images = None # see build method
self.unflatten_images = None # see build method
self.query_conv = Conv1D(filters=channels_out, kernel_size=1, padding='same')
self.value_conv = Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.key_conv = self.value_conv if key_is_value else Conv1D(filters=channels_out, kernel_size=4, padding='same')
self.attention_layer = Attention(use_scale=False, causal=False, dropout=0.)
def build(self, input_shape):
b, h, w, c = input_shape
self.flatten_images = Reshape((h*w, c), input_shape=(h, w, c))
self.unflatten_images = Reshape((h, w, self.channels_out), input_shape=(h*w, self.channels_out))
def call(self, x, training=True):
x = self.flatten_images(x)
q = self.query_conv(x)
v = self.value_conv(x)
inputs = [q, v] if self.key_is_value else [q, v, self.key_conv(x)]
output = self.attention_layer(inputs=inputs, training=training)
return self.unflatten_images(output)
# test
import numpy as np
x = np.arange(8*28*32*3).reshape((8, 28, 32, 3)).astype('float32')
model = VisualAttention(8)
y = model(x)
print(y.shape)
Has anyone been able to mix feedforward layers and recurrent layers in Tensorflow?
For example:
input->conv->GRU->linear->output
I can imagine one can define his own cell with feedforward layers and no state which can then be stacked using the MultiRNNCell function, something like:
cell = tf.nn.rnn_cell.MultiRNNCell([conv_cell,GRU_cell,linear_cell])
This would make life a whole lot easier...
can't you just do the following:
rnnouts, _ = rnn(grucell, inputs)
linearout = [tf.matmul(rnnout, weights) + bias for rnnout in rnnouts]
etc.
This tutorial gives an example of how to use convolutional layers together with recurrent ones. For example, having last convolution layers like this:
...
l_conv4_a = conv_pre(l_pool3, 16, (5, 5), scope="l_conv4_a")
l_pool4 = pool(l_conv3_a, scope="l_pool4")
l_flatten = flatten(l_pool4, scope="flatten")
and having defined RNN cell:
_, shape_state = tf.nn.dynamic_rnn(cell=shape_cell,
inputs=tf.expand_dims(batch_norm(x_shape_pl), 2), dtype=tf.float32, scope="shape_rnn")
You can concatenate both outputs and use it as the input to the next layer:
features = tf.concat(concat_dim=1, values=[x_margin_pl, shape_state, x_texture_pl, l_flatten], name="features")
Or you can just use the output of CNN layer as the input to the RNN cell:
_, shape_state = tf.nn.dynamic_rnn(cell=shape_cell,
inputs=l_flatten, dtype=tf.float32, scope="shape_rnn")
This is what I have so far; improvements welcome:
class LayerCell(rnn_cell_impl.RNNCell):
def __init__(self, tf_layer, **kwargs):
''' :param tf_layer: a tensorflow layer, e.g. tf.layers.Conv2D or
tf.keras.layers.Conv2D. NOT tf.layers.conv2d !
Can pass all other layer params as well, just need to give the
parameter name: paramname=param'''
self.layer_fn = tf_layer(**kwargs)
def __call__(self, inputs, state, scope=None):
''' Every `RNNCell` must implement `call` with
the signature `(output, next_state) = call(input, state)`. The optional
third input argument, `scope`, is allowed for backwards compatibility
purposes; but should be left off for new subclasses.'''
return (self.layer_fn(inputs), state)
def __str__(self):
return "Cell wrapper of " + str(self.layer_fn)
def __getattr__(self, attr):
'''credits to https://stackoverflow.com/questions/1382871/dynamically-attaching-a-method-to-an-existing-python-object-generated-with-swig/1383646#1383646'''
return getattr(self.layer_fn, attr)
#property
def state_size(self):
"""size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers
or TensorShapes.
"""
return (0,)
#property
def output_size(self):
"""Integer or TensorShape: size of outputs produced by this cell."""
# use with caution; could be uninitialized
return self.layer_fn.output_shape
(Naturally, don't use with recurrent layers because state-keeping will be destroyed.)
Seems to work with: tf.layers.Conv2D, tf.keras.layers.Conv2D, tf.keras.layers.Activation, tf.layers.BatchNormalization
Does NOT work with: tf.keras.layers.BatchNormalization.
At least it failed for me when using it in a tf.while loop; complaining about combining variables from different frames, similar to here. Maybe keras uses tf.Variable() instead of tf.get_variable() ...?
Usage:
cell0 = tf.contrib.rnn.ConvLSTMCell(conv_ndims=2, input_shape=[40, 40, 3], output_channels=16, kernel_shape=[5, 5])
cell1 = LayerCell(tf.keras.layers.Conv2D, filters=8, kernel_size=[5, 5], strides=(1, 1), padding='same')
cell2 = LayerCell(tf.layers.BatchNormalization, axis=-1)
inputs = np.random.rand(10, 40, 40, 3).astype(np.float32)
multicell = tf.contrib.rnn.MultiRNNCell([cell0, cell1, cell2])
state = multicell.zero_state(batch_size=10, dtype=tf.float32)
output = multicell(inputs, state)