How can I specify input dimension in neural_tangent.stax framework? - python

I have a code defining structure of a model
from neural_tangents import stax
from neural_tangents.stax import Dense
from jax import jit
def model(
W_std,
b_std,
width,
depth,
activation,
parameterization
):
"""Construct fully connected NN model and infinite width NTK & NNGP kernel
function.
Args:
W_std (float): Weight standard deviation.
b_std (float): Bias standard deviation.
width (int): Hidden layer width.
depth (int): Number of hidden layers.
activation (string): Activation function string, 'erf' or 'relu'.
parameterization (string): Parameterization string, 'ntk' or 'standard'.
Returns:
`(init_fn, apply_fn, kernel_fn)`
"""
act = activation_fn(activation)
layers_list = [Dense(width, W_std, b_std, parameterization=parameterization)]
def layer_block():
return stax.serial(act(), Dense(width, W_std, b_std, parameterization=parameterization))
for _ in range(depth-1):
layers_list += [layer_block()]
layers_list += [act(), Dense(1, W_std, b_std, parameterization=parameterization)]
# print (f"---- layer list is {layers_list} ------")
init_fn, apply_fn, kernel_fn = stax.serial(*layers_list)
apply_fn = jit(apply_fn)
return init_fn, apply_fn, kernel_fn
I can't see where I can establish dimension of input. By default it is 1, but I need to adapt this structure to inputs of higher dimension. width parameter in Dense specifies only output dimension. How can I change input dimension?
Code is from here

The key is that Dense doesn't require input dimension. It is specified in init_fn function:
init_fn, apply_fn, kernel_fn = model(
W_std,
b_std,
width,
depth,
activation,
parameterization
)
_, init_params = init_fn(key, input.shape)

Related

Obtain the output of intermediate layer (Functional API) and use it in SubClassed API

In the keras doc, it says that if we want to pick the intermediate layer's output of the model (sequential and functional), all we need to do as follows:
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = keras.Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model(data)
So, here we get two models, the intermediate_layer_model is the sub-model of its parent model. And they're independent as well. Likewise, if we get the intermediate layer's output feature maps of the parent model (or base model), and do some operation with it and get some output feature maps from this operation, then we can also impute this output feature maps back to the parent model.
input = tf.keras.Input(shape=(size,size,3))
model = tf.keras.applications.DenseNet121(input_tensor = input)
layer_name = "conv1_block1" # for example
output_feat_maps = SomeOperationLayer()(model.get_layer(layer_name).output)
# assume, they're able to add up
base = Add()([model.output, output_feat_maps])
# bind all
imputed_model = tf.keras.Model(inputs=[model.input], outputs=base)
So, in this way we have one modified model. It's quite easy with functional API. All the keras imagenet models are written with functional API (mostly). In model subclassing API, we can use these models. My concern here is, what to do if we need the intermediate output feature maps of these functional API models' inside call function.
class Subclass(tf.keras.Model):
def __init__(self, dim):
super(Subclass, self).__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim)
# building new model with the desired output layer of base model
self.mid_layer_model = tf.keras.Model(self.base.inputs,
self.base.get_layer(layer_name).output)
def call(self, inputs):
# forward with base model
x = self.base(inputs)
# forward with mid_layer_model
mid_feat = self.mid_layer_model(inputs)
# do some op with it
mid_x = SomeOperationLayer()(mid_feat)
# assume, they're able to add up
out = tf.keras.layers.add([x, mid_x])
return out
The issue is, here we've technically two models in a joint fashion. But unlike building a model like this, here we simply want the intermediate output feature maps (from some inputs) of the base model forward manner and use it somewhere else and get some output. Like this
mid_x = SomeOperationLayer()(self.base.get_layer(layer_name).output)
But it gives ValueError: Graph disconnected. So, currently, we have to build a new model from the base model based on our desired intermediate layer. In the init method we define or create new self.mid_layer_model model that gives our desired output feature maps like this: mid_feat = self.mid_layer_model(inputs). Next, we take the mid_faet and do some operation and get some output and lastly add them with tf.keras.layers.add([x, mid_x]). So by creating a new model with desired intermediate out works but by the same time, we repeat the same operation twice i.e the base model and its subset model. Maybe I'm missing something obvious, please add up something. Is it how it is! or there some strategies we can adopt. I've asked in the forum here, no response yet.
Update
Here is a working example. Let's say we have a custom layer like this
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
And we want to impute this layer into an ImageNet model and construct a model like this
input = tf.keras.Input(shape=(32, 32, 3))
base = DenseNet121(weights=None, input_tensor = input)
# get output feature maps of at certain layer, ie. conv2_block1_0_relu
cb = ConvBlock()(base.get_layer("conv2_block1_0_relu").output)
flat = Flatten()(cb)
dense = Dense(1000)(flat)
# adding up
adding = Add()([base.output, dense])
model = tf.keras.Model(inputs=[base.input], outputs=adding)
from tensorflow.keras.utils import plot_model
plot_model(model,
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here the computation from input to layer conv2_block1_0_relu is computed one time. Next, if we want to translate this functional API to subclassing API, we had to build a model from the base model's input to layer conv2_block1_0_relu first. Like
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(self.base.inputs,
self.base.get_layer("conv2_block1_0_relu").output)
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.base(x)
# forward with mid layer
mx = self.mid_layer(x)
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here model_1 is actually a sub-model from DenseNet, which probably leads the whole model (ModelWithMidLayer) to compute the same operation twice. If this observation is correct, then this gives us concern.
I thought it might be much complex but it's actually rather very simple. We just need to build a model with desired output layers at the __init__ method and use it normally in the call method.
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(
inputs=[self.base.inputs],
outputs=[
self.base.get_layer("conv2_block1_0_relu").output,
self.base.output])
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.mid_layer(x)[1] # output self.base.output
# forward with mid layer
mx = self.mid_layer(x)[0] # output base.get_layer("conv2_block1_0_relu").output
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
tf.keras.utils.plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)

Tensorflow gradients do not exist for bias in custom layer

I've built an input convex neural network in Tensorflow following this ArXiv paper that is a scalar output feed-forward model. The first hidden layer is dense and subsequent layers are custom that takes two inputs: the output from the previous layer (kernel) and the model input (passthrough). Separate weights are applied to each. This allows a positive weights regularizer to be applied to kernel weights but not the passthrough. I calculate the regularizer and add it using self.add_loss in the call method of the custom layer. I'm also using custom activation functions that are squared leaky ReLU and leaky ReLU.
When I am training this network I am able to calculate a gradient for the bias in the first dense layer but I get a warning that no gradient exists for the bias in the custom layer. When I add #tf.function to my activation functions the warning goes away but the gradient is 0. Furthermore, loss.numpy() throws an error when I use #tf.function and run in a local Jupyter notebook (but not in Colab).
Any ideas why the bias gradient exists for the dense but not the custom layer and how to calculate the bias gradient for all layers? A minimal working example is provided in this Colab notebook. Much appreciated!
Below is my custom layer. It's very similar to the standard dense layer.
class DensePartiallyConstrained(Layer):
'''
A custom layer inheriting from `tf.keras.layers.Layers` class.
This class is a fully-connected layer with two inputs. This allows
for different constraints on the weights of each input. This enables
a passthrough of the inputs to each hidden layer to have no
weight constraints while the input from the previous layer can have
a positive constraint. It also allows for different initializations
of the weight values for each input.
Most of this code and documentation was borrowed from the
`tf.keras.layers.Dense` documentation on Github (thanks!).
'''
def __init__(self,
units,
activation = None,
use_bias = True,
kernel_initializer = 'glorot_uniform',
passthrough_initializer = 'glorot_uniform',
bias_initializer = 'zeros',
kernel_constraint = None,
passthrough_constraint = None,
bias_constraint = None,
activity_regularizer = None,
regularizer_constant = 1.0,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(DensePartiallyConstrained, self).__init__(
activity_regularizer = regularizers.get(activity_regularizer), **kwargs)
self.units = int(units)
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.passthrough_initializer = initializers.get(passthrough_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.passthrough_constraint = constraints.get(passthrough_constraint)
self.bias_constraint = constraints.get(bias_constraint)
# This is for add_loss in call() method
self.regularizer_constant = regularizer_constant
# What does this do?
self.supports_masking = True
self.kernel_input_spec = InputSpec(min_ndim=2)
self.passthrough_input_spec = InputSpec(min_ndim=2)
def build(self, input_shape):
# Input shapes provided as list [kernel, passthrough]
kernel_input_shape, passthrough_input_shape = input_shape
# Check for proper datatype
dtype = dtypes.as_dtype(self.dtype or K.floatx())
if not (dtype.is_floating or dtype.is_complex):
raise TypeError('Unable to build `DensePartiallyConstrained` layer with non-floating point '
'dtype %s' % (dtype,))
# Check kernel input dimensions
kernel_input_shape = tensor_shape.TensorShape(kernel_input_shape)
if tensor_shape.dimension_value(kernel_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
kernel_last_dim = tensor_shape.dimension_value(kernel_input_shape[-1])
self.kernel_input_spec = InputSpec(min_ndim=2,
axes={-1: kernel_last_dim})
# Check passthrough input dimensions
passthrough_input_shape = tensor_shape.TensorShape(passthrough_input_shape)
if tensor_shape.dimension_value(passthrough_input_shape[-1]) is None:
raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
'should be defined. Found `None`.')
passthrough_last_dim = tensor_shape.dimension_value(passthrough_input_shape[-1])
self.passthrough_input_spec = InputSpec(min_ndim=2,
axes={-1: passthrough_last_dim})
# Add weights to kernel (between layer connections)
self.kernel = self.add_weight(name = 'kernel',
shape = [kernel_last_dim, self.units],
initializer = self.kernel_initializer,
constraint = self.kernel_constraint,
dtype = self.dtype,
trainable = True)
# Add weight to input passthrough
self.passthrough = self.add_weight(name = 'passthrough',
shape = [passthrough_last_dim, self.units],
initializer = self.passthrough_initializer,
constraint = self.passthrough_constraint,
dtype = self.dtype,
trainable = True)
# Add weights to bias
if self.use_bias:
self.bias = self.add_weight(name = 'bias',
shape = [self.units,],
initializer = self.bias_initializer,
constraint = self.bias_constraint,
dtype = self.dtype,
trainable = True)
else:
self.bias = None
self.built = True
super(DensePartiallyConstrained, self).build(input_shape)
def call(self, inputs):
# Inputs provided as list [kernel, passthrough]
kernel_input, passthrough_input = inputs
# Calculate weights regularizer
self.add_loss(self.regularizer_constant * tf.reduce_sum(tf.square(tf.math.maximum(tf.negative(self.kernel), 0.0))))
# Calculate layer output
outputs = tf.add(tf.matmul(kernel_input, self.kernel), tf.matmul(passthrough_input, self.passthrough))
if self.use_bias:
outputs = tf.add(outputs, self.bias)
if self.activation is not None:
return self.activation(outputs)
return outputs
And my activation functions:
##tf.function
def squared_leaky_ReLU(x, alpha = 0.2):
return tf.square(tf.maximum(x, alpha * x))
##tf.function
def leaky_ReLU(x, alpha = 0.2):
return tf.maximum(x, alpha * x)
Edit:
With a tensorflow update I can now access loss.numpy() when using #tf.function with my activation functions. This returns 0 gradients for the bias in all of my custom layers.
I'm beginning to think that the lack of gradient for the bias terms in the custom layer might have something to do with my loss function:
minimax loss
where
regularizer
is regularization for the weights in the custom layer kernel only. The loss for g(x) is based on the gradient with respect to the inputs, so it doesn't contain any information about the bias (the bias in f(x) update normally). Still though, if this is the case I don't understand why the bias in the first hidden dense layer of g(y) is updated? The networks are identical other than f(x) has a positive constraint on the kernel weights.

how do I implement Gaussian blurring layer in Keras?

I have an autoencoder and I need to add a Gaussian noise layer after my output. I need a custom layer to do this, but I really do not know how to produce it, I need to produce it using tensors.
what should I do if I want to implement the above equation in the call part of the following code?
class SaltAndPepper(Layer):
def __init__(self, ratio, **kwargs):
super(SaltAndPepper, self).__init__(**kwargs)
self.supports_masking = True
self.ratio = ratio
# the definition of the call method of custom layer
def call(self, inputs, training=None):
def noised():
shp = K.shape(inputs)[1:]
**what should I put here????**
return out
return K.in_train_phase(noised(), inputs, training=training)
def get_config(self):
config = {'ratio': self.ratio}
base_config = super(SaltAndPepper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I also try to implement using lambda layer but it dose not work.
If you are looking for additive or multiplicative Gaussian noise, then they have been already implemented as a layer in Keras: GuassianNoise (additive) and GuassianDropout (multiplicative).
However, if you are specifically looking for the blurring effect as in Gaussian blur filters in image processing, then you can simply use a depth-wise convolution layer (to apply the filter on each input channel independently) with fixed weights to get the desired output (Note that you need to generate the weights of Gaussian kernel to set them as the weights of DepthwiseConv2D layer. For that you can use the function introduced in this answer):
import numpy as np
from keras.layers import DepthwiseConv2D
kernel_size = 3 # set the filter size of Gaussian filter
kernel_weights = ... # compute the weights of the filter with the given size (and additional params)
# assuming that the shape of `kernel_weighs` is `(kernel_size, kernel_size)`
# we need to modify it to make it compatible with the number of input channels
in_channels = 3 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
# define your model...
# somewhere in your model you want to apply the Gaussian blur,
# so define a DepthwiseConv2D layer and set its weights to kernel weights
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')
g_layer_out = g_layer(the_input_tensor_for_this_layer) # apply it on the input Tensor of this layer
# the rest of the model definition...
# do this BEFORE calling `compile` method of the model
g_layer.set_weights([kernel_weights])
g_layer.trainable = False # the weights should not change during training
# compile the model and start training...
After a while trying to figure out how to do this with the code #today has provided, I have decided to share my final code with anyone possibly needing it in future. I have created a very simple model that is only applying the blurring on the input data:
import numpy as np
from keras.layers import DepthwiseConv2D
from keras.layers import Input
from keras.models import Model
def gauss2D(shape=(3,3),sigma=0.5):
m,n = [(ss-1.)/2. for ss in shape]
y,x = np.ogrid[-m:m+1,-n:n+1]
h = np.exp( -(x*x + y*y) / (2.*sigma*sigma) )
h[ h < np.finfo(h.dtype).eps*h.max() ] = 0
sumh = h.sum()
if sumh != 0:
h /= sumh
return h
def gaussFilter():
kernel_size = 3
kernel_weights = gauss2D(shape=(kernel_size,kernel_size))
in_channels = 1 # the number of input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1)
kernel_weights = np.repeat(kernel_weights, in_channels, axis=-1) # apply the same filter on all the input channels
kernel_weights = np.expand_dims(kernel_weights, axis=-1) # for shape compatibility reasons
inp = Input(shape=(3,3,1))
g_layer = DepthwiseConv2D(kernel_size, use_bias=False, padding='same')(inp)
model_network = Model(input=inp, output=g_layer)
model_network.layers[1].set_weights([kernel_weights])
model_network.trainable= False #can be applied to a given layer only as well
return model_network
a = np.array([[[1, 2, 3], [4, 5, 6], [4, 5, 6]]])
filt = gaussFilter()
print(a.reshape((1,3,3,1)))
print(filt.predict(a.reshape(1,3,3,1)))
For testing purposes the data are only of shape 1,3,3,1, the function gaussFilter() creates a very simple model with only input and one convolution layer that provides Gaussian blurring with weights defined in the function gauss2D(). You can add parameters to the function to make it more dynamic, e.g. shape, kernel size, channels. The weights according to my findings can be applied only after the layer was added to the model.
As the Error: AttributeError: 'float' object has no attribute 'dtype', just change K.sqrt to math.sqrt, then it will work.

Keras Conv2D custom kernel initialization

I need to initialize custom Conv2D kernels with weights
W = a1b1 + a2b2 + ... + anbn
where W = custom weight of Conv2D layer to initialise with
a = random weight Tensors as keras.backend.variable(np.random.uniform()), shape=(64, 1, 10)
b = fixed basis filters defined as keras.backend.constant(...), shape=(10, 11, 11)
W = K.sum(a[:, :, :, None, None] * b[None, None, :, :, :], axis=2) #shape=(64, 1, 11, 11)
I want my model to update the 'W' values with only changing the 'a's while keeping the 'b's constant.
I pass the custom 'W's as
Conv2D(64, kernel_size=(11, 11), activation='relu', kernel_initializer=kernel_init_L1)(img)
where kernel_init_L1 returns keras.backend.variable(K.reshape(w_L1, (11, 11, 1, 64)))
Problem:
I am not sure if this is the correct way to do this. Is it possible to specify in Keras which ones are trainable and which are not. I know that layers can be set trainable = True but i am not sure about weights.
I think the implementation is incorrect because I get similar results from my model with or without the custom initializations.
It would be immensely helpful if someone can point out any mistakes in my approach or provide a way to verify it.
Warning about your shapes: If your kernel size is (11,11), and assuming you have 64 input channels and 1 output channel, your final kernel shape must be (11,11,64,1).
You should probably be going for a[None,None] and b[:,:,:,None,None].
class CustomConv2D(Conv2D):
def __init__(self, filters, kernel_size, kernelB = None, **kwargs):
super(CustomConv2D, self).__init__(filters, kernel_size,**kwargs)
self.kernelB = kernelB
def build(self, input_shape):
#use the input_shape to calculate the shapes of A and B
#if needed, pay attention to the "data_format" used.
#this is an actual weight, because it uses `self.add_weight`
self.kernelA = self.add_weight(
shape=shape_of_kernel_A + (1,1), #or (1,1) + shape_of_A
initializer='glorot_uniform', #or select another
name='kernelA',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
#this is an ordinary var that will participate in the calculation
#not a weight, not updated
if self.kernelB is None:
self.kernelB = K.constant(....)
#use the shape already containing the new axes
#in the original conv layer, this property would be the actual kernel,
#now it's just a var that will be used in the original's "call" method
self.kernel = K.sum(self.kernelA * self.kernelB, axis=2)
#important: the resulting shape should be:
#(kernelSizeX, kernelSizeY, input_channels, output_channels)
#the following are remains of the original code for "build" in Conv2D
#use_bias is True by default
if self.use_bias:
self.bias = self.add_weight(shape=(self.filters,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
# Set input spec.
self.input_spec = InputSpec(ndim=self.rank + 2,
axes={channel_axis: input_dim})
self.built = True
Hints for custom layers
When you create a custom layer from zero (derived from Layer), you should have these methods:
__init__(self, ... parameters ...) - this is the creator, it's called when you create a new instance of your layer. Here, you store the values the user passed as parameters. (In a Conv2D, the init would have the "filters", "kernel_size", etc.)
build(self, input_shape) - this is where you should create the weights (all learnable vars are created here, based on the input shape)
compute_output_shape(self,input_shape) - here you return the output shape based on the input shape
call(self,inputs) - Here you perform the actual layer calculations
Since we're not creating this layer from zero, but deriving it from Conv2D, everything is ready, all we did was to "change" the build method and replace what would be considered the kernel of the Conv2D layer.
More on custom layers: https://keras.io/layers/writing-your-own-keras-layers/
The call method for conv layers is here in class _Conv(Layer):.

Keras_ERROR : "cannot import name '_time_distributed_dense"

Since the Keras wrapper does not support attention model yet, I'd like to refer to the following custom attention.
https://github.com/datalogue/keras-attention/blob/master/models/custom_recurrents.py
But the problem is, when I run the code above, it returns following error:
ImportError: cannot import name '_time_distributed_dense'
It looks like no more _time_distributed_dense is supported by keras over 2.0.0
the only parts that use _time_distributed_dense module is the part below:
def call(self, x):
# store the whole sequence so we can "attend" to it at each timestep
self.x_seq = x
# apply the a dense layer over the time dimension of the sequence
# do it here because it doesn't depend on any previous steps
# thefore we can save computation time:
self._uxpb = _time_distributed_dense(self.x_seq, self.U_a, b=self.b_a,
input_dim=self.input_dim,
timesteps=self.timesteps,
output_dim=self.units)
return super(AttentionDecoder, self).call(x)
In which way should I change the _time_distrubuted_dense(self ... ) part?
I just posted from An Chen's answer of the GitHub issue (the page or his answer might be deleted in the future)
def _time_distributed_dense(x, w, b=None, dropout=None,
input_dim=None, output_dim=None,
timesteps=None, training=None):
"""Apply `y . w + b` for every temporal slice y of x.
# Arguments
x: input tensor.
w: weight matrix.
b: optional bias vector.
dropout: wether to apply dropout (same dropout mask
for every temporal slice of the input).
input_dim: integer; optional dimensionality of the input.
output_dim: integer; optional dimensionality of the output.
timesteps: integer; optional number of timesteps.
training: training phase tensor or boolean.
# Returns
Output tensor.
"""
if not input_dim:
input_dim = K.shape(x)[2]
if not timesteps:
timesteps = K.shape(x)[1]
if not output_dim:
output_dim = K.shape(w)[1]
if dropout is not None and 0. < dropout < 1.:
# apply the same dropout pattern at every timestep
ones = K.ones_like(K.reshape(x[:, 0, :], (-1, input_dim)))
dropout_matrix = K.dropout(ones, dropout)
expanded_dropout_matrix = K.repeat(dropout_matrix, timesteps)
x = K.in_train_phase(x * expanded_dropout_matrix, x, training=training)
# collapse time dimension and batch dimension together
x = K.reshape(x, (-1, input_dim))
x = K.dot(x, w)
if b is not None:
x = K.bias_add(x, b)
# reshape to 3D tensor
if K.backend() == 'tensorflow':
x = K.reshape(x, K.stack([-1, timesteps, output_dim]))
x.set_shape([None, None, output_dim])
else:
x = K.reshape(x, (-1, timesteps, output_dim))
return x
You could just add this on your Python code.

Categories