I was looking for implementing attention in my seq2seq model. I know that keras has built-in attention layers (both bahdanau and luong), but I'm trying to fit a custom attention layer. I understand the math behind it but I'm not able to implement it in the desired way. I found out some custom made Additive attention layer but they only work with gradient tape and not with keras functional API.
Here's the custom attention layer -
class Attention(tf.keras.layers.Layer):
def __init__(self, units, **kwargs):
super(Attention, self).__init__(**kwargs)
self.units = units
def call(self, features, hidden):
hidden_with_time_axis = tf.expand_dims(hidden, 1)
score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
attention_weights = tf.nn.softmax(self.V(score), axis=1)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector
def build(self, input_shape):
self.W1 = tf.keras.layers.Dense(self.units)
self.W2 = tf.keras.layers.Dense(self.units)
self.V = tf.keras.layers.Dense(1)
def get_config(self):
config = super().get_config()
config.update({'units': self.units})
return config
Here feature refers to the encoder output and hidden means the previous decoder hidden state. The keras attention layer takes the decoder and encoder outputs (batch,steps,units) as the inputs and outputs a 3D vector (most probably the context vector) which is concatenated with the decoder output and then fed to the fully connected layer. But in the case of the custom layer, it outputs a 2D vector which doesn't seem to work, unlike the former.
Related
I would like to restrict a Dense layer such that only some of the weights are trainable. I have thought about making a custom layer and passing a boolean numpy array of the same shape as the weight matrix to set the corresponding layer weight to be trainable (all other weights I would want to be 0). But I'm not sure how to do this in Tensorflow. Basically I need to know how to make a tensorflow tensor where some entries are constants of 0 and other entries are variables. Below I show my code example.
In the code below
self.M=self.M[trainable_arr]
Obviously doesn't work and I need the correct alternative to this. Any good ideas? There is no simple pattern in which variables I want to be trainable or not.
class sparse_layer(tf.keras.layers.Layer):
def __init__(self, kernel_regularizer=None,latent_d=10,name=None,trainable_arr=None):
super(sparse_layer, self).__init__(name=name)
#self.w = self.add_weight(shape=(input_dim, input_dim), initializer="random_normal", trainable=True)
self.kernel_regularizer1 = tf.keras.regularizers.get(kernel_regularizer)
def build(self, input_shape):
#self.M = self.add_weight(shape=(int(latent_d), int(input_shape[-1])), initializer="random_normal", trainable=True,regularizer=self.kernel_regularizer1)
n=trainable_arr.sum().sum()
w = tf.Variable(tf.zeros(n))
#self.M=tf.constant(tf.zeros( (int(latent_d), int(input_shape[-1])) ))
self.M=tf.Variable(tf.zeros( (int(latent_d), int(input_shape[-1])) ))
self.M=self.M[trainable_arr]
# for i in self.M.shape[0]:
# for j in self.M.shape[1]:
#self.M=tf.multiply(self.M,trainable)
def call(self, inputs):
dx=tf.transpose(inputs)
return tf.transpose(tf.matmul(self.M,dx))
I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
Notice the attention layer that I added, attention_layer. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1
My attention layer is setup as follows:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
The idea of the attention mask is to allow the model to focus on more prominent features as is trains.
Why am I getting the error above and how can I fix this?
I think that the problem lies in this line:
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
This layer outputs a tensor of shape (batch_size, 64). So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for any keras layer. This is why the Repeat layer raises error as it expects vector of at least shape (batch_dimension, dim).
If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:
RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)
In TF 1.x, it was possible to build layers with custom variables. Here's an example:
import numpy as np
import tensorflow as tf
def make_custom_getter(custom_variables):
def custom_getter(getter, name, **kwargs):
if name in custom_variables:
variable = custom_variables[name]
else:
variable = getter(name, **kwargs)
return variable
return custom_getter
# Make a custom getter for the dense layer variables.
# Note: custom variables can result from arbitrary computation;
# for the sake of this example, we make them just constant tensors.
custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
custom_getter = make_custom_getter(custom_variables)
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with tf.variable_scope("model", custom_getter=custom_getter):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
The printed variables of the constructed dense layer will be custom tensors we specified in the custom_variables dict:
[<tf.Tensor 'custom_kernel:0' shape=(784, 64) dtype=float32>, <tf.Tensor 'custom_bias:0' shape=(64,) dtype=float32>]
This allows us to create layers/models that use provided tensors in custom_variables directly as their weights, so that we could further differentiate the output of the layers/models with respect to any tensors that custom_variables may depend on (particularly useful for implementing functionality in modulating sub-nets, parameter generation, meta-learning, etc.).
Variable scopes used to make it easy to nest all off graph-building inside scopes with custom getters and build models on top of the provided tensors as their parameters. Since sessions and variable scopes are no longer advisable in TF 2.0 (and all of that low-level stuff is moved to tf.compat.v1), what would be the best practice to implement the above using Keras and TF 2.0?
(Related issue on GitHub.)
Answer based on the comment below
Given you have:
kernel = createTheKernelVarBasedOnWhatYouWant() #shape (784, 64)
bias = createTheBiasVarBasedOnWhatYouWant() #shape (64,)
Make a simple function copying the code from Dense:
def custom_dense(x):
inputs, kernel, bias = x
outputs = K.dot(inputs, kernel)
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
Use the function in a Lambda layer:
layer = Lambda(custom_dense)
hiddens = layer([x, kernel, bias])
Warning: kernel and bias must be produced from a Keras layer, or come from an kernel = Input(tensor=the_kernel_var) and bias = Input(tensor=bias_var)
If the warning above is bad for you, you can always use kernel and bias "from outside", like:
def custom_dense(inputs):
outputs = K.dot(inputs, kernel) #where kernel is not part of the arguments anymore
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
layer = Lambda(custom_dense)
hiddens = layer(x)
This last option makes it a bit more complicated to save/load models.
Old answer
You should probably use a Keras Dense layer and set its weights in a standard way:
layer = tf.keras.layers.Dense(64, name='the_layer')
layer.set_weights([np.random.rand(784, 64), np.random.rand(64)])
If you need that these weights are not trainable, before compiling the keras model you set:
model.get_layer('the_layer').trainable=False
If you want direct access to the variables as tensors, they are:
kernel = layer.kernel
bias = layer.bias
There are plenty of other options, but that depends on your exact intention, which is not clear in your question.
Below is a general-purpose solution that works with arbitrary Keras models in TF2.
First, we need to define an auxiliary function canonical_variable_name and a context manager custom_make_variable with the following signatures (see implementation in meta-blocks library).
def canonical_variable_name(variable_name: str, outer_scope: str):
"""Returns the canonical variable name: `outer_scope/.../name`."""
# ...
#contextlib.contextmanager
def custom_make_variable(
canonical_custom_variables: Dict[str, tf.Tensor], outer_scope: str
):
"""A context manager that overrides `make_variable` with a custom function.
When building layers, Keras uses `make_variable` function to create weights
(kernels and biases for each layer). This function wraps `make_variable` with
a closure that infers the canonical name of the variable being created (of the
form `outer_scope/.../var_name`) and looks it up in the `custom_variables` dict
that maps canonical names to tensors. The function adheres the following logic:
* If there is a match, it does a few checks (shape, dtype, etc.) and returns
the found tensor instead of creating a new variable.
* If there is a match but checks fail, it throws an exception.
* If there are no matching `custom_variables`, it calls the original
`make_variable` utility function and returns a newly created variable.
"""
# ...
Using these functions, we can create arbitrary Keras models with custom tensors used as variables:
import numpy as np
import tensorflow as tf
canonical_custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with custom_make_variable(canonical_custom_variables, outer_scope="model"):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
Not entirely sure I understand your question correctly, but it seems to me that it should be possible to do what you want with a combination of custom layers and keras functional api.
Custom layers allow you to build any layer you want in a way that is compatible with Keras, e.g.:
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs],
initializer='normal')
self.bias = self.add_weight("bias",
shape=[self.num_outputs,],
initializer='normal')
def call(self, inputs):
return tf.matmul(inputs, self.kernel) + self.bias
and the functional api allows you to access the outputs of said layers and re-use them:
inputs = keras.Input(shape=(784,), name='img')
x1 = MyDenseLayer(64, activation='relu')(inputs)
x2 = MyDenseLayer(64, activation='relu')(x1)
outputs = MyDenseLayer(10, activation='softmax')(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
Here x1 and x2 can be connected to other subnets.
In pytorch a classification network model is defined as this,
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.out = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.out(x)
return x
Is softmax applied here? In my understanding, things should be like,
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.relu = torch.nn.ReLu(inplace=True)
self.out = torch.nn.Linear(n_hidden, n_output) # output layer
self.softmax = torch.nn.Softmax(dim=n_output)
def forward(self, x):
x = self.hidden(x) # activation function for hidden layer
x = self.relu(x)
x = self.out(x)
x = self.softmax(x)
return x
I understand that F.relu(self.relu(x)) is also applying relu, but the first block of code doesn't apply softmax, right?
Latching on to what #jodag was already saying in his comment, and extending it a bit to form a full answer:
No, PyTorch does not automatically apply softmax, and you can at any point apply torch.nn.Softmax() as you want. But, softmax has some issues with numerical stability, which we want to avoid as much as we can. One solution is to use log-softmax, but this tends to be slower than a direct computation.
Especially when we are using Negative Log Likelihood as a loss function (in PyTorch, this is torch.nn.NLLLoss, we can utilize the fact that the derivative of (log-)softmax+NLLL is actually mathematically quite nice and simple, which is why it makes sense to combine the both into a single function/element. The result is then torch.nn.CrossEntropyLoss. Again, note that this only applies directly to the last layer of your network, any other computation is not affected by any of this.
I am working with the keras-capsnet implementation of Capsule Networks, and am trying to apply the same layer to 30 images per sample.
The weights are initialized within the init and build arguments for the class, shown below. I have successfully shared the weights between the primary routing layers which just use tf.layers.conv2d, where I can assign them the same name and set reuse = True.
Does anyone know how to initialize weights in a Keras custom layer so that they may be reused? I am much more familiar with the tensorflow API than with the Keras one!
def __init__(self, num_capsule, dim_capsule, routings=3,
kernel_initializer='glorot_uniform',
**kwargs):
super(CapsuleLayer, self).__init__(**kwargs)
self.num_capsule = num_capsule
self.dim_capsule = dim_capsule
self.routings = routings
self.kernel_initializer = initializers.get(kernel_initializer)
def build(self, input_shape):
assert len(input_shape) >= 3, "The input Tensor should have shape=[None, input_num_capsule, input_dim_capsule]"
self.input_num_capsule = input_shape[1]
self.input_dim_capsule = input_shape[2]
# Weights are initialized here each time the layer is called
self.W = self.add_weight(shape=[self.num_capsule, self.input_num_capsule,
self.dim_capsule, self.input_dim_capsule],
initializer=self.kernel_initializer,
name='W')
self.built = True
The answer was simple. Set up a layer without calling it on input, and then use that built layer to call the data individually.