I am creating a custom layer with weights that need to be multiplied by element-wise before activation. I can get it to work when the output and input is the same shape. The problem occurs when I have a first order array as input with a second order array as output. tensorflow.multiply supports broadcasting, but when I try to use it in Layer.call(x, self.kernel)
to multiply x by the self.kernel Variable it complains that they are different shapes saying:
ValueError: Dimensions must be equal, but are 4 and 3 for 'my_layer_1/Mul' (op: 'Mul') with input shapes: [?,4], [4,3].
here is my code:
from keras import backend as K
from keras.engine.topology import Layer
import tensorflow as tf
from keras.models import Sequential
import numpy as np
class MyLayer(Layer):
def __init__(self, output_dims, **kwargs):
self.output_dims = output_dims
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=self.output_dims,
initializer='ones',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
#multiply wont work here?
return K.tf.multiply(x, self.kernel)
def compute_output_shape(self, input_shape):
return (self.output_dims)
mInput = np.array([[1,2,3,4]])
inShape = (4,)
net = Sequential()
outShape = (4,3)
l1 = MyLayer(outShape, input_shape= inShape)
net.add(l1)
net.compile(loss='mean_absolute_error', optimizer='adam', metrics=['accuracy'])
p = net.predict(x=mInput, batch_size=1)
print(p)
Edit:
Given input shape (4,) and output shape (4,3) the weight matrix should be the same shape as the output and initialized with ones. So in the above code the input is [1,2,3,4], the weight matrix should be [[1,1,1,1],[1,1,1,1],[1,1,1,1]] and the output should look like [[1,2,3,4],[1,2,3,4],[1,2,3,4]]
Before multiplying, you need to repeat the elements to increase the shape.
You can use K.repeat_elements for that. (import keras.backend as K)
class MyLayer(Layer):
#there are some difficulties for different types of shapes
#let's use a 'repeat_count' instead, increasing only one dimension
def __init__(self, repeat_count,**kwargs):
self.repeat_count = repeat_count
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
#first, let's get the output_shape
output_shape = self.compute_output_shape(input_shape)
weight_shape = (1,) + output_shape[1:] #replace the batch size by 1
self.kernel = self.add_weight(name='kernel',
shape=weight_shape,
initializer='ones',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this somewhere!
#here, we need to repeat the elements before multiplying
def call(self, x):
if self.repeat_count > 1:
#we add the extra dimension:
x = K.expand_dims(x, axis=1)
#we replicate the elements
x = K.repeat_elements(x, rep=self.repeat_count, axis=1)
#multiply
return x * self.kernel
#make sure we comput the ouptut shape according to what we did in "call"
def compute_output_shape(self, input_shape):
if self.repeat_count > 1:
return (input_shape[0],self.repeat_count) + input_shape[1:]
else:
return input_shape
here is another solution that is based on the answer by Daniel Möller, but uses tf.multiply like the original code.
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
output_shape = self.compute_output_shape(input_shape)
self.kernel = self.add_weight(name='kernel',
shape=(1,) + output_shape[1:],
initializer='ones',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
return K.tf.multiply(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0],self.output_dim)+input_shape[1:]
Related
I am more or less new to the field of neural networks and python, just a couple of months of work.
I am interested in this case developed in matlab https://it.mathworks.com/help/images/image-processing-operator-approximation-using-deep-learning.html
However, I would like to try to implement this using Keras.
I have three questions regarding the two custom layers this net uses, whose codes are found here:
https://github.com/catsymptote/Salsa_cryptanalysis/blob/master/matlab/workspace/adaptiveNormalizationMu.m
https://github.com/catsymptote/Salsa_cryptanalysis/blob/master/matlab/workspace/adaptiveNormalizationLambda.m
I have not really/deeply understood what these layers actually do
Is my temptative implementation of adaptiveNormalizationMu correct on Keras? Based on what I
understood, this layer just multiplies the output of the BN layer for an adaptive scale
parameter, mu. I wrote the code following the example reported here
https://www.tutorialspoint.com/keras/keras_customized_layer.htm
I am struggling with the variables input_shape and output_shape of the code I wrote following the tutorial.
Considering batch size BS, images with dimensions dim1 and dim2, 1 channel, I would love the input to have dimension (BS, dim1, dim2, 1), and output to have the same, since it is a mere scaling. How to be coherent with the code written in matlab in the mathworks example, where the only input argument is numberOfFilters? I don't know where to introduce this parameter in the code I am trying to write. I would love not to fix the input dimension, so that I can re-use this layer at different depths of the network, but correctly choose the "depht" (like the number of filters for a standard conv2D layer)
Thank you so much for the help
F.
###
from keras import backend as K
from keras.layers import Layer
class MyAdaptiveNormalizationMu(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyAdaptiveNormalizationMu, self).__init__(**kwargs)
def build(self, input_shape):
self.mu = self.add_weight(name = 'mu',
shape = (input_shape[1], self.output_dim),
initializer = 'random_normal', trainable = True)
super(MyAdaptiveNormalizationMu, self).build(input_shape)
def call(self, input_data):
return input_data * self.mu
def compute_output_shape(self, input_shape): return (input_shape[0], self.output_dim)
from keras.models import Sequential
batch_size = 16
dim1 = 8
dim2 = 8
channels = 1
input_shape = (batch_size, dim1, dim2, channels)
output_shape = input_shape
model = Sequential()
model.add(MyAdaptiveNormalizationMu(output_dim=?, input_shape=?))
EDIT: I provide a second realization attempt, which seems to compile. It should do what I think adaptiveNormalizationLambda and adaptiveNormalizationMu do: multiply the input for a learnable weight matrix. However, i am still unsure if the layer is doing what it is supposed to, and if I got correctly the sense of those layers.
from keras.layers import Layer, Input
from keras.models import Model
import numpy as np
class Multiply_Weights(Layer):
def __init__(self, **kwargs):
super(Multiply_Weights, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], input_shape[2]),
initializer='RandomNormal',
trainable=True)
super(Multiply_Weights, self).build(input_shape)
def call(self, x, **kwargs):
# Implicit broadcasting occurs here.
# Shape x: (BATCH_SIZE, N, M)
# Shape kernel: (N, M)
# Shape output: (BATCH_SIZE, N, M)
return x * self.kernel
def compute_output_shape(self, input_shape):
return input_shape
N = 3
M = 4
BATCH_SIZE = 1
a = Input(shape=(N, M))
layer = Multiply_Weights()(a)
model = Model(inputs=a,
outputs=layer)
a = np.ones(shape=(BATCH_SIZE, N, M))
pred = model.predict(a)
print(pred)
I'm trying to build a (custom) trainable matrix-multiplication layer in TensorFlow, but things aren't working out... More precisely, my model should look like this:
x -> A(x) x
where A(x) is a feed-forward network with values in the n x n matrix (and thus depends on the input x) and A(x) is matrix by vector multiplication.
Here's what I've coded-up:
class custom_layer(tf.keras.layers.Layer):
def __init__(self, units=16, input_dim=32):
super(custom_layer, self).__init__()
self.units = units
def build(self, input_shape):
self.Tw1 = self.add_weight(name='Weights_1 ',
shape=(input_shape[-1], input_shape[-1]),
initializer='GlorotUniform',
trainable=True)
self.Tw2 = self.add_weight(name='Weights_2 ',
shape=(input_shape[-1], (self.units)**2),
initializer='GlorotUniform',
trainable=True)
self.Tb = self.add_weight(name='basies',
shape=(input_shape[-1],),
initializer='GlorotUniform',#Previously 'ones'
trainable=True)
def call(self, input):
# Build Vector-Valued Feed-Forward Network
ffNN = tf.matmul(input, self.Tw1) + self.Tb
ffNN = tf.nn.relu(ffNN)
ffNN = tf.matmul(ffNN, self.Tw2)
# Map to Matrix
ffNN = tf.reshape(ffNN, [self.units,self.units])
# Multiply Matrix-Valued function with input data
x_out = tf.matmul(ffNN,input)
# Return Output
return x_out
Now I build the model:
input_layer = tf.keras.Input(shape=[2])
output_layer = custom_layer(2)(input_layer)
model = tf.keras.Model(inputs=[input_layer], outputs=[output_layer])
# Compile Model
#----------------#
# Define Optimizer
optimizer_on = tf.keras.optimizers.SGD(learning_rate=10**(-1))
# Compile
model.compile(loss = 'mse',
optimizer = optimizer_on,
metrics = ['mse'])
# Fit Model
#----------------#
model.fit(data_x, data_y, epochs=(10**1), verbose=0)
and then I get this error message:
InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 4
[[node model_62/reconfiguration_unit_70/Reshape (defined at <ipython-input-176-0b494fa3fc75>:46) ]] [Op:__inference_distributed_function_175181]
Errors may have originated from an input operation.
Input Source operations connected to node model_62/reconfiguration_unit_70/Reshape:
model_62/reconfiguration_unit_70/MatMul_1 (defined at <ipython-input-176-0b494fa3fc75>:41)
Function call stack:
distributed_function
Thoughts:
It seems like something is wrong with the network dimensions but I can't figure what/how to repair it...
I am attempting to create a custom, Dense layer in Keras to tie weights in an Autoencoder. I have tried following an example for doing this in convolutional layers here, but it seemed like some of the steps did not apply for the Dense layer (also, the code is from over two years ago).
By tying weights, I want the decode layer to use the transposed weight matrix of the encode layer. This approach is also taken in this article (page 5). Below is the relevant quote from the article:
Here, we choose both the encoding and decoding activation function to be sigmoid function and only consider the
tied weights case, in which W ′ = WT
(where WT
is the
transpose of W ) as most existing deep learning methods
do.
In the quote above, W is the weight matrix in the encode layer and W' (equal to the transpose of W) is the weight matrix in the decode layer.
I did not change too much in the dense layer. I added a tied_to parameter to the constructor, which allows you to pass the layer you want to tie it to. The only other change was to the build function, the snippet for this is below:
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
Below is the __init__ method, the only change here was the addition of the tied_to parameter.
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Dense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
self.tied_to = tied_to
The call function was not edited, but it is below for reference.
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
Above, I added a conditional to check if the tied_to parameter was set, and if so, set the layer's kernel to the transpose of the tied_to layer's kernel.
Below is the code used to instantiate the model. It is done using Keras's sequential API and DenseTied is my custom layer.
# encoder
#
encoded1 = Dense(2, activation="sigmoid")
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)
# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)
After training the model, below is the model summary and weights.
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_7 (Dense) (None, 2) 10
_________________________________________________________________
dense_tied_7 (DenseTied) (None, 4) 12
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________
autoencoder.layers[0].get_weights()[0]
array([[-2.122982 , 0.43029135],
[-2.1772149 , 0.16689162],
[-1.0465667 , 0.9828905 ],
[-0.6830663 , 0.0512633 ]], dtype=float32)
autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 , 0.14814234, 0.26533198],
[ 0.04387903, -0.22077179, 0.517225 , -0.21583867]],
dtype=float32)
As you can see, the weights reported by autoencoder.get_weights() do not seem to be tied.
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer? I was able to run the code, and it is currently training. It seems that the loss function is decreasing reasonably as well. My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.
Thanks Mikhail Berlinkov,
One imporant remark: This code runs under Keras, but not in eager mode in TF2.0. It runs, but it trains badly.
The critical point is, how the object stores the transposed weight.
self.kernel = K.transpose(self.tied_to.kernel)
In non eager mode this creates a graph the right way. In eager mode this fails, probably because the value of a transposed variable is stored at build (== the first call), and then used at subsequent calls.
However: the solution is to store the variable unaltered at build,
and put the transpose operation into the call method.
I spent several days to figure this out, and I am happy if this helps anyone.
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer?
Yes, it's valid.
My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.
It actually ties them in a computation graph, you can check in printing model.summary() that there's just one copy of these trainable weights. Also, after training your model you can check weights of corresponding layers with model.get_weights(). When the model is build there're no weights yet actually, just placeholders for them.
random.seed(1)
class DenseTied(Layer):
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
self.tied_to = tied_to
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super().__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.built = True
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) >= 2
assert input_shape[-1] == self.units
output_shape = list(input_shape)
output_shape[-1] = self.units
return tuple(output_shape)
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)
# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")
print(autoencoder.summary())
autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))
print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])
I am trying to concatenate the hidden units. For example, I have 3 units, h1,h2,h3 then I want the new layer to have [h1;h1],[h1;h2],[h1;h3],[h2;h1]....
So, I have tried:
class MyLayer(Layer):
def __init__(self,W_regularizer=None,W_constraint=None, **kwargs):
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.W_constraint = constraints.get(W_constraint)
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
# Create a trainable weight variable for this layer.
self.W = self.add_weight((input_shape[-1],input_shape[-1]),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint,
trainable=True)
super(MyLayer, self).build(input_shape)
def call(self, x,input_shape):
conc=K.concatenate([x[:, :-1, :], x[:, 1:, :]],axis=1)# help needed here
uit = K.dot(conc, self.W)# W has input_shape[-1],input_shape[-1]
return uit
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[1],input_shape[-1]
I am not sure what should I return for the second argument of my output shape.
from keras.layers import Input, Lambda, LSTM
from keras.models import Model
import keras.backend as K
from keras.layers import Lambda
lstm=LSTM(64, return_sequences=True)(input)
something=MyLayer()(lstm)
You could implement the concatenation operation that you described by leveraging itertools.product in order to compute the cartesian product of the temporal dimension's indices. The call method could be coded as follows:
def call(self, x):
prod = product(range(nb_timesteps), repeat=2)
conc_prod = []
for i, j in prod:
c = K.concatenate([x[:, i, :], x[:, j, :]], axis=-1) # Shape=(batch_size, 2*nb_features)
c_expanded = c[:, None, :] # Shape=(batch_size, 1, 2*nb_features)
conc_prod.append(c_expanded)
conc = K.concatenate(conc_prod, axis=1) # Shape=(batch_size, nb_timesteps**2, 2*nb_features)
uit = K.dot(conc, self.W) # W has shape 2*input_shape[-1], input_shape[-1]
return uit # Shape=(batch_size, nb_timesteps**2, nb_features)
In the example that you provided, nb_timesteps would be 3. Note also that the shape of the weights should be (2*input_shape[-1], input_shape[-1]) for the dot product to be valid.
Disclaimer: I am not sure what you want to achieve.
I have a simple model with one custom layer which works fine in the normal case.
When I switched to eager execution via tf.enable_eager_execution(), I got stuck on a weird error.
Here is the code so far:
import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer, Input
from tensorflow.keras.losses import kullback_leibler_divergence
tf.enable_eager_execution()
class ClusteringLayer(Layer):
def __init__(self, output_dim, input_dim=None, alpha=1.0, **kwargs):
self.output_dim = output_dim
self.input_dim = input_dim
self.alpha = alpha
super(ClusteringLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.W = self.add_weight(name='kernel', shape=(self.output_dim, input_shape[1]), initializer='uniform', trainable=True)
super(ClusteringLayer, self).build(input_shape)
def call(self, x, mask=None):
q = 1.0/(1.0 + K.sqrt(K.sum(K.square(K.expand_dims(x, 1) - self.W), axis=2))**2 /self.alpha)
q = q**((self.alpha+1.0)/2.0)
q = K.transpose(K.transpose(q)/K.sum(q, axis=1))
return q
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
def clustering_loss(y_true, y_pred):
a = K.square(y_pred) / K.sum(y_pred, axis=0)
p = K.transpose(K.transpose(a) / K.sum(a, axis=1))
loss = kullback_leibler_divergence(p, y_pred)
return loss
input1 = Input(shape=(10,), name="input")
out = ClusteringLayer(output_dim = 5, name='clustering')(input1)
model = Model(inputs=input1, outputs=out)
model.compile(optimizer=tf.train.AdamOptimizer(1e-3), loss={'clustering' : clustering_loss})
X = np.random.random((20, 10)).astype(np.float32)
Y = np.random.random((20, 5)).astype(np.float32)
model.fit(x={'input' : X}, y={'clustering' : Y}, batch_size=1, epochs=10)
The error message is related to the "fit" function:
AssertionError: Could not compute output DeferredTensor('None', shape=(5,), dtype=float32)
When I tried to check the output of my custom layer, I was surprised to find that this layer is generating two outputs. The first one is ambiguous and undesired.
Code:
input1 = Input(shape=(10,), name="input")
layer = ClusteringLayer(output_dim = 5, name='clustering')
out = layer(input1)
print(out)
Output:
[<DeferredTensor 'None' shape=(?,) dtype=float32>, <DeferredTensor 'None' shape=(5,) dtype=float32>]
Even when I changed my custom layer with the simplistic custom layer from the Keras documentation, I got the same error:
AssertionError: Could not compute output DeferredTensor('None', shape=(5,), dtype=float32)
I asked the question in GitHub since it seems more like a bug.
They have recommended using a workaround until they fix the internal problem.
I m quoting from here :github
As a workaround, you could wrap the output shape returned by
compute_output_shape in a TensorShape. For example:
TensorShape((input_shape[0], self.output_dim)). Let me know if this
works.