In TF 1.x, it was possible to build layers with custom variables. Here's an example:
import numpy as np
import tensorflow as tf
def make_custom_getter(custom_variables):
def custom_getter(getter, name, **kwargs):
if name in custom_variables:
variable = custom_variables[name]
else:
variable = getter(name, **kwargs)
return variable
return custom_getter
# Make a custom getter for the dense layer variables.
# Note: custom variables can result from arbitrary computation;
# for the sake of this example, we make them just constant tensors.
custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
custom_getter = make_custom_getter(custom_variables)
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with tf.variable_scope("model", custom_getter=custom_getter):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
The printed variables of the constructed dense layer will be custom tensors we specified in the custom_variables dict:
[<tf.Tensor 'custom_kernel:0' shape=(784, 64) dtype=float32>, <tf.Tensor 'custom_bias:0' shape=(64,) dtype=float32>]
This allows us to create layers/models that use provided tensors in custom_variables directly as their weights, so that we could further differentiate the output of the layers/models with respect to any tensors that custom_variables may depend on (particularly useful for implementing functionality in modulating sub-nets, parameter generation, meta-learning, etc.).
Variable scopes used to make it easy to nest all off graph-building inside scopes with custom getters and build models on top of the provided tensors as their parameters. Since sessions and variable scopes are no longer advisable in TF 2.0 (and all of that low-level stuff is moved to tf.compat.v1), what would be the best practice to implement the above using Keras and TF 2.0?
(Related issue on GitHub.)
Answer based on the comment below
Given you have:
kernel = createTheKernelVarBasedOnWhatYouWant() #shape (784, 64)
bias = createTheBiasVarBasedOnWhatYouWant() #shape (64,)
Make a simple function copying the code from Dense:
def custom_dense(x):
inputs, kernel, bias = x
outputs = K.dot(inputs, kernel)
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
Use the function in a Lambda layer:
layer = Lambda(custom_dense)
hiddens = layer([x, kernel, bias])
Warning: kernel and bias must be produced from a Keras layer, or come from an kernel = Input(tensor=the_kernel_var) and bias = Input(tensor=bias_var)
If the warning above is bad for you, you can always use kernel and bias "from outside", like:
def custom_dense(inputs):
outputs = K.dot(inputs, kernel) #where kernel is not part of the arguments anymore
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
layer = Lambda(custom_dense)
hiddens = layer(x)
This last option makes it a bit more complicated to save/load models.
Old answer
You should probably use a Keras Dense layer and set its weights in a standard way:
layer = tf.keras.layers.Dense(64, name='the_layer')
layer.set_weights([np.random.rand(784, 64), np.random.rand(64)])
If you need that these weights are not trainable, before compiling the keras model you set:
model.get_layer('the_layer').trainable=False
If you want direct access to the variables as tensors, they are:
kernel = layer.kernel
bias = layer.bias
There are plenty of other options, but that depends on your exact intention, which is not clear in your question.
Below is a general-purpose solution that works with arbitrary Keras models in TF2.
First, we need to define an auxiliary function canonical_variable_name and a context manager custom_make_variable with the following signatures (see implementation in meta-blocks library).
def canonical_variable_name(variable_name: str, outer_scope: str):
"""Returns the canonical variable name: `outer_scope/.../name`."""
# ...
#contextlib.contextmanager
def custom_make_variable(
canonical_custom_variables: Dict[str, tf.Tensor], outer_scope: str
):
"""A context manager that overrides `make_variable` with a custom function.
When building layers, Keras uses `make_variable` function to create weights
(kernels and biases for each layer). This function wraps `make_variable` with
a closure that infers the canonical name of the variable being created (of the
form `outer_scope/.../var_name`) and looks it up in the `custom_variables` dict
that maps canonical names to tensors. The function adheres the following logic:
* If there is a match, it does a few checks (shape, dtype, etc.) and returns
the found tensor instead of creating a new variable.
* If there is a match but checks fail, it throws an exception.
* If there are no matching `custom_variables`, it calls the original
`make_variable` utility function and returns a newly created variable.
"""
# ...
Using these functions, we can create arbitrary Keras models with custom tensors used as variables:
import numpy as np
import tensorflow as tf
canonical_custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with custom_make_variable(canonical_custom_variables, outer_scope="model"):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
Not entirely sure I understand your question correctly, but it seems to me that it should be possible to do what you want with a combination of custom layers and keras functional api.
Custom layers allow you to build any layer you want in a way that is compatible with Keras, e.g.:
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs],
initializer='normal')
self.bias = self.add_weight("bias",
shape=[self.num_outputs,],
initializer='normal')
def call(self, inputs):
return tf.matmul(inputs, self.kernel) + self.bias
and the functional api allows you to access the outputs of said layers and re-use them:
inputs = keras.Input(shape=(784,), name='img')
x1 = MyDenseLayer(64, activation='relu')(inputs)
x2 = MyDenseLayer(64, activation='relu')(x1)
outputs = MyDenseLayer(10, activation='softmax')(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
Here x1 and x2 can be connected to other subnets.
Related
So, I'm trying to create a custom layer in TensorFlow 2.4.1, using a function for a neuron I defined.
# NOTE: this is not the actual neuron I want to use,
# it's just a simple example.
def neuron(x, W, b):
return W # x + b
Where the W and b it gets would be of shape (1, x.shape[0]) and (1, 1) respectively. This means this is like a single neuron in a dense layer. So, I want to create a dense layer by stacking however many of these individual neurons I want.
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__() # handles standard arguments
self.n_units = n_units # Number of neurons to be in the layer
def build(self, input_shape):
# Create weights and biases for all neurons individually
for i in range(self.n_units):
# Create weights and bias for ith neuron
...
def call(self, inputs):
# Compute outputs for all neurons
...
# Concatenate outputs to create layer output
...
return output
How can I create a layer as a stack of individual neurons (also in a way it can train)? I have roughly outlined the idea for the layer in the above code, but the answer doesn't need to follow that as a blueprint.
Finally; yes I'm aware that to create a dense layer you don't need to go about it in such a roundabout way (you just need 1 weight and bias matrix), but in my actual use case, this is neccessary. Thanks!
So, person who asked this question here, I have found a way to do it, by dynamically creating variables and operations.
First, let's re-define the neuron to use tensorflow operations:
def neuron(x, W, b):
return tf.add(tf.matmul(W, x), b)
Then, let's create the layer (this uses the blueprint layed out in the question):
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__()
self.n_units = n_units
def build(self, input_shape):
for i in range(self.n_units):
exec(f'self.kernel_{i} = self.add_weight("kernel_{i}", shape=[1, int(input_shape[0])])')
exec(f'self.bias_{i} = self.add_weight("bias_{i}", shape=[1, 1])')
def call(self, inputs):
for i in range(self.n_units):
exec(f'out_{i} = neuron(inputs, self.kernel_{i}, self.bias_{i})')
return eval(f'tf.concat([{", ".join([ f"out_{i}" for i in range(self.n_units) ])}], axis=0)')
As you can see, we're using exec and eval to dynamically create variables and perform operations.
That's it! We can perform a few checks to see if TensorFlow could use this:
# Check to see if it outputs the correct thing
layer = Layer(5) # With 5 neurons, it should return a (5, 6)
print(layer(tf.zeros([10, 6])))
# Check to see if it has the right trainable parameters
print(layer.trainable_variables)
# Check to see if TensorFlow can find the gradients
layer = Layer(5)
x = tf.ones([10, 6])
with tf.GradientTape() as tape:
z = layer(x)
print(f"Parameter: {layer.trainable_variables[2]}")
print(f"Gradient: {tape.gradient(z, layer.trainable_variables[2])}")
This solution works, but it's not very elegant... I wonder if there's a better way to do it, some magical TF method that can map the neuron to create a layer, I'm too inexperienced to know for the moment. So, please answer if you have a (better) answer, I'll be happy to accept it :)
I'm trying to create a custom Keras Layer that is composed of a variable number of other Keras layers. I want access to both the grouped layers treated as a single layer with an input and output as well as access to the individual sublayers.
I'm just wondering, what is the "proper" method for performing this form of layer abstraction in Keras?
I'm not sure if this should be done through Keras's Layer API, or if I can work with name scopes and wrapping layers in a function and then select all layers under some name scope. (e.g. model.get_layer('custom1/*'), but I don't think that works, and name_scope doesn't appear to apply to layer names)
One of the other issues with using Keras Layers for this is that you need to assign the child variables as direct attributes (I assume it makes use of the descriptor API) to the class in order for the trainable weights to be added to the model (which is messier when we don't have a fixed number of layers, meaning we'd need to hack something together with setattr(), which, ew).
Btw, this isn't my actual code, but is a barebones simplification of what I'm trying to accomplish.
class CustomLayer(Layer):
def __init__(self, *a, n_layers=2, **kw):
self.n_layers = n_layers
super().__init__(*a, **kw)
def call(self, x):
for i in range(self.n_layers):
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x)
return x
input_shape, n_classes = (None, 100), 10
x = input = Input(input_shape)
x = CustomLayer()(x)
x = CustomLayer()(x)
x = Dense(n_classes, activation='softmax')(x)
model = Model([input], [x])
'''
So this works fine
'''
print([l.name for l in model.layers])
print(model.layers[1].name)
print(model.layers[1].output)
# Output: ['input', 'custom_1', 'custom_2', 'dense']
# Output: 'custom_1'
# Output: the custom_1 output tensor
'''
But I'm not sure how to do something to the effect of this ...
'''
print(model.layers[1].layers[0].name)
print(model.layers[1].layers[0].output)
# Output: custom_1/dense
# Output: custom_1/dense output tensor
I am working with the keras-capsnet implementation of Capsule Networks, and am trying to apply the same layer to 30 images per sample.
The weights are initialized within the init and build arguments for the class, shown below. I have successfully shared the weights between the primary routing layers which just use tf.layers.conv2d, where I can assign them the same name and set reuse = True.
Does anyone know how to initialize weights in a Keras custom layer so that they may be reused? I am much more familiar with the tensorflow API than with the Keras one!
def __init__(self, num_capsule, dim_capsule, routings=3,
kernel_initializer='glorot_uniform',
**kwargs):
super(CapsuleLayer, self).__init__(**kwargs)
self.num_capsule = num_capsule
self.dim_capsule = dim_capsule
self.routings = routings
self.kernel_initializer = initializers.get(kernel_initializer)
def build(self, input_shape):
assert len(input_shape) >= 3, "The input Tensor should have shape=[None, input_num_capsule, input_dim_capsule]"
self.input_num_capsule = input_shape[1]
self.input_dim_capsule = input_shape[2]
# Weights are initialized here each time the layer is called
self.W = self.add_weight(shape=[self.num_capsule, self.input_num_capsule,
self.dim_capsule, self.input_dim_capsule],
initializer=self.kernel_initializer,
name='W')
self.built = True
The answer was simple. Set up a layer without calling it on input, and then use that built layer to call the data individually.
What is the difference between tf.layers.dense and tf.nn.xw_plus_b in TF?
What is the default activation used in tf.layers.dense when "activation" argument is passed as None?
tf.nn.xw_plus_b is a low-level operation that only computes x*W+b and requires existing variables.
tf.layers.dense is a high-level "layer" that creates variables, apply activation can set constrains and apply regularization.
According to the documentation default activation is linear (no activation).
activation: Activation function (callable). Set it to None to maintain
a linear activation.
Update
In Tensorflow 1.12 Dense layer inherits keras.layers.Dense (code):
#tf_export('layers.Dense')
class Dense(keras_layers.Dense, base.Layer):
Keras implementation of this layer does the following (code):
def call(self, inputs):
inputs = ops.convert_to_tensor(inputs, dtype=self.dtype)
rank = common_shapes.rank(inputs)
if rank > 2:
# Broadcasting is required for the inputs.
outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]])
# Reshape the output back to the original ndim of the input.
if not context.executing_eagerly():
shape = inputs.get_shape().as_list()
output_shape = shape[:-1] + [self.units]
outputs.set_shape(output_shape)
else:
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
if self.use_bias:
outputs = nn.bias_add(outputs, self.bias)
if self.activation is not None:
return self.activation(outputs) # pylint: disable=not-callable
return outputs
So it is not implemented using tf.nn.xw_plus_b but uses two separate operations.
To answer your question: Dense layer without activation, constraints and regularization should do the same as tf.nn.xw_plus_b.
I am trying to build a custom variational autoencoder network, where in I'm initializing the decoder weights using the transpose of the weights from the encoder layer, I couldn't find something native to tf.contrib.layers.fully_connected so I used tf.assign instead, here's my code for the layers:
def inference_network(inputs, hidden_units, n_outputs):
"""Layer definition for the encoder layer."""
net = inputs
with tf.variable_scope('inference_network', reuse=tf.AUTO_REUSE):
for layer_idx, hidden_dim in enumerate(hidden_units):
net = layers.fully_connected(
net,
num_outputs=hidden_dim,
weights_regularizer=layers.l2_regularizer(training_params.weight_decay),
scope='inf_layer_{}'.format(layer_idx))
add_layer_summary(net)
z_mean = layers.fully_connected(net, num_outputs=n_outputs, activation_fn=None)
z_log_sigma = layers.fully_connected(
net, num_outputs=n_outputs, activation_fn=None)
return z_mean, z_log_sigma
def generation_network(inputs, decoder_units, n_x):
"""Define the decoder network."""
net = inputs # inputs here is the latent representation.
with tf.variable_scope("generation_network", reuse=tf.AUTO_REUSE):
assert(len(decoder_units) >= 2)
# First layer does not have a regularizer
net = layers.fully_connected(
net,
decoder_units[0],
scope="gen_layer_0",
)
for idx, decoder_unit in enumerate([decoder_units[1], n_x], 1):
net = layers.fully_connected(
net,
decoder_unit,
scope="gen_layer_{}".format(idx),
weights_regularizer=layers.l2_regularizer(training_params.weight_decay)
)
# Assign the transpose of weights to the respective layers
tf.assign(tf.get_variable("generation_network/gen_layer_1/weights"),
tf.transpose(tf.get_variable("inference_network/inf_layer_1/weights")))
tf.assign(tf.get_variable("generation_network/gen_layer_1/bias"),
tf.get_variable("generation_network/inf_layer_0/bias"))
tf.assign(tf.get_variable("generation_network/gen_layer_2/weights"),
tf.transpose(tf.get_variable("inference_network/inf_layer_0/weights")))
return net # x_recon
It is wrapped using this tf.slim arg_scope:
def _autoencoder_arg_scope(activation_fn):
"""Create an argument scope for the network based on its parameters."""
with slim.arg_scope([layers.fully_connected],
weights_initializer=layers.xavier_initializer(),
biases_initializer=tf.initializers.constant(0.0),
activation_fn=activation_fn) as arg_sc:
return arg_sc
However I'm getting the error: ValueError: Trying to share variable VarAutoEnc/generation_network/gen_layer_1/weights, but specified dtype float32 and found dtype float64_ref.
I have narrowed this down to the get_variablecall, but I don't know why it's failing.
If there is a way where you can initialize a tf.contrib.layers.fully_connected from another fully connected layer without a tf.assign operation, that solution is fine with me.
I can't reproduce your error. Here is a minimalistic runnable example that does the same as your code:
import tensorflow as tf
with tf.contrib.slim.arg_scope([tf.contrib.layers.fully_connected],
weights_initializer=tf.contrib.layers.xavier_initializer(),
biases_initializer=tf.initializers.constant(0.0)):
i = tf.placeholder(tf.float32, [1, 30])
with tf.variable_scope("inference_network", reuse=tf.AUTO_REUSE):
tf.contrib.layers.fully_connected(i, 30, scope="gen_layer_0")
with tf.variable_scope("generation_network", reuse=tf.AUTO_REUSE):
tf.contrib.layers.fully_connected(i, 30, scope="gen_layer_0",
weights_regularizer=tf.contrib.layers.l2_regularizer(0.01))
with tf.variable_scope("", reuse=tf.AUTO_REUSE):
tf.assign(tf.get_variable("generation_network/gen_layer_0/weights"),
tf.transpose(tf.get_variable("inference_network/gen_layer_0/weights")))
The code runs without a ValueError. If you get a ValueError running this, then it is probably a bug that has been fixed in a later tensorflow version (I tested on 1.9). Otherwise the error is part of your code that you don't show in the question.
By the way, assign will return an op that will perform the assignment once the returned op is run in a session. So you will want to return the output of all assign calls in the generation_network function. You can bundle all assign ops into one using tf.group.