In the code below, self.gen is instantiated while using the make_gen_block function which is only defined later outside the __init__ attribute.
How is this possible?
Shouldn't make_gen_block be defined before using it to instantiate self.gen so when __init__ is called, make_gen_block can be found within __init__ scope?
Thanks
class Generator(nn.Module):
'''
Generator Class
Values:
z_dim: the dimension of the noise vector, a scalar
im_chan: the number of channels in the images, fitted for the dataset used, a scalar
(MNIST is black-and-white, so 1 channel is your default)
hidden_dim: the inner dimension, a scalar
'''
def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
super(Generator, self).__init__()
self.z_dim = z_dim
# Build the neural network
self.gen = nn.Sequential(
self.make_gen_block(z_dim, hidden_dim * 4),
self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
self.make_gen_block(hidden_dim * 2, hidden_dim),
self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True))
def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):
'''
Function to return a sequence of operations corresponding to a generator block of DCGAN,
corresponding to a transposed convolution, a batchnorm (except for in the last layer), and an activation.
Parameters:
input_channels: how many channels the input feature representation has
output_channels: how many channels the output feature representation should have
kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
stride: the stride of the convolution
final_layer: a boolean, true if it is the final layer and false otherwise
(affects activation and batchnorm)
'''
# Steps:
# 1) Do a transposed convolution using the given parameters.
# 2) Do a batchnorm, except for the last layer.
# 3) Follow each batchnorm with a ReLU activation.
# 4) If its the final layer, use a Tanh activation after the deconvolution.
# Build the neural block
if not final_layer:
return nn.Sequential(
#### START CODE HERE ####
nn.ConvTranspose2d(input_channels, output_channels,kernel_size,stride),
nn.BatchNorm2d(output_channels),
nn.ReLU())
#### END CODE HERE ####
else: # Final Layer
return nn.Sequential(
#### START CODE HERE ####
nn.ConvTranspose2d(input_channels, output_channels,kernel_size,stride),
#### END CODE HERE ####
nn.Tanh())
def unsqueeze_noise(self, noise):
'''
Function for completing a forward pass of the generator: Given a noise tensor,
returns a copy of that noise with width and height = 1 and channels = z_dim.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
'''
return noise.view(len(noise), self.z_dim, 1, 1)
def forward(self, noise):
'''
Function for completing a forward pass of the generator: Given a noise tensor,
returns generated images.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
'''
x = self.unsqueeze_noise(noise)
return self.gen(x)
def get_noise(n_samples, z_dim, device='cpu'):
'''
Function for creating noise vectors: Given the dimensions (n_samples, z_dim)
creates a tensor of that shape filled with random numbers from the normal distribution.
Parameters:
n_samples: the number of samples to generate, a scalar
z_dim: the dimension of the noise vector, a scalar
device: the device type
'''
return torch.randn(n_samples, z_dim, device=device)
Note that the call to make_gen_block is actually calling self.make_gen_block. The self is important. You can see in the signature of __init__ that self is injected as the first argument. The method can be referenced because self has been passed into the __init__ method (so it is within the scope), and self is of type Generator, which has a make_gen_block method defined for it. The self instance of the class has already been constructed prior to the calling of the __init__ method.
When the class is instantiated, the __new__ method is called first, which constructs the instance and then the __init__ method is called, with the new instance injected (as self) into the method.
Related
My code is a simple Sequential network like:
self.net = nn.Sequential(
nn.Linear(s_dim, 256),
nn.Softplus(),
nn.Linear(256, 256),
nn.Softplus(),
nn.Linear(256, a_dim)
)
I want to initialize the weights of every layer, and make them follow the Normal distribution with (0,1).
You can loop over the nn.Sequential and initialize each linear layer using normal (gaussian) distribution. The sample code are as follows:
# this method can be defined outside your model class
def weights_init(m):
if isinstance(m, nn.Linear):
torch.nn.init.normal_(m.weight, mean=0.0, std=1.0)
torch.nn.init.zero_(m.bias)
# define init method inside your model class
def init_with_normal(self):
self.net.apply(weights_init)
So, I'm trying to create a custom layer in TensorFlow 2.4.1, using a function for a neuron I defined.
# NOTE: this is not the actual neuron I want to use,
# it's just a simple example.
def neuron(x, W, b):
return W # x + b
Where the W and b it gets would be of shape (1, x.shape[0]) and (1, 1) respectively. This means this is like a single neuron in a dense layer. So, I want to create a dense layer by stacking however many of these individual neurons I want.
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__() # handles standard arguments
self.n_units = n_units # Number of neurons to be in the layer
def build(self, input_shape):
# Create weights and biases for all neurons individually
for i in range(self.n_units):
# Create weights and bias for ith neuron
...
def call(self, inputs):
# Compute outputs for all neurons
...
# Concatenate outputs to create layer output
...
return output
How can I create a layer as a stack of individual neurons (also in a way it can train)? I have roughly outlined the idea for the layer in the above code, but the answer doesn't need to follow that as a blueprint.
Finally; yes I'm aware that to create a dense layer you don't need to go about it in such a roundabout way (you just need 1 weight and bias matrix), but in my actual use case, this is neccessary. Thanks!
So, person who asked this question here, I have found a way to do it, by dynamically creating variables and operations.
First, let's re-define the neuron to use tensorflow operations:
def neuron(x, W, b):
return tf.add(tf.matmul(W, x), b)
Then, let's create the layer (this uses the blueprint layed out in the question):
class Layer(tf.keras.layers.Layer):
def __init__(self, n_units=5):
super(Layer, self).__init__()
self.n_units = n_units
def build(self, input_shape):
for i in range(self.n_units):
exec(f'self.kernel_{i} = self.add_weight("kernel_{i}", shape=[1, int(input_shape[0])])')
exec(f'self.bias_{i} = self.add_weight("bias_{i}", shape=[1, 1])')
def call(self, inputs):
for i in range(self.n_units):
exec(f'out_{i} = neuron(inputs, self.kernel_{i}, self.bias_{i})')
return eval(f'tf.concat([{", ".join([ f"out_{i}" for i in range(self.n_units) ])}], axis=0)')
As you can see, we're using exec and eval to dynamically create variables and perform operations.
That's it! We can perform a few checks to see if TensorFlow could use this:
# Check to see if it outputs the correct thing
layer = Layer(5) # With 5 neurons, it should return a (5, 6)
print(layer(tf.zeros([10, 6])))
# Check to see if it has the right trainable parameters
print(layer.trainable_variables)
# Check to see if TensorFlow can find the gradients
layer = Layer(5)
x = tf.ones([10, 6])
with tf.GradientTape() as tape:
z = layer(x)
print(f"Parameter: {layer.trainable_variables[2]}")
print(f"Gradient: {tape.gradient(z, layer.trainable_variables[2])}")
This solution works, but it's not very elegant... I wonder if there's a better way to do it, some magical TF method that can map the neuron to create a layer, I'm too inexperienced to know for the moment. So, please answer if you have a (better) answer, I'll be happy to accept it :)
I am currently trying to get custom keras layers to work, you can see a simplified version here:
class MyLayer(Layer):
def __init__(self, **kwargs):
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
print("input_shape: "+str(input_shape))
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape)
def call(self, x):
print("input tensor: "+str(x))
return K.dot(x, self.kernel)
inputs = Input(shape=(3,3), dtype='float', name='inputs')
results = MyLayer(input_shape=(3,3))(inputs)
The resulting console output is this:
input_shape: (None, 3, 3)
input tensor: Tensor("inputs:0", shape=(?, 3, 3), dtype=float32)
As you can see, the input_shape that the layer gets is not (3,3) as I specified but actually (None,3,3). Why is that?
The shape of the input tensor is also shaped ( ?, 3,3) which I thought to be a consequence of the weird input_shape ( None, 3,3). But the input tensor also has this shape with a third dimension if you replace super(MyLayer, self).build(input_shape) with super(MyLayer, self).build((3,3)). What is this mysterious third dimension keras automatically adds and why does it do that?
It is nothing mysterious, it is the batch dimension, since keras (and most DL frameworks), make computations on batches of data at a time, since this increases parallelism, and it maps directly to batches in Stochastic Gradient Descent.
Your layer needs to support computation on batches, so the batch dimension is always present in input and output data, and it is automatically added by keras to the input_shape.
This newly added dimension refers to the batch dimension, i.e., in your case, you will be passing batches of 3x3 dimensional tensors. This additional None dimension refers to the batch dimension, which is unknown during the creation of the graph.
If you have a look at the Input layer explanation in Core Layers webpage for Keras,
https://keras.io/layers/core/, you will see that the shape argument you are passing when creating the Input layer is defined as the following:
shape: A shape tuple (integer), not including the batch size. For instance, shape=(32,) indicates that the expected input will be batches of 32-dimensional vectors.
If you want to specify the batch size, you can do this instead:
inputs = Input(batch_shape=(batch_size,height,width,channel))
In TF 1.x, it was possible to build layers with custom variables. Here's an example:
import numpy as np
import tensorflow as tf
def make_custom_getter(custom_variables):
def custom_getter(getter, name, **kwargs):
if name in custom_variables:
variable = custom_variables[name]
else:
variable = getter(name, **kwargs)
return variable
return custom_getter
# Make a custom getter for the dense layer variables.
# Note: custom variables can result from arbitrary computation;
# for the sake of this example, we make them just constant tensors.
custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
custom_getter = make_custom_getter(custom_variables)
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with tf.variable_scope("model", custom_getter=custom_getter):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
The printed variables of the constructed dense layer will be custom tensors we specified in the custom_variables dict:
[<tf.Tensor 'custom_kernel:0' shape=(784, 64) dtype=float32>, <tf.Tensor 'custom_bias:0' shape=(64,) dtype=float32>]
This allows us to create layers/models that use provided tensors in custom_variables directly as their weights, so that we could further differentiate the output of the layers/models with respect to any tensors that custom_variables may depend on (particularly useful for implementing functionality in modulating sub-nets, parameter generation, meta-learning, etc.).
Variable scopes used to make it easy to nest all off graph-building inside scopes with custom getters and build models on top of the provided tensors as their parameters. Since sessions and variable scopes are no longer advisable in TF 2.0 (and all of that low-level stuff is moved to tf.compat.v1), what would be the best practice to implement the above using Keras and TF 2.0?
(Related issue on GitHub.)
Answer based on the comment below
Given you have:
kernel = createTheKernelVarBasedOnWhatYouWant() #shape (784, 64)
bias = createTheBiasVarBasedOnWhatYouWant() #shape (64,)
Make a simple function copying the code from Dense:
def custom_dense(x):
inputs, kernel, bias = x
outputs = K.dot(inputs, kernel)
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
Use the function in a Lambda layer:
layer = Lambda(custom_dense)
hiddens = layer([x, kernel, bias])
Warning: kernel and bias must be produced from a Keras layer, or come from an kernel = Input(tensor=the_kernel_var) and bias = Input(tensor=bias_var)
If the warning above is bad for you, you can always use kernel and bias "from outside", like:
def custom_dense(inputs):
outputs = K.dot(inputs, kernel) #where kernel is not part of the arguments anymore
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
layer = Lambda(custom_dense)
hiddens = layer(x)
This last option makes it a bit more complicated to save/load models.
Old answer
You should probably use a Keras Dense layer and set its weights in a standard way:
layer = tf.keras.layers.Dense(64, name='the_layer')
layer.set_weights([np.random.rand(784, 64), np.random.rand(64)])
If you need that these weights are not trainable, before compiling the keras model you set:
model.get_layer('the_layer').trainable=False
If you want direct access to the variables as tensors, they are:
kernel = layer.kernel
bias = layer.bias
There are plenty of other options, but that depends on your exact intention, which is not clear in your question.
Below is a general-purpose solution that works with arbitrary Keras models in TF2.
First, we need to define an auxiliary function canonical_variable_name and a context manager custom_make_variable with the following signatures (see implementation in meta-blocks library).
def canonical_variable_name(variable_name: str, outer_scope: str):
"""Returns the canonical variable name: `outer_scope/.../name`."""
# ...
#contextlib.contextmanager
def custom_make_variable(
canonical_custom_variables: Dict[str, tf.Tensor], outer_scope: str
):
"""A context manager that overrides `make_variable` with a custom function.
When building layers, Keras uses `make_variable` function to create weights
(kernels and biases for each layer). This function wraps `make_variable` with
a closure that infers the canonical name of the variable being created (of the
form `outer_scope/.../var_name`) and looks it up in the `custom_variables` dict
that maps canonical names to tensors. The function adheres the following logic:
* If there is a match, it does a few checks (shape, dtype, etc.) and returns
the found tensor instead of creating a new variable.
* If there is a match but checks fail, it throws an exception.
* If there are no matching `custom_variables`, it calls the original
`make_variable` utility function and returns a newly created variable.
"""
# ...
Using these functions, we can create arbitrary Keras models with custom tensors used as variables:
import numpy as np
import tensorflow as tf
canonical_custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with custom_make_variable(canonical_custom_variables, outer_scope="model"):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
Not entirely sure I understand your question correctly, but it seems to me that it should be possible to do what you want with a combination of custom layers and keras functional api.
Custom layers allow you to build any layer you want in a way that is compatible with Keras, e.g.:
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs],
initializer='normal')
self.bias = self.add_weight("bias",
shape=[self.num_outputs,],
initializer='normal')
def call(self, inputs):
return tf.matmul(inputs, self.kernel) + self.bias
and the functional api allows you to access the outputs of said layers and re-use them:
inputs = keras.Input(shape=(784,), name='img')
x1 = MyDenseLayer(64, activation='relu')(inputs)
x2 = MyDenseLayer(64, activation='relu')(x1)
outputs = MyDenseLayer(10, activation='softmax')(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
Here x1 and x2 can be connected to other subnets.
I'm reading through some pytorch tutorials. Below is the definition of a residual block. However in the forward method each function handle only takes one argument out while in the __init__ function these functions have different number of input arguments:
# Residual Block
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(ResidualBlock, self).__init__()
self.conv1 = conv3x3(in_channels, out_channels, stride)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(out_channels, out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
Does anyone know how this works?
Is it a standard python class inheritance feature or is this specific to pytorch?
you define the layer in the init function, which means the parameters. In the forward function you only input the data that needs to be processed with the predefined settings from init. The nn.whatever builds a function with the settings you pass to it. Then this function can be used in forward and this function only takes one argument.
You define different layers of your network architecture in the constructor of the class (__init__ function). Essentially, when you create an instance of different layers, you initialize them with your settings parameters.
For example, when you declare the first convolution layer, self.conv1, you give the parameters required to initialize the layer. In the forward function, you just simply call the layers with the input to get the corresponding output. For example, in out = self.conv2(out), you take the output of the previous layer and give it as an input the next self.conv2 layer.
Please note, during initialization, you give information to the layer that what kind/shape of input will be provided to that layer. For example, you tell the first convolution layer that what will be number of input and output channels in your input. In the forward method, you just need to pass the input, that's it.