How should I initialize my network with Pytorch? - python

My code is a simple Sequential network like:
self.net = nn.Sequential(
nn.Linear(s_dim, 256),
nn.Softplus(),
nn.Linear(256, 256),
nn.Softplus(),
nn.Linear(256, a_dim)
)
I want to initialize the weights of every layer, and make them follow the Normal distribution with (0,1).

You can loop over the nn.Sequential and initialize each linear layer using normal (gaussian) distribution. The sample code are as follows:
# this method can be defined outside your model class
def weights_init(m):
if isinstance(m, nn.Linear):
torch.nn.init.normal_(m.weight, mean=0.0, std=1.0)
torch.nn.init.zero_(m.bias)
# define init method inside your model class
def init_with_normal(self):
self.net.apply(weights_init)

Related

How `make_gen_block` is defined after `__init__` in the code attached?

In the code below, self.gen is instantiated while using the make_gen_block function which is only defined later outside the __init__ attribute.
How is this possible?
Shouldn't make_gen_block be defined before using it to instantiate self.gen so when __init__ is called, make_gen_block can be found within __init__ scope?
Thanks
class Generator(nn.Module):
'''
Generator Class
Values:
z_dim: the dimension of the noise vector, a scalar
im_chan: the number of channels in the images, fitted for the dataset used, a scalar
(MNIST is black-and-white, so 1 channel is your default)
hidden_dim: the inner dimension, a scalar
'''
def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
super(Generator, self).__init__()
self.z_dim = z_dim
# Build the neural network
self.gen = nn.Sequential(
self.make_gen_block(z_dim, hidden_dim * 4),
self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
self.make_gen_block(hidden_dim * 2, hidden_dim),
self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True))
def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):
'''
Function to return a sequence of operations corresponding to a generator block of DCGAN,
corresponding to a transposed convolution, a batchnorm (except for in the last layer), and an activation.
Parameters:
input_channels: how many channels the input feature representation has
output_channels: how many channels the output feature representation should have
kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
stride: the stride of the convolution
final_layer: a boolean, true if it is the final layer and false otherwise
(affects activation and batchnorm)
'''
# Steps:
# 1) Do a transposed convolution using the given parameters.
# 2) Do a batchnorm, except for the last layer.
# 3) Follow each batchnorm with a ReLU activation.
# 4) If its the final layer, use a Tanh activation after the deconvolution.
# Build the neural block
if not final_layer:
return nn.Sequential(
#### START CODE HERE ####
nn.ConvTranspose2d(input_channels, output_channels,kernel_size,stride),
nn.BatchNorm2d(output_channels),
nn.ReLU())
#### END CODE HERE ####
else: # Final Layer
return nn.Sequential(
#### START CODE HERE ####
nn.ConvTranspose2d(input_channels, output_channels,kernel_size,stride),
#### END CODE HERE ####
nn.Tanh())
def unsqueeze_noise(self, noise):
'''
Function for completing a forward pass of the generator: Given a noise tensor,
returns a copy of that noise with width and height = 1 and channels = z_dim.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
'''
return noise.view(len(noise), self.z_dim, 1, 1)
def forward(self, noise):
'''
Function for completing a forward pass of the generator: Given a noise tensor,
returns generated images.
Parameters:
noise: a noise tensor with dimensions (n_samples, z_dim)
'''
x = self.unsqueeze_noise(noise)
return self.gen(x)
def get_noise(n_samples, z_dim, device='cpu'):
'''
Function for creating noise vectors: Given the dimensions (n_samples, z_dim)
creates a tensor of that shape filled with random numbers from the normal distribution.
Parameters:
n_samples: the number of samples to generate, a scalar
z_dim: the dimension of the noise vector, a scalar
device: the device type
'''
return torch.randn(n_samples, z_dim, device=device)
Note that the call to make_gen_block is actually calling self.make_gen_block. The self is important. You can see in the signature of __init__ that self is injected as the first argument. The method can be referenced because self has been passed into the __init__ method (so it is within the scope), and self is of type Generator, which has a make_gen_block method defined for it. The self instance of the class has already been constructed prior to the calling of the __init__ method.
When the class is instantiated, the __new__ method is called first, which constructs the instance and then the __init__ method is called, with the new instance injected (as self) into the method.

The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.
The Network is defined as follows:
class Net:
def __init__(self):
super(Net, self).__init__()
self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)
self.resnet = nn.Sequential(*list(self.resnet.children())[:-1])
self.freeze_model(self.resnet)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(512, 256),
nn.ReLU(),
nn.Linear(256, 3),
)
def forward(self, x):
out = self.resnet(x)
out = out.flatten(start_dim=1)
out = self.classifier(out)
return out
What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.
# First way
net.load_state_dict(torch.load('model_CNN_pretrained.ptl'))
# Second way
params = torch.load('model_CNN_pretrained.ptl')
net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False)
net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False)
net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False)
net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)
The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?
Finally, I find out where is the problem.
During the pre-trained process, buffer parameters in BatchNorm2d Layer of ResNet18 model were changed even if we set require_grad of parameters False. Buffer parameters were calculated by the input data after model.train() was processed, and unchanged after model.eval().
There is a link about how to freeze the BN layer.
How to freeze BN layers while training the rest of network (mean and var wont freeze)

syntax pytorch.nn functions neural networks, class or functions?

I am puzzled by the syntax of the functions used in neural networks in pytorch.
Here is an example of how one can define a linear transformation layer: (cf. https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())
torch.Size([128, 30])
Can someone explain me where the expression nn.Linear(20,30)(input) comes from ? It disturbs me a bit.
Indeed, one can define a class neural network with such cosntructor : (for example)
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, p=0):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size, bias=True)
self.fc2 = nn.Linear(hidden_size, hidden_size, bias=True)
self.fc3 = nn.Linear(hidden_size, hidden_size, bias=True)
self.fc4 = nn.Linear(hidden_size, num_classes, bias=False)
self.dropout = nn.Dropout(p=p)
and I was trying to write an attribute using numpy function, like:
self.enter_reshape = np.reshape(-1, input_size * input_size)
self.exit_reshape = np.reshape(input_size, num_classes / input_size)
or, using the view function from pytorch:
self.reshape = view(-1, self.num_flat_features())
The closest thing I know about is the partial function and closures, where one could write f(z)(x)(y). I looked into the definition of Linear, and I saw that linear is an object, but I don't see where they redefined __call__ magic function, which I thought would be used here when one calls the object.
So basically, can one explain what is up with such writting, and also, would it be possible to give to the neural network the numpy or view functions as attributes?
torch.nn.Linear inherits from torch.nn.Module (see source code), which in turn defines __call__ method.
You can see source code for torch.nn.Module here. This class allows users to make their own Modules by inheritance (as you did in your example) and is a base for all PyTorch defined modules like nn.Linear (see available methods, documentation here).
Its __call__ essentially calls forward but running registered hooks (and registering), checking torchscript etc. (see source code here, with relevant line here.
would it be possible to give to the neural network the numpy or view
functions as attributes?
From the example you've given, what you are trying to do is probably partial function (or lambda as in the example below) saved as attribute (though that is pretty uncommon and never seen it tbh), like this:
import torch
class MyModule(torch.nn.Module):
def __init__(self, shape: int = -1):
super().__init__() # required
self.reshape = lambda tensor: torch.reshape(tensor, (shape,))
def forward(self, tensor):
return self.reshape(tensor)
module = MyModule()
module(torch.randn(4, 5, 6)).shape # [120] shape
You shouldn't use numpy with pytorch unless you really need it and/or there is no sensible pytorch counterpart (although you can if you transform torch.Tensor to numpy). Also you shouldn't do anything like the code above as it's really confusing, just save attributes (anything like input_shape, hidden_dim, output_size) and use it in forward:
class MyModule(torch.nn.Module):
def __init__(self, shape: int = -1):
super().__init__() # required
self.shape = shape
def forward(self, tensor):
return torch.reshape(tensor, (self.shape,))

TensorFlow 2.0 Keras layers with custom tensors as variables

In TF 1.x, it was possible to build layers with custom variables. Here's an example:
import numpy as np
import tensorflow as tf
def make_custom_getter(custom_variables):
def custom_getter(getter, name, **kwargs):
if name in custom_variables:
variable = custom_variables[name]
else:
variable = getter(name, **kwargs)
return variable
return custom_getter
# Make a custom getter for the dense layer variables.
# Note: custom variables can result from arbitrary computation;
# for the sake of this example, we make them just constant tensors.
custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
custom_getter = make_custom_getter(custom_variables)
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with tf.variable_scope("model", custom_getter=custom_getter):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
The printed variables of the constructed dense layer will be custom tensors we specified in the custom_variables dict:
[<tf.Tensor 'custom_kernel:0' shape=(784, 64) dtype=float32>, <tf.Tensor 'custom_bias:0' shape=(64,) dtype=float32>]
This allows us to create layers/models that use provided tensors in custom_variables directly as their weights, so that we could further differentiate the output of the layers/models with respect to any tensors that custom_variables may depend on (particularly useful for implementing functionality in modulating sub-nets, parameter generation, meta-learning, etc.).
Variable scopes used to make it easy to nest all off graph-building inside scopes with custom getters and build models on top of the provided tensors as their parameters. Since sessions and variable scopes are no longer advisable in TF 2.0 (and all of that low-level stuff is moved to tf.compat.v1), what would be the best practice to implement the above using Keras and TF 2.0?
(Related issue on GitHub.)
Answer based on the comment below
Given you have:
kernel = createTheKernelVarBasedOnWhatYouWant() #shape (784, 64)
bias = createTheBiasVarBasedOnWhatYouWant() #shape (64,)
Make a simple function copying the code from Dense:
def custom_dense(x):
inputs, kernel, bias = x
outputs = K.dot(inputs, kernel)
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
Use the function in a Lambda layer:
layer = Lambda(custom_dense)
hiddens = layer([x, kernel, bias])
Warning: kernel and bias must be produced from a Keras layer, or come from an kernel = Input(tensor=the_kernel_var) and bias = Input(tensor=bias_var)
If the warning above is bad for you, you can always use kernel and bias "from outside", like:
def custom_dense(inputs):
outputs = K.dot(inputs, kernel) #where kernel is not part of the arguments anymore
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
layer = Lambda(custom_dense)
hiddens = layer(x)
This last option makes it a bit more complicated to save/load models.
Old answer
You should probably use a Keras Dense layer and set its weights in a standard way:
layer = tf.keras.layers.Dense(64, name='the_layer')
layer.set_weights([np.random.rand(784, 64), np.random.rand(64)])
If you need that these weights are not trainable, before compiling the keras model you set:
model.get_layer('the_layer').trainable=False
If you want direct access to the variables as tensors, they are:
kernel = layer.kernel
bias = layer.bias
There are plenty of other options, but that depends on your exact intention, which is not clear in your question.
Below is a general-purpose solution that works with arbitrary Keras models in TF2.
First, we need to define an auxiliary function canonical_variable_name and a context manager custom_make_variable with the following signatures (see implementation in meta-blocks library).
def canonical_variable_name(variable_name: str, outer_scope: str):
"""Returns the canonical variable name: `outer_scope/.../name`."""
# ...
#contextlib.contextmanager
def custom_make_variable(
canonical_custom_variables: Dict[str, tf.Tensor], outer_scope: str
):
"""A context manager that overrides `make_variable` with a custom function.
When building layers, Keras uses `make_variable` function to create weights
(kernels and biases for each layer). This function wraps `make_variable` with
a closure that infers the canonical name of the variable being created (of the
form `outer_scope/.../var_name`) and looks it up in the `custom_variables` dict
that maps canonical names to tensors. The function adheres the following logic:
* If there is a match, it does a few checks (shape, dtype, etc.) and returns
the found tensor instead of creating a new variable.
* If there is a match but checks fail, it throws an exception.
* If there are no matching `custom_variables`, it calls the original
`make_variable` utility function and returns a newly created variable.
"""
# ...
Using these functions, we can create arbitrary Keras models with custom tensors used as variables:
import numpy as np
import tensorflow as tf
canonical_custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with custom_make_variable(canonical_custom_variables, outer_scope="model"):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
Not entirely sure I understand your question correctly, but it seems to me that it should be possible to do what you want with a combination of custom layers and keras functional api.
Custom layers allow you to build any layer you want in a way that is compatible with Keras, e.g.:
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs],
initializer='normal')
self.bias = self.add_weight("bias",
shape=[self.num_outputs,],
initializer='normal')
def call(self, inputs):
return tf.matmul(inputs, self.kernel) + self.bias
and the functional api allows you to access the outputs of said layers and re-use them:
inputs = keras.Input(shape=(784,), name='img')
x1 = MyDenseLayer(64, activation='relu')(inputs)
x2 = MyDenseLayer(64, activation='relu')(x1)
outputs = MyDenseLayer(10, activation='softmax')(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
Here x1 and x2 can be connected to other subnets.

Pytorch-tutorial: Strange input argument in class definition

I'm reading through some pytorch tutorials. Below is the definition of a residual block. However in the forward method each function handle only takes one argument out while in the __init__ function these functions have different number of input arguments:
# Residual Block
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(ResidualBlock, self).__init__()
self.conv1 = conv3x3(in_channels, out_channels, stride)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(out_channels, out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
Does anyone know how this works?
Is it a standard python class inheritance feature or is this specific to pytorch?
you define the layer in the init function, which means the parameters. In the forward function you only input the data that needs to be processed with the predefined settings from init. The nn.whatever builds a function with the settings you pass to it. Then this function can be used in forward and this function only takes one argument.
You define different layers of your network architecture in the constructor of the class (__init__ function). Essentially, when you create an instance of different layers, you initialize them with your settings parameters.
For example, when you declare the first convolution layer, self.conv1, you give the parameters required to initialize the layer. In the forward function, you just simply call the layers with the input to get the corresponding output. For example, in out = self.conv2(out), you take the output of the previous layer and give it as an input the next self.conv2 layer.
Please note, during initialization, you give information to the layer that what kind/shape of input will be provided to that layer. For example, you tell the first convolution layer that what will be number of input and output channels in your input. In the forward method, you just need to pass the input, that's it.

Categories