Python Unit Testing: How is function automatically called without providing function name? - python

I am looking into the codes in vanilla_vae here and its unit test test_vae here.
In the code snippet of test_vae below, I am confused as to how self.model(x) portion in test_loss(self) function directly calls VanillaVAE class's forward method without mentioning the function name. Could anyone provide me insight on this?
def setUp(self) -> None:
# self.model2 = VAE(3, 10)
self.model = VanillaVAE(3, 10)
def test_loss(self):
x = torch.randn(16, 3, 64, 64)
result = self.model(x)
loss = self.model.loss_function(*result, M_N = 0.005)
print(loss)

This is because vanilla_vae inherits from BaseVAE, which inherits from nn.Module.
nn.Module contains a __call__ method, which is a built in method that makes classes callable.
This calls _call_impl where the forward function is referenced.

This behavior depends on the torch.nn.Module. That is the PyTorch base class for creating neural networks. In the forward function, you define how your model is going to be run, from input to output.
This means that every time you pass an input to your model, the forward function is called automatically and it returns what it is defined. In this case, as I can see from your link, a List[Tensor]:
def forward(self, input: Tensor, **kwargs) -> List[Tensor]:
mu, log_var = self.encode(input)
z = self.reparameterize(mu, log_var)
return [self.decode(z), input, mu, log_var]
Here you can also find a couple of examples on how the nn package is used from PyTorch.

Related

torch.nn.BCEloss() and torch.nn.functional.binary_cross_entropy

What is the basic difference between these two loss functions? I have already tried using both the loss functions.
The difference is that nn.BCEloss and F.binary_cross_entropy are two PyTorch interfaces to the same operations.
The former, torch.nn.BCELoss, is a class and inherits from nn.Module which makes it handy to be used in a two-step fashion, as you would always do in OOP (Object Oriented Programming): initialize then use. Initialization handles parameters and attributes initialization as the name implies which is quite useful when using stateful operators such as parametrized layers and the kind. This is the way to go when implementing classes of your own, for example:
class Trainer():
def __init__(self, model):
self.model = model
self.loss = nn.BCEloss()
def __call__(self, x, y)
y_hat = self.model(x)
loss = self.loss(y_hat, y)
return loss
On the other hand, the later, torch.nn.functional.binary_cross_entropy, is the functional interface. It is actually the underlying operator used by nn.BCELoss, as you can see at this line. You can use this interface but this can become cumbersome when using stateful operators. In this particular case, the binary cross-entropy loss does not have parameters (in the most general case), so you could do:
class Trainer():
def __init__(self, model):
self.model = model
def __call__(self, x, y)
y_hat = self.model(x)
loss = F.binary_cross_entropy(y_hat, y)
return loss
BCEloss is the Binary_Cross_Entropy loss.
torch.nn.functional.binary_cross_entropy calculates the actual loss inside the torch.nn.BCEloss()

How to get the params in a custom implementation of a pytorch optimizer?

Quick overview of my issue: I'm implementing a custom optimizer, and all I want to do is add two class variables and an extra instruction to the step() function. Here's the class code for reference:
class DPSGD(torch.optim.SGD):
def __init__(
self, noise_multiplier: float = 0.5, l2_norm_clip: float = 1.5, *args, **kwargs
) -> None:
super().__init__(*args, **kwargs)
self.noise_multiplier = noise_multiplier
self.l2_norm_clip = l2_norm_clip
def step(self, closure=None) -> Optional[float]:
closure = super().step()
params = []
# works for getting the params I need but there's gotta be a better way
for group in self.param_groups:
for p in group["params"]:
if p.grad is not None:
params.append(p)
# custom function that takes model.parameters()
privacy.noise_and_clip_parameters(
params,
l2_norm_clip=self.l2_norm_clip,
noise_multiplier=self.noise_multiplier,
)
return closure
I don't really understand what the param_groups are. I know that when the model.parameters() are passed to an optimizer like
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
they are taken and converted into the param_groups as a class variable, but I don't know a simple way to just get the original params out as they were. Is there a way to extract those original model.parameters() that's already implemented in the base optimizer class? My goal is to not have to use that nested for loop because I'm planning on extending other torch optimizers in the same manner and for readability I'd rather avoid writing more code than I need to.

Empty state_dict with vector or tuple of layers in nn.Module

I switched to using a Version with a parametrized number of layers of torch.nn.Module like Net_par below, only to find out all the saved state_dicts were empty after quite some optimizing.^^
This method is the recommended saving operation (https://pytorch.org/docs/stable/notes/serialization.html#recommend-saving-models), still layers stored in a vector (or tuple, for that matter) are discarded when constructing the state_dict.
torch.save works properly in contrast, but adds to data and limits robustness. This feels a little like a bug, can anybody help with a workaround?
Minimal example for comparison between parametrized and fixed layer count:
import torch
import torch.nn as nn
class Net_par(nn.Module):
def __init__(self,layer_dofs):
super(Net_par, self).__init__()
self.layers=[]
for i in range(len(layer_dofs)-1):
self.layers.append(nn.Linear(layer_dofs[i],layer_dofs[i+1]))
def forward(self, x):
for i in range(len(self.layers)-1):
x = torch.tanh(self.layers[i](x))
return torch.tanh(self.layers[len(self.layers)-1](x))
class Net_sta(nn.Module):
def __init__(self,dof1,dof2):
super(Net_sta, self).__init__()
self.layer=nn.Linear(dof1,dof2)
def forward(self, x):
return torch.tanh(self.layer1(x))
if __name__=="__main__":
net_par=Net_par((3,4))
net_sta=Net_sta(3,4)
print(str(net_par.state_dict()))
#OrderedDict() <------Why?!
print(str(net_sta.state_dict()))
#OrderedDict([('layer.weight', tensor([[...
# ...]])), ('layer.bias', tensor([... ...]))])
You need to use nn.ModuleList() instead of simple python list.
class Net_par(nn.Module):
...
self.layers = nn.ModuleList([])

Differentiating user-defined Variables when using Keras layers

I want to multiply a Keras layer with my own Variable.
Then, I want to compute the gradients of some loss relative to the variables I have defined.
Here is a simplified MWE of what I am trying to do:
import tensorflow as tf
x = input_shape = tf.keras.layers.Input((10,))
x = tf.keras.layers.Dense(5)(x)
s = tf.Variable(tf.ones((5,)))
x = x*s
model = tf.keras.models.Model(input_shape, x)
X = tf.random.normal((50, 10)) # random sample
with tf.GradientTape() as tape:
tape.watch(s)
y = model(X)
loss = y**2
print(tape.gradient(loss, s)) # why None ??
The print prints None... why?
Notice that I am using eager-execution (TF version 2.0.0).
I managed to fix my problem by sub-classing Model and creating my variable inside the model:
class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense = tf.keras.layers.Dense(5)
self.s = tf.Variable(tf.ones((5,)))
def call(self, inputs):
x = self.dense(inputs)
x = x * self.s
return x
Alternatively, defining my own custom layer also works.
There must be some magic going on whereby variables not inside a model are not backpropagated (like in PyTorch).
I will leave the question open because I am curious as to why my code was not working and what a simpler fix would look like.
This might be the explanation. Based on reviewing the documentation, I'm suspecting that the issue is the differentiation with respect to the model layer "s" (or any other layer say "x") might not be a meaningful calculation. For example, it is possible to do this:
print(tape.gradient(loss, model.variables))
and obtain the gradients with respect to the model weights/parameters, but differentiating the model with respect to a "layer" is not appropriate. This is my speculation at this point. I hope this helps.

Pytorch custom activation functions?

I'm having issues with implementing custom activation functions in Pytorch, such as Swish. How should I go about implementing and using custom activation functions in Pytorch?
There are four possibilities depending on what you are looking for. You will need to ask yourself two questions:
Q1) Will your activation function have learnable parameters?
If yes, you have no choice but to create your activation function as an nn.Module class because you need to store those weights.
If no, you are free to simply create a normal function, or a class, depending on what is convenient for you.
Q2) Can your activation function be expressed as a combination of existing PyTorch functions?
If yes, you can simply write it as a combination of existing PyTorch function and won't need to create a backward function which defines the gradient.
If no you will need to write the gradient by hand.
Example 1: SiLU function
The SiLU function f(x) = x * sigmoid(x) does not have any learned weights and can be written entirely with existing PyTorch functions, thus you can simply define it as a function:
def silu(x):
return x * torch.sigmoid(x)
and then simply use it as you would have torch.relu or any other activation function.
Example 2: SiLU with learned slope
In this case you have one learned parameter, the slope, thus you need to make a class of it.
class LearnedSiLU(nn.Module):
def __init__(self, slope = 1):
super().__init__()
self.slope = slope * torch.nn.Parameter(torch.ones(1))
def forward(self, x):
return self.slope * x * torch.sigmoid(x)
Example 3: with backward
If you have something for which you need to create your own gradient function, you can look at this example: Pytorch: define custom function
You can write a customized activation function like below (e.g. weighted Tanh).
class weightedTanh(nn.Module):
def __init__(self, weights = 1):
super().__init__()
self.weights = weights
def forward(self, input):
ex = torch.exp(2*self.weights*input)
return (ex-1)/(ex+1)
Don’t bother about backpropagation if you use autograd compatible operations.
I wrote the following SinActivation sub-class of nn.Module to implement the sin activation function.
class SinActivation(torch.nn.Module):
def __init__(self):
super(SinActivation, self).__init__()
return
def forward(self, x):
return torch.sin(x)

Categories