I am puzzled by the syntax of the functions used in neural networks in pytorch.
Here is an example of how one can define a linear transformation layer: (cf. https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())
torch.Size([128, 30])
Can someone explain me where the expression nn.Linear(20,30)(input) comes from ? It disturbs me a bit.
Indeed, one can define a class neural network with such cosntructor : (for example)
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, p=0):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size, bias=True)
self.fc2 = nn.Linear(hidden_size, hidden_size, bias=True)
self.fc3 = nn.Linear(hidden_size, hidden_size, bias=True)
self.fc4 = nn.Linear(hidden_size, num_classes, bias=False)
self.dropout = nn.Dropout(p=p)
and I was trying to write an attribute using numpy function, like:
self.enter_reshape = np.reshape(-1, input_size * input_size)
self.exit_reshape = np.reshape(input_size, num_classes / input_size)
or, using the view function from pytorch:
self.reshape = view(-1, self.num_flat_features())
The closest thing I know about is the partial function and closures, where one could write f(z)(x)(y). I looked into the definition of Linear, and I saw that linear is an object, but I don't see where they redefined __call__ magic function, which I thought would be used here when one calls the object.
So basically, can one explain what is up with such writting, and also, would it be possible to give to the neural network the numpy or view functions as attributes?
torch.nn.Linear inherits from torch.nn.Module (see source code), which in turn defines __call__ method.
You can see source code for torch.nn.Module here. This class allows users to make their own Modules by inheritance (as you did in your example) and is a base for all PyTorch defined modules like nn.Linear (see available methods, documentation here).
Its __call__ essentially calls forward but running registered hooks (and registering), checking torchscript etc. (see source code here, with relevant line here.
would it be possible to give to the neural network the numpy or view
functions as attributes?
From the example you've given, what you are trying to do is probably partial function (or lambda as in the example below) saved as attribute (though that is pretty uncommon and never seen it tbh), like this:
import torch
class MyModule(torch.nn.Module):
def __init__(self, shape: int = -1):
super().__init__() # required
self.reshape = lambda tensor: torch.reshape(tensor, (shape,))
def forward(self, tensor):
return self.reshape(tensor)
module = MyModule()
module(torch.randn(4, 5, 6)).shape # [120] shape
You shouldn't use numpy with pytorch unless you really need it and/or there is no sensible pytorch counterpart (although you can if you transform torch.Tensor to numpy). Also you shouldn't do anything like the code above as it's really confusing, just save attributes (anything like input_shape, hidden_dim, output_size) and use it in forward:
class MyModule(torch.nn.Module):
def __init__(self, shape: int = -1):
super().__init__() # required
self.shape = shape
def forward(self, tensor):
return torch.reshape(tensor, (self.shape,))
Related
I'm programming some callable custom modules in PyTorch and I wanted to know if I'm doing it correctly. Here's an example scenario where I want to construct a module that takes a torch.Tensor as input, performs a learnable linear operation and outputs a diagonal covariance matrix to use in a multivariate distribution downstream.
class Exp(nn.Module):
def forward(self, x):
return x.exp()
class Diag(nn.Module):
def forward(self, x):
return x.diag_embed()
def init_model(input_size, output_size):
log_variance_module = nn.Linear(input_size, output_size)
diag_covariance_module = nn.Sequential(logvar_module, Exp(), Diag())
return diag_covariance_module
model = init_model(5, 5)
cov = model(some_input_tensor)
dist = MultivariateNormal(some_mean, cov)
I know that this works, but is it the right design pattern? How is one recommended to approach these modules?
This looks like the correct design pattern.
Ideally, you would also write your main network as an nn.Module:
class Model(nn.Sequential):
def __init__(self, input_size, output_size):
logvar_module = nn.Linear(input_size, output_size)
super().__init__(logvar_module, Exp(), Diag())
I am trying to make a model in tensorflow using the keras subclasses method.
Q1) I am correctly calling layers as layers = [] and then using layers.append(GTLayer....) ?
Q2) calling GTLayer in init of GTN will run class GTLayer and will it call self.conv1 (which will return a tensor A from GTNconv) and self.conv2 (which will again return a tensor A from GTNconv)and then start the call mrthod of GTLayer to H,W , Am I right?
Q3) What happens to the returned H and W from 'Q2' will it store in layers[] list ? and then when we further call the GTNs call method it will bring up those layer? Am I correct?
Q4)Later in the GTNs call method I had to implement linear layers and thus I defined model = tf.keras.models.Sequential() and after theat initialised self.linear1 and self.linear2, this way I have implemented subclassing and sequential both! Is that correct?
Q5) I will finally get loss, y, Ws from calling GTN , now if I assign my model = GTN(arguments..) how will I do the training and back-propagation steps? using an optimiser and loss function? will it follow model.compile() and model.fit ? Or can we make it any different in the sub-classing method of keras?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class GTN(layers.Layer):
def __init__(self, num_edge, num_channels,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in tf.range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
model = tf.keras.models.Sequential()
self.loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
self.linear1 = model.add(tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None))
self.linear2 = model.add(tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None))
def gcn_conv(self,X,H):
X = tf.matmul(X, self.weight)
H = self.norm(H, add=True)
return tf.matmul(tf.transpose(H),X)
def call(self, A, X, target_x, target):
A = tf.expand_dims(A, 0)
Ws = []
for i in range(self.num_layers):
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
for i in range(self.num_channels):
X_tmp = tf.nn.relu(self.gcn_conv(X,H[i])).numpy()
X_ = tf.concat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = tf.nn.relu(X_).numpy()
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
class GTLayer(keras.layers.Layer):
def __init__(self, in_channels, out_channels, first=True):
super(GTLayer, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.conv1 = GTConv(in_channels, out_channels)
self.conv2 = GTConv(in_channels, out_channels)
def call(self, A, H_=None):
a = self.conv1(A)
b = self.conv2(A)
H = tf.matmul( a, b)
W = [tf.stop_gradient(tf.nn.softmax(self.conv1.weight, axis=1).numpy()),
tf.stop_gradient(tf.nn.softmax(self.conv2.weight, axis=1).numpy()) ]
return H,W
class GTConv(keras.layers.Layer):
def __init__(self, in_channels, out_channels):
super(GTConv, self).__init__()
def call(self, A):
A = tf.add_n(tf.nn.softmax(self.weight))
return A
Q1
No. There are two possibilities here
1 - If you want to access a standard layers property of Keras models:
Only Model has a layers property, a keras.layers.Layer doesn't have this property
You are not supposed to mess with the layers property of a Model, you should just read it
The variable you are creating named layers is not a property of your class because you did not use self.layers.
2 - If you just want a list named layers for personal use in your class:
I recommend you don't use a standard name like this and change it to myLayers or something like that to avoid confusion.
The variable layers you created is not being used anywhere else in your code, you just created it and never used.
Remember that layers = [] just creates a local variable, while self.layers = [] creates a property in your class that can be used in other methods inside your class
Q2
You are not "calling" GTLayer, you are "creating" GTLayer. This means that you are running GTLayer.__init__().
This distinction is important in Keras:
This is "creating" a layer: layer_instance = GTLayer(...), which runs __init__
This is "calling" a layer: layer_instance(input_tensors), which runs __call__ (which will eventually run call as defined by you)
You can do both in the same line as output_tensors = GTLayer(...)(input_tensors)
So, this is happening in GTN.__init__:
You are "creating" two instances of the GTLayer.
This runs GTLayer.__init__() for each instance
This hits the lines self.conv1 = GTConv(in_channels, out_channels) and self.conv2 = GTConv(in_channels, out_channels)
This is also "creating" (not "calling") GTConv.
self.conv1 and self.conv2 are "Layer" instances, not tensors.
Q3
No tensor is produced here because you never "called" any layer in GTN.__init__().
(And this is ok. Usually, you "create" layers inside __init__() and "call" layers inside call.)
Your layers local variable will have "instances of GTLayer".
Q4
You mixed two approaches in a strange way.
You can, of course, use a Sequential model if you want, but it's not necessary, and you're not using it correcly.
If in call you are calling each layer (that is X_ = self.linear1(X_) and y = self.linear2(X_[target_x])), you don't need a Sequential model at all, and you can just have the following in GTN.__init__() (this is the best approach for subclassing):
self.linear1 = tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None)
self.linear2 = tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None)
But you could have self.submodel = Sequential(...) and then use self.submodel in GTN.call(). But having a Model inside a layer sounds weird and might cause some strange behavior in specific cases. And, of course, the ReLUs should be a part of this submodel.
Q5
I will finally get loss, y, Ws from calling GTN
That loss and weights coming from call is a "very very" strange thing. I never saw this and I don't understand why you're doing it this way. This is not standard use of Keras and only in very specific and otherwise unsolvable cases you'd try something like this. I cannot say it will work.
How will I do the training and back-propagation steps?
You should have implemented a keras.models.Model, not a keras.layers.Layer. Only models have the ability to compile and train.
Usually, you'd not create a loss in call, you'd create a loss in model.compile, unless you're dealing with unconventional losses, like weight or activity regularization, things that really depend on the layer and not on the model's inputs/outputs.
Extra tips
There is no need to create custom layers if you're not going to create custom trainable weights. It's not wrong, of course, but also not necessary. It can help organize your code, or just add extra complication.
You are trying to use weight from your layers, but you never defined any weight anywhere.
I'm pretty sure there is a better way to achieve what you want, but I don't know what you want (and that would be something for another question, I think...)
This might be a good reading for subclassing: https://www.tensorflow.org/guide/keras/custom_layers_and_models?hl=en
I switched to using a Version with a parametrized number of layers of torch.nn.Module like Net_par below, only to find out all the saved state_dicts were empty after quite some optimizing.^^
This method is the recommended saving operation (https://pytorch.org/docs/stable/notes/serialization.html#recommend-saving-models), still layers stored in a vector (or tuple, for that matter) are discarded when constructing the state_dict.
torch.save works properly in contrast, but adds to data and limits robustness. This feels a little like a bug, can anybody help with a workaround?
Minimal example for comparison between parametrized and fixed layer count:
import torch
import torch.nn as nn
class Net_par(nn.Module):
def __init__(self,layer_dofs):
super(Net_par, self).__init__()
self.layers=[]
for i in range(len(layer_dofs)-1):
self.layers.append(nn.Linear(layer_dofs[i],layer_dofs[i+1]))
def forward(self, x):
for i in range(len(self.layers)-1):
x = torch.tanh(self.layers[i](x))
return torch.tanh(self.layers[len(self.layers)-1](x))
class Net_sta(nn.Module):
def __init__(self,dof1,dof2):
super(Net_sta, self).__init__()
self.layer=nn.Linear(dof1,dof2)
def forward(self, x):
return torch.tanh(self.layer1(x))
if __name__=="__main__":
net_par=Net_par((3,4))
net_sta=Net_sta(3,4)
print(str(net_par.state_dict()))
#OrderedDict() <------Why?!
print(str(net_sta.state_dict()))
#OrderedDict([('layer.weight', tensor([[...
# ...]])), ('layer.bias', tensor([... ...]))])
You need to use nn.ModuleList() instead of simple python list.
class Net_par(nn.Module):
...
self.layers = nn.ModuleList([])
While working with keras and tensorflow, I found the following lines of code confusing.
w_init = tf.random_normal_initializer()
self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units),
dtype='float32'),trainable=True)
Also, I have seen something like:
Dense(64, activation='relu')(x)
Therefore, if Dense(...) will create the object for me, then how can I follow that with with (x)?
Likewise for w_init above. How can I say such thing:
tf.random_normal_initializer()(shape=(input_dim, units), dtype='float32'),trainable=True)
Do we have such thing in python "ClassName()" followed by "()" while creating an object such as a layer?
While I was looking into Closures in python, I found that a function can return another function. Hence, is this what really happens in Keras?
Any help is much appreciated!!
These are two totally different ways to define models.
Keras
Keras works with the concept of layers. Each line defines a full layer of your network. What you are referring to in specific is keras' functional API. The concept is to combine layers like this:
inp = Input(shape=(28, 28, 1))
x = Conv2D((6,6), strides=(1,1), activation='relu')(inp)
# ... etc ...
x = Flatten()(x)
x = Dense(10, activation='softmax')(x)
model = Model(inputs=[inp], outputs=[x])
This way you've created a full CNN in just a few lines. Note that you never had to manually input the shape of the weight vectors or the operations that are performed. These are inferred automatically by keras.
Now, this just needs to be compiled through model.compile(...) and then you can train it through model.fit(...).
Tensorflow
On the other hand TensorFlow is a bit more low-level. This means that you have do define the variables and operations by hand. So in order to write a fully-connected layer you'd have to do the following:
# Input placeholders
x = tf.placeholder(tf.float32, shape=(None, 28, 28, 1))
y = tf.placeholder(tf.float32, shape=(None, 10))
# Convolution layer
W1 = tf.Variable(tf.truncated_normal([6, 6, 1, 32], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, tf.float32, [32]))
z1 = tf.nn.conv2d(x_2d, W1, strides=[1, 1, 1, 1], padding='SAME') + b1
c1 = tf.nn.relu(z1)
# ... etc ...
# Flatten
flat = tf.reshape(p2, [-1, ...]) # need to calculate the ... by ourselves
# Dense
W3 = tf.Variable(tf.truncated_normal([..., 10], stddev=0.1)) # same size as before
b3 = tf.Variable(tf.constant(0.1, tf.float32, [10]))
fc1 = tf.nn.relu(tf.matmul(flat, W3) + b3)
Two things to note here. There is no explicit definition of a model here and this has to be trained through a tf.Session with a feed_dict feeding the data to the placeholders. If you're interested you'll find several guides online.
Closing notes...
TensorFlow has a much friendlier and easier way to define and train models through eager execution, which will be default in TF 2.0! So the code you posted is in a sense the old way of doing things in tensorflow. It's worth taking a look into TF 2.0, which actually recommends doing things the keras way!
Edit (after comment by OP):
No a layer is not a clojure. A keras layer is a class that implements a __call__ method which also makes it callable. The way they did it was so that it is a wrapper to the call method that users typically write.
You can take a look at the implementation here
Basically how this works is:
class MyClass:
def __init__(self, param):
self.p = param
def call(self, x):
print(x)
If you try to write c = MyClass(1)(3), you'll get a TypeError saying that MyClass is not callable. But if you write it like this:
class MyClass:
def __init__(self, param):
self.p = param
def __call__(self, x):
print(x)
It works now. Essentially keras does it like this:
class MyClass:
def __init__(self, param):
self.p = param
def call(self, x):
print(x)
def __call__(self, x):
self.call(x)
So that when you write your own layer you can implement your own call method and the __call__ method that wraps your one will get inherited from keras' base Layer class.
Just from the syntax, I would say that Dense() returns a function (or more accurately a callable). Similarly w_init is a callable as well.
I'm reading through some pytorch tutorials. Below is the definition of a residual block. However in the forward method each function handle only takes one argument out while in the __init__ function these functions have different number of input arguments:
# Residual Block
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(ResidualBlock, self).__init__()
self.conv1 = conv3x3(in_channels, out_channels, stride)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(out_channels, out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
Does anyone know how this works?
Is it a standard python class inheritance feature or is this specific to pytorch?
you define the layer in the init function, which means the parameters. In the forward function you only input the data that needs to be processed with the predefined settings from init. The nn.whatever builds a function with the settings you pass to it. Then this function can be used in forward and this function only takes one argument.
You define different layers of your network architecture in the constructor of the class (__init__ function). Essentially, when you create an instance of different layers, you initialize them with your settings parameters.
For example, when you declare the first convolution layer, self.conv1, you give the parameters required to initialize the layer. In the forward function, you just simply call the layers with the input to get the corresponding output. For example, in out = self.conv2(out), you take the output of the previous layer and give it as an input the next self.conv2 layer.
Please note, during initialization, you give information to the layer that what kind/shape of input will be provided to that layer. For example, you tell the first convolution layer that what will be number of input and output channels in your input. In the forward method, you just need to pass the input, that's it.