I am not able to understand this sample_losses = self.forward(output, y) defined under the class Loss.
From which "forward function" it is taking input as forward function is previously defined for all three classes i.e. Dense_layer, Activation_ReLU and Activation_Softmax?
class Layer_Dense:
def __init__(self, n_inputs, n_neurons):
self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
print(self.weights)
def forward(self, inputs):
self.output = np.dot(inputs, self.weights) + self.biases
class Activation_ReLU:
def forward(self, inputs):
self.output= np.maximum(0, inputs)
class Activation_Softmax:
def forward (self, inputs):
exp_values = np.exp(inputs - np.max(inputs, axis = 1, keepdims= True ))
probabilities= exp_values/np.sum(exp_values, axis = 1, keepdims= True )
self.output = probabilities
class Loss:
def calculate(self, output, y):
sample_losses = self.forward(output, y)
data_loss = np.mean(sample_losses)
return data_loss
self.forward() is similar to call method but with registered hooks. This is used to directly call a method in the class when an instance name is called. These methods are inherited from nn.Module.
https://gist.github.com/nathanhubens/5a9fc090dcfbf03759068ae0fc3df1c9
Or refer to the source code:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L485
Related
# MSE Custom Class
class MyMSE(nn.Module):
def __init__(self):
super().__init__()
def forward(self, y_hat, y):
MSE_function = torch. mean( (1 / torch.numel(y)) * ((y_hat - y) ** 2) )
return MSE_function
# TEST Custom CLASS
y = torch.tensor([[1,2,3], [4,5,6]])
y_hat = torch.tensor([[4,5,6], [7,8,9]])
lossfun = MyMSE()
loss = lossfun(y_hat, y)
print(loss)
i want to write a custom class for MSELoss but it doesn't work for backward!
loss.backward()
I was learning pytorch and I encountered a case I could not understand what's happening. Here is a class called MLP, with init function and a forward function. When I pass X as a parameter to the MLP instance net, without using net.forward(X), it seems forward function has been autimatically called. Why this is the case?
import torch
from torch import nn
from torch.nn import functional as F
class MLP(nn.Module):
def __init__(self):
super().__init__() # nn.Module's params
self.hidden = nn.Linear(20, 256)
self.out = nn.Linear(256, 10)
def forward(self, X):
return self.out(F.relu(self.hidden(X)))
X = torch.rand(2, 20)
net = MLP()
net(X)
"""
output of net(X)
tensor([[ 0.0614, -0.0143, -0.0546, 0.1173, -0.1838, -0.1843, 0.0861, 0.1152,
0.0990, 0.1818],
[-0.0483, -0.0196, 0.0720, 0.1243, 0.0261, -0.2727, -0.0480, 0.1391,
-0.0685, 0.2025]], grad_fn=<AddmmBackward0>)
"""
My initial guess is that the forward is the only function is MLP receives a parameter, but after I added another function that takes the same parameters, calling net(X) seems still choose forward function
class MLP(nn.Module):
def __init__(self):
super().__init__() # nn.Module's params
self.hidden = nn.Linear(20, 256)
self.out = nn.Linear(256, 10)
def forward2(self, X):
print("hello")
return self.out((self.hidden(X)))
def forward(self, X):
return self.out(F.relu(self.hidden(X)))
net = MLP()
net(X)
net.forward(X)
net.forward2(X)
then I got
>>> net.forward(X)
tensor([[-0.1273, -0.0338, -0.1412, -0.1321, -0.1213, 0.0589, 0.0752, 0.0066,
-0.0057, -0.1374],
[-0.1660, -0.0044, -0.1765, -0.0451, -0.0386, 0.0824, 0.0486, -0.1293,
0.0511, -0.1285]], grad_fn=<AddmmBackward0>)
>>> net.forward2(X)
hello
tensor([[-0.2027, -0.2304, -0.3597, -0.3741, -0.5000, -0.2698, 0.2464, 0.1709,
-0.2262, -0.1462],
[-0.1168, -0.0417, -0.3584, -0.3133, -0.2366, -0.1521, 0.2428, 0.0043,
-0.1296, -0.2021]], grad_fn=<AddmmBackward0>)
>>> net(X)
tensor([[-0.1273, -0.0338, -0.1412, -0.1321, -0.1213, 0.0589, 0.0752, 0.0066,
-0.0057, -0.1374],
[-0.1660, -0.0044, -0.1765, -0.0451, -0.0386, 0.0824, 0.0486, -0.1293,
0.0511, -0.1285]], grad_fn=<AddmmBackward0>)
What did I miss? Really appreciate with any help!
i am a beginner of pytorch, and i want to build a fully connect model using Pytorch;
the model is very simple like:
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
return self.fc3(x)
but when i want to add some layers or adjust the hidden layers, i found i have to write lots of Redundant code like:
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.relu(self.fc3(x))
x = self.relu(self.fc4(x))
x = self.relu(self.fc5(x))
...
return self.fcn(x)
besides, if i want to change some layer's feature nums, i have to change the layer adjacent
so i want to know a way which is more grace(maybe more pythonic and more easy to adjust hyper parameter).
i tried to write code like:
def __init__(self):
super().__init__()
self.hidden_num = [2881, 5500, 2048, 20] # i just want to change here! to try some new structure
self.fc = [nn.Linear(self.hidden_num[i], self.hidden_num[i + 1]).to(DEVICE) for i in range(len(self.hidden_num) - 1)]
self.relu = nn.ReLU()
def forward(self, x):
for i in range(len(self.fc)):
x = self.fc[i](x)
if i != (len(self.fc) - 1):
x = self.relu(x)
return x
but i found this way doesn't work, the model can't be built
so could any bro tell me, how to define a fullyconnect model like above??
(so i can adjust the model layers only by adjust the list named hidden_num )
If you want to keep the same approach then you can use nn.ModuleList to properly register all linear layers inside the module's __init__:
class Model(nn.Module):
def __init__(self, hidden_num=[2881, 5500, 2048, 20]):
super().__init__()
self.fc = nn.ModuleList([
nn.Linear(hidden_num[i], hidden_num[i+1])
for i in range(len(hidden_num) - 1)])
def forward(self, x):
for i, m in enumerate(self.fc.children()):
x = m(x)
print(i)
if i != len(self.fc) - 1:
x = torch.relu(x)
return x
However, you may want to handle the logic inside the __init__ function once. One alternative is to use nn.Sequential.
class Model(nn.Module):
def __init__(self, hidden_num=[2881, 5500, 2048, 20]):
super().__init__()
fc = []
for i in range(len(hidden_num) - 1):
fc.append(nn.Linear(hidden_num[i], hidden_num[i+1]))
if i != len(self.fc) - 1:
fc.append(nn.ReLU())
self.fc = nn.Sequential(fc)
def forward(self, x):
x = self.fc(x)
return x
Ideally, you would inherit from nn.Sequential directly to avoid re-writing the forward function which is unnecessary in this case:
class Model(nn.Sequential):
def __init__(self, hidden_num=[2881, 5500, 2048, 20]):
fc = []
for i in range(len(hidden_num) - 1):
fc.append(nn.Linear(hidden_num[i], hidden_num[i+1]))
if i != len(self.fc) - 1:
fc.append(nn.ReLU())
super().__init__(fc)
I have a model that needs to implement self-attention and this is how I wrote my code:
class SelfAttention(nn.Module):
def __init__(self, args):
self.multihead_attn = torch.nn.MultiheadAttention(args)
def foward(self, x):
return self.multihead_attn.forward(x, x, x)
class ActualModel(nn.Module):
def __init__(self):
self.inp_layer = nn.Linear(arg1, arg2)
self.self_attention = SelfAttention(some_args)
self.out_layer = nn.Linear(arg2, 1)
def forward(self, x):
x = self.inp_layer(x)
x = self.self_attention(x)
x = self.out_layer(x)
return x
After loading a checkpoint of ActualModel, in ActualModel.__init__ during continuing-training or during prediction time should I load a saved model checkpoint of class SelfAttention?
If I create an instance of class SelfAttention, would the trained weights corresponding to SelfAttention.multihead_attn be loaded if I do torch.load(actual_model.pth) or would be they be reinitialized?
In other words, is this necessary?
class ActualModel(nn.Module):
def __init__(self):
self.inp_layer = nn.Linear(arg1, arg2)
self.self_attention = SelfAttention(some_args)
self.out_layer = nn.Linear(arg2, 1)
def pred_or_continue_train(self):
self.self_attention = torch.load('self_attention.pth')
actual_model = torch.load('actual_model.pth')
actual_model.pred_or_continue_training()
actual_model.eval()
In other words, is this necessary?
In short, No.
The SelfAttention class will be automatically loaded if it has been registered as a nn.module, nn.Parameters, or manually registered buffers.
A quick example:
import torch
import torch.nn as nn
class SelfAttention(nn.Module):
def __init__(self, fin, n_h):
super(SelfAttention, self).__init__()
self.multihead_attn = torch.nn.MultiheadAttention(fin, n_h)
def foward(self, x):
return self.multihead_attn.forward(x, x, x)
class ActualModel(nn.Module):
def __init__(self):
super(ActualModel, self).__init__()
self.inp_layer = nn.Linear(10, 20)
self.self_attention = SelfAttention(20, 1)
self.out_layer = nn.Linear(20, 1)
def forward(self, x):
x = self.inp_layer(x)
x = self.self_attention(x)
x = self.out_layer(x)
return x
m = ActualModel()
for k, v in m.named_parameters():
print(k)
You will get as follows, where self_attention is successfully registered.
inp_layer.weight
inp_layer.bias
self_attention.multihead_attn.in_proj_weight
self_attention.multihead_attn.in_proj_bias
self_attention.multihead_attn.out_proj.weight
self_attention.multihead_attn.out_proj.bias
out_layer.weight
out_layer.bias
I am trying to follow an implementation of an attention decoder
from keras.layers.recurrent import Recurrent
...
class AttentionDecoder(Recurrent):
...
#######################################################
# The functionality of init, build and call is clear
#######################################################
def __init__(self, units, output_dim,
activation='tanh',
...
def build(self, input_shape):
...
def call(self, x):
self.x_seq = x
...
return super(AttentionDecoder, self).call(x)
##################################################################
# What is the purpose of 'get_initial_state' and 'step' functions
# Do these functions override the Recurrent base class functions?
##################################################################
def get_initial_state(self, inputs):
# apply the matrix on the first time step to get the initial s0.
s0 = activations.tanh(K.dot(inputs[:, 0], self.W_s))
# from keras.layers.recurrent to initialize a vector of (batchsize,
# output_dim)
y0 = K.zeros_like(inputs) # (samples, timesteps, input_dims)
y0 = K.sum(y0, axis=(1, 2)) # (samples, )
y0 = K.expand_dims(y0) # (samples, 1)
y0 = K.tile(y0, [1, self.output_dim])
return [y0, s0]
def step(self, x, states):
ytm, stm = states
...
return yt, [yt, st]
The AttentionDecoder class is inherited from Recurrent, an abstract base class for recurrent layers ( Recurrent#keras.layers.recurrent, documented here ).
How do the get_initial_state and step function work withing the class (who calls them, when, etc.)? If these function are related to the base class Recurrent, where can I find the relevant documentation?