Linear Regression in PyTorch - python

It's a simple regression problem. But no matter how much I try, I can't get the answer I want. I'm guessing the weight should be 32 (4 * 8) but, the code returns 25. Why is that?
This is my full source code:
import torch
import torch.nn as nn
import torch.optim as op
X = torch.FloatTensor([[1., 2.],[2., 4.],[3., 6.]])
Y = torch.FloatTensor([[2.],[8.],[18.]])
class TEST(nn.Module):
def __init__(self):
super(TEST,self).__init__()
self.l1 = nn.Linear(2,1)
def forward(self, input):
x = self.l1(input)
return x
epochs = 2000
lr = 0.001
model = TEST()
loss_func = nn.MSELoss()
optimizer = op.SGD(model.parameters(), lr=lr)
for epoch in range(epochs):
optimizer.zero_grad()
output = model(X)
loss = loss_func(output, Y)
loss.backward()
optimizer.step()
if epoch%10 == 0:
print('loss[{}] : {}'.format(epoch, loss))
XX = torch.FloatTensor([[4., 8.]])
print(model(XX))
This is the output of the code:
loss[1920] : 0.8891088366508484
loss[1930] : 0.8890921473503113
loss[1940] : 0.8890781402587891
loss[1950] : 0.8890655636787415
loss[1960] : 0.8890505433082581
loss[1970] : 0.8890388011932373
loss[1980] : 0.889029324054718
loss[1990] : 0.8890181183815002
tensor([[25.3124]], grad_fn=<AddmmBackward>)

You are trying to approximate y = x1*x2 but are using a single linear layer i.e. a purely linear model. Ultimately, what happens is you are learning weights a and b such that y = a*x1 + b*x2. However, this model cannot approximate the distribution of x1, x2 -> x1*x2.

Related

torch.Linear weight doesn't update

#import blah blah
#active funtion
Linear = torch.nn.Linear(6,1)
sig = torch.nn.Sigmoid()
#optimizer
optim = torch.optim.SGD(Linear.parameters() ,lr = 0.001)
#input
#x => (891,6)
#output
y = y.reshape(891,1)
#cost function
loss_f = torch.nn.BCELoss()
for iter in range (10):
for i in range (1000):
optim.zero_grad()
forward = sig(Linear(x)) > 0.5
forward = forward.to(torch.float32)
forward.requires_grad = True
loss = loss_f(forward, y)
loss.backward()
optim.step()
in this code, I want to update Linear.weight and Linear.bias but It doesn't work,,
I think my code doesn't know what is weight and bias so, I tried to change
optim = torch.optim.SGD(Linear.parameters() ,lr = 0.001)
to
optim = torch.optim.SGD([Linear.weight, Linear.bias] ,lr = 0.001)
but It still didn't work,,
// I wanna explain more detail in my problem but my English level is so low 🥲 sorry
The BCELoss is defined as
As you can see the input x are probabilities. However your use of sig(Linear(x)) > 0.5 is wrong. Moreover, sig(Linear(x)) > 0.5 return a tensor with no autograd and it breaks the computation graph. You are explicitly setting the requires_grad=True however, since the graph is broken it cannot reach the linear layers during back propagation and so its weights are not learned/changed.
Correct sample usage:
import torch
import numpy as np
Linear = torch.nn.Linear(6,1)
sig = torch.nn.Sigmoid()
#optimizer
optim = torch.optim.SGD(Linear.parameters() ,lr = 0.001)
# Sample data
x = torch.rand(891,6)
y = torch.rand(891,1)
loss_f = torch.nn.BCELoss()
for iter in range (10):
optim.zero_grad()
output = sig(Linear(x))
loss = loss_f(sig(Linear(x)), y)
loss.backward()
optim.step()
print (Linear.bias.item())
Output:
0.10717090964317322
0.10703673213720322
0.10690263658761978
0.10676861554384232
0.10663467645645142
0.10650081932544708
0.10636703670024872
0.10623333603143692
0.10609971731901169
0.10596618056297302

GRU Loss decreased upto 0.9 but not further, PyTorch

the code that I am using for experimenting with GRU.
import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import *
class N(nn.Module):
def __init__(self):
super().__init__()
self.embed = nn.Embedding(5,2)
self.layers = 4
self.gru = nn.GRU(2, 512, self.layers, batch_first=True)
self.bat = nn.BatchNorm1d(4)
self.bat1 = nn.BatchNorm1d(4)
self.bat2 = nn.BatchNorm1d(4)
self.fc = nn.Linear(512,100)
self.fc1 = nn.Linear(100,100)
self.fc2 = nn.Linear(100,5)
self.s = nn.Softmax(dim=-1)
def forward(self,x):
h0 = torch.zeros(self.layers, x.size(0), 512).requires_grad_()
x = self.embed(x)
x,hn = self.gru(x,h0)
x = self.bat(x)
x = self.fc(x)
x = nn.functional.relu(x)
x = self.bat1(x)
x = self.fc1(x)
x = nn.functional.relu(x)
x = self.bat2(x)
x = self.fc2(x)
softmaxed = self.s(x)
return softmaxed
inp = torch.tensor([[4,3,2,1],[2,3,4,1],[4,1,2,3],[1,2,3,4]])
out = torch.tensor([[3,2,1,4],[3,2,4,1],[1,2,3,4],[2,3,4,1]])
k = 0
n = N()
opt = torch.optim.Adam(n.parameters(),lr=0.0001)
while k<10000:
print(inp.shape)
o = n(inp)
o = o.view(-1, o.size(-1))
out = out.view(-1)
loss = nn.functional.cross_entropy(o.view(-1,o.size(-1)),out.view(-1)-1)
acc = ((torch.argmax(o, dim=1) == (out -1)).sum().item() / out.size(0))
if k==10000:
print(torch.argmax(o, dim=1))
print(out-1)
exit()
print(loss,acc)
loss.backward()
opt.step()
opt.zero_grad()
k+=1
print(o[0])
Shrinked Output:
torch.Size([4, 4])
tensor(0.9593, grad_fn=<NllLossBackward>) 0.9375
torch.Size([4, 4])
tensor(0.9593, grad_fn=<NllLossBackward>) 0.9375
tensor([4.8500e-01, 9.7813e-06, 5.1498e-01, 6.2428e-06, 7.5929e-06],
grad_fn=<SelectBackward>)
The Loss is 0.9593 and accuracy reached up to 0.9375. For this simple input data, the GRU loss is this big. What is the reason? Is there anything wrong in this code? I used cross_entropy as loss function and Adam as the optimizer. Learning rate is 0.001. I tried multiple learning rates but all gave the same final result. I added batch normalization, it speed up the training, but the same loss and accuracy. Why loss does not decrease up to 0.2 or something.
I think it's because you are using cross entropy loss function which in PyTorch combines log-softmax and negative log likelihood. Since your model already performs softmax before returning the output, you actually end up calculating the negative log likelihood for softmax of softmax. Try removing the final softmax from your model.
PyTorch documentation for cross entropy loss: https://pytorch.org/docs/stable/nn.functional.html#cross-entropy

Loss going to NaN after few iterations

In my model, the input is a graph data in the form of edge-index and the node features. After a few iterations of training on graph data, loss (EDIT: which is a combination of MSELoss function and a negative loss function i.e., L1 + (-L2)) becomes NaN. Both L1 and -L2 become NaN after around 40 iterations.
Learning rate = 0.00001. I also checked for invalid input data also, but found none.
from torch.nn.parameter import Parameter
from torch.nn.modules.module import Module
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
import networkx as nx
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data
class Model(nn.Module):
def __init__(self, nin, nhid1, nout, inp_l, hid_l, out_l=1):
super(Model, self).__init__()
self.g1 = GCNConv(in_channels= nin, out_channels= nhid1)
self.g2 = GCNConv(in_channels= nhid1, out_channels= nout)
self.dropout = 0.5
self.lay1 = nn.Linear(inp_l ,hid_l)
self.lay2 = nn.Linear(hid_l ,out_l)
def forward(self, x, adj):
x = F.relu(self.g1(x, adj))
x = F.dropout(x, self.dropout, training=self.training)
x = self.g2(x, adj)
x = self.lay1(x)
x = F.relu(x)
x = self.lay2(x)
x = F.relu(x)
return x
The inputs to the model:
x (Tensor , optional ) – Node feature matrix with shape [num_nodes, num_node_features].
edge_index (LongTensor , optional ) – Graph connectivity in COO format with shape [2, num_edges]
Here num_nodes=1000 ; num_node_features=1 ; num_edges = 5000
GCNConv is a graph embedder returns a [num_nodes, dim] matrix. It takes in the edge-list and the features to return a matrix.
EDIT 2: Added how the loss is calculated
def train_model(epoch):
model= Model(nin = 1, nhid1=128, nout=128, inp_l=128, hid_l=64, out_l=1).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.00001)
model.train()
t = time.time()
optimizer.zero_grad()
Y = model(features, adjacency_list)
Y1 = func(Y) #Y1 values are calculated from Y by passing through a function func to obtain a same sized vector as Y
loss1 = ((Y1-Y)**2).mean() #MSE Loss function
loss2 = -Y.abs().mean() # This loss is implemented to prevent Y values going to 0. Notice the "-" sign
loss_train = loss1 + loss2
loss_train.backward(retain_graph=True)
nn.utils.clip_grad_norm_(model.parameters(), 0.5)
optimizer.step()
if epoch%20==0:
print("MSE loss = ",loss1,"\t","Mean Loss = ",loss2)
print('Epoch: {:04d}'.format(epoch+1),
'loss_train: {:.4f}'.format(loss_train.item()),
'time: {:.4f}s'.format(time.time() - t))
print("\n\n")
return Y

Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

I have been trying to do L2 regularization on a binary classification model in PyTorch but when I match the results of PyTorch and scratch code it doesn't match,
Pytorch code:
class LogisticRegression(nn.Module):
def __init__(self,n_input_features):
super(LogisticRegression,self).__init__()
self.linear=nn.Linear(4,1)
self.linear.weight.data.fill_(0.0)
self.linear.bias.data.fill_(0.0)
def forward(self,x):
y_predicted=torch.sigmoid(self.linear(x))
return y_predicted
model=LogisticRegression(4)
criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)
num_epochs=1000
for epoch in range(num_epochs):
for x,y in train_data:
y_pred=model(x)
loss=criterion(y_pred,y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Scratch Code:
def sigmoid(z):
s = 1/(1+ np.exp(-z))
return s
def yinfer(X, beta):
return sigmoid(beta[0] + np.dot(X,beta[1:]))
def cost(X, Y, beta, lam):
sum = 0
sum1 = 0
n = len(beta)
m = len(Y)
for i in range(m):
sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
for i in range(0, n):
sum1 = sum1 + beta[i]**2
return (-sum + (lam/2) * sum1)/(1.0*m)
def pred(X,beta):
if ( yinfer(X, beta) > 0.5):
ypred = 1
else :
ypred = 0
return ypred
beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
Y_prediction_train=np.zeros(len(Y_train))
Y_prediction_test=np.zeros(len(Y_test))
for l in range(len(Y_train)):
Y_prediction_train[l]=pred(X[l,:],beta)
for l in range(len(Y_test)):
Y_prediction_test[l]=pred(X_test[l,:],beta)
train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100
arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
temp_beta = np.zeros(len(beta))
''' main code from below '''
for j in range(n):
temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
for k in range(0, len(beta)):
temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*temp_beta
graph of the losses
graph of training accuracy
graph of testing accuracy
Can someone please tell me why this is happening?
L2 value=0.1
Great question. I dug a lot through PyTorch documentation and found the answer. The answer is very tricky. Basically there are two ways to calculate regulalarization. (For summery jump to the last section).
The PyTorch uses the first type (in which regularization factor is not divided by batch size).
Here's a sample code which demonstrates that:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
class model(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.linear.weight.data.fill_(1.0)
self.linear.bias.data.fill_(1.0)
def forward(self, x):
return self.linear(x)
model = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)
input = torch.tensor([[2], [4]], dtype=torch.float32)
target = torch.tensor([[7], [11]], dtype=torch.float32)
optimizer.zero_grad()
pred = model(input)
loss = F.mse_loss(pred, target)
print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')
print(f'\nMSEloss: {loss.item()}\n')
loss.backward()
print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
optimizer.step()
print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')
which outputs:
input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))
MSEloss: 26.0
Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------
Here m = batch size = 2, lr = alpha = 0.1, lambda = weight_decay = 1.
Now consider tensor weight which has value = 1 and grad = -32
case1(type1 regularization):
weight = weight - lr(grad + weight_decay.weight)
weight = 1 - 0.1(-32 + 1(1))
weight = 4.1
case2(type2 regularization):
weight = weight - lr(grad + (weight_decay/batch size).weight)
weight = 1 - 0.1(-32 + (1/2)(1))
weight = 4.15
From the output we can see that updated weight = 4.1000. That concludes PyTorch uses type1 regularization.
So finally In your code you are following type2 regularization. So just change some last lines to this:
# for k in range(0, len(beta)):
# temp_beta[k] = temp_beta[k] + lam * beta[k] #regularization here
temp_beta= temp_beta / (1.0*n)
beta = beta - alpha*(temp_beta + lam * beta)
And also PyTorch loss functions doesn't include regularization term(implemented inside optimizers) so also remove regularization terms inside your custom cost function.
In summary:
Pytorch use this Regularization function:
Regularization is implemented inside Optimizers (weight_decay parameter).
PyTorch Loss functions doesn't include Regularization term.
Bias is also regularized if Regularization is used.
To use Regularization try:
torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).

Function with torch.mm showing error while using torch.optim

I am kind of a newbie with PyTorch. Please forgive me if the question is childish. I am trying to minimize a function using PyTorch's optim. The function includes matrix multiplication. The details are given below.
First I have a tensor:
Xv.requires_grad_()
XT.requires_grad_()
My Objective Function:
def errorFun(x):
ax = x[0]
ay = x[1]
x0 = x[2]
y0 = x[3]
A = torch.tensor([[ax, 0., x0], [0., ay, y0], [0., 0., 1.]], dtype=torch.float64)
B = torch.tensor([[b11, b12, b13], [b21, b22, b23], [b31, b32, b33]], dtype=torch.float64)
H = torch.mm(A, B)
Ps = torch.mm(H, X)
px = Ps[0,:]
py = Ps[1,:]
PX = torch.stack([px, py], dim=0)
PX.requires_grad_()
return mseloss(PX, XT)
I am minimizing it:
for ii in range(n_optim_steps):
optimizer.zero_grad()
loss = errorFun(params)
#print('Step # {}, loss: {}'.format(ii, loss.item()))
loss.backward()
# Access gradient if necessary
grad = params.grad.data
optimizer.step()
But I am getting this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-54-84b874448a25> in <module>()
77 loss.backward()
78 # Access gradient if necessary
---> 79 grad = params.grad.data
80 optimizer.step()
81
AttributeError: 'NoneType' object has no attribute 'data'
Thanks in advance.
I am not sure I understand your task. But it seems you are not using pytorch the way it was designed to be used.
There are 5 things you have to have:
Data for training;
A task you are interested in per forming;
A parameterized model (eg a neural network);
A cost function to be minimized;
An optimizer;
Consider the simple example:
Data: vectors containing random numbers;
Task: sum the numbers of the vector;
Model: Linear regressor (i.e: 1 layer neural net)
Cost function: Mean Squared Error
Optimizer: Stochastic Gradient Descent
The implementation:
import torch
import torch.nn as nn
from torch.optim import SGD
input_size = 5
model = nn.Linear(input_size, 1)
opt = SGD(model.parameters(), lr=0.01)
loss_func = nn.MSELoss()
for _ range(100):
data = torch.rand(batch_size, input_size)
target = data.sum(dim=1)
opt.zero_grad()
pred = model(data)
loss = loss_func(pred, target)
loss.backward()
opt.step()

Categories