This question already has answers here:
PyTorch NotImplementedError in forward
(3 answers)
Closed 1 year ago.
i'm having problems traing to practice the Logistic Regression in pytorch.
I want to use the CIFAR10 dataset but i cant make the training loop because when i can excecute the Linnear function y recived an NotImplementedError
I probably have more than one error that I am not seeing because as I said I am learning.
I leave my code here.
import numpy as np
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms
import torch.nn.functional as F
from tqdm import tqdm
import torch.nn as nn
#IMPORTING DATA
datatest = mnist_train = datasets.CIFAR10(root="./datasets",
train=True,
transform=transforms.ToTensor(),
download=True)
datatrain = datasets.CIFAR10(root="./datasets",
train=False,
transform=transforms.ToTensor(),
download=True)
print (f'Number of CIFAR test examples {len(datatest)}')
print (f'Number of CIFAR train examples {len(datatest)}')
train_loader = torch.utils.data.DataLoader(datatrain, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(datatest, batch_size=100, shuffle=False)
data_train_iter = iter(train_loader)
images, labels = data_train_iter.next()
print("Shape of the minibatch of images: {}".format(images.shape))
print("Shape of the minibatch of labels: {}".format(labels.shape))
#n_samples, n_features = images.shape, labels.shape
#print(n_samples, n_features)
#MODEL
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = nn.Linear(3072, 10)
def foward(self, x):
return self.linear(x)
#Inicializate model
model = Model()
#Criterion
criterion= nn.CrossEntropyLoss()
#Optimizer
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(),
lr=learning_rate)
# Iterate through train set minibatchs
for images, labels in tqdm(train_loader):
# Zero out the gradients
optimizer.zero_grad()
# Forward pass
x = images.view(-1, 32*32*3)
y = model(x)
loss = criterion(y, labels)
loss.backward()
optimizer.step()
## Testing
correct = 0
total = len(datatest)
with torch.no_grad():
# Iterate through test set minibatchs
for images, labels in tqdm(test_loader):
# Forward pass
x = images.view(-1, 32*32*3)
y = model(x)
predictions = torch.argmax(y, dim=1)
correct += torch.sum((predictions == labels).float())
print('Test accuracy: {}'.format(correct/total))
Thanks!
It is due to a spelling error of forward in your Model class. You have written it as foward. Please correct the spelling in
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = nn.Linear(3072, 10)
def forward(self, x): # You have written it as `foward`
return self.linear(x)
Related
I am trying to make a MLP classifier in PyTorch. The error is produced from the code in the final chunk. I'm not sure why numpy is even involved with this, can someone please point me in the right direction.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
data = ImageFolder(data_dir, transform=transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]))
trainloader = torch.utils.data.DataLoader(data, batch_size=600,shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(data, batch_size=150,shuffle=True, num_workers=2)
dataiter = iter(trainloader)
x_train, y_train = dataiter.next()
x_train = x_train.view(600,-1).to('cpu').to(device)
y_train = y_train.to('cpu').to(device)
class Net(torch.nn.Module):
def __init__(self):
super().__init__()
self.layer1 = torch.nn.Linear(150528, 9408)
self.layer2 = torch.nn.Linear(9408, 3)
def forward(self, x):
# here we define the (forward) computational graph,
# in terms of the tensors, and elt-wise non-linearities
x = F.relu(self.layer1(x))
x = self.layer2(x)
return x
def train_show(network, data, targ, lossFunc, optimiser, epochs):
lossHistory = [] # just to show a plot later...
accuHistory = []
for t in range(epochs):
optimiser.zero_grad() # Gradients accumulate by default, so don't forget to do this.
y = network.forward(data) # the forward pass
loss = lossFunc(y,targ) # recompute the loss
loss.backward() # runs autograd, to get the gradients needed by optimiser
optimiser.step() # take a step
# just housekeeping and reporting
accuracy = torch.mean((torch.argmax(y,dim=1) == targ).float())
lossHistory.append(loss.detach().item())
accuHistory.append(accuracy.detach())
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.plot(lossHistory,'r'); plt.title("loss"); plt.xlabel("epochs")
plt.subplot(1,2,2)
plt.plot(accuHistory,'b'); plt.title("accuracy")
net = Net().to('cpu').to(device)
lossFunction = torch.nn.CrossEntropyLoss().to('cpu').to(device)
optimiser = torch.optim.SGD(net.parameters(), lr=0.01)
train_show(net, x_train, y_train, lossFunction, optimiser, 50)
The plotting function you are using, plt.plot, works on numpy arrays and not on torch.tensors. Therefore, accHistory is being converted to numpy array and failed there.
Please see this answer for more details.
I state that I am new on PyTorch. I wrote this simple program for binary classification. I also created the CSV with two columns of random values, with the "ok" column whose value is 1 only if the other two values are included between two values I decided at the same time. Example:
diam_int,diam_est,ok
37.782,125.507,0
41.278,115.15,1
42.248,115.489,1
29.582,113.141,0
37.428,107.247,0
32.947,123.233,0
37.146,121.537,0
38.537,110.032,0
26.553,113.752,0
27.369,121.144,0
41.632,108.178,0
27.655,111.279,0
29.779,109.268,0
43.695,115.649,1
44.587,116.126,0
It seems to me all done correctly, loss actually lowers (it comes back up slightly after many epochs but I don't think it's a problem), but when I try to test my Net after the training, with a sample batch of the trainset, what I got is always a prediction below 0.5 (so always 0 as estimated output) with a completely random trend.
with torch.no_grad():
pred = net(trainSet[10])
trueVal = ySet[10]
for i in range(len(trueVal)):
print(trueVal[i], pred[i])
Here is my Net class:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self) :
super().__init__()
self.fc1 = nn.Linear(2, 32)
self.fc2 = nn.Linear(32, 64)
self.fc3 = nn.Linear(64, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return torch.sigmoid(x)
Here is my Main class:
import torch
import torch.optim as optim
import torch.nn.functional as F
import pandas as pd
from net import Net
df = pd.read_csv("test.csv")
y = torch.Tensor(df["ok"])
ySet = torch.split(y, 32)
df.drop(["ok"], axis=1, inplace=True)
data = F.normalize(torch.Tensor(df.values), dim=1)
trainSet = torch.split(data, 32)
net = Net()
optimizer = optim.Adam(net.parameters(), lr=0.001)
lossFunction = torch.nn.BCELoss()
EPOCHS = 300
for epoch in range(EPOCHS):
for i, X in enumerate(trainSet):
optimizer.zero_grad()
output = net(X)
target = ySet[i].reshape(-1, 1)
loss = lossFunction(output, target)
loss.backward()
optimizer.step()
if epoch % 20 == 0:
print(loss)
What am I doing wrong? Thanks in advance for the help
Your model is underfit. Increasing the number of epochs to (say) 3000 makes the model predict perfectly on the examples you showed.
However after this many epochs the model may be overfit. A good practice is to use validation data (separate the generated data into train and validation sets), and check the validation loss in each epoch. When the validation loss starts increasing you start overfitting and stop the training.
I am working to create a Graph Neural Network (GNN) which can create embeddings of the input graph for its usage in other applications like Reinforcement Learning.
I have started with example from the spektral library TUDataset classification with GIN and modified it to divide the network into two parts. The first part to produce embeddings and second part to produce classification. My goal is to train this network using supervised learning on dataset with graph labels e.g. TUDataset and use the first part (embedding generation) once trained in other applications.
I am getting different results from my approach in two different datasets. The TUDataset shows improved loss and accuracy with this new approach whereas the other other local dataset shows significant increase in the loss.
Can I get any feedback if my approach to create embedding is appropriate or any suggestions for further improvement?
here is my code used to generate graph embeddings:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from spektral.data import DisjointLoader
from spektral.datasets import TUDataset
from spektral.layers import GINConv, GlobalAvgPool
################################################################################
# PARAMETERS
################################################################################
learning_rate = 1e-3 # Learning rate
channels = 128 # Hidden units
layers = 3 # GIN layers
epochs = 300 # Number of training epochs
batch_size = 32 # Batch size
################################################################################
# LOAD DATA
################################################################################
dataset = TUDataset("PROTEINS", clean=True)
# Parameters
F = dataset.n_node_features # Dimension of node features
n_out = dataset.n_labels # Dimension of the target
# Train/test split
idxs = np.random.permutation(len(dataset))
split = int(0.9 * len(dataset))
idx_tr, idx_te = np.split(idxs, [split])
dataset_tr, dataset_te = dataset[idx_tr], dataset[idx_te]
loader_tr = DisjointLoader(dataset_tr, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(dataset_te, batch_size=batch_size, epochs=1)
################################################################################
# BUILD MODEL
################################################################################
class GIN0(Model):
def __init__(self, channels, n_layers):
super().__init__()
self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.convs = []
for _ in range(1, n_layers):
self.convs.append(
GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
)
self.pool = GlobalAvgPool()
self.dense1 = Dense(channels, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.conv1([x, a])
for conv in self.convs:
x = conv([x, a])
x = self.pool([x, i])
return self.dense1(x)
# Build model
model = GIN0(channels, layers)
model_op = Sequential()
model_op.add(Dropout(0.5, input_shape=(channels,)))
model_op.add(Dense(n_out, activation="softmax"))
opt = Adam(lr=learning_rate)
loss_fn = CategoricalCrossentropy()
################################################################################
# FIT MODEL
################################################################################
#tf.function(input_signature=loader_tr.tf_signature(), experimental_relax_shapes=True)
def train_step(inputs, target):
with tf.GradientTape(persistent=True) as tape:
node2vec = model(inputs, training=True)
predictions = model_op(node2vec, training=True)
loss = loss_fn(target, predictions)
loss += sum(model.losses)
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
gradients2 = tape.gradient(loss, model_op.trainable_variables)
opt.apply_gradients(zip(gradients2, model_op.trainable_variables))
acc = tf.reduce_mean(categorical_accuracy(target, predictions))
return loss, acc
print("Fitting model")
current_batch = 0
model_lss = model_acc = 0
for batch in loader_tr:
lss, acc = train_step(*batch)
model_lss += lss.numpy()
model_acc += acc.numpy()
current_batch += 1
if current_batch == loader_tr.steps_per_epoch:
model_lss /= loader_tr.steps_per_epoch
model_acc /= loader_tr.steps_per_epoch
print("Loss: {}. Acc: {}".format(model_lss, model_acc))
model_lss = model_acc = 0
current_batch = 0
################################################################################
# EVALUATE MODEL
################################################################################
def tolist(predictions):
result = []
for item in predictions:
result.append((float(item[0]), float(item[1])))
return result
loss_data = []
print("Testing model")
model_lss = model_acc = 0
for batch in loader_te:
inputs, target = batch
node2vec = model(inputs, training=False)
predictions = model_op(node2vec, training=False)
predictions_list = tolist(predictions)
loss_data.append(zip(target,predictions_list))
model_lss += loss_fn(target, predictions)
model_acc += tf.reduce_mean(categorical_accuracy(target, predictions))
model_lss /= loader_te.steps_per_epoch
model_acc /= loader_te.steps_per_epoch
print("Done. Test loss: {}. Test acc: {}".format(model_lss, model_acc))
for batchi in loss_data:
for item in batchi:
print(list(item),'\n')
Your approach to generate graph embeddings is correct, the GIN0 model will return a vector given a graph.
This code here, however, seems weird:
gradients = tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(gradients, model.trainable_variables))
gradients2 = tape.gradient(loss, model_op.trainable_variables)
opt.apply_gradients(zip(gradients2, model_op.trainable_variables))
What you're doing here is that you're updating the weights of model twice, and the weights of model_op once.
When you compute the loss in the context of a tf.GradientTape, all computations that went into computing the final value are tracked. This means that if you call loss = foo(bar(x)) and then compute the training step using that loss, the weights of both foo and bar will be updated.
Besides this, I don't see issues with the code so it will mostly depend on the local dataset that you are using.
Cheers
I am wondering what I am doing wrong when looking to see how the weights changed during training.
My loss goes down considerably but it appears that the initialized weights are the same as trained weights. Am I looking in the wrong location? I would appreciate any insight that you might have!
import torch
import numpy as np
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F
# setup GPU/CPU processing
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# initialize model
class mlp1(torch.nn.Module):
def __init__(self, num_features, num_hidden, num_classes):
super(mlp1, self).__init__()
self.num_classes = num_classes
self.input_layer = torch.nn.Linear(num_features, num_hidden)
self.out_layer = torch.nn.Linear(num_hidden, num_classes)
def forward(self, x):
x = self.input_layer(x)
x = torch.sigmoid(x)
logits = self.out_layer(x)
probas = torch.softmax(logits, dim=1)
return logits, probas
# instantiate model
model = mlp1(num_features=28*28, num_hidden=100, num_classes=10).to(device)
# check initial weights
weight_check_pre = model.state_dict()['input_layer.weight'][0][0:25]
# optim
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
# download data
train_dataset = datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)
# data loader
train_dataloader = DataLoader(dataset=train_dataset,
batch_size=100,
shuffle=True)
# train
NUM_EPOCHS = 1
for epoch in range(NUM_EPOCHS):
model.train()
for batch_idx, (features, targets) in enumerate(train_dataloader):
# send data to device
features = features.view(-1, 28*28).to(device)
targets = targets.to(device)
# forward
logits, probas = model(features)
# loss
loss = F.cross_entropy(logits, targets)
optimizer.zero_grad()
loss.backward()
# now update weights
optimizer.step()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %03d/%03d | Loss: %.4f'
%(epoch+1, NUM_EPOCHS, batch_idx,
len(train_dataloader), loss))
# check post training
weight_check_post = model.state_dict()['input_layer.weight'][0][0:25]
# compare
weight_check_pre == weight_check_post # all equal
That is because both variables are referencing the same object (dictionary) in memory and so will always equal to each other.
You can do this to get actual copies of the state_dict.
import copy
# check initial weights
weight_check_pre = copy.deepcopy(model.state_dict()['input_layer.weight'][0][0:25])
...
# check post training
weight_check_post = copy.deepcopy(model.state_dict()['input_layer.weight'][0][0:25])
i can't understand how in function train() below, the variable (data, target) are choosen.
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location`
i guess they are 2 tensor representing 2 random images of dataset train, but then the loss function
loss = F.nll_loss(output, target)
is calculated at every interaction with different target?
Also i have different question: i trained the network with images of cats, then i test it with images of cars and the accuracy reached is 97%. How is this possible? is a proper value or i'm doing something wrong?
here is the entire code:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import syft as sy # <-- NEW: import the Pysyft library
hook = sy.TorchHook(torch) # <-- NEW: hook PyTorch ie add extra functionalities to support Federated Learning
bob = sy.VirtualWorker(hook, id="bob") # <-- NEW: define remote worker bob
alice = sy.VirtualWorker(hook, id="alice") # <-- NEW: and alice
class Arguments():
def __init__(self):
self.batch_size = 64
self.test_batch_size = 1000
self.epochs = 2
self.lr = 0.01
self.momentum = 0.5
self.no_cuda = False
self.seed = 1
self.log_interval = 30
self.save_model = False
args = Arguments()
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
federated_train_loader = sy.FederatedDataLoader( # <-- this is now a FederatedDataLoader
datasets.MNIST("C:\\users...\\train", train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
.federate((bob, alice)), # <-- NEW: we distribute the dataset across all the workers, it's now a FederatedDataset
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST("C:\\Users...\\test", train=False, download=True, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, federated_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
model.send(data.location) # <-- NEW: send the model to the right location
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
model.get() # <-- NEW: get the model back
if batch_idx % args.log_interval == 0:
loss = loss.get() # <-- NEW: get the loss back
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * args.batch_size, len(federated_train_loader) * args.batch_size,
100. * batch_idx / len(federated_train_loader), loss.item()))
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr) # TODO momentum is not supported at the moment
for epoch in range(1, args.epochs + 1):
train(args, model, device, federated_train_loader, optimizer, epoch)
test(args, model, device, test_loader)
if (args.save_model):
torch.save(model.state_dict(), "mnist_cnn.pt")
Consider it like this. When you hook torch, all your torch tensors will get additional functionality - methods like .send(), .federate(), and attributes like .location and ._objects. Your data and target, which were once torch tensors, became pointers to tensors residing in different VirtualWorker objects due to .federate((bob, alice)).
Now data and target have additional attributes that includes .location, which will return the location of that tensor - data/target pointed by the pointer called data/target.
Federated learning sends the global model to this location, as seen in model.send(data.location).
Now, model is a pointer residing at the same location and data is also a pointer residing there. Hence when you take the output as output = model(data), output will also reside there and all we (the central server or in other words, the VirtualWorker called 'me') will get is a pointer to that output.
Now, regarding your doubt on loss calculation, since output and target are both residing in that same location, calculation of loss will also happen there. Same goes for backprop and step.
Finally, you can see model.get(), here is where the central server pulls the remote model using the pointer called model. (I'm not sure if it should be model = model.get() though).
So anything with .get() will be pulled from that worker and will be returned in our python statement. Also note that .get() will remove that object from it's location when called. Hence use .copy().get() if you are going to need it further.