How to train network on images of different sizes Pytorch - python

I am trying to feed the Neural network dataset of images and I am getting this error
I don't know what might be the cause as all the images have different sizes
I have also tried to change batch sizes and kernels but I had no success with this.
File "c:\Users\david\Desktop\cs_agent\main.py", line 49, in <module>
for i, data in enumerate(train_loader, 0):
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
data = self._next_data()
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\fetch.py", line 52, in fetch
return self.collate_fn(data)
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in default_collate
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 172, in <listcomp>
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\data\_utils\collate.py", line 138, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 300, 535] at entry 0 and [3, 1080, 1920] at entry 23
this is my main file
import numpy as np
import matplotlib.pyplot as plt
import torch
import dataset
import os
from torch.utils.data import DataLoader
import torch.nn as nn
import torchvision
import check_device
import neural_network
import torch.optim as optim
EPS = 1.e-7
LR=0.5
WEIGHT_DECAY=0.5
batch_size =50
#DATA LOADING ###################################################################################################################
test_dataset =dataset.csHeadBody(csv_file="images\\test_labels.csv",root_dir="images\\test")
train_dataset =dataset.csHeadBody(csv_file="images\\train_labels.csv",root_dir="images\\train")
train_loader =DataLoader(dataset =train_dataset,batch_size=batch_size,shuffle=True)
test_loader =DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)
#DATA LOADING ###################################################################################################################END
#NEURAL NET #####################################################################################################################################################
net=neural_network.Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
#NEURAL NET END ######################################################################################
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
print(data)
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
and this is my dataset file
class csHeadBody(Dataset):
def __init__(self, csv_file, root_dir, transform=None, target_transform=None):
self.img_labels = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels)
def __getitem__(self, idx):
img_path = os.path.join(self.root_dir, self.img_labels.iloc[idx, 0])
image = read_image(img_path)
label = self.img_labels.iloc[idx, 1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
this is my neural network architecture
import torch.nn.functional as F
import torch.nn as nn
import torch
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 535, 535)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

You need to adjust the parameters of your convolutional and linear layers. The first argument is the number of input channels (3 for standard RGB images in conv1), then the number of output channels and then the convolution kernel size. To clarify, I've used named arguments in the code below. The code works for images of a square input size of 224x224 pixels (standard imagenet size, adjust if needed). If you want image size agnostic code you could use something like global average pooling (mean of each channel in the last conv layer). The net below supports both:
class Net(nn.Module):
def __init__(self, use_global_average_pooling: bool = False):
super().__init__()
self.use_global_average_pooling = use_global_average_pooling
self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
self.pool = nn.MaxPool2d(kernel_size=(2, 2))
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
if use_global_average_pooling:
self.fc_gap = nn.Linear(64, 10)
else:
self.fc_1 = nn.Linear(54 * 54 * 64, 84) # 54 img side times 64 out channels from conv2
self.fc_2 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # img side: (224 - 2) // 2 = 111
x = self.pool(F.relu(self.conv2(x))) # img side: (111 - 2) // 2 = 54
if self.use_global_average_pooling:
# mean for global average pooling (mean over channel dimension)
x = x.mean(dim=(-1, -2))
x = F.relu(self.fc_gap(x))
else: # use all features
x = torch.flatten(x, 1)
x = F.relu(self.fc_1(x))
x = self.fc_2(x)
return x
Additionally, the torchvision.io.read_image function used in your Dataset returns an uint8 tensor with integer values from 0 to 255. You'll want floating point values for your network, so you have to divide the result by 255 to get values in the [0, 1] range. Furthermore, neural networks work best with normalized inputs (subtracting the mean and then dividing by the standard error of your training dataset). I've added normalization to the image transforms below. For convenience, it is using the imagenet mean and standard error, which should work fine if your images are similar to imagenet images (otherwise you can calculate them on your own images).
Note that the resizing might distort your images (doesn't keep the original aspect ratio). Often this is no problem, but if it is you might want to pad your images with a constant color (e.g. black) to resize them to the required dimensions (there are also transforms for this in the torchvision library).
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]
transforms = torchvision.transforms.Compose([
torchvision.transforms.Lambda(lambda x: x / 255.),
torchvision.transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
torchvision.transforms.Resize((224, 224)),
])
You might also need to adjust the code in your Dataset to load images as an RGB image (if they also have an alpha channel). This can be done like this:
image = read_image(img_path, mode=torchvision.io.image.ImageReadMode.RGB)
You can then initialise your Dataset using:
test_dataset = dataset.csHeadBody(csv_file="images\\test_labels.csv", root_dir="images\\test", transform=transforms)
train_dataset = dataset.csHeadBody(csv_file="images\\train_labels.csv", root_dir="images\\train", transform=transforms)
I haven't tested the code, let me know if it doesn't work!

Related

Making predictions on new images using a CNN in pytorch

I'm new in pytorch, and i have been stuck for a while on this problem. I have trained a CNN for classifying X-ray images. The images can be found in this Kaggle page https://www.kaggle.com/prashant268/chest-xray-covid19-pneumonia/ .
I managed to get good accuracy both on training and test data, but when i try to make predictions on new images i get the same (wrong class) output for every image. Here's my model in detail.
import os
import matplotlib.pyplot as plt
import numpy as np
import torch
import glob
import torch.nn.functional as F
import torch.nn as nn
from torchvision.transforms import transforms
from torch.utils.data import DataLoader
from torch.optim import Adam
from torch.autograd import Variable
import torchvision
import pathlib
from google.colab import drive
drive.mount('/content/drive')
epochs = 20
batch_size = 128
learning_rate = 0.001
#Data Transformation
transformer = transforms.Compose([
transforms.Resize((224,224)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])
#Load data with DataLoader
train_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Data/train'
test_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Data/test'
train_loader = DataLoader(torchvision.datasets.ImageFolder(train_path,transform = transformer), batch_size= batch_size, shuffle= True)
test_loader = DataLoader(torchvision.datasets.ImageFolder(test_path,transform = transformer), batch_size= batch_size, shuffle= False)
root = pathlib.Path(train_path)
classes = sorted([j.name.split('/')[-1] for j in root.iterdir()])
print(classes)
train_count = len(glob.glob(train_path+'/**/*.jpg')) + len(glob.glob(train_path+'/**/*.png')) + len(glob.glob(train_path+'/**/*.jpeg'))
test_count = len(glob.glob(test_path+'/**/*.jpg')) + len(glob.glob(test_path+'/**/*.png')) + len(glob.glob(test_path+'/**/*.jpeg'))
print(train_count,test_count)
#Create the CNN
class CNN(nn.Module):
def __init__(self):
super(CNN,self).__init__()
'''nout = [(width + 2*padding - kernel_size) / stride] + 1 '''
# [128,3,224,224]
self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 12, kernel_size = 5)
# [4,12,220,220]
self.pool1 = nn.MaxPool2d(2,2) #reduces the images by a factor of 2
# [4,12,110,110]
self.conv2 = nn.Conv2d(in_channels = 12, out_channels = 24, kernel_size = 5)
# [4,24,106,106]
self.pool2 = nn.MaxPool2d(2,2)
# [4,24,53,53] which becomes the input of the fully connected layer
self.fc1 = nn.Linear(in_features = (24 * 53 * 53), out_features = 120)
self.fc2 = nn.Linear(in_features = 120, out_features = 84)
self.fc3 = nn.Linear(in_features = 84, out_features = len(classes)) #final layer, output will be the number of classes
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = x.view(-1, 24 * 53 * 53)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
# Training the model
model = CNN()
loss_function = nn.CrossEntropyLoss() #includes the softmax activation function
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
n_total_steps = len(train_loader)
for epoch in range(epochs):
n_correct = 0
n_samples = 0
for i, (images, labels) in enumerate(train_loader):
# Forward pass
outputs = model(images)
_, predicted = torch.max(outputs, 1)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()
loss = loss_function(outputs, labels)
# Backpropagation and optimization
optimizer.zero_grad() #empty gradients
loss.backward()
optimizer.step()
acc = 100.0 * n_correct / n_samples
print(f'Epoch [{epoch+1}/{epochs}], Step [{i+1}/{n_total_steps}], Accuracy: {round(acc,2)} %, Loss: {loss.item():.4f}')
print('Done!!')
# Testing the model
with torch.no_grad():
n_correct = 0
n_samples = 0
n_class_correct = [0 for i in range(3)]
n_class_samples = [0 for i in range(3)]
for images, labels in test_loader:
outputs = model(images)
# max returns (value ,index)
_, predicted = torch.max(outputs, 1)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()
acc = 100.0 * n_correct / n_samples
print(f'Accuracy of the network: {acc} %')
torch.save(model.state_dict(),'/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/model.model')
For loading the model and trying to make predictions on new images, the code is as follows:
checkpoint = torch.load('/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/model.model')
model = CNN()
model.load_state_dict(checkpoint)
model.eval()
#Data Transformation
transformer = transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])
#Making preidctions on new data
from PIL import Image
def prediction(img_path,transformer):
image = Image.open(img_path).convert('RGB')
image_tensor = transformer(image)
image_tensor = image_tensor.unsqueeze_(0) #so img is not treated as a batch
input_img = Variable(image_tensor)
output = model(input_img)
#print(output)
index = output.data.numpy().argmax()
pred = classes[index]
return pred
pred_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Test_images/Data/'
test_imgs = glob.glob(pred_path+'/*')
for i in test_imgs:
print(prediction(i,transformer))
I'm guessing the problem must be in the way that i am preprocessing the data, although i cannot find my mistake. Any help will be deeply appreciated, since i have been stuck on this for a while now.
p.s. i can share my notebook as well, if it is of any help
Regarding your problem, I have a really good way to debug this to target where the problem most likely will be and so it will be really easy to fix your issue.
So, my debugging process would be based on the fact that your CNN performs well on the test set. Firstly set your test loader batch size to 1 temporarily. After that, One thing to do is in your test loop when you calculate the amount correct, you can run the following code:
#Your code
outputs = model(images) # Really only one image and 1 output.
#Altered Code:
correct = (predicted == labels).sum().item() # This will be either 1 or 0 since you have only one image per batch
# My new code:
if correct:
# if value is 1 instead of 0 then turn value into a single image with no batch size
single_correct_image = images.squeeze(0)
# Then convert tensor image into PIL image
pil_image = transforms.ToPILImage()(single_correct_image)
# Save the pil image to any directory specified in quotes.
pil_image = pil_image.save("/content")
#Terminate testing process. Ignore Value Error if it says terminating process
raise ValueError("terminating process")
Now you have an image saved to disk that you know is correct in the test set. The next step would be to open such image and run it to your predict function. Couple of things can happen and thus give info about your situation
If your model returns the wrong answer then there is something wrong with the different code you have within the prediction and testing code. One uses a torch.sum and torch.max the other uses np.argmax.Then you can use print statements to debug what is going on there. Perhaps some conversion error or your expectation of the output's format is different.
If your code return the right answer then your model is just failing to predict on new images. I suggest running more trial cases with the above process.
For additional reference, if you still get very stuck to the point where you feel like you can't solve it, then I suggest using this notebook to guide and give some suggestions on what code to atleast inspect.
https://www.kaggle.com/salvation23/xray-cnn-pytorch
Sarthak Jain

Dimensionality of tensor from my WaveNet incompatible with PyTorch cross_entropy function

I've been doing a project regarding making my own WaveNet implementation as Deepmind delivered early in the 2016's in Python.
Preprocessing includes mu law encoding, and one hot encoding. The model itself functions well, my problem lies in the loss function torch.nn.functional.cross_entropy used during training, found here: https://pytorch.org/docs/stable/nn.functional.html
Particularly, the relation between my output and my target tensors, namely
input_tensor.shape = tensor([1, 256, 225332]) # [batch_size, sample_size, audio_length]
output.shape = tensor([1, 256, 225332])
According to F.cross_entropy, I must have output = (N, C) and target = input_tensor = (N).
My supervisor told me to do the following:
output = output.T.reshape(-1, 256) = tensor([225332, 256])
target = input_tensor.T.long() = tensor([225332, 256, 1]) # This needs to be 1-dimensional, help?
For anyone interested in the explicit code, below:
NOTE - the receptive field is not padded, so just for debugging purposes I have subtracted it, while I do know this is not natural.
>>> output.T.reshape(-1, 256).shape
torch.Size([225332, 256])
>>> input_tensor[:, :, model.input_size - model.output_size:].T.shape
torch.Size([225332, 256, 1])
>>> loss = F.cross_entropy(output.T.reshape(-1, 256), input_tensor[:, :, model.input_size - model.output_size:].T.long().to(device))
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.3.3\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "C:\Users\JaQtae\anaconda3\envs\CortiGit\lib\site-packages\torch\nn\functional.py", line 2693, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "C:\Users\JaQtae\anaconda3\envs\CortiGit\lib\site-packages\torch\nn\functional.py", line 2388, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1D target tensor expected, multi-target not supported
Somewhat of a novice-in-training with ML and AI, particularly the PyTorch library.
Would appreciate any advice regarding how I should tackle this issue.
The training:
model = Wavenet(layers=3,blocks=2,output_size=32).to(device)
model.apply(initialize) # Initialize causalconv1d() with xavier_uniform_ weights and bias of 0.
model.train()
optimizer = optim.Adam(model.parameters(), lr=0.0003)
for i, batch in tqdm(enumerate(train_loader)):
mu_enc_my_x = encode_mu_law(x=batch, mu=256)
input_tensor = one_hot_encoding(mu_enc_my_x)
input_tensor = input_tensor.to(device)
output = model(input_tensor)
# TODO: Inspect input/output formats, maybe something wrong....
loss = F.cross_entropy(output.T.reshape(-1, 256), input_tensor[:,:,model.input_size - model.output_size:].long().to(device)) # subtract receptive field instead of pad it, workaround for quick debugging of loss-issue.
print("\nLoss:", loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 1000 == 0:
print("\nSaving model")
torch.save(model.state_dict(), "wavenet.pt")
The purpose is to get my loss function to work properly, so that I can generate sound files. The current ones with my bad loss function obviously return pure noise.
My full model if any help.
"""
Wavenet model
Sources:
https://github.com/kan-bayashi/PytorchWaveNetVocoder/blob/master/wavenet_vocoder/nets/wavenet.py
https://github.com/r9y9/wavenet_vocoder/blob/master/wavenet_vocoder/wavenet.py
https://github.com/Dankrushen/Wavenet-PyTorch/blob/master/wavenet/models.py
https://github.com/vincentherrmann/pytorch-wavenet
"""
from torch import nn
import torch
#TODO: Add local and global conditioning
def initialize(m):
"""
Initialize CNN with Xavier_uniform weight and 0 bias.
"""
if isinstance(m, torch.nn.Conv1d):
nn.init.xavier_uniform_(m.weight)
nn.init.constant_(m.bias, 0.0)
class CausalConv1d(torch.nn.Module):
"""
Causal Convolution for WaveNet
Causality can be introduced with padding as (kernel_size - 1) * dilation (see Keras documentation)
or it can be introduced as follows according to Golbin.
https://github.com/golbin/WaveNet/blob/05545339096c3a1d9909d96fb19da4fbae28d8c6/wavenet/networks.py#L38
Else, look at the following article, several ways to implement it using PyTorch:
https://github.com/pytorch/pytorch/issues/1333
- Jakob
"""
def __init__(self, in_channels, out_channels, kernel_size, dilation = 1, bias = True):
super(CausalConv1d, self).__init__()
# padding=1 for same size(length) between input and output for causal convolution
self.dilation = dilation
self.kernel_size = kernel_size
self.in_channels = in_channels
self.out_channels = out_channels
self.padding = padding = (kernel_size-1) * dilation # kernelsize = 2, -1 * dilation = 1, = 1. - Jakob.
self.conv = torch.nn.Conv1d(in_channels, out_channels,
kernel_size, padding=padding, dilation=dilation,
bias=bias) # Fixed for WaveNet but not sure
def forward(self, x):
output = self.conv(x)
if self.padding != 0:
output = output[:, :, :-self.padding]
return output
class Wavenet(nn.Module):
def __init__(self,
layers=3,
blocks=2,
dilation_channels=32,
residual_block_channels=512,
skip_connection_channels=512,
output_channels=256,
output_size=32,
kernel_size=3
):
super(Wavenet, self).__init__()
self.layers = layers
self.blocks = blocks
self.dilation_channels = dilation_channels
self.residual_block_channels = residual_block_channels
self.skip_connection_channels = skip_connection_channels
self.output_channels = output_channels
self.kernel_size = kernel_size
self.output_size = output_size
# initialize dilation variables
receptive_field = 1
init_dilation = 1
# List of layers and connections
self.dilations = []
self.residual_convs = nn.ModuleList()
self.filter_conv_layers = nn.ModuleList()
self.gate_conv_layers = nn.ModuleList()
self.skip_convs = nn.ModuleList()
# First convolutional layer
self.first_conv = CausalConv1d(in_channels=self.output_channels,
out_channels=residual_block_channels,
kernel_size = 2)
# Building the Modulelists for the residual blocks
for b in range(blocks):
additional_scope = kernel_size - 1
new_dilation = 1
for i in range(layers):
# dilations of this layer
self.dilations.append((new_dilation, init_dilation))
# dilated convolutions
self.filter_conv_layers.append(nn.Conv1d(in_channels=residual_block_channels, out_channels=dilation_channels, kernel_size=kernel_size, dilation=new_dilation))
self.gate_conv_layers.append(nn.Conv1d(in_channels=residual_block_channels, out_channels=dilation_channels, kernel_size=kernel_size, dilation=new_dilation))
# 1x1 convolution for residual connection
self.residual_convs.append(nn.Conv1d(in_channels=dilation_channels, out_channels=residual_block_channels, kernel_size=1))
# 1x1 convolution for skip connection
self.skip_convs.append(nn.Conv1d(in_channels=dilation_channels,
out_channels=skip_connection_channels,
kernel_size=1))
# Update receptive field and dilation
receptive_field += additional_scope
additional_scope *= 2
init_dilation = new_dilation
new_dilation *= 2
# Last two convolutional layers
self.last_conv_1 = nn.Conv1d(in_channels=skip_connection_channels,
out_channels=skip_connection_channels,
kernel_size=1)
self.last_conv_2 = nn.Conv1d(in_channels=skip_connection_channels,
out_channels=output_channels,
kernel_size=1)
#Calculate model receptive field and the required input size for the given output size
self.receptive_field = receptive_field
self.input_size = receptive_field + output_size - 1
def forward(self, input):
# Feed first convolutional layer with input
x = self.first_conv(input)
# Initialize skip connection
skip = 0
# Residual block
for i in range(self.blocks * self.layers):
(dilation, init_dilation) = self.dilations[i]
# Residual connection bypassing dilated convolution block
residual = x
# input to dilated convolution block
filter = self.filter_conv_layers[i](x)
filter = torch.tanh(filter)
gate = self.gate_conv_layers[i](x)
gate = torch.sigmoid(gate)
x = filter * gate
# Feed into 1x1 convolution for skip connection
s = self.skip_convs[i](x)
#Adding skip & Match size with decreasing dimensionality of x
if skip is not 0:
skip = skip[:, :, -s.size(2):]
skip = s + skip # Sum all skip connections
# Feed into 1x1 convolution for residual connection
x = self.residual_convs[i](x)
#Adding Residual & Match size with decreasing dimensionality of x
x = x + residual[:, :, dilation * (self.kernel_size - 1):]
# print(x.shape)
x = torch.relu(skip)
#Last conv layers
x = torch.relu(self.last_conv_1(x))
x = self.last_conv_2(x)
soft = torch.nn.Softmax(dim=1)
x = soft(x)
return x
EDIT: added code snippet of train for clarity, and full model

Image Generator for 3D volumes in keras with data augmentation

Since the ImageDataGenerator by keras is not suitable for 3D volumes, I started to write my own generator for keras (semantic segmentation, not classification!).
1) If there is anybody out there that has adapted the ImageDataGenerator code to work with 3D volumes, please share it! This guy has done it for videos.
2) According to this tutorial I wrote a custom generator.
import glob
import os
import keras
import numpy as np
import skimage
from imgaug import augmenters as iaa
class DataGenerator(keras.utils.Sequence):
"""Generates data for Keras"""
"""This structure guarantees that the network will only train once on each sample per epoch"""
def __init__(self, list_IDs, im_path, label_path, batch_size=4, dim=(128, 128, 128),
n_classes=4, shuffle=True, augment=False):
'Initialization'
self.dim = dim
self.batch_size = batch_size
self.list_IDs = list_IDs
self.im_path = im_path
self.label_path = label_path
self.n_classes = n_classes
self.shuffle = shuffle
self.augment = augment
self.on_epoch_end()
def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.list_IDs) / self.batch_size))
def __getitem__(self, index):
'Generate one batch of data'
# Generate indexes of the batch
indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
# Generate data
X, y = self.__data_generation(list_IDs_temp)
return X, y
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle == True:
np.random.shuffle(self.indexes)
def __data_generation(self, list_IDs_temp):
if self.augment:
pass
if not self.augment:
X = np.empty([self.batch_size, *self.dim])
Y = np.empty([self.batch_size, *self.dim, self.n_classes])
# Generate data
for i, ID in enumerate(list_IDs_temp):
img_X = skimage.io.imread(os.path.join(im_path, ID))
X[i,] = img_X
img_Y = skimage.io.imread(os.path.join(label_path, ID))
Y[i,] = keras.utils.to_categorical(img_Y, num_classes=self.n_classes)
X = X.reshape(self.batch_size, *self.dim, 1)
return X, Y
params = {'dim': (128, 128, 128),
'batch_size': 4,
'im_path': "some/path/for/the/images/",
'label_path': "some/path/for/the/label_images",
'n_classes': 4,
'shuffle': True,
'augment': True}
partition = {}
im_path = "some/path/for/the/images/"
label_path = "some/path/for/the/label_images/"
images = glob.glob(os.path.join(im_path, "*.tif"))
images_IDs = [name.split("/")[-1] for name in images]
partition['train'] = images_IDs
training_generator = DataGenerator(partition['train'], **params)
My images have the size (128, 128, 128) and when I load them in I get a 5D tensor of size (batch_size, depth, heigt, width, number_of_channels), e.g. (4, 128, 128, 128, 1). For the label_images (which have the same dimensions and are single channel coded (value 1 = label 1, value 2 = label 2, value 3 = label 3 and value 0 = label 4 or background)) I get a binary representation of the labels with the to_categorical() function from keras and end up with a 5D, e.g. (4, 128, 128, 128, 4). The images and label_images have the same name and are located in different folders.
As I only have very few images, I would like to extend the total number of images through image augmentation. How would I do that with this generator? I have successfully tested the imgaug package, but instead of adding images to my set I only transform the existing images (e.g. flip them horizontally)
Edit: I was in misconception regarding data augmentation. See this article about image augmentation. Images will be passed in with random transformations (on-the-fly). Now I just have to gather enough data and set the parameters with imgaug. I will update this soon.
I found an implementation of a Keras customDataGenerator for 3D volume. Here is a GitHub link. The implementation can easily be expanded to include new augmentation techniques. Here is a minimal working example I am working in my project (3D volume semantic segmentation) based in the implementation I shared in the link:
from generator import customImageDataGenerator
def generator(images, groundtruth, batch):
"""Load a batch of augmented images"""
gen = customImageDataGenerator(mirroring=True,
rotation_90=True,
transpose_axes=True
)
for b in gen.flow(x=images, y=groundtruth, batch_size=batch):
yield (b[0], (b[1]).astype(float))
# images = (123, 48,48,48,1)
# groundtruth = (123, 48,48,48,1)
history = model.fit(
x=generator(images, groundtruth, batchSize),
validation_data=(imagesTest, groundtruthTest),
steps_per_epoch=len(images) / batchSize,
epochs=epochs,
callbacks=[callbacks],
)

What transformation do I need to do in order to run dataset through neural network?

I'm new to deep learning and Pytorch, but I hope someone can help me out with this. My dataset contains images from different sizes. I'm trying to create a simple neural network that can classify images. However, I'm getting mismatch errors.
Neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3)
self.conv2 = nn.Conv2d(32, 32, 3)
self.fc1 = nn.Linear(32 * 3 * 3, 200)
self.fc2 = nn.Linear(200, 120)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
net = Net()
My first convolution layer has 1 input channel, because I transform the images to grayscale images. 32 output channels was an arbitrary decision. The final fully-connected layer has 120 output channels, because there are 120 different classes.
Determine transformations and assign training set and validation set
transform = transforms.Compose(
[transforms.Grayscale(1),
transforms.RandomCrop((32,32)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
data_dir = 'dataset'
full_dataset = datasets.ImageFolder(os.path.join(data_dir, 'train'), transform = transform)
train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
trainset, valset = torch.utils.data.random_split(full_dataset, [train_size, val_size])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
valloader = torch.utils.data.DataLoader(valset, batch_size=4,
shuffle=False, num_workers=2)
classes = full_dataset.classes
I transform the images to grayscale, because they are gray anyway. I crop the images to 32, because the images have different sizes and I figured that they must all be the same size when putting it through the neural network. Everything is working fine so far.
Train neural network
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
When running this last piece of code, I get the following error: size mismatch, m1: [3584 x 28], m2: [288 x 200] at /Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/TH/generic/THTensorMath.cpp:2070 when the following line is being executed: outputs = net(inputs)
My code is a variation of the code provided in this Pytorch tutorial. Can someone tell me what I'm doing wrong?
UPDATE
I updated the neural network class to this:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
But now I get an error at loss = criterion(outputs, labels):
Assertion cur_target >= 0 && cur_target < n_classes' failed. at /Users/soumith/miniconda2/conda-bld/pytorch_1532623076075/work/aten/src/THNN/generic/ClassNLLCriterion.c:93
In your first configuration, you have configured self.fc1 incorrectly. You need the input to be of dimensions 32 * 28 * 28 instead of 32 * 3 * 3 as your images are 32 * 32 and kernel and stride are 3 and 1 respectively. See this video for a simpler explanation. Try adjusting your second configuration yourself now, if you can't, comment below.

TensorFlow model gets zero loss

import tensorflow as tf
import numpy as np
import os
import re
import PIL
def read_image_label_list(img_directory, folder_name):
# Input:
# -Name of folder (test\\\\train)
# Output:
# -List of names of files in folder
# -Label associated with each file
cat_label = 1
dog_label = 0
filenames = []
labels = []
dir_list = os.listdir(os.path.join(img_directory, folder_name)) # List of all image names in 'folder_name' folder
# Loop through all images in directory
for i, d in enumerate(dir_list):
if re.search("train", folder_name):
if re.search("cat", d): # If image filename contains 'Cat', then true
labels.append(cat_label)
else:
labels.append(dog_label)
filenames.append(os.path.join(img_dir, folder_name, d))
return filenames, labels
# Define convolutional layer
def conv_layer(input, channels_in, channels_out):
w_1 = tf.get_variable("weight_conv", [5,5, channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_1 = tf.get_variable("bias_conv", [channels_out], initializer=tf.zeros_initializer())
conv = tf.nn.conv2d(input, w_1, strides=[1,1,1,1], padding="SAME")
activation = tf.nn.relu(conv + b_1)
return activation
# Define fully connected layer
def fc_layer(input, channels_in, channels_out):
w_2 = tf.get_variable("weight_fc", [channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_2 = tf.get_variable("bias_fc", [channels_out], initializer=tf.zeros_initializer())
activation = tf.nn.relu(tf.matmul(input, w_2) + b_2)
return activation
# Define parse function to make input data to decode image into
def _parse_function(img_path, label):
img_file = tf.read_file(img_path)
img_decoded = tf.image.decode_image(img_file, channels=3)
img_decoded.set_shape([None,None,3])
img_decoded = tf.image.resize_images(img_decoded, (28, 28), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img_decoded = tf.image.per_image_standardization(img_decoded)
img_decoded = tf.cast(img_decoded, dty=tf.float32)
label = tf.one_hot(label, 1)
return img_decoded, label
tf.reset_default_graph()
# Define parameterspe
EPOCHS = 10
BATCH_SIZE_training = 64
learning_rate = 0.001
img_dir = 'C:/Users/tharu/PycharmProjects/cat_vs_dog/data'
batch_size = 128
# Define data
features, labels = read_image_label_list(img_dir, "train")
# Define dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels)) # Takes slices in 0th dimension
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size)
iterator = dataset.make_initializable_iterator()
# Get next batch of data from iterator
x, y = iterator.get_next()
# Create the network (use different variable scopes for reuse of variables)
with tf.variable_scope("conv1"):
conv_1 = conv_layer(x, 3, 32)
pool_1 = tf.nn.max_pool(conv_1, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
with tf.variable_scope("conv2"):
conv_2 = conv_layer(pool_1, 32, 64)
pool_2 = tf.nn.max_pool(conv_2, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
flattened = tf.contrib.layers.flatten(pool_2)
with tf.variable_scope("fc1"):
fc_1 = fc_layer(flattened, 7*7*64, 1024)
with tf.variable_scope("fc2"):
logits = fc_layer(fc_1, 1024, 1)
# Define loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf.cast(y, dtype=tf.int32)))
# Define optimizer
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
with tf.Session() as sess:
# Initiliaze all the variables
sess.run(tf.global_variables_initializer())
# Train the network
for i in range(EPOCHS):
# Initialize iterator so that it starts at beginning of training set for each epoch
sess.run(iterator.initializer)
print("EPOCH", i)
while True:
try:
_, epoch_loss = sess.run([train, loss])
except tf.errors.OutOfRangeError: # Error given when out of data
if i % 2 == 0:
# [train_accuaracy] = sess.run([accuracy])
# print("Step ", i, "training accuracy = %{}".format(train_accuaracy))
print(epoch_loss)
break
I've spent a few hours trying to figure out systematically why I've been getting 0 loss when I run this model.
Features = list of file locations for each image (e.g. ['\data\train\cat.0.jpg', /data\train\cat.1.jpg])
Labels = [Batch_size, 1] one_hot vector
Initially I thought it was because there was something wrong with my data. But I've viewed the data after being resized and the images seems fine.
Then I tried a few different loss functions because I thought maybe I'm misunderstanding what the the tensorflow function softmax_cross_entropy does, but that didn't fix anything.
I've tried running just the 'logits' section to see what the output is. This is just a small sample and the numbers seem fine to me:
[[0.06388957]
[0. ]
[0.16969752]
[0.24913025]
[0.09961276]]
Surely then the softmax_cross_entropy function should be able to compute this loss given that the corresponding labels are 0 or 1? I'm not sure if I'm missing something. Any help would be greatly appreciated.
As documented:
logits and labels must have the same shape, e.g. [batch_size, num_classes] and the same dtype (either float16, float32, or float64).
Since you mentioned your label is "[Batch_size, 1] one_hot vector", I would assume both your logits and labels are [Batch_size, 1] shape. This will certainly lead to zero loss. Conceptually speaking, you have only 1 class (num_classes=1) and your cannot be wrong (loss=0).
So at least for you labels, you should transform it: tf.one_hot(indices=labels, depth=num_classes). Your prediction logits should also have a shape [batch_size, num_classes] output.
Alternatively, you can use sparse_softmax_cross_entropy_with_logits, where:
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.

Categories