PyTorch: Image not displaying properly - python

I have the following code portion:
dataset = trainDataset()
train_loader = DataLoader(dataset,batch_size=1,shuffle=True)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
images = []
image_labels = []
for i, data in enumerate(train_loader,0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
inputs, labels = inputs.float(), labels.float()
images.append(inputs)
image_labels.append(labels)
image = images[7]
image = image.numpy()
image = image.reshape(416,416,3)
img = Image.fromarray(image,'RGB')
img.show()
The issue is that the image doesn't display properly. For instance, the dataset I have contains images of cats and dogs. But, the image displayed looks as shown below. Why is that?
EDIT 1
So, after #flawr's nice explanation, I have the following:
image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
img = Image.fromarray(image,'RGB')
img.show()
And, the image looks as shown below. Not sure if it is a Numpy thing or the way the image is represented and displayed? I would like to also kindly note that I get a different display of the image at every run, but it is pretty much something close to the image displayed below.
EDIT 2
I think the issue now is with how to represent the image. By referring to this solution, I now get the following:
image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
image = (image * 255).astype(np.uint8)
img = Image.fromarray(image,'RGB')
img.show()
Which produces the following image as expected :-)

In pytorch you usually represent pictures with tensors of shape
(channels, height, width)
You then seem to reshape it to what you expect would be
(height, width, channels)
Note that these tensors or arrays are actually stored as 1d "array", and the multiple dimensions just come from defining strides (check out How to understand numpy strides for layman?).
In your particular case this means that consecutive values (that were basically values of the same color channela and the same row) are now interpreted as different colour channels.
So let's say you have a 2x2 image with 3 color channels. Let's say it is a chessboard pattern. In pytorch that would looks something like the following array of shape (3, 2, 2):
[[[1,0],[0,1]],[[1,0],[0,1]],[[1,0],[0,1]]]
The underlaying internal array is just
[ 1,0 , 0,1 , 1,0 , 0,1 , 1,0 , 0,1 ]
So reshaping to (2, 2, 3) would look like so:
[[[1,0,0],[1,1,0]],[[0,1,1],[0,0,1]]]
which immediately shows how the image will be completely jumbled. Reshaping really just means setting the brackets in different places!
So what you probably want instead of reshape is permute([1, 2, 0]), (or in numpy called transpose) which will actually rearrange the data.

Related

how to detect and slice boxes out of a numpy array

I've got a wide image with 2 images inside it, these 2 images could be seen as 'boxes' in the big image and the numpy array would look like this:
[
[200,200,200,157,200,200,200,238,256,167,234,266,154,200,200,200,157,200,200,200,238,256,167,234,266,154,200,200,200,157,200,200,200],
[200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200],
[200,144,200,200,132,200,200,238,256,167,234,266,154,200,144,200,200,132,200,200,238,256,167,234,266,154,200,144,200,200,132,200,200],
[200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200],
[200,200,166,200,200,200,200,238,256,167,234,266,154,200,200,166,200,200,200,200,238,256,167,234,266,154,200,200,166,200,200,200,200],
[182,200,200,200,200,200,200,238,256,167,234,266,154,182,200,200,200,200,200,200,238,256,167,234,266,154,182,200,200,200,200,200,200]
]
Because i applied a median filter, the surrounding pixels are all 200 with a little bit of noise here and there. My question is: How can i extract those 2 sub-images from this big image and put them as their own array so i have the pictures seperately. My guess would be to slice them out or maybe use edge detection but i haven't succeeded to do so yet. The array in the question is mockup but represents how it looks like because the real array it too big for the output in visual studio. Underneath is what a picture really looks like, there are different pictures with each different 'white spaces' and amount of sub pictures in it. The size is fixed and always 28 x 200.
I am not sure whether my assumptions of your image data is correct. The following algorithm only works, if the image itself does not contain
"flat regions" of 200.
import numpy as np
from matplotlib import pyplot as plt
data = np.array([
[200,200,200,157,200,200,200,238,256,167,234,266,154,200,200,200,157,200,200,200,238,256,167,234,266,154,200,200,200,157,200,200,200],
[200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200],
[200,144,200,200,132,200,200,238,256,167,234,266,154,200,144,200,200,132,200,200,238,256,167,234,266,154,200,144,200,200,132,200,200],
[200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200,238,256,167,234,266,154,200,200,200,200,200,200,200],
[200,200,166,200,200,200,200,238,256,167,234,266,154,200,200,166,200,200,200,200,238,256,167,234,266,154,200,200,166,200,200,200,200],
[182,200,200,200,200,200,200,238,256,167,234,266,154,182,200,200,200,200,200,200,238,256,167,234,266,154,182,200,200,200,200,200,200]
])
# filter all regions of (more or less) constant 200
median_data = np.median(data, axis=0)
diff_data = np.append([0], np.diff(median_data))
img_region = (diff_data != 0) & (median_data != 200)
# get image regions, identify longest image as "true" image length
idx_pairs = np.where(np.diff(np.hstack(([False],img_region,[False]))))[0]
region_lengths = np.diff(idx_pairs)[::2]
longest_region = np.max(np.diff(idx_pairs)[::2])
# fix broken images
for i_region, region_length in enumerate(region_lengths):
if region_length != longest_region:
try:
img_region[slice(idx_pairs[i_region*2], idx_pairs[i_region*2]+longest_region)] = True
except IndexError:
pass # removed mini-regions
idx_pairs = np.where(np.diff(np.hstack(([False], img_region, [False]))))[0]
# get number of images
n_img = np.sum(np.diff(img_region.astype(int)) == 1)
image_data = data[:, img_region]
# slice images
images = np.split(image_data, n_img, axis=1)
for img in images:
plt.figure()
plt.imshow(img)
plt.show()

Mapping tensor in pytorch

I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)

Why does torchvision.utils.make_grid() return copies of the wanted grid?

In the below coding example I can not understand why the output tensor , grid has a shape of
3,28,280. I understand why its 28 in height and 280 in width, but not the 3. It seems from running plt.imshow() on all 3 28x280 arrays along axis 0 that they are identical copies since printing any 1 of these gives me the image I want.
Also I do not understand why I can pass grid as an argument to plt.imshow() given that it is supposed to take in a 2D array, not a 3D one as grid clearly is.
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
train_set = torchvision.datasets.FashionMNIST(
root = './pytorch_obj_classifier/data/FashionMNIST',
train = True,
download = True,
transform = transforms.Compose([
transforms.ToTensor()
])
)
sample = next(iter(train_loader))
image,label = sample
print(image.shape)
grid = torchvision.utils.make_grid(image,padding=0, nrow=10)
print(grid.shape)
plt.figure(figsize=(15,15))
grid = np.transpose(grid,(1,2,0))
grid1 = grid[:,:,0]
grid2 = grid[:,:,1]
grid3 = grid[:,:,2]
plt.imshow(grid1,cmap = 'gray')
plt.imshow(grid2,cmap = 'gray')
plt.imshow(grid3,cmap = 'gray')
plt.imshow(grid,cmap = 'gray')
The MNIST dataset consists of grascale images. If you look at the implementation detail of torchvision.utils.make_grid, single-channel images get their channel copied three times:
if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images
tensor = torch.cat((tensor, tensor, tensor), 1)
As for matplotlib.pyplot.imshow it can take 2D, 3D or 4D inputs:
The image data. Supported array shapes are:
(M, N): an image with scalar data. The data is visualized using a colormap.
(M, N, 3): an image with RGB values (0-1 float or 0-255 int).
(M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.
Generally speaking, we wouldn't refer to dimensions but rather describe tensors by their shape (the size on each of their axes). In PyTorch, images always have three axes, and have a shape that follows: (channel, height, width). Even for single-channel images: considering it as a 3D tensor (1, height, width) instead of a 2D tensor (height, width). This is to be consistant with cases where you have more than one channel, which is very often (cf. convolution neural networks).

Images dimensions error in python

Trying to match two images to find out the scores between them.But it shows some dimension error.Unable to fix the issue.My code is given below:
from skimage.measure import compare_ssim
#import argparse
#import imutils
import cv2
img1="1.png"
img2="2.png"
# load the two input images
imageA = cv2.imread(img1)
imageB = cv2.imread(img2)
# convert the images to grayscale
grayA = cv2.cvtColor(imageA, cv2.COLOR_BGR2GRAY)
grayB = cv2.cvtColor(imageB, cv2.COLOR_BGR2GRAY)
# compute the Structural Similarity Index (SSIM) between the two
# images, ensuring that the difference image is returned
(score, diff) = compare_ssim(grayA, grayB, full=True)
diff = (diff * 255).astype("uint8")
print("SSIM: {}".format(score))
This give n an error:
raise ValueError('Input images must have the same dimensions.')
ValueError: Input images must have the same dimensions.
How to fix this issue?
Amending Saurav Panda's answer:
You can reshape one of the images to the size of other image like this:
imageB=cv2.resize(imageB,imageA.shape)
note that
(H, W) = imageA.shape
# to resize and set the new width and height
imageB = cv2.resize(imageB, (W, H))
the cv2.resize function inputs expects (W,H). This is the reverse order of the output of cv2.shape (H,W), so you need to catch that, or you'll get the same error when comparing non-square images.
You can do this in many ways:
Like in the first method, you can assign a fixed dimension which would be less than the actual dimensions of the image and resize both images to this same size. Like, resize all images to (150,150), etc.
In second method you can reshape one of the images to the size of other images.
Try this code:
imageB=cv2.resize(imageB,imageA.shape)
This will work for you, but in case the difference in dimensions of two image is very large, sometimes you may lose some data. You can compare for both x and y dimensions and find the smallest one.Then resize both images to this smallest dimension of x and y.
The error
'Input images must have the same dimensions.'
Tells you that the function you called expects input images of the same dimensions and that you did not do this.
You obviously fix that by providing input images that have the same dimensions or by not calling that function if the images have different dimensions and if you cannot change that for whatever reason.
Compare imageA.shape and imageB.shape after loading the images from file.
For simple debugging:
print imageA.shape
print imageB.shape
You can use tensorflow. See this link and you can modify your data accordingly

Check if cifar10 is converted well

Recently I followed a few tutorials on machine learning, and now I want to test if I can make some image recognition program by myself. For this I want to use the CIFAR 10 dataset, but I think I have a small problem in the conversion of the dataset.
For who is not familiar with this set: the dataset comes as lists of n rows and 3072 columns, in which the first 1024 columns represent the red values, the second 1024 the green values and the last are the blue values. Each row is a single image (size 32x32) and the pixel rows are stacked after each other (first 32 values are the red values for the top-most row of pixels, etc.)
What I wanted to do with this dataset is to transform it to a 4D tensor (with numpy), so I can view the images with matplotlibs .imshow(). the tensor I made has this shape: (n, 32, 32, 3), so the first 'dimension' stores all images, the second stores rows of pixels, the third stores individual pixels and the last represents the rgb values of those pixels. Here is the function I made that should do this:
def rawToRgb(data):
length = data.shape[0]
# convert to flat img array with rgb pixels
newAr = np.zeros([length, 1024, 3])
for img in range(length):
for pixel in range(1024):
newAr[img, pixel, 0] = data[img, pixel]
newAr[img, pixel, 1] = data[img, pixel+1024]
newAr[img, pixel, 2] = data[img, pixel+2048]
# convert to 2D img array
newAr2D = newAr.reshape([length, 32, 32, 3])
# plt.imshow(newAr2D[5998])
# plt.show()
return newAr2D
Which takes a single parameter (a tensor of shape (n, 3072)). I have commented out the pyplot code, as this is only for testing, but when testing, I noticed that everything seems to be ok (I can recognise the shapes of the objects in the images, but I am not sure if the colours are good or not, as I get some oddly-coloured images as well as some pretty normal images... Here are a few examples: purple plane, blue cat, normal horse, blue frog.
Can anyone tell me wether I am making a mistake or not?
The images that appear oddly-coloured are the negative of the actual image, so you need to subtract each pixel value from 255 to get the true value. If you simply want to see what the original images look like, use:
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(file_path)
plt.imshow(255 - img)
plt.show()
The original cause of the problem is that the CIFAR-10 data stores the pixel values on a scale of 0-255, but matplotlib's imshow() method (which I assume you are using) expects inputs between 0 and 1. Given an input that is not scaled between 0 and 1, imshow() does some normalization internally, which causes some images to become negatives.

Categories