Why does transposing a numpy array rotate it 90 degrees? - python

I am trying to read images from an lmdb dataset, augment each one and then save them into another dataset for being used in my trainings.
These images axis were initially changed to (3,32,32) when they were being saved into the lmdb dataset, So in order to augment them I had to transpose them back into their actual shape.
The problem is whenever I try to display them using either matplotlib's show() method or scipy's toimage(), they show a rotated version of the image.
So we have :
img_set = np.transpose(data_train,(0,3,2,1))
#trying to display an image using pyplot, makes it look like this:
plt.subplot(1,2,1)
plt.imshow(img_set[0])
showing the same image using toimage :
Now if I dont transpose data_train, pyplot's show() generates an error while
toimage() displays the image well:
What is happening here?
When I feed the transposed data_train to my augmenter, I also get the result rotated just like previous examples.
Now I'm not sure whether this is a displaying issue, or the actual images are indeed rotated!
What should I do ?

First, look closely. The transoposed array is not rotated but mirrored on the diagonal (i.e. X and Y axes are swapped).
The original shape is (3,32,32), which I interpret as (RGB, X, Y). However, imshow expects an array of shape MxNx3 - the color information must be in the last dimension.
By transposing the array you invert the order of dimensions: (RGB, X, Y) becomes (Y, X, RGB). This is fine for matplotlib because the color information is now in the last dimension but X and Y are swapped, too. If you want to preserve the order of X, Y you can tell transpose to do so:
import numpy as np
img = np.zeros((3, 32, 64)) # non-square image for illustration
print(img.shape) # (3, 32, 64)
print(np.transpose(img).shape) # (64, 32, 3)
print(np.transpose(img, [1, 2, 0]).shape) # (32, 64, 3)
When using imshow to display an image be aware of the following pitfalls:
It treats the image as a matrix, so the dimensions of the array are interpreted as (ROW, COLUMN, RGB), which is equivalent to (VERTICAL, HORIZONTAL, COLOR) or (Y, X, RGB).
It changes direction of the y axis so the upper left corner is img[0, 0]. This is different from matplotlib's normal coordinate system where (0, 0) is the bottom left.
Example:
import matplotlib.pyplot as plt
img = np.zeros((32, 64, 3))
img[1, 1] = [1, 1, 1] # marking the upper right corner white
plt.imshow(img)
Note that the smaller first dimension corresponds to the vertical direction of the image.

Related

How can I plot a normalized RGB map

I have a numpy array where each element has 3 values (RGB) from 0 to 255, and it spans from [0, 0, 0] to [255, 255, 255] with 256 elements evenly spaced. I want to plot it as a 16 by 16 grid but have no idea how to map the colors (as the numpy array) to the data to create the grid.
import numpy as np
# create an evenly spaced RGB representation as integers
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
# convert the evenly spaced integers to RGB representation
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
# data to fit the rgb_colors as colors into a plot as a 16 by 16 numpy array
data = np.array(tuple((k,p) for k in range(16) for p in range(16)))
So, how to map the rgb_colors as colors to the data data into a grid plot?
There's quite a bit going on here, and I think it's valuable to talk about it.
linspace
I suggest you read the linspace documentation.
https://numpy.org/doc/stable/reference/generated/numpy.linspace.html
If you want a 16x16 grid, then you should start by generating 16x16=256 values, however if you inspect the shape of the all_colors_int array, you'll notice that it's only generated 50 values, which is the default value of the linspace num argument.
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
print(all_colors_int.shape) # (50,)
Make sure you specify this third 'num' argument to generate the correct quantity of RGB pixels.
As a further side note, (255 << 16) + (255 << 8) + 255 is equivalent to (2^24)-1. The 2^N-1 formula is usually what's used to fill the first N bits of an integer with 1's.
numpy is faster
On your next line, your for loop manually iterates over all of the elements in python.
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
While this might work, this isn't considered the correct way to use numpy arrays.
You can directly perform bitwise operations to the entire numpy array without the python for loop. For example, to extract bits [16, 24) (which is usually the red channel in an RGB integer):
# Shift over so the 16th bit is now bit 0, then select only the first 8 bits.
RedChannel = (all_colors_int >> 16) & 255
Building the grid
There are many ways to do this in numpy, however I would suggest this approach.
Images are usually represented with a 3-dimensional numpy array, usually of the form
(HEIGHT, WIDTH, CHANNELS)
First, reshape your numpy int array into the 16x16 grid that you want.
reshaped = all_colors_int.reshape((16, 16))
Again, the numpy documentation is really great, give it a read:
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
Now, extract the red, green and blue channels, as described above, from this reshaped array. If you operate directly on the numpy array, you won't need a nested for-loop to iterate over the 16x16 grid, numpy will handle this for you.
RedChannel = (reshaped >> 16) & 255
GreenChannel = ... # TODO
BlueChannel = ... # TODO
And then finally, we can convert our 3, 16x16 grids, into a 16x16x3 grid, using the numpy stack function
https://numpy.org/doc/stable/reference/generated/numpy.stack.html
grid_rgb = np.stack((
RedChannel,
GreenChannel,
BlueChannel
), axis=2).astype(np.uint8)
Notice two things here
When we 'stack' arrays, we create a new dimension. The axis=2 argument tells numpy to add this new dimension at index 2 (e.g. the third axis). Without this, the shape of our grid would be (3, 16, 16) instead of (16, 16, 3)
The .astype(np.uint8) casts all of the values in this numpy array into a uint8 data type. This is so the grid is compatible with other image manipulation libraries, such as openCV, and PIL.
Show the image
We can use PIL for this.
If you want to use OpenCV, then remember that OpenCV interprets images as BGR not RGB and so your channels will be inverted.
# Show Image
from PIL import Image
Image.fromarray(grid_rgb).show()
If you've done everything right, you'll see an image... And it's all gray.
Why is it gray?
There are over 16 million possible colours. Selecting only 256 of them just so happens to select only pixels with the same R, G and B values which results in an image without any color.
If you want to see some colours, you'll need to either show a bigger image (e.g. 256x256), or alternatively, you can use a dimension that's not a power of two. For example, try a prime number, as this will add a small amount of pseudo-randomness to the RGB selection, e.g. try 17.
Best of luck.
Based solely on the title 'How to plot a normalized RGB map' rather than the approach you've provided, it appears that you'd like to plot a colour spectrum in RGB.
The following approach can be taken to manually construct this.
import cv2
import matplotlib.pyplot as plt
import numpy as np
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
Explanation:
It's generally easier to construct (and decompose) a colour palette using the HSV (hue, saturation, value) colour scale; where hue is the colour itself, saturation can be thought of as the intensity and value as the distance from black. Therefore, there's really only one value to worry about, hue. Saturation and value can be set to 255, for 'full intensity'.
cv2 is used here to simply convert the constructed HSV colourscale to RGB and matplotlib is used to plot the image. (I didn't use cv2 for plotting as it doesn't play nicely with Jupyter.)
The actual spectrum values are constructed in numpy.
Breakdown:
Create the colour spectrum of hue and plug 255 in for the saturation and value. Why is 180 used?
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
Stack the three channels H+S+V into a 3-dimensional array, convert the array values to unsigned 8-bit integers, and have cv2 convert from HSV to RGB for us, to be lazy and save us working out the math.
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
Plot the RGB image.
plt.imshow(rgb)

Why does torchvision.utils.make_grid() return copies of the wanted grid?

In the below coding example I can not understand why the output tensor , grid has a shape of
3,28,280. I understand why its 28 in height and 280 in width, but not the 3. It seems from running plt.imshow() on all 3 28x280 arrays along axis 0 that they are identical copies since printing any 1 of these gives me the image I want.
Also I do not understand why I can pass grid as an argument to plt.imshow() given that it is supposed to take in a 2D array, not a 3D one as grid clearly is.
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
train_set = torchvision.datasets.FashionMNIST(
root = './pytorch_obj_classifier/data/FashionMNIST',
train = True,
download = True,
transform = transforms.Compose([
transforms.ToTensor()
])
)
sample = next(iter(train_loader))
image,label = sample
print(image.shape)
grid = torchvision.utils.make_grid(image,padding=0, nrow=10)
print(grid.shape)
plt.figure(figsize=(15,15))
grid = np.transpose(grid,(1,2,0))
grid1 = grid[:,:,0]
grid2 = grid[:,:,1]
grid3 = grid[:,:,2]
plt.imshow(grid1,cmap = 'gray')
plt.imshow(grid2,cmap = 'gray')
plt.imshow(grid3,cmap = 'gray')
plt.imshow(grid,cmap = 'gray')
The MNIST dataset consists of grascale images. If you look at the implementation detail of torchvision.utils.make_grid, single-channel images get their channel copied three times:
if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images
tensor = torch.cat((tensor, tensor, tensor), 1)
As for matplotlib.pyplot.imshow it can take 2D, 3D or 4D inputs:
The image data. Supported array shapes are:
(M, N): an image with scalar data. The data is visualized using a colormap.
(M, N, 3): an image with RGB values (0-1 float or 0-255 int).
(M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.
Generally speaking, we wouldn't refer to dimensions but rather describe tensors by their shape (the size on each of their axes). In PyTorch, images always have three axes, and have a shape that follows: (channel, height, width). Even for single-channel images: considering it as a 3D tensor (1, height, width) instead of a 2D tensor (height, width). This is to be consistant with cases where you have more than one channel, which is very often (cf. convolution neural networks).

Check if cifar10 is converted well

Recently I followed a few tutorials on machine learning, and now I want to test if I can make some image recognition program by myself. For this I want to use the CIFAR 10 dataset, but I think I have a small problem in the conversion of the dataset.
For who is not familiar with this set: the dataset comes as lists of n rows and 3072 columns, in which the first 1024 columns represent the red values, the second 1024 the green values and the last are the blue values. Each row is a single image (size 32x32) and the pixel rows are stacked after each other (first 32 values are the red values for the top-most row of pixels, etc.)
What I wanted to do with this dataset is to transform it to a 4D tensor (with numpy), so I can view the images with matplotlibs .imshow(). the tensor I made has this shape: (n, 32, 32, 3), so the first 'dimension' stores all images, the second stores rows of pixels, the third stores individual pixels and the last represents the rgb values of those pixels. Here is the function I made that should do this:
def rawToRgb(data):
length = data.shape[0]
# convert to flat img array with rgb pixels
newAr = np.zeros([length, 1024, 3])
for img in range(length):
for pixel in range(1024):
newAr[img, pixel, 0] = data[img, pixel]
newAr[img, pixel, 1] = data[img, pixel+1024]
newAr[img, pixel, 2] = data[img, pixel+2048]
# convert to 2D img array
newAr2D = newAr.reshape([length, 32, 32, 3])
# plt.imshow(newAr2D[5998])
# plt.show()
return newAr2D
Which takes a single parameter (a tensor of shape (n, 3072)). I have commented out the pyplot code, as this is only for testing, but when testing, I noticed that everything seems to be ok (I can recognise the shapes of the objects in the images, but I am not sure if the colours are good or not, as I get some oddly-coloured images as well as some pretty normal images... Here are a few examples: purple plane, blue cat, normal horse, blue frog.
Can anyone tell me wether I am making a mistake or not?
The images that appear oddly-coloured are the negative of the actual image, so you need to subtract each pixel value from 255 to get the true value. If you simply want to see what the original images look like, use:
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(file_path)
plt.imshow(255 - img)
plt.show()
The original cause of the problem is that the CIFAR-10 data stores the pixel values on a scale of 0-255, but matplotlib's imshow() method (which I assume you are using) expects inputs between 0 and 1. Given an input that is not scaled between 0 and 1, imshow() does some normalization internally, which causes some images to become negatives.

How to I efficiently transform a numpy image of shape (w, h, 3) to (w,h,5) which has r,g,b,x,y in third axis?

Without using for loop.
How can I attach the x,y coordinate on to each pixel of a rgb image in numpy?
such that
image[0,0,:] = (r,g,b,x,y) where x,y is the coordinate of the pixel
Suppose rgb and xy are your (w,h,3) and (w,h,2) arrays, respectively. Then you can concatenate them along the third axis:
image = np.concatenate((rgb, xy), axis=2)

How can I prevent Numpy/ SciPy gaussian blur from converting image to grey scale?

I want to perform gaussian blur on an image but I don't want to be convert to grey scale. Is there anyway to perform this operation and keep the color?
from scipy import misc
import scipy
import numpy as np
a = misc.imread('A.jpg')
# A retains its color
misc.imsave('color.jpg', a)
# A_G_Blur gets converted to grey scale, I want to prevent this
a_g_blure = ndimage.uniform_filter(a, size=11)
# I want it to keep it's color
misc.imsave('now_grey.jpg', a)
a is a 3-d array with shape (M, N, 3). The problem is that ndimage.uniform_filter(a, size=11) applies a filter with length 11 to each dimension of a, include the third axis that holds the color channels. When you apply the filter with length 11 to an axis with length 3, the resulting values are all pretty close to the average of the three values, so you get something pretty close to a gray scale. (Depending on the image, you might have some color left.)
What you actually want is to apply a 2-d filter to each color channel separately. You can do this by giving a tuple as the size argument, using a size of 1 for the last axis:
a_g_blure = ndimage.uniform_filter(a, size=(11, 11, 1))
Note: uniform_filter is not a Gaussian blur. For that, you would use scipy.ndimage.gaussian_filter. You might also be interested in the filters provided by scikit-image. In particular, see skimage.filters.gaussian_filter.
For a gaussian blur, I recommend using skimage.filters.gaussian_filter.
from skimage.io import imread
from skimage.filters import gaussian_filter
sigma=5 # blur radius
img = imread('path/to/img')
# this will only return grayscale
grayscale_blur = gaussian_filter(src_img, sigma=sigma)
# passing multichannel param as True returns colors
color_blur = gaussian_filter(src_img, sigma=sigma, multichannel=True)

Categories