Similar Image shape conversion in Python and OpenCV - python

I am new to Python and am having difficulty understanding the image shape conversion in Python.
In Python code, image I has I.shape
ipdb> I.shape
(720, 1280, 3)
Running this command in Python converts the I's shape and stored into h5_image
h5_image = np.transpose(I, (2,0,1)).reshape(data_shape)
Where data_shape is:
ipdb> p data_shape
(1, 3, 720, 1280)
What is OpenCV's similar function that does the same output?
In (1, 3, 720, 1280), what does 1 mean?
What is the difference between (3, 720, 1280) and (720, 1280, 3)?

You can look on image (I) in python/numpy as a matrix with N dimensions.
In the case you have grayscale image, you will have single value for each row and column. This means 2 dimensions and the shape will be: I.shape --> (rows, cols)
With RGB image, you have 3 channels, red, green, blue. So you have a total of 3 dimensions: I.shape --> (rows, cols, 3)
With RGBA image, you have 4 channels, red, green, blue, alpha. Still 3 dimensions: I.shape --> (rows, cols, 4)
These are the common way to keep image data, but of course you can keep it in any way you like, as long as you know how to read it. For example, you can keep it as one long vector in 1 dimension, and keep also the image width and height, so you know how to read it into 2D format.
For your more specific questions:
I am not sure what is the output you are looking for. You can do transpose() or flip() also in OpenCV.
The (1, 3, 720, 1280) only means you have an additional degenerate dimension. To access each pixel you will have to write I[1,channel,row,col]. The 1 is unnecessary, and it is not a common way to hold an image array. Why do you want to do this? Do you want to save in a specific format? (HDF5?)
The only difference is in the arrangement of your data. For example, in the case of (3, 720, 1280), to get the red channel you need to write: red = I[0,:,:]. While in the case of (720, 1280, 3) you need to write: red = I[:,:,0] (This is more common).
*There are some performance issues which depend on the actual arrangment of the image data in your memory, but I don't think you need to care of this right now.

Related

Reading a grayscale image in python shows 3 color channels instead of one

I am reading an image in python from a dataset which contains grayscale .png images.
img_1.png is already a grayscale image.
img=cv2.imread('img_1.png')
print(img.shape)
on reading the image, img.shape shows 3 channels.
(16, 16, 3)
However it is a grayscale image so it should have only width and height.
(16, 16)
I have also read the same grayscale image in MATLAB, it is only showing width and height.
but reading an rgb image in python and converting it to the grayscale does not show 3 channels. why it is so?
gray_img=cv2.imread('rgb_1.png',cv2.IMREAD_GRAYSCALE)
print(gray_img.shape)
output:
(1050, 1400)
Is the output of reading a grayscale image and an image converted to grayscale in python is same? what is the difference?
i want to read that grayscale image without showing channels.
cv.imread always converts everything to BGR, unless you tell it not to. use the cv.IMREAD_UNCHANGED flag to do that.
You mistake the number of dimensions for the size of the color dimension.
(h,w) is 2-dimensional, and it's equivalent to (h,w,1), where the color dimension has size 1.
(h,w,3) is 3-dimensions (tuple has length 3), and the 3 in the last place says the color dimension has length 3 (for three colors).

Blending 2 BGRA frames OpenCV - the output array type must be explicitly specified in function 'arithm_op

I have two 4 channel BGRA images that I simply want to call cv2.addWeighted on.
However, running m = cv2.addWeighted(i, 0.5, cutout, 0.5, 0) gives me an error:
When the input arrays in add/subtract/multiply/divide functions have different types, the output array type must be explicitly specified in function 'arithm_op
I confirmed using .shape that they were both the same size: (538, 1114, 4)
Not sure what I'm doing wrong here
Get the same problem and I find the mistake I made.
# two images, image and image1
print(image.shape)
print(image1.shape)
print(image.dtype)
print(image1.dtype)
added_image = cv2.addWeighted(image,0.5,image1,0.1,0)
Then got the exact same error as above. After printing data, I found that even two images are with the same shape, they may have different dtype.
(5000, 5000, 3)
(5000, 5000, 3)
uint8
int32
All I have to do is to change the dtype for the image1, make it same dtype as image. Then blend successfully.
image1 = image1.astype(np.uint8)

SSIM for image comparison: issue with image shape

I am calculating the Structural Similarity Index between two images. I don't understand what the dimensionality should be. Both images (reference and target) are RGB images.
If I shape my image as (256*256, 3), I obtain:
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref).reshape(256*256, 3)
print(ref_array.shape) # (65536, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img).reshape(256*256, 3)
print(img_array.shape) # (65536, 3)
ssim = compare_ssim(ref_array,img_array,multichannel=True,data_range=255)
The result is 0.0786.
On the other hand, if I reshape to (256, 256, 3):
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref)
print(ref_array.shape) # (256, 256, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img)
print(img_array.shape) # (256, 256, 3)
ssim = compare_ssim(ref_array, img_array, multichannel=True, data_range=255)
The result is 0.0583
Which of the two results is correct and why? The documentation does not say anything about it, since it's probably a conceptual problem.
The second one is correct, assuming you have a square shaped image and not a really long thin one.
SSIM takes neighbouring pixels into account (for luminance and chrominance masking and identifying structures). Images can be any shape, but if you tell the algorithm your shape is 256*256 by 1 pixel in shape, then the vertical structures will not be taken into account.

Iterating over pixels in an image as a numpy array

Let's say I have a numpy array of shape (100, 100, 3), and that it represents an image in RGB encoding. How do I iterate over the individual pixels of this image.
Specifically I want to map this image with a function.
Note, I got that array from opencv.

copy/reshape high dimensional numpy array

I have a numpy array with a shape like this
x.shape
(100, 1, 300, 300)
Think of this as 100 observations of grayscale images of size 300x300.
Grayscale images have only 1 channel, hence the second 1 in the shape.
I want to convert this to an array of RGB images, with 3 channels.
I want to just copy the grayscale image to the two other channels.
So the final shape would be (100, 3, 300, 300)
How can I do that?
Use np.repeat -
np.repeat(x,3,axis=1)
Sample run -
In [8]: x = np.random.randint(11,99,(2,1,3,4))
In [9]: np.repeat(x,3,axis=1).shape
Out[9]: (2, 3, 3, 4)

Categories