SSIM for image comparison: issue with image shape - python

I am calculating the Structural Similarity Index between two images. I don't understand what the dimensionality should be. Both images (reference and target) are RGB images.
If I shape my image as (256*256, 3), I obtain:
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref).reshape(256*256, 3)
print(ref_array.shape) # (65536, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img).reshape(256*256, 3)
print(img_array.shape) # (65536, 3)
ssim = compare_ssim(ref_array,img_array,multichannel=True,data_range=255)
The result is 0.0786.
On the other hand, if I reshape to (256, 256, 3):
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref)
print(ref_array.shape) # (256, 256, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img)
print(img_array.shape) # (256, 256, 3)
ssim = compare_ssim(ref_array, img_array, multichannel=True, data_range=255)
The result is 0.0583
Which of the two results is correct and why? The documentation does not say anything about it, since it's probably a conceptual problem.

The second one is correct, assuming you have a square shaped image and not a really long thin one.
SSIM takes neighbouring pixels into account (for luminance and chrominance masking and identifying structures). Images can be any shape, but if you tell the algorithm your shape is 256*256 by 1 pixel in shape, then the vertical structures will not be taken into account.

Related

Structural Similarity Index (SSIM) in Python (Multichannel error)

I want to calculate the Structural Similarity Index (SSIM) between a generated and a target image (that have been picked randomly from an array of images).
This is what I have tried-
from skimage.metrics import structural_similarity as ssim
print(tar_image.shape)
print(gen_image.shape)
ssim_skimg = ssim(tar_image, gen_image,
data_range = gen_image.max() - gen_image.min(),
multichannel = True)
print("SSIM: based on scikit-image = ", ssim_skimg)
But I am getting this output:
(1, 128, 128, 3)
(1, 128, 128, 3)
ValueError: win_size exceeds image extent. If the input is a multichannel (color) image, set multichannel=True.
Can someone please tell me where I am going wrong and how I can fix this problem?
You have 3 channel images, so you should use the multichannel = True argument.
Also you should remove the first dimension of your images to get (128,128,3) shapes
import numpy as np
from skimage.metrics import structural_similarity as ssim
tar_image = np.zeros((128, 128, 3))
gen_image = np.zeros((128, 128, 3))
ssim_skimg = ssim(tar_image, gen_image, multichannel = True)

Mapping tensor in pytorch

I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)

How to convert all images in a batch from rgb to greyscale using average across all channels for each pixel

I have a batch of images with shape (32, 32, 3, 73257) each image is 32x32 and I want to convert RGB of each image to greyscale by taking average across each channel. I know there are other ways but my requirement is to take average.
I can not come up with logic that I can apply this average thing to whole batch and then reduce 3 to 1. Can someone help me? I tried looking on StackOverflow and other things but can not find any satisfactory answer.
img_arr.shape
>>> (32, 32, 3, 73257)
img_arr = np.average(img_arr, axis= 2).astype(int)
img_arr.shape
>>> (32, 32, 73257)
if you want to retain he shape
img_arr = img_arr[..., np.newaxis, :]
You wanna average over the 2nd axis(0-indexing), which has the number of channels( = 3)
You can do that with np.average()
gray_scale = np.average(your_image_array, axis=2)
Then to actually plot the image:
plt.imshow(gray_scale, interpolation='nearest', cmap=plt.get_cmap('gray'))
plt.show()

Tensorflow: Convert image to rgb if grayscale

I have a dataset of rgb and grayscale images. While iterating over the dataset, I want to detect if the image is a grayscale image such that I can convert it to rgb. I wanted to use tf.shape(image) to detect the dimensions of the image. For a rgb image I get something like [1, 100, 100, 3]. For grayscale images the function returns for example [1, 100, 100]. I wanted to use len(tf.shape(image)) to detect if it is of length 4 (=rgb) or length 3 (=grayscale). That did not work.
This is my code so far which did not work:
def process_image(image):
# Convert numpy array to tensor
image = tf.convert_to_tensor(image, dtype=tf.uint8)
# Take care of grayscale images
dims = len(tf.shape(image))
if dims == 3:
image = np.expand_dims(image, axis=3)
image = tf.image.grayscale_to_rgb(image)
return image
Is there an alternative way to convert grayscale images to rgb?
You can use a function like this for that:
import tensorflow as tf
def process_image(image):
image = tf.convert_to_tensor(image, dtype=tf.uint8)
image_rgb = tf.cond(tf.rank(image) < 4,
lambda: tf.image.grayscale_to_rgb(tf.expand_dims(image, -1)),
lambda: tf.identity(image))
# Add shape information
s = image.shape
image_rgb.set_shape(s)
if s.ndims is not None and s.ndims < 4:
image_rgb.set_shape(s.concatenate(3))
return image_rgb
I had a very similar problem, I wanted to load rgb and greyscale images in one go. Tensorflow supports setting the channel number when reading in the images. So if the images have different numbers of channels, this might be what you are looking for:
# to get greyscale:
tf.io.decode_image(raw_img, expand_animations = False, dtype=tf.float32, channels=1)
# to get rgb:
tf.io.decode_image(raw_img, expand_animations = False, dtype=tf.float32, channels=3)
-> You can even do both on the same image and inside tf.data.Dataset mappings!
You now have to set the channels variable to match the shape you need, so all the loaded images will be of that shape. Than you could reshape without a condition.
This also allows you to directly load a grayscale image to RGB in Tensorflow. Here an example:
>> a = Image.open(r"Path/to/rgb_img.JPG")
>> np.array(a).shape
(666, 1050, 3)
>> a = a.convert('L')
>> np.array(a).shape
(666, 1050)
>> b = np.array(a)
>> im = Image.fromarray(b)
>> im.save(r"Path/to/now_it_is_greyscale.jpg")
>> raw_img = tf.io.read_file(r"Path/to/now_it_is_greyscale.jpg")
>> img = tf.io.decode_image(raw_img, dtype=tf.float32, channels=3)
>> img.shape
TensorShape([666, 1050, 3])
>> img = tf.io.decode_image(raw_img, dtype=tf.float32, channels=1)
>> img.shape
TensorShape([666, 1050, 1])
Use expand_animations = False if you get ValueError: 'images' contains no shape.! See: https://stackoverflow.com/a/59944421/9621080

After changing the image to numpy array, I want to import only 1 channel

I took some images and replaced them with numpy array.
The image is a RGB image.
The converted numpy array is of size (256, 256, 3).
I wanted to import only the Y channel after I switched this RGB image to YCbCr.
What I want is an array of size (256,256, 1).
So I used [:,:, 0] in the array.
However, I have now become a two-dimensional image as shown in the code below.
I created an array of (256, 256, 1) size with 15 lines of code.
But I failed to see it again as an image.
Below is my code.
from PIL import Image
import numpy as np
img = Image.open('test.bmp') # input image 256 x 256
img = img.convert('YCbCr')
img.show()
print(np.shape(img)) # (256, 256, 3)
arr_img = np.asarray(img)
print(np.shape(arr_img)) # (256, 256, 3)
arr_img = arr_img[:, :, 0]
print(np.shape(arr_img)) # (256, 256)
arr_img = arr_img.reshape( * arr_img.shape, 1 )
print(np.shape(arr_img)) # (256, 256, 1)
pi = Image.fromarray(arr_img)
pi.show # error : TypeError: Cannot handle this data type
When I forcibly changed a two-dimensional image into a three-dimensional image,
The image can not be output.
I want to have a purely (256, 256, 1) sized array.
Y image of the channel!
I tried to use arr_img = arr_img [:,:, 0: 1] but I got an error.
How can I output an image with only Y (256,256,1) size and save it?
A single-channel image should actually be in 2D, with shape of just (256, 256). Extracting out the Y channel is effectively the same as having a greyscale image, which is just 2D. Adding the third dimension is causing the error because it is expecting just the two dimensions.
If you remove the reshape to (256, 256, 1), you will be able to save the image.
Edit:
from PIL import Image
import numpy as np
img = Image.open('test.bmp') # input image 256 x 256
img = img.convert('YCbCr')
arr_img = np.asarray(img) # (256, 256, 3)
arr_img = arr_img[:, :, 0] # (256, 256)
pi = Image.fromarray(arr_img)
pi.show()
# Save image
pi.save('out.bmp')
Try this:
arr_img_1d = np.expand_dims(arr_img, axis=1)
Here is the numpy documentation for the expand_dims function.

Categories