I want to calculate the Structural Similarity Index (SSIM) between a generated and a target image (that have been picked randomly from an array of images).
This is what I have tried-
from skimage.metrics import structural_similarity as ssim
print(tar_image.shape)
print(gen_image.shape)
ssim_skimg = ssim(tar_image, gen_image,
data_range = gen_image.max() - gen_image.min(),
multichannel = True)
print("SSIM: based on scikit-image = ", ssim_skimg)
But I am getting this output:
(1, 128, 128, 3)
(1, 128, 128, 3)
ValueError: win_size exceeds image extent. If the input is a multichannel (color) image, set multichannel=True.
Can someone please tell me where I am going wrong and how I can fix this problem?
You have 3 channel images, so you should use the multichannel = True argument.
Also you should remove the first dimension of your images to get (128,128,3) shapes
import numpy as np
from skimage.metrics import structural_similarity as ssim
tar_image = np.zeros((128, 128, 3))
gen_image = np.zeros((128, 128, 3))
ssim_skimg = ssim(tar_image, gen_image, multichannel = True)
Related
I have trained model and I am trying to test it
import tensorflow as tf
import os
import cv2
import matplotlib.pyplot as plt
import numpy as np
# Some paths
paths = {
"cats": "data\\images\\cats",
"dogs": "data\\images\\dogs",
"img": "data\\images"
}
label_to_index = {
"cat": 0,
"dog": 1
}
index_to_label = {
0: "cat",
1: "dog"
}
animals = {
"cats_labels": [label_to_index["cat"] for _ in range(len(paths["cats"]))],
"dogs_labels": [label_to_index["dog"] for _ in range(len(paths["dogs"]))],
"cats": [os.path.join(paths["cats"], img) for img in os.listdir(paths["cats"])],
"dogs": [os.path.join(paths["dogs"], img) for img in os.listdir(paths["dogs"])],
}
# Load model
model = tf.keras.models.load_model('models/cats_dogs_model_1.h5')
# Load one image for test
img = cv2.imread(animals['cats'][0])
# Predictions input needs to be an array
test_img = [img]
# (1) mobilenetv2_1.00_192_input expected to have 4 dimensions, image now has 3 dims ->
# -> add new dim
test_img = [np.expand_dims(img, axis=0) for img in test_img]
print("Shape is ", test_img[0].shape) # New Shape is (1, 375, 500, 3) , was (375, 500, 3)
print("Number if dimentions is ", test_img[0].ndim) # New Number if dimentions is 4 , was 3
# (2) ValueError: Error when checking input: expected mobilenetv2_1.00_192_input
# to have shape (192, 192, 3) but got array with shape (375, 500, 3)
# in the next line I am trying to reshape image for necessary sizes:
# test_img = [np.reshape(img, (192, 192)) for img in test_img]
# but if I uncomment it new error will be raised:
# ValueError: cannot reshape array of size 562500 into shape (192,192)
predictions = model.predict(test_img) # !!! error raises here
plt.imshow(test_img[0])
label = np.argmax(predictions)
plt.title(label)
plt.show()
But I encounter errors all the time I am trying to make image shape and dims valid for model.predict input, so I am stuck at this moment without understanding how to reshape my image properly. I hope anybody could explain me what is wrong with my image transormation, because this part is a black box for me now.
Errors I encounter:
(1) Error about dimentions - I added fourth dim, and everything is ok for now, then
(2) Error about invalid input image shape
Reshape is not you are looking for, reshape only rearranges the dimensions and changes values across dimensions, but never generates new values to fit a required size. What you want to resize the images. TensorFlow has a convenient function to resize a batch of images:
import tensorflow as tf
# ...
model = tf.keras.models.load_model('models/cats_dogs_model_1.h5')
img = cv2.imread(animals['cats'][0])
test_img = [img]
test_img = [np.expand_dims(img, axis=0) for img in test_img]
# It is better to have a single tensor instead of a list of tensors, therefore,
# before resizing the images concatenate all them in a tensor
test_img = tf.concat(test_img, axis=0)
test_img = tf.image.resize(test_img, [192, 192])
np.reshape cannot resize an image. It's used to, you guessed it, reshape an array to another shape without changing the number of elements it contains. For instance, you can reshape a (20, 50) array into a (20, 5, 10) array because 20x50=20x5x10, but you can't reshape a (375, 500, 3) image into a (192, 192, 3) image.
Instead, you should use the resize method from PIL.Image (https://www.google.com/amp/s/www.geeksforgeeks.org/python-pil-image-resize-method/amp/)
I am calculating the Structural Similarity Index between two images. I don't understand what the dimensionality should be. Both images (reference and target) are RGB images.
If I shape my image as (256*256, 3), I obtain:
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref).reshape(256*256, 3)
print(ref_array.shape) # (65536, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img).reshape(256*256, 3)
print(img_array.shape) # (65536, 3)
ssim = compare_ssim(ref_array,img_array,multichannel=True,data_range=255)
The result is 0.0786.
On the other hand, if I reshape to (256, 256, 3):
ref = Image.open('path1').convert("RGB")
ref_array = np.array(ref)
print(ref_array.shape) # (256, 256, 3)
img = Image.open('path2').convert("RGB")
img_array = np.array(img)
print(img_array.shape) # (256, 256, 3)
ssim = compare_ssim(ref_array, img_array, multichannel=True, data_range=255)
The result is 0.0583
Which of the two results is correct and why? The documentation does not say anything about it, since it's probably a conceptual problem.
The second one is correct, assuming you have a square shaped image and not a really long thin one.
SSIM takes neighbouring pixels into account (for luminance and chrominance masking and identifying structures). Images can be any shape, but if you tell the algorithm your shape is 256*256 by 1 pixel in shape, then the vertical structures will not be taken into account.
I've taken the input from the folder and then reshaped it accordingly as per the model VGG16-places365. It is still showing the same error and looked into the Keras documentation of the problem (https://keras.io/applications/#vgg16) yet the error still prevails.
if __name__ == '__main__':
#from urllib.request import urlopen
import numpy as np
from PIL import Image
from cv2 import resize
pred_array = np.empty((0,6),dtype=float)
TEST_PATH = '/home/guest/Downloads/content/image/thumb'
for img in os.listdir(TEST_PATH):
image = Image.open(os.path.join(TEST_PATH, img))
image = np.array(image, dtype=np.uint8)
image = resize(image, (224, 224))
image = np.expand_dims(image, 0)
model = VGG16_Places365(weights='places')
predictions_to_return = 5
preds = model.predict(image)[0]
top_preds = np.argsort(preds)[::-1][0:predictions_to_return]
# load the class label
file_name = 'categories_places365.txt'
if not os.access(file_name, os.W_OK):
synset_url = 'https://raw.githubusercontent.com/csailvision/places365/master/categories_places365.txt'
os.system('wget ' + synset_url)
classes = list()
with open(file_name) as class_file:
for line in class_file:
classes.append(line.strip().split(' ')[0][3:])
classes = tuple(classes)
temprow = np.hstack((np.array([img]),top_preds))
np.append(pred_array,temprow.reshape(-1,pred_array.shape[1]),axis=0)
df = pd.DataFrame(data=pred_array,columns=['File_name','Tag_1','Tag_2','Tag_3','Tag_4','Tag_5'])
print(df)
You are probably loading an image with an alpha channel (RGBA) but the VGG16 neural network expects an image without an alpha channel (RGB).
To convert the image from RGBA to RGB, you can either use
image = image.convert("RGB")
on the PIL Image object, i.e. directly after Image.open, or use numpy array slicing on the numpy array object to cut off the first three color channels after np.array has been called:
image = image[:, :, :3]
I took some images and replaced them with numpy array.
The image is a RGB image.
The converted numpy array is of size (256, 256, 3).
I wanted to import only the Y channel after I switched this RGB image to YCbCr.
What I want is an array of size (256,256, 1).
So I used [:,:, 0] in the array.
However, I have now become a two-dimensional image as shown in the code below.
I created an array of (256, 256, 1) size with 15 lines of code.
But I failed to see it again as an image.
Below is my code.
from PIL import Image
import numpy as np
img = Image.open('test.bmp') # input image 256 x 256
img = img.convert('YCbCr')
img.show()
print(np.shape(img)) # (256, 256, 3)
arr_img = np.asarray(img)
print(np.shape(arr_img)) # (256, 256, 3)
arr_img = arr_img[:, :, 0]
print(np.shape(arr_img)) # (256, 256)
arr_img = arr_img.reshape( * arr_img.shape, 1 )
print(np.shape(arr_img)) # (256, 256, 1)
pi = Image.fromarray(arr_img)
pi.show # error : TypeError: Cannot handle this data type
When I forcibly changed a two-dimensional image into a three-dimensional image,
The image can not be output.
I want to have a purely (256, 256, 1) sized array.
Y image of the channel!
I tried to use arr_img = arr_img [:,:, 0: 1] but I got an error.
How can I output an image with only Y (256,256,1) size and save it?
A single-channel image should actually be in 2D, with shape of just (256, 256). Extracting out the Y channel is effectively the same as having a greyscale image, which is just 2D. Adding the third dimension is causing the error because it is expecting just the two dimensions.
If you remove the reshape to (256, 256, 1), you will be able to save the image.
Edit:
from PIL import Image
import numpy as np
img = Image.open('test.bmp') # input image 256 x 256
img = img.convert('YCbCr')
arr_img = np.asarray(img) # (256, 256, 3)
arr_img = arr_img[:, :, 0] # (256, 256)
pi = Image.fromarray(arr_img)
pi.show()
# Save image
pi.save('out.bmp')
Try this:
arr_img_1d = np.expand_dims(arr_img, axis=1)
Here is the numpy documentation for the expand_dims function.
I've trained a Handwritten image classifier using Keras library in Python. Initially I've used standard MNIST dataset for training and testing purpose. But now I want to use my own data set for testing, in which all the images are size 900*1200*3 instead of 28*28*1
So I need to reshape all the images before testing. I'm using following code to reshape but it give errors.
Code:
bb = lol.reshape(lol.shape[0], 28, 28, 1).astype('float32')
where lol is my numpy array containing 55 images of shape (900,1200,3)
and the Error log is as following:
ValueError Traceback (most recent call last)
<ipython-input-46-87da95da73e9> in <module>()
24 # # you can show every image
25 # img.show()
---> 26 bb = lol.reshape(lol.shape[0], 28, 28, 1).astype('float32')
27 # model = loaded_model
28 # classes = model.predict(bb)
ValueError: cannot reshape array of size 178200000 into shape (55,28,28,1)
So what am I doing wrong? Can I get accurate predictions even after resizing the large images to very small images of 28*28? Thanks for help.
What you are doing is wrong. You can't reshape an array of (55, 900, 1200, 3) into an array of (55, 28, 28, 1), because you are trying to store 55*900*1200*3=178200000 elements in an array that can store only 55*28*28=43120 elements.
You want to do two things:
1) Convert your rgb image (indicated by the last dimension which is the 3 channels) into grayscale (1 channel). The simplest way to do this is (R+B+G)/3. All python libraries that have to do with images (PIL, OpenCV, skimage, tensorflow, keras, etc) have this already implemented. Example:
from skimage.color import rgb2gray
gray = rgb2gray(original)
2) Resize the image from 900x1200 to 28x28. Again you can do this in all major image-related python libraries. Example:
from skimage.transform import resize
resized = resize(gray, (28,28))
Now if you want to do this in all 55 images you can either write a function that transforms one image and map it across your array, or use a simple for loop and populate your new array one image at a time.
In your case the code should look something like this:
num_images = lol.shape[0] # 55 in your case
resized_images = np.zeros(shape=(num_images, 28, 28, 1)) # your final array
for i in range(num_images):
gray = rgb2gray(lol[i,:,:,:]) # gray.shape should be (900,1200,1)
resized = resize(gray, (28,28)) # resized.shape should be (28,28,1)
resized_images[i,:,:,:] = resized # resized_images.shape should be (55,28,28,1)
It would be more intuitive to process each image individually, which would also give you the best chance of preserving some information.
Try using the PIL library:
import numpy
from PIL import Image
lol = numpy.zeros((55,900,1200,3),dtype=numpy.uint8)
new_array = numpy.zeros((lol.shape[0],28,28),dtype=numpy.float32)
for i in range(lol.shape[0]):
img = Image.fromarray(lol[i])
img_resize = img.resize((28,28))
img_mono = img_resize.convert('L')
arr = numpy.array(img_mono,dtype=numpy.uint8)
new_array[i] = arr