I want to call tf.image.resize_image_with_crop_or_pad(images,height,width) to resize my input images. As my input images are all in form as 2-d numpy array of pixels, while the image input of resize_image_with_crop_or_pad must be 3-d or 4-d tensor, it will cause an error. What should I do?
Let's suppose that you got images that's a [n, W, H] numpy nd-array, in which n is the number of images and W and H are the width and the height of the images.
Convert images to a tensor, in order to be able to use tensorflow functions:
tf_images = tf.constant(images)
Convert tf_images to the image data format used by tensorflow (thus from n, W, H to n, H, W)
tf_images = tf.transpose(tf_images, perm=[0,2,1])
In tensorflow, every image has a depth channell, thus altough you're using grayscale images, we have to add the depth=1 channell.
tf_images = tf.expand_dims(tf_images, 2)
Now you can use tf.image.resize_image_with_crop_or_pad to resize the batch (that how has a shape of [n, H, W, 1] (4-d tensor)):
resized = tf.image.resize_image_with_crop_or_pad(tf_images,height,width)
Related
I am trying to make YOLO architecture with an input of gray scaled image.
In general, input of YOLO is RGB image (3-dim tensor), whose size is (N, N, 3), where N is size of an image and 3 represents R,G and B channel.
When reading an image with pillow, the code below gives me 2-dim tensor image. Which is, (N , N).
from PIL import Image
image = Image.open('sample.jpg')
image = image.convert('L')
However, in order to have this image as an input of YOLO architecture, I need the size of it to be (N, N, 1) , where 1 represents gray channel. Which is 3-dim tensor.
If possible, I would be glad if the solution was given in code using pillow.
I found that in Pillow, there is no way to do what I want. So instead, I decided to use numpy.
In the below coding example I can not understand why the output tensor , grid has a shape of
3,28,280. I understand why its 28 in height and 280 in width, but not the 3. It seems from running plt.imshow() on all 3 28x280 arrays along axis 0 that they are identical copies since printing any 1 of these gives me the image I want.
Also I do not understand why I can pass grid as an argument to plt.imshow() given that it is supposed to take in a 2D array, not a 3D one as grid clearly is.
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
train_set = torchvision.datasets.FashionMNIST(
root = './pytorch_obj_classifier/data/FashionMNIST',
train = True,
download = True,
transform = transforms.Compose([
transforms.ToTensor()
])
)
sample = next(iter(train_loader))
image,label = sample
print(image.shape)
grid = torchvision.utils.make_grid(image,padding=0, nrow=10)
print(grid.shape)
plt.figure(figsize=(15,15))
grid = np.transpose(grid,(1,2,0))
grid1 = grid[:,:,0]
grid2 = grid[:,:,1]
grid3 = grid[:,:,2]
plt.imshow(grid1,cmap = 'gray')
plt.imshow(grid2,cmap = 'gray')
plt.imshow(grid3,cmap = 'gray')
plt.imshow(grid,cmap = 'gray')
The MNIST dataset consists of grascale images. If you look at the implementation detail of torchvision.utils.make_grid, single-channel images get their channel copied three times:
if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images
tensor = torch.cat((tensor, tensor, tensor), 1)
As for matplotlib.pyplot.imshow it can take 2D, 3D or 4D inputs:
The image data. Supported array shapes are:
(M, N): an image with scalar data. The data is visualized using a colormap.
(M, N, 3): an image with RGB values (0-1 float or 0-255 int).
(M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.
Generally speaking, we wouldn't refer to dimensions but rather describe tensors by their shape (the size on each of their axes). In PyTorch, images always have three axes, and have a shape that follows: (channel, height, width). Even for single-channel images: considering it as a 3D tensor (1, height, width) instead of a 2D tensor (height, width). This is to be consistant with cases where you have more than one channel, which is very often (cf. convolution neural networks).
So I am trying to write a function that converts RGB to HSI on Python.
I have an image that is saved in np.ndarray (tensor I suppose?) with dimensions (1080, 1920, 3), that is - 1080x1920 pixels in RGB. How can I extract matrix of R/G/B; after I get H/S/I, how do I concatenate the matrices to get back the tensor (1080, 1920, 3)?
Assuming your_image contains the RGB channels, you can extract each channel using the corresponding index:
r = your_image[..., 0]
g = your_image[..., 1]
b = your_image[..., 2]
Note: You may need to normalize the values to the interval [0.0, 1.0]. If so, divide them by 255.0.
Conversely, you can stack the three channels together as follows:
import numpy as np
your_image = np.dstack((r, g, b))
I am trying to extract edge features like this:
img = io.imread('pic.jpg')
H, W, C = img.shape
features = custom_features(img)
assignments = kmeans_fast(features, num_segments)
segments = assignments.reshape((H, W))
# Display segmentation
plt.imshow(segments, cmap='viridis')
plt.axis('off')
plt.show()
custom_features:
from skimage.filters import prewitt_h,prewitt_v
def custom_features(image):
"""
Args:
img - array of shape (H, W, C)
Returns:
features - array of (H * W, C)
"""
edges_prewitt_horizontal = prewitt_h(image)
return edges_prewitt_horizontal
However, currently I get an error because the shape of the image is different than what is expected by the prewitt_h function.
ValueError: The parameter `image` must be a 2-dimensional array
How can I modify this inside the function such that the returned shape is as desired?
It looks like you need to give to prewitt a grayscale image. The prewitt transform applies a convolution with a 2-dimensional kernel, hence you need 2-dimensional image (and yours is 3-d, because you have colors (RGB, 3 Channels)).
You could add inside your custom_features method a conversion to grayscale (skimage that you are using already has a method for that, check it out )
from skimage.filters import prewitt_h,prewitt_v
from skimage.color import rgb2gray
def custom_features(image):
"""
Args:
image - array of shape (H, W, C)
Returns:
features - array of (H * W, C)
"""
grayscale = rgb2gray(image)
edges_prewitt_horizontal = prewitt_h(grayscale)
return edges_prewitt_horizontal
And this should do the trick (I assume the image that the custom_features methods receives in input is always an RGB image because of the shape you defined above).
In case you have different types you can add a check if C == 3: to convert only RGB images.
By default, skimage.io.imread returns the read JPEG image as a shape-(M, N, 3) array, representing an RGB color image. However, the prewitt functions expect that the input is a single channel image.
To fix this, convert the image to grayscale first with skimage.color.rgb2gray before filtering. Or you could read the image directly as grayscale with skimage.io.imread(f, as_gray=True).
I've created a script to shift the hue of an image around the colour wheel by any number of steps.
As you might imagine, when I import an image (using PIL) and convert it to a Numpy array, it is this shape: (x, y, (r,g,b)).
I convert this array from RGB to HSV colour space with the Skimage color module (after scaling the RGB values to the range [0,1.0]).
The trouble I am having is manipulation of only one of the HSV values (either H, S, or V) for all pixels. I'd like to efficiently add, multiply, or subtract any of these three dimensions for every 'pixel' in the array.
I have gotten it to work by splitting the HSV values into three separate arrays:
h,s,v = np.dsplit(hsv,3)
manipulating the array in the way I want:
h_new = np.multiply(h,.33)
and then reassembling the array:
hsv_new = np.stack((h_new,s,v))
This doesn't seem like the most efficient way to do this and so my question is:
How can I manipulate each of these dimensions without having to split the array into chunks?
hsv[:,:,0] *= 0.33
modifies the h component of hsv inplace.
hsv[:,:,0] is a "basic slice" of hsv and as such, is a view of the original array.
h, s, v = np.dsplit(hsv, 3)
creates 3 new arrays, h, s, v which copy data from hsv. Modifying h, s, v does not affect hsv. So modifying h then requires rebuilding hsv. This is therefore slower.
For notational convenience, you could replace
h,s,v = np.dsplit(hsv, 3)
with
h, s, v = hsv[:,:,0], hsv[:,:,1], hsv[:,:,2]
Then h, s, v will be views of hsv, and modifying h, s, v will automatically affect hsv itself. (So there is no need for hsv_new = np.stack((h_new,s,v))).
Note also that h,s,v = np.dsplit(hsv, 3) makes h, s and v have shape (n, m, 1). Whereas
h, s, v = hsv[:,:,0], hsv[:,:,1], hsv[:,:,2]
makes h, s and v have shape (n, m). That might affect your other code a little bit, but overall I think the latter is nicer.