Extracting matrices from tensors - python

So I am trying to write a function that converts RGB to HSI on Python.
I have an image that is saved in np.ndarray (tensor I suppose?) with dimensions (1080, 1920, 3), that is - 1080x1920 pixels in RGB. How can I extract matrix of R/G/B; after I get H/S/I, how do I concatenate the matrices to get back the tensor (1080, 1920, 3)?

Assuming your_image contains the RGB channels, you can extract each channel using the corresponding index:
r = your_image[..., 0]
g = your_image[..., 1]
b = your_image[..., 2]
Note: You may need to normalize the values to the interval [0.0, 1.0]. If so, divide them by 255.0.
Conversely, you can stack the three channels together as follows:
import numpy as np
your_image = np.dstack((r, g, b))

Related

Using numpy.histogram on an array of images

I'm trying to calculate image histograms of an numpy array of images. The array of images is of shape (n_images, width, height, colour_channels) and I want to return an array of shape (n_images, count_in_each_bin (i.e. 255)). This is done via two intermediary steps of averaging each colour channel for each image and then flattening each 2D image to a 1D one.
I think have successfully done this with the code below, however I have cheated a bit with the for loop at the end. My question is this - is there a way of getting rid of the last for loop and using an optimised numpy function instead?
def histogram_helper(flattened_image: np.array) -> np.array:
counts, _ = np.histogram(flattened_image, bins=[n for n in range(0, 256)])
return counts
# Using 10 RGB images of width and height 300
images = np.zeros((10, 300, 300, 3))
# Take the mean of the three colour channels
channel_avg = np.mean(images, axis=3)
# Flatten each image in the array of images, resulting in a 1D representation of each image.
flat_images = channel_avg.reshape(*channel_avg.shape[:-2], -1)
# Now calculate the counts in each of the colour bins for each image in the array.
# This will provide us with a count of how many times each colour appears in an image.
result = np.empty((0, len(self.histogram_bins) - 1), dtype=np.int32)
for image in flat_images:
colour_counts = self.histogram_helper(image)
colour_counts = colour_counts.reshape(1, -1)
result = np.concatenate([result, colour_counts])
You don't necessarily need to call np.histogram or np.bincount for this, since pixel values are in the range 0 to N. That means that you can treat them as indices and simply use a counter.
Here's how I would transform the initial images, which I imaging are of dtype np.uint8:
images = np.random.randint(0, 255, size=(10, 5, 5, 3)) # 10 5x5 images, 3 channels
reshaped = np.round(images.reshape(images.shape[0], -1, images.shape[-1]).mean(-1)).astype(images.dtype)
Now you can simply count the histograms using unbuffered addition with np.add.at:
result = np.zeros((images.shape[0], 256), int)
index = np.arange(len(images))[:, None]
np.add.at(result, (index, reshaped), 1)
The last operation is in-place and therefore returns None, but the answer will be in result nevertheless.

Mapping tensor in pytorch

I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)

extract edge features with prewitt_h

I am trying to extract edge features like this:
img = io.imread('pic.jpg')
H, W, C = img.shape
features = custom_features(img)
assignments = kmeans_fast(features, num_segments)
segments = assignments.reshape((H, W))
# Display segmentation
plt.imshow(segments, cmap='viridis')
plt.axis('off')
plt.show()
custom_features:
from skimage.filters import prewitt_h,prewitt_v
def custom_features(image):
"""
Args:
img - array of shape (H, W, C)
Returns:
features - array of (H * W, C)
"""
edges_prewitt_horizontal = prewitt_h(image)
return edges_prewitt_horizontal
However, currently I get an error because the shape of the image is different than what is expected by the prewitt_h function.
ValueError: The parameter `image` must be a 2-dimensional array
How can I modify this inside the function such that the returned shape is as desired?
It looks like you need to give to prewitt a grayscale image. The prewitt transform applies a convolution with a 2-dimensional kernel, hence you need 2-dimensional image (and yours is 3-d, because you have colors (RGB, 3 Channels)).
You could add inside your custom_features method a conversion to grayscale (skimage that you are using already has a method for that, check it out )
from skimage.filters import prewitt_h,prewitt_v
from skimage.color import rgb2gray
def custom_features(image):
"""
Args:
image - array of shape (H, W, C)
Returns:
features - array of (H * W, C)
"""
grayscale = rgb2gray(image)
edges_prewitt_horizontal = prewitt_h(grayscale)
return edges_prewitt_horizontal
And this should do the trick (I assume the image that the custom_features methods receives in input is always an RGB image because of the shape you defined above).
In case you have different types you can add a check if C == 3: to convert only RGB images.
By default, skimage.io.imread returns the read JPEG image as a shape-(M, N, 3) array, representing an RGB color image. However, the prewitt functions expect that the input is a single channel image.
To fix this, convert the image to grayscale first with skimage.color.rgb2gray before filtering. Or you could read the image directly as grayscale with skimage.io.imread(f, as_gray=True).

How to add item to each tuple element of numpy array?

I have a numpy array A of shape (512, 512, 4)
Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
I have a numpy array B of shape (512, 512, 3)
Each element is a tuple: (r, g, b). It represents a similar, RGB image.
I want to fast copy all the 'a' (alpha) values from each element of A into corresponding elements in B. (basically transferring the alpha channel).
resulting B shape would be (512, 512, 4).
How can I achieve this? The algorithm is based on fast pixel manipulation technique laid out here.
Code:
## . input_image is loaded using PIL/pillow
rgb_image = input_image
print(f"Image: {rgb_image}")
rgb_image_array = np.asarray(rgb_image) # convert to numpy array
print(f"Image Array Shape: {rgb_image_array.shape}")
gray_image = rgb_image.convert("L") # convert to grayscale
print(f"Gray image: {gray_image}")
gray_image_array = np.asarray(gray_image)
print(f"Gray image shape: {gray_image_array.shape}")
out_image_array = np.zeros(rgb_image_array.shape, rgb_image_array.dtype)
print(f"Gray image array shape: {out_image_array.shape}")
rows, cols, items = out_image_array.shape
# create lookup table for each gray value to new rgb value
LUT = []
for i in range(256):
color = gray_to_rgb(i / 256.0, positions, colors)
LUT.append(color)
LUT = np.array(LUT, dtype=np.uint8)
print(f"LUT shape: {LUT.shape}")
# get final output that uses lookup table technique.
# notice that at this point, we don't have the alpha channel
out_image_array = LUT[gray_image_array]
print(f"output image shape: {out_image_array.shape}")
# How do I get the alpha channel back from rgb_image_array into out_image_array
Output:
Image: <PIL.Image.Image image mode=RGBA size=512x512 at 0x7FDEF5F2F438>
Image Array Shape: (512, 512, 4)
Gray image: <PIL.Image.Image image mode=L size=512x512 at 0x7FDEF5C25CF8>
Gray image shape: (512, 512)
Gray image array shape: (512, 512, 4)
LUT shape: (256, 3)
output image shape: (512, 512, 3)
Using numpy slices:
import numpy as np
A = [[(1,1,1,4)], [(1,1,1,5)]]
B = [[(2,2,2)], [(3,3,3)]]
# A and B are tensors of order 3
A = np.array(A)
B = np.array(B)
print("A=")
print(A)
print("B=")
print(B)
C = np.copy(A)
# assign along all 1st and 2nd dimensions, but only the first three elements of the third dimension
C[:,:,0:3] = B
print("C=")
print(C)
Output:
A=
[[[1 1 1 4]]
[[1 1 1 5]]]
B=
[[[2 2 2]]
[[3 3 3]]]
C=
[[[2 2 2 4]]
[[3 3 3 5]]]
Let's be careful about terminology
I have a numpy array A of shape (512, 512, 4) Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
If A has that shape, and has a numeric dtype (e.g. np.int32), then it has 512*512*4 elements. The only way it can have a tuple element is if the dtype was object. I suspect rather that you have a 512x512 image where each pixel is represented by 4 values.
A[0,0,:]
will be a (4,) shape array representing those 4 values (sometimes called channels) of one pixel.
A[:,:,0]
is the r value for the whole image.
If they really are 3d arrays, then #mocav's solution of copying columns (indexing on the last dimension) to a new array is the right one.
Another possibility is that they are structured 2d arrays with 4 and 3 fields respectively. That would print (str) as tuples, though the repr print will make the compound dtype explicit. But the solution will be similar - make a new array of the right shape and dtype (like A), and copy values by field name from B and A. (I'll wait with details until you clarify the situation).

Tensorflow resize_image_with_crop_or_pad

I want to call tf.image.resize_image_with_crop_or_pad(images,height,width) to resize my input images. As my input images are all in form as 2-d numpy array of pixels, while the image input of resize_image_with_crop_or_pad must be 3-d or 4-d tensor, it will cause an error. What should I do?
Let's suppose that you got images that's a [n, W, H] numpy nd-array, in which n is the number of images and W and H are the width and the height of the images.
Convert images to a tensor, in order to be able to use tensorflow functions:
tf_images = tf.constant(images)
Convert tf_images to the image data format used by tensorflow (thus from n, W, H to n, H, W)
tf_images = tf.transpose(tf_images, perm=[0,2,1])
In tensorflow, every image has a depth channell, thus altough you're using grayscale images, we have to add the depth=1 channell.
tf_images = tf.expand_dims(tf_images, 2)
Now you can use tf.image.resize_image_with_crop_or_pad to resize the batch (that how has a shape of [n, H, W, 1] (4-d tensor)):
resized = tf.image.resize_image_with_crop_or_pad(tf_images,height,width)

Categories