copy/reshape high dimensional numpy array - python

I have a numpy array with a shape like this
x.shape
(100, 1, 300, 300)
Think of this as 100 observations of grayscale images of size 300x300.
Grayscale images have only 1 channel, hence the second 1 in the shape.
I want to convert this to an array of RGB images, with 3 channels.
I want to just copy the grayscale image to the two other channels.
So the final shape would be (100, 3, 300, 300)
How can I do that?

Use np.repeat -
np.repeat(x,3,axis=1)
Sample run -
In [8]: x = np.random.randint(11,99,(2,1,3,4))
In [9]: np.repeat(x,3,axis=1).shape
Out[9]: (2, 3, 3, 4)

Related

Reshaping a 2D array of images using numpy

I have a numpy array of shape (10, 10, 1024, 1024, 3). This represents an 10x10 grid of images, each of shape (1024, 1024, 3) (1024x1024 color images). I want to reshape this into one array of shape (10*1024, 10*1024, 3), where the 1024x1024 patch of pixels in the upper left of the new image corresponds to the [0, 0] index of my original array. What's the best way to do this using numpy?
This should do the job: np.swapaxes(arr,1,2).reshape(10*1024, 10*1024, 3). Note that swapaxis generates an array of shape (10, 1024, 10, 1024, 3).

HSV image formation in python

I have 3 (512×512) numpy arrays representing the Hue, Saturation and Value channels of my desired HSV image, containing float values.
How do I construct a single 512×512 image from these 3 numpy arrays?
To create an HSV image from the 3 channels, you can put them in a numpy array and transpose it to convert the shape from (3, 512, 512) to (512, 512, 3):
hsv = np.transpose([h, s, v], (1, 2, 0))
With OpenCV you can also use cv2.merge.

How to add item to each tuple element of numpy array?

I have a numpy array A of shape (512, 512, 4)
Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
I have a numpy array B of shape (512, 512, 3)
Each element is a tuple: (r, g, b). It represents a similar, RGB image.
I want to fast copy all the 'a' (alpha) values from each element of A into corresponding elements in B. (basically transferring the alpha channel).
resulting B shape would be (512, 512, 4).
How can I achieve this? The algorithm is based on fast pixel manipulation technique laid out here.
Code:
## . input_image is loaded using PIL/pillow
rgb_image = input_image
print(f"Image: {rgb_image}")
rgb_image_array = np.asarray(rgb_image) # convert to numpy array
print(f"Image Array Shape: {rgb_image_array.shape}")
gray_image = rgb_image.convert("L") # convert to grayscale
print(f"Gray image: {gray_image}")
gray_image_array = np.asarray(gray_image)
print(f"Gray image shape: {gray_image_array.shape}")
out_image_array = np.zeros(rgb_image_array.shape, rgb_image_array.dtype)
print(f"Gray image array shape: {out_image_array.shape}")
rows, cols, items = out_image_array.shape
# create lookup table for each gray value to new rgb value
LUT = []
for i in range(256):
color = gray_to_rgb(i / 256.0, positions, colors)
LUT.append(color)
LUT = np.array(LUT, dtype=np.uint8)
print(f"LUT shape: {LUT.shape}")
# get final output that uses lookup table technique.
# notice that at this point, we don't have the alpha channel
out_image_array = LUT[gray_image_array]
print(f"output image shape: {out_image_array.shape}")
# How do I get the alpha channel back from rgb_image_array into out_image_array
Output:
Image: <PIL.Image.Image image mode=RGBA size=512x512 at 0x7FDEF5F2F438>
Image Array Shape: (512, 512, 4)
Gray image: <PIL.Image.Image image mode=L size=512x512 at 0x7FDEF5C25CF8>
Gray image shape: (512, 512)
Gray image array shape: (512, 512, 4)
LUT shape: (256, 3)
output image shape: (512, 512, 3)
Using numpy slices:
import numpy as np
A = [[(1,1,1,4)], [(1,1,1,5)]]
B = [[(2,2,2)], [(3,3,3)]]
# A and B are tensors of order 3
A = np.array(A)
B = np.array(B)
print("A=")
print(A)
print("B=")
print(B)
C = np.copy(A)
# assign along all 1st and 2nd dimensions, but only the first three elements of the third dimension
C[:,:,0:3] = B
print("C=")
print(C)
Output:
A=
[[[1 1 1 4]]
[[1 1 1 5]]]
B=
[[[2 2 2]]
[[3 3 3]]]
C=
[[[2 2 2 4]]
[[3 3 3 5]]]
Let's be careful about terminology
I have a numpy array A of shape (512, 512, 4) Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
If A has that shape, and has a numeric dtype (e.g. np.int32), then it has 512*512*4 elements. The only way it can have a tuple element is if the dtype was object. I suspect rather that you have a 512x512 image where each pixel is represented by 4 values.
A[0,0,:]
will be a (4,) shape array representing those 4 values (sometimes called channels) of one pixel.
A[:,:,0]
is the r value for the whole image.
If they really are 3d arrays, then #mocav's solution of copying columns (indexing on the last dimension) to a new array is the right one.
Another possibility is that they are structured 2d arrays with 4 and 3 fields respectively. That would print (str) as tuples, though the repr print will make the compound dtype explicit. But the solution will be similar - make a new array of the right shape and dtype (like A), and copy values by field name from B and A. (I'll wait with details until you clarify the situation).

Image.fromarray changes size

I have data that I want to store into an image. I created an image with width 100 and height 28, my matrix has the same shape. When I use Image.fromarray(matrix) the shape changes:
from PIL import Image
img = Image.new('L', (100, 28))
tmp = Image.fromarray(matrix)
print(matrix.shape) # (100, 28)
print(tmp.size) # (28, 100)
img.paste(tmp, (0, 0, 100, 28) # ValueError: images do not match
When I use img.paste(tmp, (0, 0)) the object is pasted into the image, but the part starting with the x value 28 is missing.
Why does the dimension change?
PIL and numpy have different indexing systems. matrix[a, b] gives you the point at x position b, and y position a, but img.getpixel((a, b)) gives you the point at x position a, and y position b. As a result of this, when you are converting between numpy and PIL matrices, they switch their dimensions. To fix this, you could take the transpose (matrix.transpose()) of the matrix.
Here's what's happening:
import numpy as np
from PIL import Image
img = Image.new('L', (100, 28))
img.putpixel((5, 3), 17)
matrix = np.array(img)
print matrix[5, 3] #This returns 0
print matrix[3, 5] #This returns 17
matrix = matrix.transpose()
print matrix[5, 3] #This returns 17
print matrix[3, 5] #This returns 0
NumPy and PIL have different indexing systems. So a (100, 28) numpy array will be interpreted as an image with width 28 and height 100.
If you want a 28x100 image, then you should swap the dimensions for your image instantiation.
img = Image.new('L', (28, 100))
If you want a 100x28 image, then you should transpose the numpy array.
tmp = Image.fromarray(matrix.transpose())
More generally, if you're working with RGB, you can use transpose() to only swap the first two axes.
>>> arr = np.zeros((100, 28, 3))
>>> arr.shape
(100, 28, 3)
>>> arr.transpose(1, 0, 2).shape
(28, 100, 3)

Similar Image shape conversion in Python and OpenCV

I am new to Python and am having difficulty understanding the image shape conversion in Python.
In Python code, image I has I.shape
ipdb> I.shape
(720, 1280, 3)
Running this command in Python converts the I's shape and stored into h5_image
h5_image = np.transpose(I, (2,0,1)).reshape(data_shape)
Where data_shape is:
ipdb> p data_shape
(1, 3, 720, 1280)
What is OpenCV's similar function that does the same output?
In (1, 3, 720, 1280), what does 1 mean?
What is the difference between (3, 720, 1280) and (720, 1280, 3)?
You can look on image (I) in python/numpy as a matrix with N dimensions.
In the case you have grayscale image, you will have single value for each row and column. This means 2 dimensions and the shape will be: I.shape --> (rows, cols)
With RGB image, you have 3 channels, red, green, blue. So you have a total of 3 dimensions: I.shape --> (rows, cols, 3)
With RGBA image, you have 4 channels, red, green, blue, alpha. Still 3 dimensions: I.shape --> (rows, cols, 4)
These are the common way to keep image data, but of course you can keep it in any way you like, as long as you know how to read it. For example, you can keep it as one long vector in 1 dimension, and keep also the image width and height, so you know how to read it into 2D format.
For your more specific questions:
I am not sure what is the output you are looking for. You can do transpose() or flip() also in OpenCV.
The (1, 3, 720, 1280) only means you have an additional degenerate dimension. To access each pixel you will have to write I[1,channel,row,col]. The 1 is unnecessary, and it is not a common way to hold an image array. Why do you want to do this? Do you want to save in a specific format? (HDF5?)
The only difference is in the arrangement of your data. For example, in the case of (3, 720, 1280), to get the red channel you need to write: red = I[0,:,:]. While in the case of (720, 1280, 3) you need to write: red = I[:,:,0] (This is more common).
*There are some performance issues which depend on the actual arrangment of the image data in your memory, but I don't think you need to care of this right now.

Categories