How do I broadcast other dimensions using numpy.matmul? - python

I have an 640x480 RGB image with numpy shape (480, 640, 3) -- 3 for the R, G, B channels
and I would like to at each pixel, transform the RGB value according to:
# for each pixel
for i in range(img.shape[0]):
for j in range(img.shape[1]):
# set the new RGB value at that pixel to
# (new RGB vector) = (const matrix A) * (old RGB vector) + (const vector B)
img[i,j,:] = np.reshape((np.matmul(A, np.expand_dims(img[i,j,:], axis = 1)) + B), (3,))
Where A is of shape (3,3) and B is of shape (3,1).
How do I do this with numpy without writing a loop over the pixels?

One way would be with np.einsum -
img = np.einsum('ij,klj->kli',A,data) + B
Another with np.tensordot -
img = np.tensordot(data,A,axes=(-1,-1)) + B
Related - Understanding tensordot
I assumed B to be 1D array. If not so, use B.ravel() in place of B.

Related

How do I use np.linspace() to change the pixels colors of a patch in an image

I drew a straight line by changing the color of 50 pixels in the image below. The line is the diagonal line shown on the image
Here is how I did it.
zeros = torch.zeros_like(img) #img is b,c,h,w
x1, y1, x2, y2, r, g, b = 16, 50, 100, 100, 6, 2, 4
rgb = torch.tensor([r, g, b])
cx, cy = x2 - x1, y2 - y1
for t in np.linspace(0, 1, 50):
px, py = x1 + t * cx, y1 + t * cy
zeros[0, :, int(py), int(px)] = rgb
mask = (zeros == 0).float()
im = mask * img + (1 - mask) *zeros
It is obvious in np.linspace(0, 1, 50) that I am changing 50 pixels. I want to use the same method to change a specific portion of the image, like a 14 x 14 patch of my image if the image size is 224 x 224 or even more. The range of pixels to cover now will be 196 -> (since there are 196 pixels in a 14 x 14 patch) instead of 50. How do I go about it, please?
I assume that img dimensions are BCHW based on the comment in the first line.
You can change values pixel by pixel in a for loop (which is quite inefficient):
img[0, :, y, x] = rgb
Or you can change the whole patch. Assuming img is Numpy ndarray and you want to reduce intensity:
img[0, :, 100:200, 100:200] = (img[0, :, 100:200, 100:200] / 1.3).astype(np.uint8)
Just put attention to dimensions and data types. You have to convert the result to uint8, same as original image.
To set some intensity gradient on the patch:
gradient = np.tile(np.linspace(0.2, 0.9, 100), (3, 100, 1))
img[0, :, 100:200, 100:200] = (img[0, :, 100:200, 100:200] * gradient).astype(np.uint8)
Here 0.2-0.9 is intensity and 100 is the patch size. So gradient is 3x100x100 tensor. I create it by taking a vector of intensities of shape (100,) and tiling it so it becomes a 3x100x100 tensor. The last line is a multiplication of each value in img by corresponding value in gradient.
When working with images it is much more convenient to represent them in HWC format. You can easily achieve that by writing
img = img[0].transpose(1,2,0)
This will swap dimensions CHW->HWC (as Batch is removed by indexing [0] ). Then you can easily write:
img[y:y+patch_h, x:x+patch_w] = rgb
and Python will automatically will try to fit missing dimension. In this case a 3-d vector rgb will be broadcasted to patch_hxpatch_wx3 tensor. Same logic works for Python native arrays, NumPy ndarray, Pytorch and TF tensors. Then you can (for example) flip a patch by
img[y:y+patch_h, x:x+patch_w] = np.fliplr(img[y:y+patch_h, x:x+patch_w])

Retrieve RGB image from 1-dim array containing RGB image data

I pass a C array containing RGB image data to a function in Python to further process the image.
How can I retrieve this image and plot it as well in Python?
the C array named c_data that contains RGB image data was created by
for(k = 0; k < c; ++k){
for(j = 0; j < h; ++j){
for(i = 0; i < w; ++i){
int dst_index = i + w*j + w*h*k;
int src_index = k + c*i + c*w*j;
c_data[dst_index] = (float)stb_im[src_index]/255.;
}
}
}
the C array is converted into a numpy array and is passed to the Python function with the following header via the parameter named im_data
def read_img_from_c(im_data, im_h, im_w):
print(im_h) // 480
print(im_w) // 640
print(im_data.shape) // (921600,) --> (480*640*3)
I tried to simply reshape the numpy array using
data = im_data.reshape((im_h, im_w, 3))
and create a PIL image object using
img = PIL.Image.fromarray(data, 'RGB')
, but when I run the following command
img.show()
I got the following rather than the original image.
Update: I follow the suggestion by multiplying those normalized pixel values by 255.0, cast the numpy array to type int and plot:
im_data = (im_data*255.0).astype(np.uint8)
im_data = im_data.reshape((im_h, im_w, 3))
img = Image.fromarray(im_data, 'RGB')
img.show()
and I got the image with repeated patterns instead of a single big RGB
image:
Try multiplying data by 255 again and rounding it to int. I guess the values in RGB tuple should be from range 0-255, not 0-1.
After spent a day for recovering this image, I have found a solution.
I believe that the flatten version of my normalized image pixels were stored in the one-dimensional array named im_data that looks like this
[ r1 g1 b1 r2 g2 b2 ... rN gN bN]
, where subscript N is the number of pixels.
So, the first step I multiply each pixel with 255.0 to get pixel values between 0-255:
import numpy as np
im_data = (im_data*255.0).astype(np.uint8)
and rather than reshaping the array using a shape of (im_h, im_w, 3), I reshape it using a shape of (3, im_h, im_w) so:
im_data = im.reshape((3, im_h, im_w))
Finally, I transpose the result numpy array to get a correct image shape, which is (im_h, im_w, 3), so:
im_data = np.transpose(im, (1, 2, 0))
Finally,
img = Image.fromarray(im_data, 'RGB')
img.show()
and boom:
(the image is one of the MOTChallenge benchmark dataset https://motchallenge.net/)
To be honest, I am not totally sure about how all these works out. I just mess around with array operations.

How can I replace a value from one array with a value in the same index of another array?

I have two 3D numpy arrays which represent two images. The shape of each array is (1080, 1920, 3). The number 3 represents the RGB value of each pixel in the image.
My goal is to replace every non-black pixel in the first array to the value of the "parallel" pixel (in the same index) from the other array.
How can I do this using only numpy methods?
Use a mask with True/False values
# All pixels should be normalized 0..1 or 0..254
first_img = np.random.rand(1920,1080,3)
second_img = np.random.rand(1920,1080,3)
eps = 0.01 # Black pixel threshold
mask = first_img.sum(axis=2) > eps
for i in range(first_img.shape[2]):
first_img[:,:,i] = (first_img[:, :, i] * mask) + ((1 - mask) * second_img[:, :, i])

How to add item to each tuple element of numpy array?

I have a numpy array A of shape (512, 512, 4)
Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
I have a numpy array B of shape (512, 512, 3)
Each element is a tuple: (r, g, b). It represents a similar, RGB image.
I want to fast copy all the 'a' (alpha) values from each element of A into corresponding elements in B. (basically transferring the alpha channel).
resulting B shape would be (512, 512, 4).
How can I achieve this? The algorithm is based on fast pixel manipulation technique laid out here.
Code:
## . input_image is loaded using PIL/pillow
rgb_image = input_image
print(f"Image: {rgb_image}")
rgb_image_array = np.asarray(rgb_image) # convert to numpy array
print(f"Image Array Shape: {rgb_image_array.shape}")
gray_image = rgb_image.convert("L") # convert to grayscale
print(f"Gray image: {gray_image}")
gray_image_array = np.asarray(gray_image)
print(f"Gray image shape: {gray_image_array.shape}")
out_image_array = np.zeros(rgb_image_array.shape, rgb_image_array.dtype)
print(f"Gray image array shape: {out_image_array.shape}")
rows, cols, items = out_image_array.shape
# create lookup table for each gray value to new rgb value
LUT = []
for i in range(256):
color = gray_to_rgb(i / 256.0, positions, colors)
LUT.append(color)
LUT = np.array(LUT, dtype=np.uint8)
print(f"LUT shape: {LUT.shape}")
# get final output that uses lookup table technique.
# notice that at this point, we don't have the alpha channel
out_image_array = LUT[gray_image_array]
print(f"output image shape: {out_image_array.shape}")
# How do I get the alpha channel back from rgb_image_array into out_image_array
Output:
Image: <PIL.Image.Image image mode=RGBA size=512x512 at 0x7FDEF5F2F438>
Image Array Shape: (512, 512, 4)
Gray image: <PIL.Image.Image image mode=L size=512x512 at 0x7FDEF5C25CF8>
Gray image shape: (512, 512)
Gray image array shape: (512, 512, 4)
LUT shape: (256, 3)
output image shape: (512, 512, 3)
Using numpy slices:
import numpy as np
A = [[(1,1,1,4)], [(1,1,1,5)]]
B = [[(2,2,2)], [(3,3,3)]]
# A and B are tensors of order 3
A = np.array(A)
B = np.array(B)
print("A=")
print(A)
print("B=")
print(B)
C = np.copy(A)
# assign along all 1st and 2nd dimensions, but only the first three elements of the third dimension
C[:,:,0:3] = B
print("C=")
print(C)
Output:
A=
[[[1 1 1 4]]
[[1 1 1 5]]]
B=
[[[2 2 2]]
[[3 3 3]]]
C=
[[[2 2 2 4]]
[[3 3 3 5]]]
Let's be careful about terminology
I have a numpy array A of shape (512, 512, 4) Each element is a tuple: (r, g, b, a). It represents a 512x512 RGBA image.
If A has that shape, and has a numeric dtype (e.g. np.int32), then it has 512*512*4 elements. The only way it can have a tuple element is if the dtype was object. I suspect rather that you have a 512x512 image where each pixel is represented by 4 values.
A[0,0,:]
will be a (4,) shape array representing those 4 values (sometimes called channels) of one pixel.
A[:,:,0]
is the r value for the whole image.
If they really are 3d arrays, then #mocav's solution of copying columns (indexing on the last dimension) to a new array is the right one.
Another possibility is that they are structured 2d arrays with 4 and 3 fields respectively. That would print (str) as tuples, though the repr print will make the compound dtype explicit. But the solution will be similar - make a new array of the right shape and dtype (like A), and copy values by field name from B and A. (I'll wait with details until you clarify the situation).

Channel mix with Pillow

I would like to do some color transformations, for example given RGB channels
R = G + B / 2
or some other transformation where a channel value is calculated based on the values of other channels of the same pixel.
It seems that .point() function can only operate on one channel. Is there a way to do what I want?
An alternative to using PIL.ImageChops is to convert the image data to a Numpy array. Numpy uses native machine data types and its compiled routines can processes array data very quickly compared to doing Python loops on Python numeric objects. So the speed of Numpy code is comparable to the speed of using ImageChops. And you can do all sorts of mathematical operations in Numpy, or using related libraries, like SciPy.
Numpy provides a function np.asarray which can create a Numpy array from PIL data. And PIL.Image has a .fromarray method to load image data from a Numpy array.
Here's a script that shows two different Numpy approaches, as well as an approach based on kennytm's ImageChops code.
#!/usr/bin/env python3
''' PIL Image channel manipulation demo
Replace each RGB channel by the mean of the other 2 channels, i.e.,
R_new = (G_old + B_old) / 2
G_new = (R_old + B_old) / 2
B_new = (R_old + G_old) / 2
This can be done using PIL's own ImageChops functions
or by converting the pixel data to a Numpy array and
using standard Numpy aray arithmetic
Written by kennytm & PM 2Ring 2017.03.18
'''
from PIL import Image, ImageChops
import numpy as np
def comp_mean_pil(iname, oname):
print('Loading', iname)
img = Image.open(iname)
#img.show()
rgb = img.split()
half = ImageChops.constant(rgb[0], 128)
rh, gh, bh = [ImageChops.multiply(x, half) for x in rgb]
rgb = [
ImageChops.add(gh, bh),
ImageChops.add(rh, bh),
ImageChops.add(rh, gh),
]
out_img = Image.merge(img.mode, rgb)
out_img.show()
out_img.save(oname)
print('Saved to', oname)
# Do the arithmetic using 'uint8' arrays, so we must be
# careful that the data doesn't overflow
def comp_mean_npA(iname, oname):
print('Loading', iname)
img = Image.open(iname)
in_data = np.asarray(img)
# Halve all RGB values
in_data = in_data // 2
# Split image data into R, G, B channels
r, g, b = np.split(in_data, 3, axis=2)
# Create new channel data
rgb = (g + b), (r + b), (r + g)
# Merge channels
out_data = np.concatenate(rgb, axis=2)
out_img = Image.fromarray(out_data)
out_img.show()
out_img.save(oname)
print('Saved to', oname)
# Do the arithmetic using 'uint16' arrays, so we don't need
# to worry about data overflow. We can use dtype='float'
# if we want to do more sophisticated operations
def comp_mean_npB(iname, oname):
print('Loading', iname)
img = Image.open(iname)
in_data = np.asarray(img, dtype='uint16')
# Split image data into R, G, B channels
r, g, b = in_data.T
# Transform channel data
r, g, b = (g + b) // 2, (r + b) // 2, (r + g) // 2
# Merge channels
out_data = np.stack((r.T, g.T, b.T), axis=2).astype('uint8')
out_img = Image.fromarray(out_data)
out_img.show()
out_img.save(oname)
print('Saved to', oname)
# Test
iname = 'Glasses0.png'
oname = 'Glasses0_out.png'
comp_mean = comp_mean_npB
comp_mean(iname, oname)
input image
output image
FWIW, that output image was created using comp_mean_npB.
The calculated channel values produced by the 3 functions can differ from one another by 1, due to the differences in the way they perform the calculations, but of course such differences aren't readily visible. :)
For this particular operation, the color transformation can be written as a matrix multiplication, so you could use the convert() method with a custom matrix (assuming no alpha channel):
# img must be in RGB mode (not RGBA):
transformed_img = img.convert('RGB', (
0, 1, .5, 0,
0, 1, 0, 0,
0, 0, 1, 0,
))
Otherwise, you can split() the image into 3 or 4 images of each color band, apply whatever operation you like, and finally merge() those bands back to a single image. Again, the original image should be in RGB or RGBA mode.
(red, green, blue, *rest) = img.split()
half_blue = PIL.ImageChops.multiply(blue, PIL.ImageChops.constant(blue, 128))
new_red = PIL.ImageChops.add(green, half_blue)
transformed_img = PIL.Image.merge(img.mode, (new_red, green, blue, *rest))

Categories