I'm trying to calculate image histograms of an numpy array of images. The array of images is of shape (n_images, width, height, colour_channels) and I want to return an array of shape (n_images, count_in_each_bin (i.e. 255)). This is done via two intermediary steps of averaging each colour channel for each image and then flattening each 2D image to a 1D one.
I think have successfully done this with the code below, however I have cheated a bit with the for loop at the end. My question is this - is there a way of getting rid of the last for loop and using an optimised numpy function instead?
def histogram_helper(flattened_image: np.array) -> np.array:
counts, _ = np.histogram(flattened_image, bins=[n for n in range(0, 256)])
return counts
# Using 10 RGB images of width and height 300
images = np.zeros((10, 300, 300, 3))
# Take the mean of the three colour channels
channel_avg = np.mean(images, axis=3)
# Flatten each image in the array of images, resulting in a 1D representation of each image.
flat_images = channel_avg.reshape(*channel_avg.shape[:-2], -1)
# Now calculate the counts in each of the colour bins for each image in the array.
# This will provide us with a count of how many times each colour appears in an image.
result = np.empty((0, len(self.histogram_bins) - 1), dtype=np.int32)
for image in flat_images:
colour_counts = self.histogram_helper(image)
colour_counts = colour_counts.reshape(1, -1)
result = np.concatenate([result, colour_counts])
You don't necessarily need to call np.histogram or np.bincount for this, since pixel values are in the range 0 to N. That means that you can treat them as indices and simply use a counter.
Here's how I would transform the initial images, which I imaging are of dtype np.uint8:
images = np.random.randint(0, 255, size=(10, 5, 5, 3)) # 10 5x5 images, 3 channels
reshaped = np.round(images.reshape(images.shape[0], -1, images.shape[-1]).mean(-1)).astype(images.dtype)
Now you can simply count the histograms using unbuffered addition with np.add.at:
result = np.zeros((images.shape[0], 256), int)
index = np.arange(len(images))[:, None]
np.add.at(result, (index, reshaped), 1)
The last operation is in-place and therefore returns None, but the answer will be in result nevertheless.
Related
I have a numpy array where each element has 3 values (RGB) from 0 to 255, and it spans from [0, 0, 0] to [255, 255, 255] with 256 elements evenly spaced. I want to plot it as a 16 by 16 grid but have no idea how to map the colors (as the numpy array) to the data to create the grid.
import numpy as np
# create an evenly spaced RGB representation as integers
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
# convert the evenly spaced integers to RGB representation
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
# data to fit the rgb_colors as colors into a plot as a 16 by 16 numpy array
data = np.array(tuple((k,p) for k in range(16) for p in range(16)))
So, how to map the rgb_colors as colors to the data data into a grid plot?
There's quite a bit going on here, and I think it's valuable to talk about it.
linspace
I suggest you read the linspace documentation.
https://numpy.org/doc/stable/reference/generated/numpy.linspace.html
If you want a 16x16 grid, then you should start by generating 16x16=256 values, however if you inspect the shape of the all_colors_int array, you'll notice that it's only generated 50 values, which is the default value of the linspace num argument.
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
print(all_colors_int.shape) # (50,)
Make sure you specify this third 'num' argument to generate the correct quantity of RGB pixels.
As a further side note, (255 << 16) + (255 << 8) + 255 is equivalent to (2^24)-1. The 2^N-1 formula is usually what's used to fill the first N bits of an integer with 1's.
numpy is faster
On your next line, your for loop manually iterates over all of the elements in python.
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
While this might work, this isn't considered the correct way to use numpy arrays.
You can directly perform bitwise operations to the entire numpy array without the python for loop. For example, to extract bits [16, 24) (which is usually the red channel in an RGB integer):
# Shift over so the 16th bit is now bit 0, then select only the first 8 bits.
RedChannel = (all_colors_int >> 16) & 255
Building the grid
There are many ways to do this in numpy, however I would suggest this approach.
Images are usually represented with a 3-dimensional numpy array, usually of the form
(HEIGHT, WIDTH, CHANNELS)
First, reshape your numpy int array into the 16x16 grid that you want.
reshaped = all_colors_int.reshape((16, 16))
Again, the numpy documentation is really great, give it a read:
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
Now, extract the red, green and blue channels, as described above, from this reshaped array. If you operate directly on the numpy array, you won't need a nested for-loop to iterate over the 16x16 grid, numpy will handle this for you.
RedChannel = (reshaped >> 16) & 255
GreenChannel = ... # TODO
BlueChannel = ... # TODO
And then finally, we can convert our 3, 16x16 grids, into a 16x16x3 grid, using the numpy stack function
https://numpy.org/doc/stable/reference/generated/numpy.stack.html
grid_rgb = np.stack((
RedChannel,
GreenChannel,
BlueChannel
), axis=2).astype(np.uint8)
Notice two things here
When we 'stack' arrays, we create a new dimension. The axis=2 argument tells numpy to add this new dimension at index 2 (e.g. the third axis). Without this, the shape of our grid would be (3, 16, 16) instead of (16, 16, 3)
The .astype(np.uint8) casts all of the values in this numpy array into a uint8 data type. This is so the grid is compatible with other image manipulation libraries, such as openCV, and PIL.
Show the image
We can use PIL for this.
If you want to use OpenCV, then remember that OpenCV interprets images as BGR not RGB and so your channels will be inverted.
# Show Image
from PIL import Image
Image.fromarray(grid_rgb).show()
If you've done everything right, you'll see an image... And it's all gray.
Why is it gray?
There are over 16 million possible colours. Selecting only 256 of them just so happens to select only pixels with the same R, G and B values which results in an image without any color.
If you want to see some colours, you'll need to either show a bigger image (e.g. 256x256), or alternatively, you can use a dimension that's not a power of two. For example, try a prime number, as this will add a small amount of pseudo-randomness to the RGB selection, e.g. try 17.
Best of luck.
Based solely on the title 'How to plot a normalized RGB map' rather than the approach you've provided, it appears that you'd like to plot a colour spectrum in RGB.
The following approach can be taken to manually construct this.
import cv2
import matplotlib.pyplot as plt
import numpy as np
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
Explanation:
It's generally easier to construct (and decompose) a colour palette using the HSV (hue, saturation, value) colour scale; where hue is the colour itself, saturation can be thought of as the intensity and value as the distance from black. Therefore, there's really only one value to worry about, hue. Saturation and value can be set to 255, for 'full intensity'.
cv2 is used here to simply convert the constructed HSV colourscale to RGB and matplotlib is used to plot the image. (I didn't use cv2 for plotting as it doesn't play nicely with Jupyter.)
The actual spectrum values are constructed in numpy.
Breakdown:
Create the colour spectrum of hue and plug 255 in for the saturation and value. Why is 180 used?
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
Stack the three channels H+S+V into a 3-dimensional array, convert the array values to unsigned 8-bit integers, and have cv2 convert from HSV to RGB for us, to be lazy and save us working out the math.
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
Plot the RGB image.
plt.imshow(rgb)
When trying to join two images to create one:
img3 = imread('image_home.png')
img4 = imread('image_away.png')
result = np.hstack((img3,img4))
imwrite('Home_vs_Away.png', result)
This error sometimes appears:
all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 192 and the array at index 1 has size 191
How should I proceed to generate the image when there is this difference in array size when np.hstack does not work?
Note:
I use several images, so not always the largest image is the first and not always the largest is the second, it can be quite random which is the smallest or largest between the two.
You can manually add a row/column with a color of your choice to match the shapes. Or you can simply let cv2.resize handle the resizing for you. In this code I show how to use both methods.
import numpy as np
import cv2
img1 = cv2.imread("image_home.png")
img2 = cv2.imread("image_away.png")
# Method 1 (add a column and a row to the smallest image)
padded_img = np.ones(img1.shape, dtype="uint8")
color = np.array(img2[-1, -1]) # take the border color
padded_img[:-1, :-1, :] = img2
padded_img[-1, :, :] = color
padded_img[:, -1, :] = color
# Method 2 (let OpenCV handle the resizing)
padded_img = cv2.resize(img2, img1.shape[:2][::-1])
result = np.hstack((img1, padded_img))
cv2.imwrite("Home_vs_Away.png", result)
I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)
So my problem is this: I have an RGB image as a numpy array of dimensions (4086, 2048, 3), I split this image dimension into 96x96 patches and get back the positions of these patches in a numpy array. I always get 96x96 patches in every case. If the dimensions of the image can't allow me to create "pure" 96x96 patches on the x or y axis I just add a left padding to it so the last patches overlap a bit with the patch before it.
Now with these positions in hand I want to get rid of all 96x96 patches for which the RGB value is 255 in all three channels for every pixel in the patch, in the fastest way possible and I want to get back all the patches positions which don't have this value.
I would like to know:
What is the fastest way to extract the 96x96 patches positions from the image dimension? (for now I have a for loop)
How can you get rid of pure white patches (with value 255 on the 3 channels) in most optimal way? (for now I have a for loop)
I have a lot of these images to process like that with images resolution going up to (39706, 94762, 3) so my "for loops" becomes quickly inefficient here. Thanks for your help! (I take solutions which make use of the GPU too)
Here is the pseudo code to give you an idea on how it's done for now:
patches = []
patch_y = 0
y_limit = False
slide_width = 4086
slide_height = 2048
# Lets imagine this image_slide has 96x96 patches which value is 255
image_slide = np.random.rand(slide_width, slide_height, 3)
while patch_y < slide_height:
patch_x = 0
x_limit = False
while patch_x < slide_width:
# Extract the patch at the given position and return it or return None if it's 3 RGB
# channels are 255
is_white = PatchExtractor.is_white(patch_x, patch_y, image_slide)
# Add the patches position to the list if it's not None (not white)
if not is_white:
patches.append((patch_x, patch_y))
if not x_limit and patch_x + crop_size > slide_width - crop_size:
patch_x = slide_width - crop_size
x_limit = True
else:
patch_x += crop_size
if not y_limit and patch_y + crop_size > slide_height - crop_size:
patch_y = slide_height - crop_size
y_limit = True
else:
patch_y += crop_size
return patches
Ideally, I would like to get my patches positions outside a "for loop" then once I have them I can test if they are white or not outside a for loop as well with the fewer possible calls to numpy (so the code is processed in the C layer of numpy and doesn't go back and forth to python)
As you suspected you can vectorize all of what you're doing. It takes roughly a small integer multiple of the memory need of your original image. The algorithm is quite straightforward: pad your image so that an integer number of patches fit in it, cut it up into patches, check if each patch is all white, keep the rest:
import numpy as np
# generate some dummy data and shapes
imsize = (1024, 2048)
patchsize = 96
image = np.random.randint(0, 256, size=imsize + (3,), dtype=np.uint8)
# seed some white patches: cut a square hole in the random noise
image[image.shape[0]//2:3*image.shape[0]//2, image.shape[1]//2:3*image.shape[1]//2] = 255
# pad the image to necessary size; memory imprint similar size as the input image
# white pad for simplicity for now
nx,ny = (np.ceil(dim/patchsize).astype(int) for dim in imsize) # number of patches
if imsize[0] % patchsize or imsize[1] % patchsize:
# we need to pad along at least one dimension
padded = np.pad(image, ((0, nx * patchsize - imsize[0]),
(0, ny * patchsize - imsize[1]), (0,0)),
mode='constant', constant_values=255)
else:
# no padding needed
padded = image
# reshape padded image according to patches; doesn't copy memory
patched = padded.reshape(nx, patchsize, ny, patchsize, 3).transpose(0, 2, 1, 3, 4)
# patched is shape (nx, ny, patchsize, patchsize, 3)
# appending .copy() as a last step to the above will copy memory but might speed up
# the next step; time it to find out
# check for white patches; memory imprint the same size as the padded image
filt = ~(patched == 255).all((2, 3, 4))
# filt is a bool, one for each patch that tells us if it's _not_ all white
# (i.e. we want to keep it)
patch_x,patch_y = filt.nonzero() # patch indices of non-whites from 0 to nx-1, 0 to ny-1
patch_pixel_x = patch_x * patchsize # proper pixel indices of each pixel
patch_pixel_y = patch_y * patchsize
patches = np.array([patch_pixel_x, patch_pixel_y]).T
# shape (npatch, 2) which is compatible with a list of tuples
# if you want the actual patches as well:
patch_images = patched[filt, ...]
# shape (npatch, patchsize, patchsize, 3),
# patch_images[i,...] is an image with patchsize * patchsize pixels
As you can see, in the above I used white padding to get a congruent padded image. I believe this is in line with the philosophy of what you're trying to do. If you want to replicate what you're doing in the loop exactly, you can pad your image manually using the overlapping pixels that you'd take into account near the edge. You'd need to allocate a padded image of the right size, then manually slice the overlapping pixels of the original image in order to set the edge pixels in the padded result.
Since you mentioned that your images are huge and consequently padding leads to far too much memory use, you can avoid padding with some elbow grease. You can use slices of your huge image (which doesn't create a copy), but then you have to manually handle the edges where you don't have full slices. Here's how:
def get_patches(img, patchsize):
"""Compute patches on an input image without padding: assume "congruent" patches
Returns an array shaped (npatch, 2) of patch pixel positions"""
mx,my = (val//patchsize for val in img.shape[:-1])
patched = img[:mx*patchsize, :my*patchsize, :].reshape(mx, patchsize, my, patchsize, 3)
filt = ~(patched == 255).all((1, 3, 4))
patch_x,patch_y = filt.nonzero() # patch indices of non-whites from 0 to nx-1, 0 to ny-1
patch_pixel_x = patch_x * patchsize # proper pixel indices of each pixel
patch_pixel_y = patch_y * patchsize
patches = np.stack([patch_pixel_x, patch_pixel_y], axis=-1)
return patches
# fix the patches that fit inside the image
patches = get_patches(image, patchsize)
# fix edge patches if necessary
all_patches = [patches]
if imsize[0] % patchsize:
# then we have edge patches along the first dim
tmp_patches = get_patches(image[-patchsize:, ...], patchsize)
# correct indices
all_patches.append(tmp_patches + [imsize[0] - patchsize, 0])
if imsize[1] % patchsize:
# same along second dim
tmp_patches = get_patches(image[:, -patchsize:, :], patchsize)
# correct indices
all_patches.append(tmp_patches + [0, imsize[1] - patchsize])
if imsize[0] % patchsize and imsize[1] % patchsize:
# then we have a corner patch we still have to fix
tmp_patches = get_patches(image[-patchsize:, -patchsize:, :], patchsize)
# correct indices
all_patches.append(tmp_patches + [imsize[0] - patchsize, imsize[1] - patchsize])
# gather all the patches into an array of shape (npatch, 2)
patches = np.vstack(all_patches)
# if you also want to grab the actual patch values without looping:
xw, yw = np.mgrid[:patchsize, :patchsize]
patch_images = image[patches[:,0,None,None] + xw, patches[:,1,None,None] + yw, :]
# shape (npatch, patchsize, patchsize, 3),
# patch_images[i,...] is an image with patchsize * patchsize pixels
This will also exactly replicate your looping code, since we're explicitly taking the edge patches such that they overlap with the previous patches (there's no spurious white padding). If you want to have the patches in a given order you'll have to sort them now, though.
Suppose I have an ndarray imgs of shape ( num_images, 3, width, height ) that stores a stack of num_images RGB images all of the same size.
I would like to slice/crop from each image a patch of shape ( 3, pw, ph ) but the center location of the patch is different for each image and is given in centers array of shape (num_images, 2).
Is there a nice/pythonic way of slicing imgs to get patches (of shape (num_images,3,pw,ph)) each patch is centered around its corresponding centers?
for simplicity it is safe to assume all patches fall within image boundaries.
Proper slicing is out of the question, because you need to access the underlying data on irregular intervals. You could get the crops with a single "fancy indexing" operation, but you'll need a (very) large indexing array. Therefor I think using a loop is easier and faster.
Compare the following two functions:
def fancy_indexing(imgs, centers, pw, ph):
n = imgs.shape[0]
img_i, RGB, x, y = np.ogrid[:n, :3, :pw, :ph]
corners = centers - [pw//2, ph//2]
x_i = x + corners[:,0,None,None,None]
y_i = y + corners[:,1,None,None,None]
return imgs[img_i, RGB, x_i, y_i]
def just_a_loop(imgs, centers, pw, ph):
crops = np.empty(imgs.shape[:2]+(pw,ph), imgs.dtype)
for i, (x,y) in enumerate(centers):
crops[i] = imgs[i,:,x-pw//2:x+pw//2,y-ph//2:y+ph//2]
return crops