How to resize an image faster in python - python

I have an image i.e an array of pixel values, lets say 5000x5000 (this is the typical size). Now I want to expand it by 2 times to 10kx10k. The value of (0,0) pixel value goes to (0,0), (0,1), (1,0), (1,1) in the expanded image.
After that I am rotating the expanded image using scipy.interpolate.rotate (I believe there is no faster way than this given the size of my array)
Next I have to again resize this 10kx10k array to original size i.e. 5kx5k. To do this I have to take the average pixel values of (0,0), (0,1), (1,0), (1,1) in the expanded image and put them in (0,0) of the new image.
However it turns out that this whole thing is an expensive procedure an takes a lot of time given the size of my array. Is there a faster way to do it?
I am using the following code to expand the original image
#Assume the original image is already given
largeImg=np.zeros((10000,10000), dtype=np.float32)
for j in range(5000):
for k in range(5000):
pixel_value=original_img[j][k]
for x in range((2*k), (2*(k+1))):
for y in range((2*j), (2*(j+1))):
largeImg[y][x] = pixel_value
A similar method is used to reduce the image to original size after rotation.

In numpy you can use repeat:
large_img = original_img.repeat(2, axis=1).repeat(2, axis=0)
and
final_img = 0.25 * rotated_img.reshape(5000,2,5000,2).sum(axis=(3,1))
or use scipy.ndimage.zoom. this can give you smoother results than the numpy methods.

there is a nice library that probably has all the functions you need for handling images, including rotate:
http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rotate

Related

Shifting an image with bilinear interpolation in pytorch

Suppose that I have an input x of size [H,W] and also a mu_x and mu_y (which may be fractional)representing the pixels in x and y direction to shift. Is there any efficient way in pytorch without using c++ to shift the tensor x for mu_x and mu_y units with bilinear interpolation.
To be more precise, let's say we have an image. mu_x = 5 and mu_y = 3, we may want to shift the image so that the image moves rightward 5 pixels and downward 3 pixels, with the pixels out of boundary of [H,W] removed and new pixels introduced at the other end of the boundary to be 0. However, with fractional mu_x and mu_y, we need to use bilinear interpolation to estimate the resulting image.
Is it possible to be implemented with pure pytorch tensor operations? Or do I need to use c++.
I believe you can achieve this by applying grid sampling on your original input and using a grid to guide the sampling process. If you take a coordinate grid of your image and sample using that the resulting image will be equal to the original image. However you can apply a shift on this grid and therefore sample with the given shift. Grid sampling works with floating-point grids of course, which means you can apply an arbitrary non-round shift to your image and choose a sampling mode (bilinear is the default).
This can be implemented out of the box with F.grid_sampling. Given an image tensor img, we first construct a pixel grid of that image using torch.meshgrid. Keep in mind the grid used by the sampler must be normalized to [-1, -1]. Therefore pixel x=0,y=0 should be mapped to (-1,-1), pixel x=w,y=h mapped to (1,1), and the center pixel will end up at around (0,0).
Use two torch.arange with a [0,1]-normalization followed by a remapping to [-1,1]:
>>> c,h,w = img.shape
>>> x, y = torch.arange(h)/(h-1), torch.arange(w)/(w-1)
>>> grid = torch.dstack(torch.meshgrid(x, y))*2-1
So the resulting grid has a shape of (c, h, w) which will be the dimensions of the output image produced by the sampling process.
Since we are not working with batched elements, we need to unsqueeze singleton dimensions on both img and grid. Then we can apply F.grid_sample:
>>> sampled = F.grid_sample(img[None], grid[None])
Following this you can apply your arbitrary mu_x, mu_y shift and even easily use this to batches of images and shifts. The way you would define your sampling is by defining a shifted grid:
>>> x_s, y_s = (torch.arange(h)+mu_y)/(h-1), (torch.arange(w)+mu_x)/(w-1)
Where mu_x and mu_y are the values in pixels (floating point) with wish which the image is shifted on the horizontal and vertical axes respectively. To acquire the sampled image, apply F.grid_sampling on a grid made up of x_s and y_s:
>>> grid_shifted = torch.dstack(torch.meshgrid(x_s, y_s))*2-1
>>> sampled = F.grid_sample(img[None], grid_shifted[None])

How to replace pixel value as a tensorflow operation?

I need to replace a pixel value in an image as an operation in the graph. Doing this beforehand is unfortunately not an option as it is part of an optimization process.
As a fix until I come up with a solution, I am simply using tf.py_func() but since this operation has to be executed a lot it's very slow and inefficient.
# numpy function to perturb a single pixel in an image
def perturb_image(pixel, img):
# At each pixel's x,y position, assign its rgb value
x_pos, y_pos, r, g, b = pixel
rgb = [r,g,b]
img[x_pos, y_pos] = rgb
return img
# pixel is a 1D tensor like [x-dim,y-dim,R,G,B]
# image is tensor with shape (x-dim,y-dim,3)
img_perturbed = tf.py_func(perturb_image,[pixel, image], tf.uint8)
One way I thought of is using tf.add(perturbation, image) where both have the same dimension and perturbation is all zeros except at the pixel location which needs its RGB-values changed to the same value as defined in pixel from the above code snippet. Unfortunately, I would need to rewrite a lot of code surrounding this operation which I am trying to avoid.
Can you think of a solution to replace py_func with another tensorflow operation using the same inputs?
Any help is much appreciated.

How to average the non-zero pixels of two images with slight offset [python]

I have two greyscale images that have a slight offset (~80% overlap) that I need to average into a single image. The images have padding around them, so the overlap is already account for within the image (i.e. the x and y start position of each image is different). The images are aligned in their current offset, similar to a panoramic image.
My current approach (see below) is to use nested for loops, compare the pixel intensities at each position, sum them, and divide by the non-zero count.
combined_image=np.empty((image1.shape))
for row in range(image1.shape[0]):
for pixel in range(image2.shape[1]):
temp_array = np.array((image1[row][pixel], image2[row][pixel]))
combined_image[row][pixel] = np.sum(temp_array)/np.count_nonzero(temp_array)
I believe it works, however, it is rather slow, as these images are 1000 x 1000 pixels. Was wondering if there is a more efficient approach
Usually, if you're using for loops with numpy, you're not taking advantage of its built-in functionality.
Use broadcast operations.
combined_image = (image1 + image2) / 2
Should be faster and definitely simpler

How to reshape a 3D numpy array?

I have a list of numpy arrays which are actually input images to my CNN. However size of each of my image is not cosistent, and my CNN takes only images which are of dimension 224X224. How do I reshape each of my image into the given dimension?
print(train_images[key].reshape(224, 224,3))
gives me an output
ValueError: total size of new array must be unchanged
I would be very grateful if anybody could help me with this.
New array should have the same amount of values when you are reshaping. What you need is cropping the picture (if it is bigger than 224x224) and padding (if it is smaller than 224x224) or resizing on both occasions.
Cropping is simply slicing with correct indexes:
def crop(np_img, size):
v_start = round((np_img.shape[0] - size[0]) / 2)
h_start = round((np_img.shape[1] - size[1]) / 2)
return np_img[v_start:v_start+size[1], h_start:h_start+size[0],:]
Padding is slightly more complex, this will create a zeros array in desired shape and plug in the values of image inside:
def pad_image(np_img, size):
v_start = round((size[0] - np_img.shape[0]) / 2)
h_start = round((size[1] - np_img.shape[1]) / 2)
result = np.zeros(size)
result[v_start:v_start+np_img.shape[1], h_start:h_start+np_img.shape[0], :] = np_img
return result
You can also use np.pad function for it:
def pad_image(np_img, size):
v_dif = size[0] - np_img.shape[0]
h_dif = size[1] - np_img.shape[1]
return np.lib.pad(np_img, ((v_dif, 0), (h_dif, 0), (0, 0)), 'constant', constant_values=(0))
You may realize padding is a bit different in two functions, I didn't want to over complicate the problem and just padded top and left on the second function. Did the both sides in first one since it was easier to calculate.
And finally for resizing, you better use another library. You can use scipy.misc.imresize, its pretty straightforward. This should do it:
imresize(np_img, size)
Here are a few ways I know to achieve this:
Since you're using python, you can use cv2.resize(), to resize the image to 224x224. The problem here is going to be distortions.
Scale the image to adjust to one of the required sizes (W=224 or H=224) and trim off whatever is extra. There is a loss of information here.
If you have the larger image, and a bounding box, use some delta to bounding box to maintain the aspect ratio and then resize down to the required size.
When you reshape a numpy array, the produce of the dimensions must match. If not, it'll throw a ValueError as you've got. There's no solution using reshape to solve your problem, AFAIK.
The standard way is to resize the image such that the smaller side is equal to 224 and then crop the image to 224x224. Resizing the image to 224x224 may distort the image and can lead to erroneous training. For example, a circle might become an ellipse if the image is not a square. It is important to maintain the original aspect ratio.

Python - matplotlib - imshow - How to influence displayed value of unzoomed image

I need to search outliers in more or less homogeneous images representing some physical array. The images have a resolution which is much higher than the screen resolution. Thus every pixel on screen originates from a block of image pixels. Is there the possibility to customize the algorithm which calculates the displayed value for such a block? Especially the possibility to either use the lowest or the highest value would be helpful.
Thanks in advance
Scipy provides several such filters. To get a new image (new) whose pixels are the maximum/minimum over a w*w block of an original image (img), you can use:
new = scipy.ndimage.filters.maximum_filter(img, w)
new = scipy.ndimage.filters.minimum_filter(img, w)
scipy.ndimage.filters has several other filters available.
If the standard filters don't fit your requirements, you can roll your own. To get you started here is an example that shows how to get the minimum in each block in the image. This function reduces the size of the full image (img) by a factor of w in each direction. It returns a smaller image (new) in which each pixel is the minimum pixel in a w*w block of pixels from the original image. The function assumes the image is in a numpy array:
import numpy as np
def condense(img, w):
new = np.zeros((img.shape[0]/w, img.shape[1]/w))
for i in range(0, img.shape[1]//w):
col1 = i * w
new[:, i] = img[:, col1:col1+w].reshape(-1, w*w).min(1)
return new
If you wanted the maximum, replace min with max.
For the condense function to work well, the size of the full image must be a multiple of w in each direction. The handling of non-square blocks or images that don't divide exactly is left as an exercise for the reader.

Categories