I have an image and I want to split it into multiple images using vertical and horizontal strides like a sliding window and the resultant images will all be of same resolution. How can I do that efficiently in Python? I have done this much:
from PIL import Image
def sliding_window(image, stride, imgSize):
width, height = image.size
img = []
for y in range(0, height-imgSize, stride):
for x in range(0, width-imgSize, stride):
# Setting the points for cropped image
left = x
top = y
right = x + imgSize
bottom = y + imgSize
im1 = image.crop((left, top, right, bottom))
img.append(im1)
return img
file = "/home/xxxxxx/yyyyyy.png"
im = Image.open(file)
img = sliding_window(im, 1, 838) # Strides of 1 takes too much time
but this code requires too much RAM and is too time consuming. Please help.
Example :
Sample code : img = sliding_window(im, 200, 300)
The following image is of 800*800 size.
Output :
As you correcly surmised, there is a way to do this with windows that view the original data without copying it. The simplest way is probably to use the relatively new sliding_window_view function:
from numpy.lib.stride_tricks import sliding_window_view
window = sliding_window_view(image, (838, 838), axis=(0, 1))
You don't need an explicit axis for 2D images, but it doesn't hurt and saves you some trouble in the 3D case. If you wanted to adjust the strides, you can just subset the result. For example, for a stride of (3, 4):
window = window[::3, ::4]
Since the window axes must (should) come last in C order, 3D images will have the channels moved to the middle axis. To access the correct shape, you can use something like np.moveaxis or transpose:
np.moveaxis(window[80, 70], 0, -1)
OR
window[80, 70].transpose(1, 2, 0).shape
Related
I have the following two tensors:
img is a RGB image of shape (224,224,3)
uvs is a tensor with same spacial size e.g. (224, 224, 2) that maps to coordinates (x,y). In other words it provides (x,y) coordinates for every pixel of the input image.
I want to create now a new output image tensor that contains on index (x,y) the value of the input image. So the output should be an image as well with the pixels rearranged according to the mapping tensor.
Small toy example:
img = [[c1,c2], [c3, c4]] where c is a RGB color [r, g, b]
uvs = [[[0,0], [1,1]],[[0,1], [1,0]]]
out = [[c1, c3], [c4, c2]]
How would one achieve such a thing in pytorch in a fast vectorized manner?
Try with:
out = img[idx[...,0], idx[...,1]]
I was able to solve it (with the help of Quang Hoang answer)
out[idx[...,0], idx[...,1]] = img
What you need is torch.nn.functional.grid_sample(). You can do something like this:
width, height, channels = (224, 224, 3)
# Note that the image is channel-first (CHW format). In this example, I'm using a float image, so the values must be in the range (0, 1).
img = torch.rand((channels, height, width))
# Create the indices of shape (224, 224, 2). Any other size would work too.
col_indices = torch.arange(width, dtype=torch.float32)
row_indices = torch.arange(height, dtype=torch.float32)
uvs = torch.stack(torch.meshgrid([col_indices, row_indices]), dim=-1)
# Transform the indices from pixel coordiantes to the to the range [-1, 1] such that:
# * top-left corner of the input = (-1, -1)
# * bottom-right corner of the input = (1, 1)
# This is required for grid_sample() to work properly.
uvs[..., 0] = (uvs[..., 0] / width) * 2 - 1
uvs[..., 1] = (uvs[..., 1] / height)* 2 - 1
# Do the "mapping" operation (this does a bilinear interpolation) using `uvs` coordinates.
# Note that grid_sample() requires a batch dimension, so need to use `unsqueeze()`, then
# get rid of it using squeeze().
mapped = torch.nn.functional.grid_sample(
img.unsqueeze(0),
uvs.unsqueeze(0),
mode='bilinear',
align_corners=True,
)
# The final image is in HWC format.
result = mapped.squeeze(0).permute(1, 2, 0)
Side note: I found your question by searching for a solution for a related problem I had for a while. While I was writing an answer to you question, I realized what bug was causing the the problem I was facing. By helping you I effectively helped my self, so I hope this helps you! :)
I'm trying to accomplish a basic image processing. Here is my algorithm :
Find n., n+1., n+2. pixel's RGB values in a row and create a new image from these values.
I'm taking first pixel's red value,second pixel's green value and third pixel's blue value and create pixel. This operation continue for every row in image.
Here is my example code in python :
import glob
import ntpath
import numpy
from PIL import Image
images = glob.glob('balls/*.png')
data_compressed = numpy.zeros((540, 2560, 3), dtype=numpy.uint8)
for image_file in images:
print(f'Processing [{image_file}]')
image = Image.open(image_file)
data = numpy.loadasarray(image)
for i in range(0, 2559):
for j in range(0, 539):
pix_x = j * 3 + 1
red = data[pix_x - 1, i][0]
green = data[pix_x, i][1]
blue = data[pix_x + 1, i][2]
data_compressed[j, i] = [red, green, blue]
im = Image.fromarray(data_compressed)
image_name = ntpath.basename(image_file)
im.save(f'export/{image_name}')
My input and output images are in RGB format. My code is taking 5 second for every image. I'm open for any idea to optimization this task. I can use c++ or any other languages if necessary.
data_compressed = np.concatenate((
np.expand_dims(data[0:-2][:,:,0], axis=2),
np.expand_dims(data[1:-1][:,:,1], axis=2),
np.expand_dims(data[2:][:,:,2], axis=2)), axis=2)
Image1 : Original image
Image2: Original image shifted by one pixel
Image3: Original image shifted by two pixel
Take channel 0 of Image1, channel 1 of Image2 and channel 3 of Image3 concatenate.
Sample
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
image = Image.open("Lenna.png")
data = numpy.asarray(image)
data_compressed = np.concatenate((
np.expand_dims(data[0:-2][:,:,0], axis=2),
np.expand_dims(data[1:-1][:,:,1], axis=2),
np.expand_dims(data[2:][:,:,2], axis=2)), axis=2)
new_image = Image.fromarray(data_compressed)
If you want a stride over 3 pixels for calculating the next pixel again then you can use numpy slicing
new_image = Image.fromarray(data_compressed[:, ::3])
Original Image:
Transformed Image with 3 stride:
Well if are only looking for a speed up you should take a look at the module Cython. It lets you specify the type of different variables and then compile the script to functioning c code. This can often lead to great improvements when it comes to time complexity.
With plain python there's only so much you can do. Here is a small optimization which can help a bit since it will allocate less memory. Otherwise I would look at Cython/Numba as said previously or using other languages.
data_compressed[j, i, 0] = data[pix_x - 1, i][0]
data_compressed[j, i, 1] = data[pix_x, i][1]
data_compressed[j, i, 2] = data[pix_x + 1, i][2]
I want to perform image translation by a certain amount (shift the image vertically and horizontally).
The problem is that when I paste the cropped image back on the canvas, I just get back a white blank box.
Can anyone spot the issue here?
Many thanks
img_shape = image.shape
# translate image
# percentage of the dimension of the image to translate
translate_factor_x = random.uniform(*translate)
translate_factor_y = random.uniform(*translate)
# initialize a black image the same size as the image
canvas = np.zeros(img_shape)
# get the top-left corner coordinates of the shifted image
corner_x = int(translate_factor_x*img_shape[1])
corner_y = int(translate_factor_y*img_shape[0])
# determine which part of the image will be pasted
mask = image[max(-corner_y, 0):min(img_shape[0], -corner_y + img_shape[0]),
max(-corner_x, 0):min(img_shape[1], -corner_x + img_shape[1]),
:]
# determine which part of the canvas the image will be pasted on
target_coords = [max(0,corner_y),
max(corner_x,0),
min(img_shape[0], corner_y + img_shape[0]),
min(img_shape[1],corner_x + img_shape[1])]
# paste image on selected part of the canvas
canvas[target_coords[0]:target_coords[2], target_coords[1]:target_coords[3],:] = mask
transformed_img = canvas
plt.imshow(transformed_img)
This is what I get:
For image translation, you can make use of the somewhat obscure numpy.roll function. In this example I'm going to use a white canvas so it is easier to visualize.
image = np.full_like(original_image, 255)
height, width = image.shape[:-1]
shift = 100
# shift image
rolled = np.roll(image, shift, axis=[0, 1])
# black out shifted parts
rolled = cv2.rectangle(rolled, (0, 0), (width, shift), 0, -1)
rolled = cv2.rectangle(rolled, (0, 0), (shift, height), 0, -1)
If you want to flip the image so the black part is on the other side, you can use both np.fliplr and np.flipud.
Result:
Here is a simple solution that translates an image by tx and ty pixels using only array indexing, that does not roll over, and handles negative values as well:
tx, ty = 8, 5 # translation on x and y axis, in pixels
N, M = image.shape
image_translated = np.zeros_like(image)
image_translated[max(tx,0):M+min(tx,0), max(ty,0):N+min(ty,0)] = image[-min(tx,0):M-max(tx,0), -min(ty,0):N-max(ty,0)]
Example:
(Note that for simplicity it does not handle cases where tx > M or ty > N).
I wanted to change all the pixels in an image to a grey color (r = g = b = 128) if they are in a certain threshold (if the value is between 50 and 150 change it). I imported the image and when i try to process the image it gives me the following error : IndexError: index 3474 is out of bounds for axis 0 with size 3474 (the image is 3474x4632).
Here's the code:
from PIL import Image
import numpy as np
image = Image.open("texture.jpg")
w, h = image.size
print ("%d %d" % (w, h)) #to be sure what the width and height are
im = np.array(image)
for x in range(0, w):
for y in range(0, h):
if (im[x][y][0] <= 150 and im[x][y][0] >= 50):
im[x][y][0] = 128
im[x][y][1] = 128
im[x][y][2] = 128
cv2.imwrite("image2.jpg", im)
And here's the image i'm trying to convert: https://ibb.co/hnjq4p (too large to upload here). Any ideas about why it doesn't work ?
I believe that numpy reverses the axis order from PIL. Actually the first index is rows. So you should loop through w,h = im.shape or h,w = image.size instead. Maybe you can verify that this is correct by comparing image.size and im.shape?
That said, it will be much better if you do not loop. You can use masking and broadcasting to achieve the for loop task like this:
im[(im[...,0]<=150)&(im[...,0]>=50)] = 128 # will modify im in place
This will be much faster especially on large images like this.
Note that this only checks the first channel of the image to be between 150 and 50. This is what your for loop says so I guess it's what you want.
Please check im.shape: you should index your pixels as im[y,x] after converting to a numpy.array.
My problem is that, I have 4 videos and I would like to combine and fit them into one single video and play them at once using Python. Each of the video are set in the position (e.g. top, bottom, left, right) like a hologram video like this. Are there any ways which can help me implement this? I've found some related source which is similar to my problem but I cannot manage to apply it for my problem.
Thank you in advance
You can try to merge all images together by copying them into one black frame. Here is an example with the same image in all 4 places:
import cv2
import numpy as np
#loads images and gets data
img = cv2.imread("img.png")
h,w,_ = img.shape
# creates the resulting image with double the size and 3 channels
output = np.zeros((h * 2, w * 2, 3), dtype="uint8")
# copies the image to the top left
output[0:h, 0:w] = img
# copies the image to the top right
output[0:h, w:w * 2] = img
# copies the image to the bottom left
output[h:h * 2, w:w * 2] = img
# copies the image to the bottom right
output[h:h * 2, 0:w] = img
You can always change the img to something different. Also you can concatenate them like this:
top = np.hstack((img, img))
bottom = np.hstack((img, img))
result = np.vstack((top, bottom))
And the result will be the same.
Here as sample of the resulting img with this code:
However your image is a little bit different, you will need a rotation and is not exactly concatenation, but the copying one. An example of this follows:
# creates the resulting image with double the size and 3 channels
output = np.zeros((w+h+h , w + h + h, 3), dtype="uint8")
# top img
output[0:h, h:h+w] = img
# left img (rotated 90°)
output[h:h+w, 0:h] = np.rot90(img,1)
# right img (rotated 270°)
output[h:h + w, h + w:h +w +h] = np.rot90(img,3)
# bottom img (rotated 180°)
output[h+w:h+w+h, h:h+w] = np.rot90(img,2)
and the result is like this:
If you use your image with the black background you will get more or less what you have there. You would need to play maybe with the copying parameters, but basically you do something like:
imgToCopyTo[y1:y2, x1:x2] = imgToCopyFrom
Where y1 and x1 is your top left coordinates where you want to start the copy and y2 and x2 are your bottom right coordinates of where you want to copy to. Also y2-y1 should have the height of the imgToCopyFrom x2-x1 the width (it can be bigger than the width or height but not smaller).