Fastest method of splitting image into tiles - python

I have an array of bytes which are the RGB values of an image. For example the first three bytes of the array will be the RGB value of the top left pixel. i.e. a[0] is R, a[1] is G and a[2] is B.
This image is actually a grid of images, typically arranged in 2x2 form. Here's an example.
I'm currently using PIL to split the image into 4 sub-images. This is the code I'm currently using.
def split_image_to_tiles(im, grid_width, grid_height):
#This treats the image `im` as a square grid of images.
w, h = im.size
w_step = w / grid_width
h_step = h / grid_height
tiles = []
for y in xrange(0, grid_height):
for x in xrange(0, grid_width):
x1 = x * w_step
y1 = y * h_step
x2 = x1 + w_step
y2 = y1 + h_step
t = im.crop((x1, y1, x2, y2))
tiles.append(t)
return tiles
This works, but it isn't particularly fast. Is there a better or faster way?

This may not answer your question since I'm using Numpy and OpenCV, but I just had a very similar problem, where I wanted to split a grayscale image into a 2D array of tiles/subimages. I ended up doing it with
height, width = image.shape
tiles = image.reshape((GRID_HEIGHT, height/GRID_HEIGHT,
GRID_WIDTH, width/GRID_WIDTH)).swapaxes(1, 2)
For a color image with more than one channel:
height, width = image.shape[:2]
tiles = image.reshape((GRID_HEIGHT, height/GRID_HEIGHT,
GRID_WIDTH, width/GRID_WIDTH, image.shape[2])).swapaxes(1, 2)
After that you could simply do tiles[y, x] to reference the tile at that index.
Especially compared to other image processing operations, this method is effectively instantaneous.

In fact in terms of Complexity you can do nothing, it will keep going as a O(N) complexity where N is the number of the Tiles that you wanna get from the Image.
Regarding this, you should run a profiler to realizer where the time is really spending. And as you can guess the im.crop is the method where the most of time the CPU is stuck.
This is a typical CPU bound problem, the only way to find out a better approximation without try to optimize the cropping by it self is use as many processes as many tiles you wanna get. Why processes ?
In that case where we don't have IO bound the GIL matters, and we want to make sure that each Python interpreter gets the CPU without problems.
Then, my recomendation is use the multiprocessing Python module.

Related

Generating a scatterplot from a greyscale intensity map

Using matplotlib(or if there exists anything else), i want to populate a scatterplot image by using a grey scale image as its distribution. I have found many resource to create heat maps from images but not the other way around.
The input image will be like this one.
I think I understand what you're going for, but I'm not certain. I also don't really understand what this would be used for so I'm extra uncertain about this answer, but here goes:
So by loading the image we can evaluate each pixel position and its intensity. We can use that intensity as a "fitness" value and probabilistically add it to our plot so that we can get some of that "density" of points that you want to see. I picked a really simple equation as a decider (I just cubed the value), but feel free to replace that with whatever you want.
import cv2
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import random
# select func
def selection(value):
return value**3 >= random.randint(0, 255**3);
# populate the sample
def populate(img):
# get res
h, w = img.shape;
# go through and populate
sx = [];
sy = [];
for y in range(0, h):
for x in range(0, w):
val = img[y, x];
# use intensity to decide if it gets in
# replace with what you want this function to look like
if selection(val):
sx.append(x);
sy.append(h - y); # opencv is top-left origin
return sx, sy;
# I'm using opencv to pull the image into code, use whatever you like
# matplotlib can also do something similar, but I'm not familiar with its format
img = cv2.imread("circ.png");
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY);
# lets take a sample
sx, sy = populate(img);
# find the bigger square size
h, w = img.shape;
side = None;
if h > w:
side = h;
else:
side = w;
# make a square graph
fig, ax = plt.subplots();
ax.scatter(sx, sy, s = 4);
ax.set_xlim((0, side));
ax.set_ylim((0, side));
x0,x1 = ax.get_xlim();
y0,y1 = ax.get_ylim();
ax.set_aspect(abs(x1-x0)/abs(y1-y0));
fig.savefig("out.png", dpi=600);
plt.show();
Feel free to replace opencv with whatever image library you're comfortable with. I'm pretty sure matplotlib can open images as well, but openCV is what I'm most familiar with so I used that.
As far as I can tell, you're trying to generate random coordinates that follow a distribution described by a grayscale image: the brighter each point, the more likely that point's coordinates will be generated. Your problem can thus be solved by a rejection sampler, as follows.
Assume you know the width and height of the image in pixels, call them w and h.
Generate two random numbers: one in the interval [0, w), and [0, h). These are the x and y coordinates, respectively.
Get the pixel at the given coordinates x and y in the image. This can be done using interpolation, but describing interpolation techniques is beyond the scope of this answer. For this reason, we will use only the nearest pixel ("nearest neighbor") in the image: take the pixel at coordinate floor(x) and floor(y) (and step 1 devolves to generating random integers). Convert the pixel somehow to a number p in the interval [0, 1]; in this answer we will assume black is 0 and white is 1, to simplify matters.
With probability p, return the point (x, y). Otherwise, go to step 1.
Roughly speaking, the time complexity of this algorithm depends on the numbers of "bright points" the input image has, compared to the number of "dark points". In general, the "brighter" the image, the higher the acceptance rate (and the faster the algorithm runs).

Numpy point cloud to image

I have a point cloud which looks something like this:
The red dots are the points, the black dots are the red dots projected to the xy plane. Although it is not visible in the plot, each point also has a value, which is added to the given pixel when the point is moved to the xy plane. The points are represented by a numpy (np) array like so:
points=np.array([[x0,y0,z0,v0],[x1,y1,z1,v1],...[xn,yn,zn,vn]])
The obvious way to put these points into some image would be through a simple loop, like so:
image=np.zeros(img_size)
for point in points:
#each point = [x,y,z,v]
image[tuple(point[0:2])] += point[3]
Now this works fine, but it is very slow. So I was wondering if there is some way using vectorization, slicing and other clever numpy/python tricks of speeding it up, since in reality I have to this many times for large point clouds. I had come up with something using np.put:
def points_to_image(xs, ys, vs, img_size):
img = np.zeros(img_size)
coords = np.stack((ys, xs))
#put the 2D coordinates into linear array coordinates
abs_coords = np.ravel_multi_index(coords, img_size)
np.put(img, abs_coords, ps)
return img
(in this case the points are pre-split into vectors containing the x, y and v components). While this works fine, it of course only puts the last point to each given pixel, i.e. it is not additive.
Many thanks for your help!
Courtesy of #Paul Panzer:
def points_to_image(xs, ys, ps, img_size):
coords = np.stack((ys, xs))
abs_coords = np.ravel_multi_index(coords, img_size)
img = np.bincount(abs_coords, weights=ps, minlength=img_size[0]*img_size[1])
img = img.reshape(img_size)
On my machine, the loop version takes 0.4432s vs 0.0368s using vectorization. So a neat 12x speedup.
============ EDIT ============
Quick update: using torch...
def points_to_image_torch(xs, ys, ps, sensor_size=(180, 240)):
xt, yt, pt = torch.from_numpy(xs), torch.from_numpy(ys), torch.from_numpy(ps)
img = torch.zeros(sensor_size)
img.index_put_((yt, xt), pt, accumulate=True)
return img
I get all the way down to 0.00749. And that's still all happening on CPU, so 59x speedup vs python loop. I also had a go at running it on GPU, it doesn't seem to make a difference in speed, I guess with accumulate=True it's probably using some sort of atomics on the GPU that slows it all down.

Optimize performance to resize with maximum pixel value

I want to reduce one size of an image by keeping only the maximum pixel value of each pixel set. I implemented this in python :
def pixel_max_resize(img, h, w):
imr = np.zeros((h,w), dtype=np.uint8)
r = int(h/w)
for j in range(0,w):
imr[:,j] = np.amax(img[:,j*r:j*r+r], axis = 1)
return imr
This function is a lot slower than a cv2.resize of the same size (by a factor of 5-10). Anyone has an idea how to optimize the speed of this function ? Is there a list comprehension formulation that could speed up the process ?
I'm not 100% sure what you are trying to achieve since your code throws an error if the target height is not equal to the source height. Anyway, here is a function that resizes an image based on the maximum value of each subsample area. It's about 3-5 times faster than your code.
def pixel_max_resize(img, h, w):
source_h, source_w = img.shape
return img.reshape(h,source_h // h,-1,source_w // w).swapaxes(1,2).reshape(h,w,-1).max(axis=2)
(Caveat: source width and height must be an integer multiple of target width and height respectively)
Explanation:
The source 2d image is partitioned into a 3d array so that the first and second axis have the size of the target width and height and the third axis contains the values of all pixels to be subsampled for one target pixel. max() over this axis returns the maximum value for each subsample.

Efficient way to map a function over numpy matrix?

I'm trying to reproduce an algorithm over image, but failed to achieve the performance of PIL on Python.
For simplity, we take interpolation as an example.Supposed we have a matrix Im of luminance. For any point (x,y), we can compute the interpolation value by g(x,y)=f(floor(x),floor(y))+f(floor(x)+1,floor(y))+f(floor(x),floor(y)+1)+f(floor(x)+1,floor(y)+1) /4
Here is part of code. It takes tens of seconds to resize an image, that's inefficient. Also, it's not a element-wise mapping function. It involves with the whole matrix, or more precisely, the neighbour points of each point.
im = np.matrix(...) #A 512*512 image
axis = [x/(2047/511.) for x in xrange(2048)]
axis = [(x,y) for x in axis for y in axis] #resize to 2048*2048
im_temp = []
for (x, y) in axis:
(l, k) = np.floor((x, y)).astype(int)
a, b = x-l, y-k
temp = (1-a)*(1-b)*im[l+1,k+1] + a*(1-b)*im[l+2,k+1] + (1-a)*b*im[l+1,k+2] + a*b*im[l+2,k+2]
im_temp.append(temp)
np.asmatrix(im_temp).reshape((2048,2048)).astype(int)
How can we implement this algorithm in a more efficient way instead of 2 for loop?

Efficient processing of pixel + neighborhood in numpy image

I have a range image of a scene. I traverse the image and calculate the average change in depth under the detection window. The detection windows changes size based on the average depth of the surrounding pixels of the current location. I accumulate the average change to produce a simple response image.
Most of the time is spent in the for loop, it is taking about 40+s for a 512x52 image on my machine. I was hoping for some speed up. Is there a more efficient/faster way to traverse the image? Is there a better pythonic/numpy/scipy way to visit each pixel? Or shall I go learn cython?
EDIT: I have reduced running time to about 18s by using scipy.misc.imread() instead of skimage.io.imread(). Not sure what the difference is, I will try to investigate.
Here is a simplified version of the code:
import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time
def intersect(a, b):
'''Determine the intersection of two rectangles'''
rect = (0,0,0,0)
r0 = max(a[0],b[0])
c0 = max(a[1],b[1])
r1 = min(a[2],b[2])
c1 = min(a[3],b[3])
# Do we have a valid intersection?
if r1 > r0 and c1 > c0:
rect = (r0,c0,r1,c1)
return rect
# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src) # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0) # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with
# Image to accumulate response
out_img = np.zeros(depth_src.shape)
# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044) # width, height
start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
# Find points around current pixel
r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)
# Calculate average of depth of points around current pixel
scale = integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0
# Based on average depth, create the detection window
r0 = r - (model[0] * scale/2)
c0 = c - (model[1] * scale/2)
r1 = r + (model[0] * scale/2)
c1 = c + (model[1] * scale/2)
# Used scale optimised detection window to extract features
r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
depth_count = integrate(depth_pts,r0,c0,r1,c1)
if depth_count:
depth_sum = integrate(depth_intg,r0,c0,r1,c1)
avg_change = depth_sum / depth_count
# Accumulate response
out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"
plt.imshow(out_img)
plt.gray()
plt.show()
Michael, interesting question. It seems that the main performance problem you have is that each pixel in the image has two integrate() functions computed on it, one of size 3x3 and the other of a size which is not known in advance. Calculating individual integrals in this way is extremely inefficient, regardless of what numpy functions you use; it's an algorithmic issue, not an implementation issue. Consider an image of size NN. You can calculate all integrals of any size KK in that image using only approximately 4*NN operations, not (as one might naively expect) NNKK. The way you do that is first calculate an image of sliding sums over a window K in each row, and then sliding sums over the result in each column. Updating each sliding sum to move to the next pixel requires only adding the newest pixel in the current window and subtracting the oldest pixel in the previous window, thus two operations per pixel regardless of window size. We do have to do that twice (for rows and columns), therefore 4 operations per pixel.
I am not sure if there is a sliding window sum built into numpy, but this answer suggests a couple of ways to do it, using stride tricks: https://stackoverflow.com/a/12713297/1828289. You can certainly accomplish the same with one loop over columns and one loop over rows (taking slices to extract a row/column).
Example:
# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
if i > K:
row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
elif i > 1:
row_sums[i,:] = row_sums[i-1,:] + img[i,:]
else: # i == 0
row_sums[i,:] = img[i,:]
col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
if j > K:
col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
elif j > 1:
col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
else: # j == 0
col_sums[:,j] = row_sums[:,j]
# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored
How do you best apply that to your case? I think you might want to pre-compute the integrals for 3x3 (average depth) and also for several larger sizes, and use the value of the 3x3 to select one of the larger sizes for the detection window (assuming I understand the intent of your algorithm). The range of larger sizes you need might be limited, or artificially limiting it might still work acceptably well, just pick the nearest size. Calculating all integrals together using sliding sums is so much more efficient that I am almost certain it is worth calculating them for a lot of sizes you would never use at a particular pixel, especially if some of the sizes are large.
P.S. This is a minor addition, but you may want to avoid calling intersect() for every pixel: either (a) only process pixels which are farther from the edge than the max integral size, or (b) add margins to the image of the max integral size on all sides, filling the margins with either zeros or nans, or (c) (best approach) use slices to take care of this automatically: a slice index outside the boundary of an ndarray is automatically limited to the boundary, except of course negative indexes are wrapped around.
EDIT: added example of sliding window sums

Categories