I want to reduce one size of an image by keeping only the maximum pixel value of each pixel set. I implemented this in python :
def pixel_max_resize(img, h, w):
imr = np.zeros((h,w), dtype=np.uint8)
r = int(h/w)
for j in range(0,w):
imr[:,j] = np.amax(img[:,j*r:j*r+r], axis = 1)
return imr
This function is a lot slower than a cv2.resize of the same size (by a factor of 5-10). Anyone has an idea how to optimize the speed of this function ? Is there a list comprehension formulation that could speed up the process ?
I'm not 100% sure what you are trying to achieve since your code throws an error if the target height is not equal to the source height. Anyway, here is a function that resizes an image based on the maximum value of each subsample area. It's about 3-5 times faster than your code.
def pixel_max_resize(img, h, w):
source_h, source_w = img.shape
return img.reshape(h,source_h // h,-1,source_w // w).swapaxes(1,2).reshape(h,w,-1).max(axis=2)
(Caveat: source width and height must be an integer multiple of target width and height respectively)
Explanation:
The source 2d image is partitioned into a 3d array so that the first and second axis have the size of the target width and height and the third axis contains the values of all pixels to be subsampled for one target pixel. max() over this axis returns the maximum value for each subsample.
Related
I would like to compute the average luminescence value vs distance to the center of an image. The approach I am thinking about is to
compute the distance between pixels in image and image center
group pixels with same distance
compute the average value of pixels for each group
plot graph of distance vs average intensity
To compute first step I use this function:
dist_img = np.zeros(gray.shape, dtype=np.uint8)
for y in range(0, h):
for x in range(0, w):
cy = gray.shape[0]/2
cx = gray.shape[1]/2
dist = math.sqrt(((x-cx)**2)+((y-cy)**2))
dist_img[y,x] = dist
Unfortunately id does give different result from the one which I compute from here
distance = math.sqrt(((1 - gray.shape[0]/2)**2 )+((1 - gray.shape[1]/2 )**2))
when I test it for pixel (1,1) I receive 20 from first code and 3605 from second.
I would appreciate suggestions on the how to correct the loop and hints on how to start with other points.Or maybe there is other way to achieve what I would like to.
You are setting up dist_img with an np.uint8 dtype. This 8 Bit unsigned integer can fit values between 0 and 255, thus 3605 can not be properly represented. Use a higher bith depth for your distance image dtype, like np.uint32.
distance = math.sqrt(((1 - gray.shape[0]/2)**2 )+((1 - gray.shape[1]/2 )**2))
Careful: gray.shape will give you (height, width) or (y, x). The other code correctly assigns gray.shape[0]/2 to the y center, this one mixes it up and uses the height for the x coordinate.
Your algorithm seems good enough, I would suggest you stick with it. You can achieve something similar to the first two steps by converting the image to polar space (e.g. with OpenCV linearToPolar), but that may be harder to debug.
I have an array of bytes which are the RGB values of an image. For example the first three bytes of the array will be the RGB value of the top left pixel. i.e. a[0] is R, a[1] is G and a[2] is B.
This image is actually a grid of images, typically arranged in 2x2 form. Here's an example.
I'm currently using PIL to split the image into 4 sub-images. This is the code I'm currently using.
def split_image_to_tiles(im, grid_width, grid_height):
#This treats the image `im` as a square grid of images.
w, h = im.size
w_step = w / grid_width
h_step = h / grid_height
tiles = []
for y in xrange(0, grid_height):
for x in xrange(0, grid_width):
x1 = x * w_step
y1 = y * h_step
x2 = x1 + w_step
y2 = y1 + h_step
t = im.crop((x1, y1, x2, y2))
tiles.append(t)
return tiles
This works, but it isn't particularly fast. Is there a better or faster way?
This may not answer your question since I'm using Numpy and OpenCV, but I just had a very similar problem, where I wanted to split a grayscale image into a 2D array of tiles/subimages. I ended up doing it with
height, width = image.shape
tiles = image.reshape((GRID_HEIGHT, height/GRID_HEIGHT,
GRID_WIDTH, width/GRID_WIDTH)).swapaxes(1, 2)
For a color image with more than one channel:
height, width = image.shape[:2]
tiles = image.reshape((GRID_HEIGHT, height/GRID_HEIGHT,
GRID_WIDTH, width/GRID_WIDTH, image.shape[2])).swapaxes(1, 2)
After that you could simply do tiles[y, x] to reference the tile at that index.
Especially compared to other image processing operations, this method is effectively instantaneous.
In fact in terms of Complexity you can do nothing, it will keep going as a O(N) complexity where N is the number of the Tiles that you wanna get from the Image.
Regarding this, you should run a profiler to realizer where the time is really spending. And as you can guess the im.crop is the method where the most of time the CPU is stuck.
This is a typical CPU bound problem, the only way to find out a better approximation without try to optimize the cropping by it self is use as many processes as many tiles you wanna get. Why processes ?
In that case where we don't have IO bound the GIL matters, and we want to make sure that each Python interpreter gets the CPU without problems.
Then, my recomendation is use the multiprocessing Python module.
I have a range image of a scene. I traverse the image and calculate the average change in depth under the detection window. The detection windows changes size based on the average depth of the surrounding pixels of the current location. I accumulate the average change to produce a simple response image.
Most of the time is spent in the for loop, it is taking about 40+s for a 512x52 image on my machine. I was hoping for some speed up. Is there a more efficient/faster way to traverse the image? Is there a better pythonic/numpy/scipy way to visit each pixel? Or shall I go learn cython?
EDIT: I have reduced running time to about 18s by using scipy.misc.imread() instead of skimage.io.imread(). Not sure what the difference is, I will try to investigate.
Here is a simplified version of the code:
import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time
def intersect(a, b):
'''Determine the intersection of two rectangles'''
rect = (0,0,0,0)
r0 = max(a[0],b[0])
c0 = max(a[1],b[1])
r1 = min(a[2],b[2])
c1 = min(a[3],b[3])
# Do we have a valid intersection?
if r1 > r0 and c1 > c0:
rect = (r0,c0,r1,c1)
return rect
# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src) # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0) # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with
# Image to accumulate response
out_img = np.zeros(depth_src.shape)
# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044) # width, height
start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
# Find points around current pixel
r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)
# Calculate average of depth of points around current pixel
scale = integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0
# Based on average depth, create the detection window
r0 = r - (model[0] * scale/2)
c0 = c - (model[1] * scale/2)
r1 = r + (model[0] * scale/2)
c1 = c + (model[1] * scale/2)
# Used scale optimised detection window to extract features
r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
depth_count = integrate(depth_pts,r0,c0,r1,c1)
if depth_count:
depth_sum = integrate(depth_intg,r0,c0,r1,c1)
avg_change = depth_sum / depth_count
# Accumulate response
out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"
plt.imshow(out_img)
plt.gray()
plt.show()
Michael, interesting question. It seems that the main performance problem you have is that each pixel in the image has two integrate() functions computed on it, one of size 3x3 and the other of a size which is not known in advance. Calculating individual integrals in this way is extremely inefficient, regardless of what numpy functions you use; it's an algorithmic issue, not an implementation issue. Consider an image of size NN. You can calculate all integrals of any size KK in that image using only approximately 4*NN operations, not (as one might naively expect) NNKK. The way you do that is first calculate an image of sliding sums over a window K in each row, and then sliding sums over the result in each column. Updating each sliding sum to move to the next pixel requires only adding the newest pixel in the current window and subtracting the oldest pixel in the previous window, thus two operations per pixel regardless of window size. We do have to do that twice (for rows and columns), therefore 4 operations per pixel.
I am not sure if there is a sliding window sum built into numpy, but this answer suggests a couple of ways to do it, using stride tricks: https://stackoverflow.com/a/12713297/1828289. You can certainly accomplish the same with one loop over columns and one loop over rows (taking slices to extract a row/column).
Example:
# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
if i > K:
row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
elif i > 1:
row_sums[i,:] = row_sums[i-1,:] + img[i,:]
else: # i == 0
row_sums[i,:] = img[i,:]
col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
if j > K:
col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
elif j > 1:
col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
else: # j == 0
col_sums[:,j] = row_sums[:,j]
# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored
How do you best apply that to your case? I think you might want to pre-compute the integrals for 3x3 (average depth) and also for several larger sizes, and use the value of the 3x3 to select one of the larger sizes for the detection window (assuming I understand the intent of your algorithm). The range of larger sizes you need might be limited, or artificially limiting it might still work acceptably well, just pick the nearest size. Calculating all integrals together using sliding sums is so much more efficient that I am almost certain it is worth calculating them for a lot of sizes you would never use at a particular pixel, especially if some of the sizes are large.
P.S. This is a minor addition, but you may want to avoid calling intersect() for every pixel: either (a) only process pixels which are farther from the edge than the max integral size, or (b) add margins to the image of the max integral size on all sides, filling the margins with either zeros or nans, or (c) (best approach) use slices to take care of this automatically: a slice index outside the boundary of an ndarray is automatically limited to the boundary, except of course negative indexes are wrapped around.
EDIT: added example of sliding window sums
So, I'm learning my self python by this tutorial and I'm stuck with exercise number 13 which says:
Write a function to uniformly shrink or enlarge an image. Your function should take an image along with a scaling factor. To shrink the image the scale factor should be between 0 and 1 to enlarge the image the scaling factor should be greater than 1.
This is not meant as a question about PIL, but to ask which algorithm to use so I can code it myself.
I've found some similar questions like this, but I dunno how to translate this into python.
Any help would be appreciated.
I've come to this:
import image
win = image.ImageWin()
img = image.Image("cy.png")
factor = 2
W = img.getWidth()
H = img.getHeight()
newW = int(W*factor)
newH = int(H*factor)
newImage = image.EmptyImage(newW, newH)
for col in range(newW):
for row in range(newH):
p = img.getPixel(col,row)
newImage.setPixel(col*factor,row*factor,p)
newImage.draw(win)
win.exitonclick()
I should do this in a function, but this doesn't matter right now. Arguments for function would be (image, factor). You can try it on OP tutorial in ActiveCode. It makes a stretched image with empty columns :.
Your code as shown is simple and effective for what's known as a Nearest Neighbor resize, except for one little bug:
p = img.getPixel(col/factor,row/factor)
newImage.setPixel(col,row,p)
Edit: since you're sending a floating point coordinate into getPixel you're not limited to Nearest Neighbor - you can implement any interpolation algorithm you want inside. The simplest thing to do is simply truncate the coordinates to int which will cause pixels to be replicated when factor is greater than 1, or skipped when factor is less than 1.
Mark has the correct approach. To get a smoother result, you replace:
p = img.getPixel(col/factor,row/factor)
with a function that takes floating point coordinates and returns a pixel interpolated from several neighboring points in the source image. For linear interpolation it takes the four nearest neigbors; for higher-order interpolation it takes a larger number of surrounding pixels.
For example, if col/factor = 3.75 and row/factor = 1.9, a linear interpolation would take the source pixels at (3,1), (3,2), (4,1), and (4,2) and give a result between those 4 rgb values, weighted most heavily to the pixel at (4,2).
You can do that using the Python Imaging Library.
Image.resize() should do what you want.
See http://effbot.org/imagingbook/image.htm
EDIT
Since you want to program this yourself without using a module, I have added an extra solution.
You will have to use the following algorithm.
load your image
extract it's size
calculate the desired size (height * factor, width * factor)
create a new EmptyImage with the desired size
Using a nested loop through the pixels (row by column) in your image.
Then (for shrinking) you remove some pixels every once in while, or for (enlarging) you duplicate some pixels in your image.
If you want you want to get fancy, you could smooth the added, or removed pixels, by averaging the rgb values with their neighbours.
I have a code which creates a square image with dimensions 4x4 arcsec running from -2 arcsec to +2 arcsec and is created on an 80x80 grid. To this I want to add another image.
This second image is created through a FFT of an 80x80 grid and thus starts out in Fourier space. After the FFT, I want the image to have exactly the same dimensions in real space as the first image.
Because Fourier space represents the scales and the wavenumber is defined as k = 2pi/x (although in this case the numpy.fft uses the definition where I think k = 1/x), I thought the largest scale would have to have the smallest k-value and the smallest scale the largest k-value.
So if x_max = 2 (the dimensions in the x-direction of the first image) and dim_x = 80 (the number of columns in the grid):
k_x,max = 1/(2*x_max/dim_x)
k_x,min = 1/(2*x_max)
and let the grid in Fourier-space run from k_x,min to k_x,max (same for the y-direction)
I hope I explained this clearly enough, but I haven't been able to find any confirmation or explanation for this in the literature about FFT's and would really like to know if this correct.
Thanks in advance
This is not correct. The k-space values will range from -N/2*omega_0 to (N-1)/2*omega_0, where omega_0 is the inverse of the sample length, given by 2*pi/(max(x)-min(x)) and N is the number of samples. So for your case you get something along the lines of this:
N = len(x)
dx = x[-1]-x[0]
k = np.linspace(-N*pi/dx, (N+1)*pi/dx, N)