Speed up this interpolation in python

Speed up this interpolation in python - python

I have an image processing problem I'm currently solving in python, using numpy and scipy. Briefly, I have an image that I want to apply many local contractions to. My prototype code is working, and the final images look great. However, processing time has become a serious bottleneck in our application. Can you help me speed up my image processing code?
I've tried to boil down our code to the 'cartoon' version below. Profiling suggests that I'm spending most of my time on interpolation. Are there obvious ways to speed up execution?
import cProfile, pstats
import numpy
from scipy.ndimage import interpolation
def get_centered_subimage(
center_point, window_size, image):
x, y = numpy.round(center_point).astype(int)
xSl = slice(max(x-window_size-1, 0), x+window_size+2)
ySl = slice(max(y-window_size-1, 0), y+window_size+2)
subimage = image[xSl, ySl]
interpolation.shift(
subimage, shift=(x, y)-center_point, output=subimage)
return subimage[1:-1, 1:-1]
"""In real life, this is experimental data"""
im = numpy.zeros((1000, 1000), dtype=float)
"""In real life, this mask is a non-zero pattern"""
window_radius = 10
mask = numpy.zeros((2*window_radius+1, 2*window_radius+1), dtype=float)
"""The x, y coordinates in the output image"""
new_grid_x = numpy.linspace(0, im.shape[0]-1, 2*im.shape[0])
new_grid_y = numpy.linspace(0, im.shape[1]-1, 2*im.shape[1])
"""The grid we'll end up interpolating onto"""
grid_step_x = new_grid_x[1] - new_grid_x[0]
grid_step_y = new_grid_y[1] - new_grid_y[0]
subgrid_radius = numpy.floor(
(-1 + window_radius * 0.5 / grid_step_x,
-1 + window_radius * 0.5 / grid_step_y))
subgrid = (
window_radius + 2 * grid_step_x * numpy.arange(
-subgrid_radius[0], subgrid_radius[0] + 1),
window_radius + 2 * grid_step_y * numpy.arange(
-subgrid_radius[1], subgrid_radius[1] + 1))
subgrid_points = ((2*subgrid_radius[0] + 1) *
(2*subgrid_radius[1] + 1))
"""The coordinates of the set of spots we we want to contract. In real
life, this set is non-random:"""
numpy.random.seed(0)
num_points = 10000
center_points = numpy.random.random(2*num_points).reshape(num_points, 2)
center_points[:, 0] *= im.shape[0]
center_points[:, 1] *= im.shape[1]
"""The output image"""
final_image = numpy.zeros(
(new_grid_x.shape[0], new_grid_y.shape[0]), dtype=numpy.float)
def profile_me():
for m, cp in enumerate(center_points):
"""Take an image centered on each illumination point"""
spot_image = get_centered_subimage(
center_point=cp, window_size=window_radius, image=im)
if spot_image.shape != (2*window_radius+1, 2*window_radius+1):
continue #Skip to the next spot
"""Mask the image"""
masked_image = mask * spot_image
"""Resample the image"""
nearest_grid_index = numpy.round(
(cp - (new_grid_x[0], new_grid_y[0])) /
(grid_step_x, grid_step_y))
nearest_grid_point = (
(new_grid_x[0], new_grid_y[0]) +
(grid_step_x, grid_step_y) * nearest_grid_index)
new_coordinates = numpy.meshgrid(
subgrid[0] + 2 * (nearest_grid_point[0] - cp[0]),
subgrid[1] + 2 * (nearest_grid_point[1] - cp[1]))
resampled_image = interpolation.map_coordinates(
masked_image,
(new_coordinates[0].reshape(subgrid_points),
new_coordinates[1].reshape(subgrid_points))
).reshape(2*subgrid_radius[1]+1,
2*subgrid_radius[0]+1).T
"""Add the recentered image back to the scan grid"""
final_image[
nearest_grid_index[0]-subgrid_radius[0]:
nearest_grid_index[0]+subgrid_radius[0]+1,
nearest_grid_index[1]-subgrid_radius[1]:
nearest_grid_index[1]+subgrid_radius[1]+1,
] += resampled_image
cProfile.run('profile_me()', 'profile_results')
p = pstats.Stats('profile_results')
p.strip_dirs().sort_stats('cumulative').print_stats(10)
Vague explanation of what the code does:
We start with a pixellated 2D image, and a set of arbitrary (x, y) points in our image that don't generally fall on an integer grid. For each (x, y) point, I want to multiply the image by a small mask centered precisely on that point. Next we contract/expand the masked region by a finite amount, before finally adding this processed sub-image to a final image, which may not have the same pixel size as the original image. (Not my finest explanation. Ah well).

I'm pretty sure that, as you said, the bulk of the calculation time happens in interpolate.map_coordinates(…), which gets called once for every iteration on center_points, here 10,000 times. Generally, working with the numpy/scipy stack, you want the repetitive task over a large array to happen in native Numpy/Scipy functions -- i.e. in a C loop over homogeneous data -- as opposed to explicitely in Python.
One strategy that might speed up the interpolation, but that will also increase the amount of memory used, is :
First, fetch all the subimages (here named masked_image) in a 3-dimensional array (window_radius x window_radius x center_points.size)
Make a ufunc (read that, it's useful) that wraps the work that has to be done on each subimage, using numpy.frompyfunc, which should return another 3-dimensional array (subgrid_radius[0] x subgrid_radius[1] x center_points.size). In short, this creates a vectorized version of the python function, that can be broadcast element-wise on an array.
Build the final image by summing over the third dimension.
Hope that gets you closer to your goals!

Related

Padding scipy affine_transform output to show non-overlapping regions of transformed images

I have source (src) image(s) I wish to align to a destination (dst) image using an Affine Transformation whilst retaining the full extent of both images during alignment (even the non-overlapping areas).
I am already able to calculate the Affine Transformation rotation and offset matrix, which I feed to scipy.ndimage.interpolate.affine_transform to recover the dst-aligned src image.
The problem is that, when the images are not fuly overlapping, the resultant image is cropped to only the common footprint of the two images. What I need is the full extent of both images, placed on the same pixel coordinate system. This question is almost a duplicate of this one - and the excellent answer and repository there provides this functionality for OpenCV transformations. I unfortunately need this for scipy's implementation.
Much too late, after repeatedly hitting a brick wall trying to translate the above question's answer to scipy, I came across this issue and subsequently followed to this question. The latter question did give some insight into the wonderful world of scipy's affine transformation, but I have as yet been unable to crack my particular needs.
The transformations from src to dst can have translations and rotation. I can get translations only working (an example is shown below) and I can get rotations only working (largely hacking around the below and taking inspiration from the use of the reshape argument in scipy.ndimage.interpolation.rotate). However, I am getting thoroughly lost combining the two. I have tried to calculate what should be the correct offset (see this question's answers again), but I can't get it working in all scenarios.
Translation-only working example of padded affine transformation, which follows largely this repo, explained in this answer:
from scipy.ndimage import rotate, affine_transform
import numpy as np
import matplotlib.pyplot as plt
nblob = 50
shape = (200, 100)
buffered_shape = (300, 200) # buffer for rotation and translation
def affine_test(angle=0, translate=(0, 0)):
np.random.seed(42)
# Maxiumum translation allowed is half difference between shape and buffered_shape
# Generate a buffered_shape-sized base image with random blobs
base = np.zeros(buffered_shape, dtype=np.float32)
random_locs = np.random.choice(np.arange(2, buffered_shape[0] - 2), nblob * 2, replace=False)
i = random_locs[:nblob]
j = random_locs[nblob:]
for k, (_i, _j) in enumerate(zip(i, j)):
# Use different values, just to make it easier to distinguish blobs
base[_i - 2 : _i + 2, _j - 2 : _j + 2] = k + 10
# Impose a rotation and translation on source
src = rotate(base, angle, reshape=False, order=1, mode="constant")
bsc = (np.array(buffered_shape) / 2).astype(int)
sc = (np.array(shape) / 2).astype(int)
src = src[
bsc[0] - sc[0] + translate[0] : bsc[0] + sc[0] + translate[0],
bsc[1] - sc[1] + translate[1] : bsc[1] + sc[1] + translate[1],
]
# Cut-out destination from the centre of the base image
dst = base[bsc[0] - sc[0] : bsc[0] + sc[0], bsc[1] - sc[1] : bsc[1] + sc[1]]
src_y, src_x = src.shape
def get_matrix_offset(centre, angle, scale):
"""Follows OpenCV.getRotationMatrix2D"""
angle = angle * np.pi / 180
alpha = scale * np.cos(angle)
beta = scale * np.sin(angle)
return (
np.array([[alpha, beta], [-beta, alpha]]),
np.array(
[
(1 - alpha) * centre[0] - beta * centre[1],
beta * centre[0] + (1 - alpha) * centre[1],
]
),
)
# Obtain the rotation matrix and offset that describes the transformation
# between src and dst
matrix, offset = get_matrix_offset(np.array([src_y / 2, src_x / 2]), angle, 1)
offset = offset - translate
# Determine the outer bounds of the new image
lin_pts = np.array([[0, src_x, src_x, 0], [0, 0, src_y, src_y]])
transf_lin_pts = np.dot(matrix.T, lin_pts) - offset[::-1].reshape(2, 1)
# Find min and max bounds of the transformed image
min_x = np.floor(np.min(transf_lin_pts[0])).astype(int)
min_y = np.floor(np.min(transf_lin_pts[1])).astype(int)
max_x = np.ceil(np.max(transf_lin_pts[0])).astype(int)
max_y = np.ceil(np.max(transf_lin_pts[1])).astype(int)
# Add translation to the transformation matrix to shift to positive values
anchor_x, anchor_y = 0, 0
if min_x < 0:
anchor_x = -min_x
if min_y < 0:
anchor_y = -min_y
shifted_offset = offset - np.dot(matrix, [anchor_y, anchor_x])
# Create padded destination image
dst_h, dst_w = dst.shape[:2]
pad_widths = [anchor_y, max(max_y, dst_h) - dst_h, anchor_x, max(max_x, dst_w) - dst_w]
dst_padded = np.pad(
dst,
((pad_widths[0], pad_widths[1]), (pad_widths[2], pad_widths[3])),
"constant",
constant_values=-1,
)
dst_pad_h, dst_pad_w = dst_padded.shape
# Create the aligned and padded source image
source_aligned = affine_transform(
src,
matrix.T,
offset=shifted_offset,
output_shape=(dst_pad_h, dst_pad_w),
order=3,
mode="constant",
cval=-1,
)
# Plot the images
fig, axes = plt.subplots(1, 4, figsize=(10, 5), sharex=True, sharey=True)
axes[0].imshow(src, cmap="viridis", vmin=-1, vmax=nblob)
axes[0].set_title("Source")
axes[1].imshow(dst, cmap="viridis", vmin=-1, vmax=nblob)
axes[1].set_title("Dest")
axes[2].imshow(source_aligned, cmap="viridis", vmin=-1, vmax=nblob)
axes[2].set_title("Source aligned to Dest padded")
axes[3].imshow(dst_padded, cmap="viridis", vmin=-1, vmax=nblob)
axes[3].set_title("Dest padded")
plt.show()
e.g.:
affine_test(0, (-20, 40))
gives:
With a zoom in showing the aligned in the padded images:
I require the full extent of the src and dst images aligned on the same pixel coordinates, with both rotations and translations.
Any help is greatly appreciated!

Complexity analysis
The problem is to determine three parameters
Let's suppose that you have a grid for angle, x and y displacements, each with size O(n) and that your images are of size O(n x n) so, rotation, translation, and comparison of the images all take O(n^2), since you have O(n^3) candidate transforms to try, you end up with complexity O(n^5), and probably that's why you are asking the question.
However the part of the displacement can be computed slightly more efficiently by computing maximum correlation using Fourier transforms. The Fourier transforms can be performed with complexity O(n log n) each axis, and we have to perform them to the two spatial dimensions, the complete correlation matrix can be computed in O(n^2 log^2 n), then we find the maximum with complexity O(n^2), so the overall time complexity of determining the best alignment is O(n^2 log^2 n). However you still want to search for the best angle, since we have O(n) candidate angles the overall complexity of this search will be O(n^3 log^2 n). Remember we are using python and we may have some significant overhead, so this complexity only gives us an idea of how difficult it will be, and I have handled problems like this before so I start confident.
Preparing some example
I will start by downloading an image and applying rotation and centering the image padding with zeros.
def centralized(a, width, height):
'''
Image centralized to the given width and height
by padding with zeros (black)
'''
assert width >= a.shape[0] and height >= a.shape[1]
ap = np.zeros((width, height) + a.shape[2:], a.dtype)
ccx = (width - a.shape[0])//2
ccy = (height - a.shape[1])//2
ap[ccx:ccx+a.shape[0], ccy:ccy+a.shape[1], ...] = a
return ap
def image_pair(im, width, height, displacement=(0,0), angle=0):
'''
this build an a pair of images as numpy arrays
from the input image.
Both images will be padded with zeros (black)
and roughly centralized.
and will have the specified shape
make sure that the width and height chosen are enough
to fit the rotated image
'''
a = np.array(im)
a1 = centralized(a, width, height)
a2 = centralized(ndimage.rotate(a, angle), width, height)
a2 = np.roll(a2, displacement, axis=(0,1))
return a1, a2
def random_transform():
angle = np.random.rand() * 360
displacement = np.random.randint(-100, 100, 2)
return displacement, angle
a1, a2 = image_pair(im, 512, 512, *random_transform())
plt.subplot(121)
plt.imshow(a1)
plt.subplot(122)
plt.imshow(a2)
The displacement search
The first thing is to compute the correlation of the image
def compute_correlation(a1, a2):
A1 = np.fft.rfftn(a1, axes=(0,1))
A2 = np.fft.rfftn(a2, axes=(0,1))
C = np.fft.irfftn(np.sum(A1 * np.conj(A2), axis=2))
return C
Then, let's create an example without rotation and confirm that the with the index of the maximum correlation we can find the displacement that fit one image to the other.
displacement, _ = random_transform()
a1, a2 = image_pair(im, 521, 512, displacement, angle=0)
C = compute_correlation(a1, a2)
np.unravel_index(np.argmax(C), C.shape), displacement
a3 = np.roll(a2, np.unravel_index(np.argmax(C), C.shape), axis=(0,1))
assert np.all(a3 == a1)
With rotation or interpolation this result may not be exact but it gives the displacement that will give us the closest possible alignment.
Let's put this in a function for future use
def get_aligned(a1, a2, angle):
a1_rotated = ndimage.rotate(a1, angle, reshape=False)
C = compute_correlation(a2, a1_rotated)
found_displacement = np.unravel_index(np.argmax(C), C.shape)
a1_aligned = np.roll(a1_rotated, found_displacement, axis=(0,1))
return a1_aligned
Searching for the angle
Now we can do something in two steps,
in one we compute the correlation for each angle, then with the angle that gives maximum correlation find the alignment.
displacement, angle = random_transform()
a1, a2 = image_pair(im, 521, 512, displacement, angle)
C_max = []
C_argmax = []
angle_guesses = np.arange(0, 360, 5)
for angle_guess in angle_guesses:
a1_rotated = ndimage.rotate(a1, angle_guess, reshape=False)
C = compute_correlation(a1_rotated, a2)
i = np.argmax(C)
v = C.reshape(-1)[i]
C_max.append(v)
C_argmax.append(i)
Let's see how the correlation looks like
plt.plot(angle_guesses, C_max);
We have a clear winner looking at this curve, even if a sunflower has some sort of rotation symmetry.
Let's apply the transformation to the original image and see how it looks like
a1_aligned = get_aligned(a1, a2, angle_guesses[np.argmax(C_max)])
plt.subplot(121)
plt.imshow(a2)
plt.subplot(122)
plt.imshow(a1_aligned)
Great, I wouldn't have done better than this manually.
I am using a sunflower image for beauty reasons, but the procedure is the same for any type of image. I use RGB showing that the image may have one additional dimension, i.e. it uses a feature vector, instead of the scalar feature, you can use reshape your data to (width, height, 1) if your feature is a scalar.

Working code below in case anyone else has this need of scipy's affine transformations:
def affine_test(angle=0, translate=(0, 0), shape=(200, 100), buffered_shape=(300, 200), nblob=50):
# Maxiumum translation allowed is half difference between shape and buffered_shape
np.random.seed(42)
# Generate a buffered_shape-sized base image
base = np.zeros(buffered_shape, dtype=np.float32)
random_locs = np.random.choice(np.arange(2, buffered_shape[0] - 2), nblob * 2, replace=False)
i = random_locs[:nblob]
j = random_locs[nblob:]
for k, (_i, _j) in enumerate(zip(i, j)):
base[_i - 2 : _i + 2, _j - 2 : _j + 2] = k + 10
# Impose a rotation and translation on source
src = rotate(base, angle, reshape=False, order=1, mode="constant")
bsc = (np.array(buffered_shape) / 2).astype(int)
sc = (np.array(shape) / 2).astype(int)
src = src[
bsc[0] - sc[0] + translate[0] : bsc[0] + sc[0] + translate[0],
bsc[1] - sc[1] + translate[1] : bsc[1] + sc[1] + translate[1],
]
# Cut-out destination from the centre of the base image
dst = base[bsc[0] - sc[0] : bsc[0] + sc[0], bsc[1] - sc[1] : bsc[1] + sc[1]]
src_y, src_x = src.shape
def get_matrix_offset(centre, angle, scale):
"""Follows OpenCV.getRotationMatrix2D"""
angle_rad = angle * np.pi / 180
alpha = np.round(scale * np.cos(angle_rad), 8)
beta = np.round(scale * np.sin(angle_rad), 8)
return (
np.array([[alpha, beta], [-beta, alpha]]),
np.array(
[
(1 - alpha) * centre[0] - beta * centre[1],
beta * centre[0] + (1 - alpha) * centre[1],
]
),
)
matrix, offset = get_matrix_offset(np.array([((src_y - 1) / 2) - translate[0], ((src_x - 1) / 2) - translate[
1]]), angle, 1)
offset += np.array(translate)
M = np.column_stack((matrix, offset))
M = np.vstack((M, [0, 0, 1]))
iM = np.linalg.inv(M)
imatrix = iM[:2, :2]
ioffset = iM[:2, 2]
# Determine the outer bounds of the new image
lin_pts = np.array([[0, src_y-1, src_y-1, 0], [0, 0, src_x-1, src_x-1]])
transf_lin_pts = np.dot(matrix, lin_pts) + offset.reshape(2, 1) # - np.array(translate).reshape(2, 1) # both?
# Find min and max bounds of the transformed image
min_x = np.floor(np.min(transf_lin_pts[1])).astype(int)
min_y = np.floor(np.min(transf_lin_pts[0])).astype(int)
max_x = np.ceil(np.max(transf_lin_pts[1])).astype(int)
max_y = np.ceil(np.max(transf_lin_pts[0])).astype(int)
# Add translation to the transformation matrix to shift to positive values
anchor_x, anchor_y = 0, 0
if min_x < 0:
anchor_x = -min_x
if min_y < 0:
anchor_y = -min_y
dot_anchor = np.dot(imatrix, [anchor_y, anchor_x])
shifted_offset = ioffset - dot_anchor
# Create padded destination image
dst_y, dst_x = dst.shape[:2]
pad_widths = [anchor_y, max(max_y, dst_y) - dst_y, anchor_x, max(max_x, dst_x) - dst_x]
dst_padded = np.pad(
dst,
((pad_widths[0], pad_widths[1]), (pad_widths[2], pad_widths[3])),
"constant",
constant_values=-10,
)
dst_pad_y, dst_pad_x = dst_padded.shape
# Create the aligned and padded source image
source_aligned = affine_transform(
src,
imatrix,
offset=shifted_offset,
output_shape=(dst_pad_y, dst_pad_x),
order=3,
mode="constant",
cval=-10,
)
E.g. running:
affine_test(angle=-25, translate=(10, -40))
will show:
and zoomed in:
Apologies the code is not nicely written as is.
Note that running this in the wild I notice it cannot handle any change in scale size of the images, but I am not certain it isn't something to do with how I calculate the transformation - so a caveat worth noting, and checking out, if you are aligning images with different scales.

If you have two images that are similar (or the same) and you want to align them, you can do it using both functions rotate and shift :
from scipy.ndimage import rotate, shift
You need to find first the difference of angle between the two images angle_to_rotate, having that you apply a rotation to src:
angle_to_rotate = 25
rotated_src = rotate(src, angle_to_rotate , reshape=True, order=1, mode="constant")
With reshape=True you avoid losing information from your original src matrix, and it pads the result so the image could be translated around the 0,0 indexes. You can calculate this translation as it is (x*cos(angle),y*sin(angle) where x and y are the dimensions of the image, but it probably won't matter.
Now you will need to translate the image to the source, for doing that you can use the shift function:
rot_translated_src = shift(rotated_src , [distance_x, distance_y])
In this case there is no reshape (because otherwise you wouldn't have any real translation) so if the image was not previously padded some information will be lost.
But you can do some padding with
np.pad(src, number, mode='constant')
To calculate distance_x and distance_y you will need to find a point that serves you as a reference between the rotated_src and the destination, then just calculate the distance in the x and y axis.
Summary
Make some padding in src, and dst
Find the angular distance between them.
Rotate src with scipy.ndimage.rotate using reshape=True
Find the horizontal and vertical distance distance_x, distance_y between the rotated image and dst
Translate your 'rotated_src' with scipy.ndimage.shift
Code
from scipy.ndimage import rotate, shift
import matplotlib.pyplot as plt
import numpy as np
First we make the destination image:
# make and plot dest
dst = np.ones([40,20])
dst = np.pad(dst,10)
dst[17,[14,24]]=4
dst[27,14:25]=4
dst[26,[14,25]]=4
rotated_dst = rotate(dst, 20, order=1)
plt.imshow(dst) # plot it
plt.imshow(rotated_dst)
plt.show()
We make the Source image:
# make_src image and plot it
src = np.zeros([40,20])
src = np.pad(src,10)
src[0:20,0:20]=1
src[7,[4,14]]=4
src[17,4:15]=4
src[16,[4,15]]=4
plt.imshow(src)
plt.show()
Then we align the src to the destination:
rotated_src = rotate(src, 20, order=1) # find the angle 20, reshape true is by default
plt.imshow(rotated_src)
plt.show()
distance_y = 8 # find this distances from rotated_src and dst
distance_x = 12 # use any visual reference or even the corners
translated_src = shift(rotated_src, [distance_y,distance_x])
plt.imshow(translated_src)
plt.show()
pd: If you find problems to find the angle and the distances in a programmatic way, please leave a comment providing a bit more of insight of what can be used as a reference that could be for example the frame of the image or some image features / data)

Remove for loops for faster execution - vectorize

As a part of my academic project, I am working on a linear filter for an image. Below is the code, using only NumPy (no external libraries) and want to eliminate for loops by vectorizing or any other options. How can I achieve vectorization for faster execution? Thanks for the help.
Inputs -
Image.shape - (568, 768)
weightArray.shape - (3, 3)
def apply_filter(image: np.array, weight_array: np.array) -> np.array:
rows, cols = image.shape
height, width = weight_array.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for rrow in range(rows - height + 1):
for ccolumn in range(cols - width + 1):
for hheight in range(height):
for wwidth in range(width):
imgval = image[rrow + hheight, ccolumn + wwidth]
filterval = weight_array[hheight, wwidth]
output[rrow, ccolumn] += imgval * filterval
return output

Vectorization is the process of converting each explicit for loop into a 1-dimensional array operation.
In Python, this will involve reimagining your data in terms of slices.
In the code below, I've provided a working vectorization of the kernel loop. This shows how to approach vectorization, but since it is only optimizing the 3x3 array, it doesn't give you the biggest available gains.
If you want to see a larger improvement, you'll vectorize the image array, which I've templated for you as well -- but left some as an exercise.
import numpy as np
from PIL import Image
## no vectorization
def applyFilterMethod1(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for rrow in range(rows - height + 1):
for ccolumn in range(cols - width + 1):
for hheight in range(height):
for wwidth in range(width):
imgval = image[rrow + hheight, ccolumn + wwidth]
filterval = weightArray[hheight, wwidth]
output[rrow, ccolumn] += imgval * filterval
return output
## vectorize the kernel loop (~3x improvement)
def applyFilterMethod2(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for rrow in range(rows - height + 1):
for ccolumn in range(cols - width + 1):
imgval = image[rrow:rrow + height, ccolumn:ccolumn + width]
filterval = weightArray[:, :]
output[rrow, ccolumn] = sum(sum(imgval * filterval))
return output
## vectorize the image loop (~50x improvement)
def applyFilterMethod3(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for hheight in range(height):
for wwidth in range(width):
imgval = 0 ## TODO -- construct a compatible slice
filterval = weightArray[hheight, wwidth]
output[:, :] += imgval * filterval
return output
src = Image.open("input.png")
sb = np.asarray(src)
cb = np.array([[1,2,1],[2,4,2],[1,2,1]])
cb = cb/sum(sum(cb)) ## normalize
db = applyFilterMethod2(sb, cb)
dst = Image.fromarray(db)
dst.convert("L").save("output.png")
#src.show() ; dst.show()
Note: You could probably remove all four for loops, with some additional complexity. However, because this would only eliminate the overhead of 9 iterations (in this example), I don't estimate that it would yield any additional performance gains over applyFilterMethod3. Furthermore, although I haven't attempted it, the way I imagine it would be done might add more overhead than it would remove.
FYI: This is a standard image convolution (supporting only grayscale as implemented). I always like to point out that, in order to be mathematically correct, this would need to compensate for the gamma compression that is implicit in nearly every default image encoding -- but this little detail is often ignored.
Discussion
This type of vectorization is is often necessary in Python, specifically, because the standard Python interpreter is extremely inefficient at processing large for loops. Explicitly iterating over each pixel of an image, therefore, wastes a lot time. Ultimately, though, the vectorized implementation does not change the amount of real work performed, so we're only talking about eliminating an overhead aspect of the algorithm.
However, vectorization has a side-benefit: Parallelization. Lumping a large amount of data processing onto a single operator gives the language/library more flexibility in how to optimize the execution. This might include executing your embarrassingly parallel operation on a GPU -- if you have the right tools, for example the Tensorflow image module.
Python's seamless support for array programming is one reason that it has become highly popular for use in machine learning, which can be extremely compute intensive.
Solution
Here's the solution to imgval assignment, which was left as an exercise above.
imgval = image[hheight:hheight+rows - height+1, wwidth:wwidth+cols - width +1]

You can construct an array of sliced views of the image, each shifted by the indices of the weights array, and then multiply it by the weights and take the sum.
def apply_filter(image: np.array, weights: np.array) -> np.array:
height, width = weights.shape
indices = np.indices(weights.shape).T.reshape(weights.size, 2)
views = np.array([image[r:-height+r,c:-width+c] for r, c in indices])
return np.inner(views.T, weights.T.flatten()).T # sum product
(I had to transpose and reshape at several points to get the data into the desired shapes and order. There may be simpler ways.)
There is still a sneaky for loop in the form of a list comprehension over the weights indices, but we minimize the operations inside the for loop to creating a set of slice views. The loop could potentially be avoided using sliding_window_view, but it's not clear if that would improve performance; or stride_tricks.as_strided (see answers to this question).

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a

Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.

I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Is there Implementation of Hawkes Process in PyMC?

I want to use Hawkes process to model some data. I could not find whether PyMC supports Hawkes process. More specifically I want an observed variable with Hawkes Process and learn a posterior on its params.
If it is not there, then could I define it in PyMC in some way e.g. #deterministic etc.??

It's been quite a long time since your question, but I've worked it out on PyMC today so I'd thought I'd share the gist of my implementation for the other people who might get across the same problem. We're going to infer the parameters λ and α of a Hawkes process. I'm not going to cover the temporal scale parameter β, I'll leave that as an exercise for the readers.
First let's generate some data :
def hawkes_intensity(mu, alpha, points, t):
p = np.array(points)
p = p[p <= t]
p = np.exp(p - t)
return mu + alpha * np.sum(p)
def simulate_hawkes(mu, alpha, window):
t = 0
points = []
lambdas = []
while t < window:
m = hawkes_intensity(mu, alpha, points, t)
s = np.random.exponential(scale=1/m)
ratio = hawkes_intensity(mu, alpha, points, t + s)
t = t + s
if t < window:
points.append(t)
lambdas.append(ratio)
else:
break
points = np.sort(np.array(points, dtype=np.float32))
lambdas = np.array(lambdas, dtype=np.float32)
return points, lambdas
# parameters
window = 1000
mu = 8
alpha = 0.25
points, lambdas = simulate_hawkes(mu, alpha, window)
num_points = len(points)
We just generated some temporal points using some functions that I adapted from there : https://nbviewer.jupyter.org/github/MatthewDaws/PointProcesses/blob/master/Temporal%20points%20processes.ipynb
Now, the trick is to create a matrix of size (num_points, num_points) that contains the temporal distance of the ith point from all the other points. So the (i, j) point of the matrix is the temporal interval separating the ith point to the jth. This matrix will be used to compute the sum of the exponentials of the Hawkes process, ie. the self-exciting part. The way to create this matrix as well as the sum of the exponentials is a bit tricky. I'd recommend to check every line yourself so you can see what they do.
tile = np.tile(points, num_points).reshape(num_points, num_points)
tile = np.clip(points[:, None] - tile, 0, np.inf)
tile = np.tril(np.exp(-tile), k=-1)
Σ = np.sum(tile, axis=1)[:-1] # this is our self-exciting sum term
We have points and we have a matrix containg the sum of the excitations term.
The duration between two consecutive events of a Hawkes process follow an exponential distribution of parameter λ = λ0 + ∑ excitation. This is what we are going to model, but first we have to compute the duration between two consecutive points of our generated data.
interval = points[1:] - points[:-1]
We're now ready for inference:
with pm.Model() as model:
λ = pm.Exponential("λ", 1)
α = pm.Uniform("α", 0, 1)
lam = pm.Deterministic("lam", λ + α * Σ)
interarrival = pm.Exponential(
"interarrival", lam, observed=interval)
trace = pm.sample(2000, tune=4000)
pm.plot_posterior(trace, var_names=["λ", "α"])
plt.show()
print(np.mean(trace["λ"]))
print(np.mean(trace["α"]))
7.829
0.284
Note: the tile matrix can become quite large if you have many data points.

High performance variable blurring in very big images using Python

I have a large collection of large images (ex. 15000x15000 pixels) that I would like to blur. I need to blur the images using a distance function, so the further away I move from some areas in the image the more heavier the blurring should be. I have a distance map describing how far a given pixel is from the areas.
Due to the large amount of images I have to consider performance. I have looked at NumPY/SciPY, they have some great functions but they seem to use a fixed kernel size and I need to reduce or increase the kernel size depending on the distance to the previous mentioned areas.
How can I solve this problem in python?
UPDATE: My solution so far based on the answer by rth:
# cython: boundscheck=False
# cython: cdivision=True
# cython: wraparound=False
import numpy as np
cimport numpy as np
def variable_average(int [:, ::1] data, int[:,::1] kernel_size):
cdef int width, height, i, j, ii, jj
width = data.shape[1]
height = data.shape[0]
cdef double [:, ::1] data_blurred = np.empty([width, height])
cdef double res
cdef int sigma, weight
for i in range(width):
for j in range(height):
weight = 0
res = 0
sigma = kernel_size[i, j]
for ii in range(i - sigma, i + sigma + 1):
for jj in range(j - sigma, j + sigma + 1):
if ii < 0 or ii >= width or jj < 0 or jj >= height:
continue
res += data[ii, jj]
weight += 1
data_blurred[i, j] = res/weight
return data_blurred
Test:
data = np.random.randint(256, size=(1024,1024))
kernel = np.random.randint(256, size=(1024,1024)) + 1
result = np.asarray(variable_average(data, kernel))
The method using the above settings takes around 186seconds to run. Is that what I can expect to ultimately squeeze out of the method or are there optimizations that I can use to further increase the performance (still using Python)?

As you have noted related scipy functions do not support variable size blurring. You could implement this in pure python with for loops, then use Cython, Numba or PyPy to get a C-like performance.
Here is a low level python implementation, than uses numpy only for data storage,
import numpy as np
def variable_blur(data, kernel_size):
""" Blur with a variable window size
Parameters:
- data: 2D ndarray of floats or integers
- kernel_size: 2D ndarray of integers, same shape as data
Returns:
2D ndarray
"""
data_blurred = np.empty(data.shape)
Ni, Nj = data.shape
for i in range(Ni):
for j in range(Nj):
res = 0.0
weight = 0
sigma = kernel_size[i, j]
for ii in range(i - sigma, i+sigma+1):
for jj in range(j - sigma, j+sigma+1):
if ii<0 or ii>=Ni or jj < 0 or jj >= Nj:
continue
res += data[ii, jj]
weight += 1
data_blurred[i, j] = res/weight
return data_blurred
data = np.random.rand(50, 20)
kernel_size = 3*np.ones((50, 20), dtype=np.int)
variable_blur(data, kernel_size)
that calculates an arithmetic average of pixels with a variable kernel size. It is a bad implementation with respect to numpy, in a sense that is it not vectorized. However, this makes it convenient to port to other high performance solutions:
Cython: simply statically typing variables, and compiling should give you C-like performance,
def variable_blur(double [:, ::1] data, long [:, ::1] kernel_size):
cdef double [:, ::1] data_blurred = np.empty(data.shape)
cdef Py_ssize_t Ni, Nj
Ni = data.shape[0]
Nj = data.shape[1]
for i in range(Ni):
# [...] etc.
see this post for a complete example, as well as the compilation notes.
Numba: Wrapping the above function with the #jit decorator, should be mostly sufficient.
PyPy: installing PyPy + the experimental numpy branch, could be another alternative worth trying. Although, then you would have to use PyPy for all your code, which might not be possible at present.
Once you have a fast implementation, you can then use multiprocessing, etc. to process different images in parallel, if need be. Or even parallelize with OpenMP in Cython the outer for loop.

I came across this while googling and thought I would share my own solution which is mostly vectorized and doesn't include any for loops on pixels. You can approximate a Gaussian blur by running a box blur multiple times in a row. So the approach I decided to use is to iteratively box blur the image, but to vary the number of iterations per pixel using a weighting function.
If you need a large blur radius, the number of iterations grows quadratically, so consider increasing the ksize.
Here is the implementation
import cv2
def variable_blur(im, sigma, ksize=3):
"""Blur an image with a variable Gaussian kernel.
Parameters
----------
im: numpy array, (h, w)
sigma: numpy array, (h, w)
ksize: int
The box blur kernel size. Should be an odd number >= 3.
Returns
-------
im_blurred: numpy array, (h, w)
"""
variance = box_blur_variance(ksize)
# Number of times to blur per-pixel
num_box_blurs = 2 * sigma**2 / variance
# Number of rounds of blurring
max_blurs = int(np.ceil(np.max(num_box_blurs))) * 3
# Approximate blurring a variable number of times
blur_weight = num_box_blurs / max_blurs
current_im = im
for i in range(max_blurs):
next_im = cv2.blur(current_im, (ksize, ksize))
current_im = next_im * blur_weight + current_im * (1 - blur_weight)
return current_im
def box_blur_variance(ksize):
x = np.arange(ksize) - ksize // 2
x, y = np.meshgrid(x, x)
return np.mean(x**2 + y**2)
And here is an example
im = np.random.rand(300, 300)
sigma = 3
# Variable
x = np.linspace(0, 1, im.shape[1])
y = np.linspace(0, 1, im.shape[0])
x, y = np.meshgrid(x, y)
sigma_arr = sigma * (x + y)
im_variable = variable_blur(im, sigma_arr)
# Gaussian
ksize = sigma * 8 + 1
im_gauss = cv2.GaussianBlur(im, (ksize, ksize), sigma)
# Gaussian replica
sigma_arr = np.full_like(im, sigma)
im_approx = variable_blur(im, sigma_arr)
Blurring results
The plot is:
Top left: Source image
Top right: Variable blurring
Bottom left: Gaussian blurring
Bottom right: Approximated Gaussian blurring

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up this interpolation in python - python

Related

Padding scipy affine_transform output to show non-overlapping regions of transformed images

Remove for loops for faster execution - vectorize

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

Is there Implementation of Hawkes Process in PyMC?

High performance variable blurring in very big images using Python

Categories

Resources