Faster iteration on for loop with 2d arrays - python

I have a problem with optimization to compute errors for disparity map estimation.
To compute errors I create a class with called methods for each error. I need to iterate for every pixel to get an error.
This arrays are big cause I'm iterating in size of 1937 x 1217 images. Do you know how to optimize it?
Here is code of my method:
EDIT:
def mreError(self):
s_gt = self.ref_disp_norm
s_all = self.disp_bin
s_r = self.disp_norm
s_gt = s_gt.astype(np.float32)
s_r = s_r.astype(np.float32)
n, m = s_gt.shape
all_arr = []
for i in range(0, n):
for j in range(0, m):
if s_all[i,j] == 255:
if s_gt[i,j] == 0:
sub_mre = 0
else:
sub_mre = np.abs(s_gt[i,j] - s_r[i,j]) / s_gt[i,j]
all_arr.append(sub_mre)
mre_all = np.mean(all_arr)
return mre_all

A straight up vectorisation of your method would be
def method_1(self):
# get s_gt, s_all, s_r
sub_mre = np.zeros((s_gt.shape), dtype=np.float32)
idx = s_gt != 0
sub_mre[idx] = np.abs((s_gt[idx] - s_r[idx]) / s_gt[idx])
return np.mean(sub_mre[s_all == 255])
But since you're doing your averaging only for pixels where s_all is 255, you could also filter for those first and then do the rest
def method_2(self):
idx = s_all == 255
s_gt = s_gt[idx].astype(np.float32)
s_r = s_r[idx].astype(np.float32)
sub_mre = np.zeros_like(s_gt)
idx = s_gt != 0
sub_mre[idx] = np.abs((s_gt[idx] - s_r[idx]) / s_gt[idx])
return np.mean(sub_mre)
Personally, I would favour the first method unless the second one results in a much faster result. Calling the function only once and spending, for example, 40 ms vs 5 ms is not noticeable and the readability of the function matters more.

You could simply use array operators instead applying them to every element inside a for loop:
import numpy as np
# Creating 2000x2000 Test-Data
s_gt = np.random.randint(0,2,(2000,2000)).astype(np.float32)
s_r = np.random.randint(0,2,(2000,2000)).astype(np.float32)
s_all = np.random.randint(0,256,(2000,2000)).astype(np.float32)
def calc(s_gt, s_r, s_all):
n, m = s_gt.shape
all_arr = []
for i in range(0, n):
for j in range(0, m):
if s_gt[i,j] == 0:
sub_mre = 0
else:
sub_mre = np.abs(s_gt[i,j] - s_r[i,j]) / s_gt[i,j]
if s_all[i,j] == 255:
all_arr.append(sub_mre)
mre_all = np.mean(all_arr)
return mre_all
def calc_optimized(s_gt, s_r, s_all):
sub_mre = np.abs((s_gt-s_r)/s_gt)
sub_mre[s_gt==0] = 0
return np.mean(sub_mre[s_all == 255])
When I test the speed of the two different approaches:
%time calc(s_gt, s_r, s_all)
Wall time: 27.6 s
Out[53]: 0.24686379928315413
%time calc_optimized(s_gt, s_r, s_all)
Wall time: 63.3 ms
__main__:34: RuntimeWarning: divide by zero encountered in true_divide
__main__:34: RuntimeWarning: invalid value encountered in true_divide
Out[54]: 0.2468638

You can just make an image grey (this will speed up calculations substantially) Go check this link how you can do it.

Related

Array won't update with new values in while-loop

I am having a problem updating my x0x1 array with the new values in my while loop. When I print my array out to check each generation iteration, its printing out the same values each time (despite me getting different z vector values). Not sure why.
I am getting the z-values for each iteration in my for loop, but for some reason, my newly generated x0x1 arrays each generation of the while loop are not being updated.
import numpy as np
import numpy.random
def calculateFunctionValueMatrix(x0x1, N):
functionValueArray = np.zeros((N-1, 1))
for i in range(0, N-1):
functionValueArray[i] = calculateFunctionValue(x0x1[i])
return functionValueArray
def calculateFunctionValue(x):
function = (x[0]-1)**2 + 5*((x[1]-x[0]**2)**2)
return function
def x0x1Array(N):
x0x1 = np.random.uniform(-2, 2, size = (N-1,2))
generateCandidateVector(x0x1)
def generateCandidateVector(x0x1):
print(x0x1)
K = 10
F= 0.8
N = 10
generation = 1
while generation <= K:
for i in range(0, N-1):
x0_ind, x1_ind, x2_ind = np.random.choice(len(x0x1), 3)
x0 = x0x1[x0_ind]
x1 = x0x1[x1_ind]
x2 = x0x1[x2_ind]
vectorZ = x0 + F*(x1-x2)
print("this is vector", vectorZ)
if(calculateFunctionValue(vectorZ) < calculateFunctionValue(x0x1[i])):
vectorZ = x0x1[i]
elif(calculateFunctionValue(vectorZ) > calculateFunctionValue(x0x1[i])):
x0x1[i] = x0x1[i]
print(x0x1)
if(np.std(calculateFunctionValueMatrix(x0x1, N)) < 0.01):
print("Optimal Solution Found")
generation = generation + 1
def main():
N=50
x0x1Array(N)
main()

Vectorize else-if statement function using numpy

I have an array of 3 dimensional vectors vec and I want to find a perpendicular vector res_vec to each of those vectors respectively.
Using other methods I got some numerically unstable behaviour so I just check for the smallest component of that vector and set it to zero, while exchanging the two components that are left and negating one of them. However, this is not the main concern, it seems to work just right but it is slow.
So my question is, if my code/functionality can be rewritten so I can eliminate the for-loop and vectorize it using some clever numpy-tricks.
So far I failed at all attempts doing so.
This is the code:
for i in range(10000):
index_min = np.argsort(np.abs(vec[i]))
if index_min[0] == 0: # x smallest magnitude
res_vec = np.array([0, -vec[i][2], vec[i][1]])
elif index_min[0] == 1: # y smallest magnitude
res_vec = np.array([vec[i][2], 0, -vec[i][0]])
elif index_min[0] == 2: # z smallest magnitude
res_vec = np.array([-vec[i][1], vec[i][0], 0])
The array vec contains data of the form (3D row-vectors):
print(vec) -->
[[ 0.57743925 0.57737595 -0.5772355 ]
[ 0.5776141 0.5777615 -0.57667464]
[ 0.5772779 0.5785899 -0.57618046]
...
[ 0.5764752 0.5781902 -0.5773842 ]
[ 0.5764985 0.578053 -0.57749826]
[ 0.5764546 0.5784942 -0.57710016]]
print(vec.ndim) -->
2
print(vec.shape) -->
(32000, 3)
As your question is about vectorizing your code, you can look at the code below that compares your for loop version (Timer 1, see code below) with Feri's vectorized version (Timer 2) and the performance is improved significantly. I also found that using boolean indexing (Timer 3) can speed-up your code even more but the code is a little less aesthetic:
import numpy as np
import time
# Preparation of testdata
R = 32000
vec = 2 * np.random.rand(R,3) - 1
# For loop verion
t_start = time.time()
res_vec = np.zeros(vec.shape)
for i in range(R):
index_min = np.argsort(np.abs(vec[i]))
if index_min[0] == 0: # x smallest magnitude
res_vec[i,:] = np.array([0, -vec[i][2], vec[i][1]])
elif index_min[0] == 1: # y smallest magnitude
res_vec[i,:] = np.array([vec[i][2], 0, -vec[i][0]])
elif index_min[0] == 2: # z smallest magnitude
res_vec[i,:] = np.array([-vec[i][1], vec[i][0], 0])
print(f'Timer 1: {time.time()-t_start}s')
# Feri's formula
t_start = time.time()
res_vec2 = np.zeros(vec.shape)
index_min = np.argmin(np.abs(vec), axis=1)
res_vec2[range(R),(index_min+1)%3] = -vec[range(R),(index_min+2)%3]
res_vec2[range(R),(index_min+2)%3] = vec[range(R),(index_min+1)%3]
print(f'Timer 2: {time.time()-t_start}s')
# Boolean indexing
t_start = time.time()
res_vec3 = np.zeros(vec.shape)
index_min = np.argmin(np.abs(vec), axis=1)
res_vec3[index_min == 0,1] = -vec[index_min == 0,2]
res_vec3[index_min == 0,2] = vec[index_min == 0,1]
res_vec3[index_min == 1,0] = vec[index_min == 1,2]
res_vec3[index_min == 1,2] = -vec[index_min == 1,0]
res_vec3[index_min == 2,0] = -vec[index_min == 2,1]
res_vec3[index_min == 2,1] = vec[index_min == 2,0]
print(f'Timer 3: {time.time()-t_start}s')
print('Results 1&2 are equal' if np.linalg.norm(res_vec-res_vec2)==0 else 'Results 1&2 differ')
print('Results 1&3 are equal' if np.linalg.norm(res_vec-res_vec3)==0 else 'Results 1&3 differ')
Output:
% python3 script.py
Timer 1: 0.24681901931762695s
Timer 2: 0.020949125289916992s
Timer 3: 0.0034308433532714844s
Results 1&2 are equal
Results 1&3 are equal
index_min = np.argmin(np.abs(vec), axis=1)
vec_c = vec.copy()
vec[range(len(vec)), index_min] = 0.
vec[range(len(vec)), (index_min + 1) % 3] = -vec_c[range(len(vec)), (index_min + 2) % 3]
vec[range(len(vec)), (index_min + 2) % 3] = vec_c[range(len(vec)), (index_min + 1) % 3]
Sorting each entire 3D array is unnecessary when you only care about the index of the smallest one. Do this:
for i in range(10000):
index_min = np.argmin(np.abs(vec[i]))
if index_min == 0: # x smallest magnitude
res_vec = np.array([0, -vec[i][2], vec[i][1]])
elif index_min == 1: # y smallest magnitude
res_vec = np.array([vec[i][2], 0, -vec[i][0]])
else:
res_vec = np.array([-vec[i][1], vec[i][0], 0])
You could improve this further by using Numba to JIT compile the loop. That would also let you avoid creating the unnecessary temporary array from np.abs() because you could write a custom argmin() that uses the absolute value of each element as it goes.
You can also avoid temporaries produced by - if you do this:
for i in range(10000):
index_min = np.argmin(np.abs(vec[i]))
res_vec = np.empty_like(vec[i])
if index_min == 0: # x smallest magnitude
res_vec[0] = 0
np.negative(vec[i][2], out=res_vec[1])
res_vec[2] = vec[i][1]
# etc
The idea being that np.negative will write the negated values directly into res_vec whereas - on its own will always produce a new allocated array that you don't need.
Although you say it's not the main issue, I thought I'd add this in case it is of interest.
A method I've found to have good stability to find a (unit) vector orthogonal to a given (non-zero) vector is to use Householder reflectors. These are orthogonal and symmetric (hence their own inverses) matrices defined by a non-zero vector h as
Q = I - 2*h*h'/(h'*h)
Given a non-zero vector v there is an algorithm to compute (the h defining) a Householder reflector Q that maps v to a multiple of (1,0,0)'. It follows that Q*(0,1,0)' is orthogonal to v.
In case this sounds expensive here is C code (sorry, I don't speak python) that given v, fills u with a vector orthogonal to v
static void ovec( const double* v, double* restrict u)
{
double lv = sqrt( v[0]*v[0] + v[1]*v[1] + v[2]*v[2]); // length of v
double s = copysign ( lv, v[0]); // s has abs value lv, sign of v[0]
double h = v[0] + s; // first component of householder vector for Q
// other components are v[1] and v[2]
double a = -1.0/(s*h); // householder scale
// apply reflector to (0,1,0)'
double b = a*v[1];
u[0] = b*h; u[1] = 1.0 + b*v[1]; u[2] = b*v[2];
}
A couple of things I like about this are that the same method can be used in higher dimensions, and that it is easy to extend it to make an orthogonal basis, where one vector is parallel to v, and the others are mutually orthogonal and orthogonal to v.

How can I determine whether or not a point is enclosed in a n-gon in matplotlib? [duplicate]

I found two main methods to look if a point belongs inside a polygon. One is using the ray tracing method used here, which is the most recommended answer, the other is using matplotlib path.contains_points (which seems a bit obscure to me). I will have to check lots of points continuously. Does anybody know if any of these two is more recommendable than the other or if there are even better third options?
UPDATE:
I checked the two methods and matplotlib looks much faster.
from time import time
import numpy as np
import matplotlib.path as mpltPath
# regular polygon for testing
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)[:-1]]
# random points set of points to test
N = 10000
points = np.random.rand(N,2)
# Ray tracing
def ray_tracing_method(x,y,poly):
n = len(poly)
inside = False
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y
return inside
start_time = time()
inside1 = [ray_tracing_method(point[0], point[1], polygon) for point in points]
print("Ray Tracing Elapsed time: " + str(time()-start_time))
# Matplotlib mplPath
start_time = time()
path = mpltPath.Path(polygon)
inside2 = path.contains_points(points)
print("Matplotlib contains_points Elapsed time: " + str(time()-start_time))
which gives,
Ray Tracing Elapsed time: 0.441395998001
Matplotlib contains_points Elapsed time: 0.00994491577148
Same relative difference was obtained one using a triangle instead of the 100 sides polygon. I will also check shapely since it looks a package just devoted to these kind of problems
You can consider shapely:
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
point = Point(0.5, 0.5)
polygon = Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])
print(polygon.contains(point))
From the methods you've mentioned I've only used the second, path.contains_points, and it works fine. In any case depending on the precision you need for your test I would suggest creating a numpy bool grid with all nodes inside the polygon to be True (False if not). If you are going to make a test for a lot of points this might be faster (although notice this relies you are making a test within a "pixel" tolerance):
from matplotlib import path
import matplotlib.pyplot as plt
import numpy as np
first = -3
size = (3-first)/100
xv,yv = np.meshgrid(np.linspace(-3,3,100),np.linspace(-3,3,100))
p = path.Path([(0,0), (0, 1), (1, 1), (1, 0)]) # square with legs length 1 and bottom left corner at the origin
flags = p.contains_points(np.hstack((xv.flatten()[:,np.newaxis],yv.flatten()[:,np.newaxis])))
grid = np.zeros((101,101),dtype='bool')
grid[((xv.flatten()-first)/size).astype('int'),((yv.flatten()-first)/size).astype('int')] = flags
xi,yi = np.random.randint(-300,300,100)/100,np.random.randint(-300,300,100)/100
vflag = grid[((xi-first)/size).astype('int'),((yi-first)/size).astype('int')]
plt.imshow(grid.T,origin='lower',interpolation='nearest',cmap='binary')
plt.scatter(((xi-first)/size).astype('int'),((yi-first)/size).astype('int'),c=vflag,cmap='Greens',s=90)
plt.show()
, the results is this:
If speed is what you need and extra dependencies are not a problem, you maybe find numba quite useful (now it is pretty easy to install, on any platform). The classic ray_tracing approach you proposed can be easily ported to numba by using numba #jit decorator and casting the polygon to a numpy array. The code should look like:
#jit(nopython=True)
def ray_tracing(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y
return inside
The first execution will take a little longer than any subsequent call:
%%time
polygon=np.array(polygon)
inside1 = [numba_ray_tracing_method(point[0], point[1], polygon) for
point in points]
CPU times: user 129 ms, sys: 4.08 ms, total: 133 ms
Wall time: 132 ms
Which, after compilation will decrease to:
CPU times: user 18.7 ms, sys: 320 µs, total: 19.1 ms
Wall time: 18.4 ms
If you need speed at the first call of the function you can then pre-compile the code in a module using pycc. Store the function in a src.py like:
from numba import jit
from numba.pycc import CC
cc = CC('nbspatial')
#cc.export('ray_tracing', 'b1(f8, f8, f8[:,:])')
#jit(nopython=True)
def ray_tracing(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y
return inside
if __name__ == "__main__":
cc.compile()
Build it with python src.py and run:
import nbspatial
import numpy as np
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in
np.linspace(0,2*np.pi,lenpoly)[:-1]]
# random points set of points to test
N = 10000
# making a list instead of a generator to help debug
points = zip(np.random.random(N),np.random.random(N))
polygon = np.array(polygon)
%%time
result = [nbspatial.ray_tracing(point[0], point[1], polygon) for point in points]
CPU times: user 20.7 ms, sys: 64 µs, total: 20.8 ms
Wall time: 19.9 ms
In the numba code I used:
'b1(f8, f8, f8[:,:])'
In order to compile with nopython=True, each var needs to be declared before the for loop.
In the prebuild src code the line:
#cc.export('ray_tracing' , 'b1(f8, f8, f8[:,:])')
Is used to declare the function name and its I/O var types, a boolean output b1 and two floats f8 and a two-dimensional array of floats f8[:,:] as input.
Edit Jan/4/2021
For my use case, I need to check if multiple points are inside a single polygon - In such a context, it is useful to take advantage of numba parallel capabilities to loop over a series of points. The example above can be changed to:
from numba import jit, njit
import numba
import numpy as np
#jit(nopython=True)
def pointinpolygon(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in numba.prange(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y
return inside
#njit(parallel=True)
def parallelpointinpolygon(points, polygon):
D = np.empty(len(points), dtype=numba.boolean)
for i in numba.prange(0, len(D)):
D[i] = pointinpolygon(points[i,0], points[i,1], polygon)
return D
Note: pre-compiling the above code will not enable the parallel capabilities of numba (parallel CPU target is not supported by pycc/AOT compilation) see: https://github.com/numba/numba/issues/3336
Test:
import numpy as np
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)[:-1]]
polygon = np.array(polygon)
N = 10000
points = np.random.uniform(-1.5, 1.5, size=(N, 2))
For N=10000 on a 72 core machine, returns:
%%timeit
parallelpointinpolygon(points, polygon)
# 480 µs ± 8.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Edit 17 Feb '21:
fixing loop to start from 0 instead of 1 (thanks #mehdi):
for i in numba.prange(0, len(D))
Edit 20 Feb '21:
Follow-up on the comparison made by #mehdi, I am adding a GPU-based method below. It uses the point_in_polygon method, from the cuspatial library:
import numpy as np
import cudf
import cuspatial
N = 100000002
lenpoly = 1000
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in
np.linspace(0,2*np.pi,lenpoly)]
polygon = np.array(polygon)
points = np.random.uniform(-1.5, 1.5, size=(N, 2))
x_pnt = points[:,0]
y_pnt = points[:,1]
x_poly =polygon[:,0]
y_poly = polygon[:,1]
result = cuspatial.point_in_polygon(
x_pnt,
y_pnt,
cudf.Series([0], index=['geom']),
cudf.Series([0], name='r_pos', dtype='int32'),
x_poly,
y_poly,
)
Following #Mehdi comparison. For N=100000002 and lenpoly=1000 - I got the following results:
time_parallelpointinpolygon: 161.54760098457336
time_mpltPath: 307.1664695739746
time_ray_tracing_numpy_numba: 353.07356882095337
time_is_inside_sm_parallel: 37.45389246940613
time_is_inside_postgis_parallel: 127.13793849945068
time_is_inside_rapids: 4.246025562286377
hardware specs:
CPU Intel xeon E1240
GPU Nvidia GTX 1070
Notes:
The cuspatial.point_in_poligon method, is quite robust and powerful, it offers the ability to work with multiple and complex polygons (I guess at the expense of performance)
The numba methods can also be 'ported' on the GPU - it will be interesting to see a comparison which includes a porting to cuda of fastest method mentioned by #Mehdi (is_inside_sm).
Your test is good, but it measures only some specific situation:
we have one polygon with many vertices, and long array of points to check them within polygon.
Moreover, I suppose that you're measuring not
matplotlib-inside-polygon-method vs ray-method,
but
matplotlib-somehow-optimized-iteration vs simple-list-iteration
Let's make N independent comparisons (N pairs of point and polygon)?
# ... your code...
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)[:-1]]
M = 10000
start_time = time()
# Ray tracing
for i in range(M):
x,y = np.random.random(), np.random.random()
inside1 = ray_tracing_method(x,y, polygon)
print "Ray Tracing Elapsed time: " + str(time()-start_time)
# Matplotlib mplPath
start_time = time()
for i in range(M):
x,y = np.random.random(), np.random.random()
inside2 = path.contains_points([[x,y]])
print "Matplotlib contains_points Elapsed time: " + str(time()-start_time)
Result:
Ray Tracing Elapsed time: 0.548588991165
Matplotlib contains_points Elapsed time: 0.103765010834
Matplotlib is still much better, but not 100 times better.
Now let's try much simpler polygon...
lenpoly = 5
# ... same code
result:
Ray Tracing Elapsed time: 0.0727779865265
Matplotlib contains_points Elapsed time: 0.105288982391
Comparison of different methods
I found other methods to check if a point is inside a polygon (here). I tested two of them only (is_inside_sm and is_inside_postgis) and the results were the same as the other methods.
Thanks to #epifanio, I parallelized the codes and compared them with #epifanio and #user3274748 (ray_tracing_numpy) methods. Note that both methods had a bug so I fixed them as shown in their codes below.
One more thing that I found is that the code provided for creating a polygon does not generate a closed path np.linspace(0,2*np.pi,lenpoly)[:-1]. As a result, the codes provided in above GitHub repository may not work properly. So It's better to create a closed path (first and last points should be the same).
Codes
Method 1: parallelpointinpolygon
from numba import jit, njit
import numba
import numpy as np
#jit(nopython=True)
def pointinpolygon(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in numba.prange(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y
return inside
#njit(parallel=True)
def parallelpointinpolygon(points, polygon):
D = np.empty(len(points), dtype=numba.boolean)
for i in numba.prange(0, len(D)): #<-- Fixed here, must start from zero
D[i] = pointinpolygon(points[i,0], points[i,1], polygon)
return D
Method 2: ray_tracing_numpy_numba
#jit(nopython=True)
def ray_tracing_numpy_numba(points,poly):
x,y = points[:,0], points[:,1]
n = len(poly)
inside = np.zeros(len(x),np.bool_)
p2x = 0.0
p2y = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
idx = np.nonzero((y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= max(p1x,p2x)))[0]
if len(idx): # <-- Fixed here. If idx is null skip comparisons below.
if p1y != p2y:
xints = (y[idx]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x:
inside[idx] = ~inside[idx]
else:
idxx = idx[x[idx] <= xints]
inside[idxx] = ~inside[idxx]
p1x,p1y = p2x,p2y
return inside
Method 3: Matplotlib contains_points
path = mpltPath.Path(polygon,closed=True) # <-- Very important to mention that the path
# is closed (default is false)
Method 4: is_inside_sm (got it from here)
#jit(nopython=True)
def is_inside_sm(polygon, point):
length = len(polygon)-1
dy2 = point[1] - polygon[0][1]
intersections = 0
ii = 0
jj = 1
while ii<length:
dy = dy2
dy2 = point[1] - polygon[jj][1]
# consider only lines which are not completely above/bellow/right from the point
if dy*dy2 <= 0.0 and (point[0] >= polygon[ii][0] or point[0] >= polygon[jj][0]):
# non-horizontal line
if dy<0 or dy2<0:
F = dy*(polygon[jj][0] - polygon[ii][0])/(dy-dy2) + polygon[ii][0]
if point[0] > F: # if line is left from the point - the ray moving towards left, will intersect it
intersections += 1
elif point[0] == F: # point on line
return 2
# point on upper peak (dy2=dx2=0) or horizontal line (dy=dy2=0 and dx*dx2<=0)
elif dy2==0 and (point[0]==polygon[jj][0] or (dy==0 and (point[0]-polygon[ii][0])*(point[0]-polygon[jj][0])<=0)):
return 2
ii = jj
jj += 1
#print 'intersections =', intersections
return intersections & 1
#njit(parallel=True)
def is_inside_sm_parallel(points, polygon):
ln = len(points)
D = np.empty(ln, dtype=numba.boolean)
for i in numba.prange(ln):
D[i] = is_inside_sm(polygon,points[i])
return D
Method 5: is_inside_postgis (got it from here)
#jit(nopython=True)
def is_inside_postgis(polygon, point):
length = len(polygon)
intersections = 0
dx2 = point[0] - polygon[0][0]
dy2 = point[1] - polygon[0][1]
ii = 0
jj = 1
while jj<length:
dx = dx2
dy = dy2
dx2 = point[0] - polygon[jj][0]
dy2 = point[1] - polygon[jj][1]
F =(dx-dx2)*dy - dx*(dy-dy2);
if 0.0==F and dx*dx2<=0 and dy*dy2<=0:
return 2;
if (dy>=0 and dy2<0) or (dy2>=0 and dy<0):
if F > 0:
intersections += 1
elif F < 0:
intersections -= 1
ii = jj
jj += 1
#print 'intersections =', intersections
return intersections != 0
#njit(parallel=True)
def is_inside_postgis_parallel(points, polygon):
ln = len(points)
D = np.empty(ln, dtype=numba.boolean)
for i in numba.prange(ln):
D[i] = is_inside_postgis(polygon,points[i])
return D
Benchmark
Timing for 10 million points:
parallelpointinpolygon Elapsed time: 4.0122294425964355
Matplotlib contains_points Elapsed time: 14.117807388305664
ray_tracing_numpy_numba Elapsed time: 7.908452272415161
sm_parallel Elapsed time: 0.7710440158843994
is_inside_postgis_parallel Elapsed time: 2.131121873855591
Here is the code.
import matplotlib.pyplot as plt
import matplotlib.path as mpltPath
from time import time
import numpy as np
np.random.seed(2)
time_parallelpointinpolygon=[]
time_mpltPath=[]
time_ray_tracing_numpy_numba=[]
time_is_inside_sm_parallel=[]
time_is_inside_postgis_parallel=[]
n_points=[]
for i in range(1, 10000002, 1000000):
n_points.append(i)
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)]
polygon = np.array(polygon)
N = i
points = np.random.uniform(-1.5, 1.5, size=(N, 2))
#Method 1
start_time = time()
inside1=parallelpointinpolygon(points, polygon)
time_parallelpointinpolygon.append(time()-start_time)
# Method 2
start_time = time()
path = mpltPath.Path(polygon,closed=True)
inside2 = path.contains_points(points)
time_mpltPath.append(time()-start_time)
# Method 3
start_time = time()
inside3=ray_tracing_numpy_numba(points,polygon)
time_ray_tracing_numpy_numba.append(time()-start_time)
# Method 4
start_time = time()
inside4=is_inside_sm_parallel(points,polygon)
time_is_inside_sm_parallel.append(time()-start_time)
# Method 5
start_time = time()
inside5=is_inside_postgis_parallel(points,polygon)
time_is_inside_postgis_parallel.append(time()-start_time)
plt.plot(n_points,time_parallelpointinpolygon,label='parallelpointinpolygon')
plt.plot(n_points,time_mpltPath,label='mpltPath')
plt.plot(n_points,time_ray_tracing_numpy_numba,label='ray_tracing_numpy_numba')
plt.plot(n_points,time_is_inside_sm_parallel,label='is_inside_sm_parallel')
plt.plot(n_points,time_is_inside_postgis_parallel,label='is_inside_postgis_parallel')
plt.xlabel("N points")
plt.ylabel("time (sec)")
plt.legend(loc = 'best')
plt.show()
CONCLUSION
The fastest algorithms are:
1- is_inside_sm_parallel
2- is_inside_postgis_parallel
3- parallelpointinpolygon (#epifanio)
I will just leave it here, just rewrote the code above using numpy, maybe somebody finds it useful:
def ray_tracing_numpy(x,y,poly):
n = len(poly)
inside = np.zeros(len(x),np.bool_)
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
idx = np.nonzero((y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= max(p1x,p2x)))[0]
if p1y != p2y:
xints = (y[idx]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x:
inside[idx] = ~inside[idx]
else:
idxx = idx[x[idx] <= xints]
inside[idxx] = ~inside[idxx]
p1x,p1y = p2x,p2y
return inside
Wrapped ray_tracing into
def ray_tracing_mult(x,y,poly):
return [ray_tracing(xi, yi, poly[:-1,:]) for xi,yi in zip(x,y)]
Tested on 100000 points, results:
ray_tracing_mult 0:00:00.850656
ray_tracing_numpy 0:00:00.003769
pure numpy vectorized implementation of the Even-odd rule
The other answers are either a slow python loop or requires external dependancies or cython treatment.
import numpy as np
def points_in_polygon(polygon, pts):
pts = np.asarray(pts,dtype='float32')
polygon = np.asarray(polygon,dtype='float32')
contour2 = np.vstack((polygon[1:], polygon[:1]))
test_diff = contour2-polygon
mask1 = (pts[:,None] == polygon).all(-1).any(-1)
m1 = (polygon[:,1] > pts[:,None,1]) != (contour2[:,1] > pts[:,None,1])
slope = ((pts[:,None,0]-polygon[:,0])*test_diff[:,1])-(test_diff[:,0]*(pts[:,None,1]-polygon[:,1]))
m2 = slope == 0
mask2 = (m1 & m2).any(-1)
m3 = (slope < 0) != (contour2[:,1] < polygon[:,1])
m4 = m1 & m3
count = np.count_nonzero(m4,axis=-1)
mask3 = ~(count%2==0)
mask = mask1 | mask2 | mask3
return mask
N = 1000000
lenpoly = 1000
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)]
polygon = np.array(polygon,dtype='float32')
points = np.random.uniform(-1.5, 1.5, size=(N, 2)).astype('float32')
mask = points_in_polygon(polygon, points)
1 mil points with polygon of size 1000 took 44s.
Its orders of magnitude slower than the other implementations but still faster than the python loop and only uses numpy.
inpoly is the gold standard for doing in polygon checks in python, and can handle huge queries:
https://github.com/dengwirda/inpoly-python
simple usage:
from inpoly import inpoly2
import numpy as np
xmin, xmax, ymin, ymax = 0, 1, 0, 1
x0, y0, x1, y1 = 0.5, 0.5, 0, 1
#define any n-sided polygon
p = np.array([[xmin, ymin],
[xmax, ymin],
[xmax, ymax],
[xmin, ymax],
[xmin, ymin]])
#define some coords
coords = np.array([[x0, y0],
[x1, y1]])
#get boolean mask for points if in or on polygon perimeter
isin, ison = inpoly2(coords, p)
the C implementation in the backend is lightning fast

Vectorizing for loop with repeated indices in python

I am trying to optimize a snippet that gets called a lot (millions of times) so any type of speed improvement (hopefully removing the for-loop) would be great.
I am computing a correlation function of some j'th particle with all others
C_j(|r-r'|) = sqrt(E((s_j(r')-s_k(r))^2)) averaged over k.
My idea is to have a variable corrfun which bins data into some bins (the r, defined elsewhere). I find what bin of r each s_k belongs to and this is stored in ind. So ind[0] is the index of r (and thus the corrfun) for which the j=0 point corresponds to. Multiple points can fall into the same bin (in fact I want bins to be big enough to contain multiple points) so I sum together all of the (s_j(r')-s_k(r))^2 and then divide by number of points in that bin (stored in variable rw). The code I ended up making for this is the following (np is for numpy):
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
Note, the rw2 business was because I want to avoid divide by 0 problems but I do return the rw array and I want to be able to differentiate between the rw=0 and rw=1 elements. Perhaps there is a more elegant solution for this as well.
Is there a way to make the for-loop faster? While I would like to not add the self interaction (j==k) I am even ok with having self interaction if it means I can get significantly faster calculation (length of ind ~ 1E6 so self interaction is probably insignificant anyways).
Thank you!
Ilya
Edit:
Here is the full code. Note, in the full code I am averaging over j as well.
import numpy as np
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
print(r)
corrfun = r*0
rw = r*0
print(maxR)
''' go through all points'''
for j in range(0, n-1):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
return r, corrfun, rw
I debug test it the following way
from twopointcorr import twopointcorr
import numpy as np
import matplotlib.pyplot as plt
import time
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
print('running two point corr functinon')
start_time = time.time()
r,corrfun,rw = twopointcorr(x,y,s,0.1)
print("--- Execution time is %s seconds ---" % (time.time() - start_time))
fig1=plt.figure()
plt.plot(r, corrfun,'-x')
fig2=plt.figure()
plt.plot(r, rw,'-x')
plt.show()
Again, the main issue is that in the real dataset n~1E6. I can resample to make it smaller, of course, but I would love to actually crank through the dataset.
Here is the code that use broadcast, hypot, round, bincount to remove all the loops:
def twopointcorr2(x, y, s, dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
osub = lambda x:np.subtract.outer(x, x)
ind = np.clip(np.round(np.hypot(osub(x), osub(y)) / dr), 0, len(r)-1).astype(int)
rw = np.bincount(ind.ravel())
rw[0] -= len(x)
corrfun = np.bincount(ind.ravel(), (osub(s)**2).ravel())
return r, corrfun, rw
to compare, I modified your code as follows:
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
return r, corrfun, rw
and here is the code to check the results:
import numpy as np
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
r1, corrfun1, rw1 = twopointcorr(x,y,s,0.1)
r2, corrfun2, rw2 = twopointcorr2(x,y,s,0.1)
assert np.allclose(r1, r2)
assert np.allclose(corrfun1, corrfun2)
assert np.allclose(rw1, rw2)
and the %timeit results:
%timeit twopointcorr(x,y,s,0.1)
%timeit twopointcorr2(x,y,s,0.1)
outputs:
1 loop, best of 3: 5.16 s per loop
10 loops, best of 3: 134 ms per loop
Your original code on my system runs in about 5.7 seconds. I fully vectorized the inner loop and got it to run in 0.39 seconds. Simply replace your "go through all points" loop with this:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
# go through all points
for j in range(n): # n.b. previously n-1, not sure why
ind = inds[j]
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[ind[j]] -= 1 # subtract self
The first observation was that your hypot code was computing 2D distances, so I replaced that with cdist from SciPy to do it all in a single call. The second was that the inner for loop was slow, and thanks to an insightful comment from #hpaulj I vectorized that as well using np.add.at().
Since you asked how to vectorize the inner loop as well, I did that later. It now takes 0.25 seconds to run, for a total speedup of over 20x. Here's the final code:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
sn = np.tile(s, (n,1)) # n copies of s
diffs = (sn - sn.T)**2 # squares of pairwise differences
np.add.at(corrfun, inds, diffs)
rw = np.bincount(inds.flatten(), minlength=len(r))
np.subtract.at(rw, inds.diagonal(), 1) # subtract self
This uses more memory but does produce a substantial speedup vs. the single-loop version above.
Ok, so as it turns out outer products are incredibly memory expensive, however, using answers from #HYRY and #JohnZwinck i was able to make code that is still roughly linear in n in memory and computes fast (0.5 seconds for the test case)
import numpy as np
def twopointcorr(x,y,s,dr,maxR=-1):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
if maxR < dr:
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR+dr, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
ind = np.clip(np.round(np.hypot(x[j]-x,y[j]-y) / dr), 0, len(r)-1).astype(int)
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[0] -= n
corrfun = np.sqrt(np.divide(corrfun, np.maximum(rw,1)))
r=np.delete(r,-1)
rw=np.delete(rw,-1)
corrfun=np.delete(corrfun,-1)
return r, corrfun, rw

Python, how to optimize this code

I tried to optimize the code below but I cannot figure out how to improve computation speed. I tried Cthon but the performance is like in python.
Is it possible to improve the performance without rewrite everything in C/C++?
Thanks for any help
import numpy as np
heightSequence = 400
widthSequence = 400
nHeights = 80
DOF = np.zeros((heightSequence, widthSequence), dtype = np.float64)
contrast = np.float64(np.random.rand(heightSequence, widthSequence, nHeights))
initDOF = np.zeros([heightSequence, widthSequence], dtype = np.float64)
initContrast = np.zeros([heightSequence, widthSequence, nHeights], dtype = np.float64)
initHeight = np.float64(np.r_[0:nHeights:1.0])
initPixelContrast = np.array(([0 for ii in range(nHeights)]), dtype = np.float64)
# for each row
for row in range(heightSequence):
# for each col
for col in range(widthSequence):
# initialize variables
height = initHeight # array ndim = 1
c = initPixelContrast # array ndim = 1
# for each height
for indexHeight in range(0, nHeights):
# get contrast profile for current pixel
tempC = contrast[:, :, indexHeight]
c[indexHeight] = tempC[row, col]
# save original contrast
# originalC = c
# originalHeight = height
# remove profile before maximum and after minumum contrast
idxMaxContrast = np.argmax(c)
c = c[idxMaxContrast:]
height = height[idxMaxContrast:]
idxMinContrast = np.argmin(c) + 1
c = c[0:idxMinContrast]
height = height[0:idxMinContrast]
# remove some refraction
if (len(c) <= 1) | (np.max(c) <= 0):
DOF[row, col] = 0
else:
# linear fitting of profile contrast
P = np.polyfit(height, c, 1)
m = P[0]
q = P[1]
# remove some refraction
if m >= 0:
DOF[row, col] = 0
else:
DOF[row, col] = -q / m
print 'row=%i/%i' %(row, heightSequence)
# set range of DOF
DOF[DOF < 0] = 0
DOF[DOF > nHeights] = 0
By looking at the code it seems that you can get rid of the two outer loops completely, converting the code to a vectorised form. However, the np.polyfit call must then be replaced by some other expression, but the coefficients for a linear fit are easy to find, also in vectorised form. The last if-else can then be turned into a np.where call.

Categories