I would like to perform image segmentation using maximum likelihood algorithm implemented in python.
The mean vectors of the classes, and covariance matrices are known, and iterating over the images (which are quite big...5100X7020) we can calculate for each pixel the probability of being part of the given class.
Simply written in Python
import numpy as np
from numpy.linalg import inv
from numpy.linalg import det
...
probImage1 = []
probImage1Vector = []
norm = 1.0 / (np.power((2*np.pi), i/2) * np.sqrt(np.linalg.det(covMatrixClass1)))
covMatrixInverz = np.linalg.inv(covMatrixClass1)
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
The problem that this code is very slow when performing on large images...
The calculations like vector subtraction and multiplication consumes a lot of time, even though they are only 1X3 vectors.
Could you please give a hint how to speed up this code? I would really appreciate. Sorry, if I was not clear I am still beginner in python.
Taking a closer look at :
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
We see that the operation is basically :
A.T (d) C (d) A # where `(d)` is the dot-product
Those three steps could be easily expressed as one string notation in np.einsum, like so -
np.einsum('k,lk,l->',pA,covMatrixInverz,-0.5*pA)
Performing this across both iterators i(=x) and j(=y), we would have a fully vectorized expression -
np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))
Alternatively, we could perform the first part of sume-reduction with np.tensordot -
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
output = np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized)
Benchmarking
Listing all approaches as functions -
# Original code posted by OP to return array
def org_app(meanVectorClass1, realImage, covMatrixInverz, norm):
probImage1 = []
probImage1Vector = []
x_img, y_img = realImage.shape[:2]
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
return np.asarray(probImage1).reshape(x_img,y_img)
def vectorized(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
return np.exp(np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized))*norm
def vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
return np.exp(np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))*norm
Timings -
In [19]: # Setup inputs
...: meanVectorClass1 = np.array([23.96000000, 58.159999, 61.5399])
...:
...: covMatrixClass1 = np.array([[ 514.20040404, 461.68323232, 364.35515152],
...: [ 461.68323232, 519.63070707, 446.48848485],
...: [ 364.35515152, 446.48848485, 476.37212121]])
...: covMatrixInverz = np.linalg.inv(covMatrixClass1)
...:
...: norm = 0.234 # Random float number
...: realImage = np.random.rand(1000,2000,3)
...:
In [20]: out1 = org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out2 = vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out3 = vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...:
True
True
In [21]: %timeit org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 27.8 s per loop
In [22]: %timeit vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 182 ms per loop
In [23]: %timeit vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 275 ms per loop
Looks like the fully vectorized einsum + tensordot hybrid solution is doing pretty good!
For further performance boost, one can also look into numexpr module to speedup the exponential computations on large arrays.
As a first step, I would get rid of unnecessary function calls like transpose, dot, and multiply. These are all simple calculations which you should be doing inline. When you can actually see what you are doing, instead of hiding things inside of functions, it will be easier to understand the performance problems.
The fundamental issue here is that this appears to be at least a quartic complexity operation. You might want to simply multiply out how many operations you are doing in all of your loops. Is it 500 million, 2 billion, 350 billion? How many?
To get control of performance you need to understand how many instructions you are doing. A modern computer can do about 1 billion instructions per second, but if memory movements are involved, it can be substantially slower.
Related
The question is simple: here is my current algorithm. This is terribly slow because of the loops on the arrays. Is there a way to change it in order to avoid the loops and take advantage of the NumPy arrays types ?
import numpy as np
def loopingFunction(listOfVector1, listOfVector2):
resultArray = []
for vector1 in listOfVector1:
result = 0
for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]
resultArray.append(result)
return np.array(resultArray)
listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)
listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))
listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)
listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
result = loopingFunction(listOfVector1, listOfVector2)
I am supposed to deal with really big arrays, that have way more than 1000 vectors in each. So if you have any advice, I'll take it.
The obligatory np.einsum benchmark
r2 = np.einsum('ij, kj, k->i', listOfVector1, listOfVector2, listOfVector2[:,2], optimize=['einsum_path', (1, 2), (0, 1)])
#%timeit result: 10000 loops, best of 5: 116 µs per loop
np.testing.assert_allclose(result, r2)
Just for fun, I wrote an optimized Numba implementation that outperform all others. It is based on the einsum optimization of the #MichaelSzczesny answer.
import numpy as np
import numba as nb
# This decorator ask Numba to eagerly compile the code using
# the provided signature string (containing the parameter types).
#nb.njit('(float64[:,::1], float64[:,::1])')
def loopingFunction_numba(listOfVector1, listOfVector2):
n, m = listOfVector1.shape
assert m == 3
result = np.empty(n)
s1 = s2 = s3 = 0.0
for i in range(n):
factor = listOfVector2[i, 2]
s1 += listOfVector2[i, 0] * factor
s2 += listOfVector2[i, 1] * factor
s3 += listOfVector2[i, 2] * factor
for i in range(n):
result[i] = listOfVector1[i, 0] * s1 + listOfVector1[i, 1] * s2 + listOfVector1[i, 2] * s3
return result
result = loopingFunction_numba(listOfVector1, listOfVector2)
Here are timings on my i5-9600KF processor:
Initial: 1052.0 ms
ymmx: 5.121 ms
MichaelSzczesny: 75.40 us
MechanicPig: 3.36 us
Numba: 2.74 us
Optimal lower bound: 0.66 us
This solution is ~384_000 times faster than the original one. Note that is does not even use the SIMD instructions of the processor that would result in a ~4x speed up on my machine. This is only possible by having transposed input that are much more SIMD-friendly than the current one. Transposition may also speed up other answers like the one of MechanicPig since BLAS can often benefit from this. The resulting code would reach the symbolic 1_000_000 speed up factor!
You can at least remove the two forloop to save alot of time, use matrix computation directly
import time
import numpy as np
def loopingFunction(listOfVector1, listOfVector2):
resultArray = []
for vector1 in listOfVector1:
result = 0
for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]
resultArray.append(result)
return np.array(resultArray)
def loopingFunction2(listOfVector1, listOfVector2):
resultArray = np.sum(np.dot(listOfVector1, listOfVector2.T) * listOfVector2[:,2], axis=1)
return resultArray
listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)
listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))
listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)
listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
import time
t0 = time.time()
result = loopingFunction(listOfVector1, listOfVector2)
print('time old version',time.time() - t0)
t0 = time.time()
result2 = loopingFunction2(listOfVector1, listOfVector2)
print('time matrix computation version',time.time() - t0)
print('Are results are the same',np.allclose(result,result2))
Which gives
time old version 1.174513578414917
time matrix computation version 0.011968612670898438
Are results are the same True
Basically, the less loop the better.
Avoid nested loops and adjust the calculation order, which is 20 times faster than the optimized np.einsum and nearly 400_000 times faster than the original program:
>>> out = listOfVector1.dot(listOfVector2[:, 2].dot(listOfVector2))
>>> np.allclose(out, loopingFunction(listOfVector1, listOfVector2))
True
Test:
>>> timeit(lambda: loopingFunction(listOfVector1, listOfVector2), number=1)
1.4389081999834161
>>> timeit(lambda: listOfVector1.dot(listOfVector2[:, 2].dot(listOfVector2)), number=400_000)
1.3162514999858104
>>> timeit(lambda: np.einsum('ij, kj, k->i', listOfVector1, listOfVector2, listOfVector2[:, 2], optimize=['einsum_path', (1, 2), (0, 1)]), number=18_000)
1.3501517999975476
I have an image mask stored as a 2D numpy array where the values indicate the presence of objects that have been segmented in the image (0 = no object, 1..n = object 1 through n). I want to get a single coordinate for each object representing the center of the object. It doesn't have to be a perfectly accurate centroid or center of gravity. I'm just taking the mean of the x and y indices of all cells in the array that contain each object. I'm wondering if there's a faster way to do this than my current method:
for obj in np.unique(mask):
if obj == 0:
continue
x, y = np.mean(np.where(mask == obj), axis=1)
Here is a reproducible example:
import numpy as np
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
points = []
for obj in np.unique(mask):
if obj == 0:
continue
points.append(np.mean(np.where(mask == obj), axis=1))
print(points)
This outputs:
[array([1.33333333, 1.66666667]),
array([1.28571429, 5. ]),
array([4., 2.]),
array([5., 6.])]
I came up with another way to do it that seems to be about 3x faster:
import numpy as np
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
flat = mask.flatten()
split = np.unique(np.sort(flat), return_index=True)[1]
points = []
for inds in np.split(flat.argsort(), split)[2:]:
points.append(np.array(np.unravel_index(inds, mask.shape)).mean(axis=1))
print(points)
I wonder if the for loop can be replaced with a numpy operation which would likely be even faster.
You can copy this answer (give them an upvote too if this answer works for you) and use sparse matrices instead of np arrays. However, this only proves to be quicker for large arrays, with increasing speed boosts the larger your array is:
import numpy as np, time
from scipy.sparse import csr_matrix
def compute_M(data):
cols = np.arange(data.size)
return csr_matrix((cols, (np.ravel(data), cols)),
shape=(data.max() + 1, data.size))
def get_indices_sparse(data,M):
#M = compute_M(data)
return [np.mean(np.unravel_index(row.data, data.shape),1) for R,row in enumerate(M) if R>0]
def gen_random_mask(C, n, m):
mask = np.zeros([n,m],int)
for i in range(C):
x = np.random.randint(n)
y = np.random.randint(m)
mask[x:x+np.random.randint(n-x),y:y+np.random.randint(m-y)] = i
return mask
N = 100
C = 4
for S in [10,100,1000,10000]:
mask = gen_random_mask(C, S, S)
print('Time for size {:d}x{:d}:'.format(S,S))
s = time.time()
for _ in range(N):
points = []
for obj in np.unique(mask):
if obj == 0:
continue
points.append(np.mean(np.where(mask == obj), axis=1))
points_np = np.array(points)
print('NP: {:f}'.format((time.time() - s)/N))
mask_s = compute_M(mask)
s = time.time()
for _ in range(100):
points = get_indices_sparse(mask,mask_s)
print('Sparse: {:f}'.format((time.time() - s)/N))
np.testing.assert_equal(points,points_np)
Which results in the timings of:
Time for size 10x10:
NP: 0.000066
Sparse: 0.000226
Time for size 100x100:
NP: 0.000207
Sparse: 0.000253
Time for size 1000x1000:
NP: 0.018662
Sparse: 0.004472
Time for size 10000x10000:
NP: 2.545973
Sparse: 0.501061
The problem likely comes from np.where(mask == obj) which iterates on the whole mask array over and over. This is a problem when there are a lot of objects. You can solve this problem efficiently using a group-by strategy. However, Numpy do not yet provide such an operation. You can implement that using a sort followed by a split. But a sort is generally not efficient. An alternative method is to ask Numpy to return the index in the unique call so that you can then accumulate the value regarding the object (like a reduce-by-key where the reduction operator is an addition and the key are object integers). The mean can be obtained using a simple division in the end.
objects, inverts, counts = np.unique(mask, return_counts=True, return_inverse=True)
# Reduction by object
x = np.full(len(objects), 0.0)
y = np.full(len(objects), 0.0)
xPos = np.repeat(np.arange(mask.shape[0]), mask.shape[1])
yPos = np.tile(np.arange(mask.shape[1]), reps=mask.shape[0])
np.add.at(x, inverts, xPos)
np.add.at(y, inverts, yPos)
# Compute the final mean from the sum
x /= counts
y /= counts
# Discard the first item (when obj == 0)
x = x[1:]
y = y[1:]
If you need something faster, you could use Numba and perform the reduction manually (and possibly in parallel).
EDIT: if you really need a list in output, you can use points = list(np.stack([x, y]).T) but this is rather slow to use lists instead of Numpy arrays (and not memory efficient either).
Because the mask values number the segments they can be directly used as indices into numpy arrays. Combined with Cython this can be used to achieve a strong speed-up.
In Jupyter start with loading Cython:
%load_ext Cython
then use Python magic and a single pass over the whole array to calculate the means:
%%cython -a
import cython
import numpy as np
cimport numpy as np
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def calc_xy_mean4(int[:,:] mask, int number_of_maskvalues):
cdef int[:] sum_x = np.zeros(number_of_maskvalues, dtype='int')
cdef int[:] sum_y = np.zeros(number_of_maskvalues, dtype='int')
n = np.zeros(number_of_maskvalues, dtype='int')
cdef int[:] n_mv = n
mean_x = np.zeros(number_of_maskvalues, dtype='float')
mean_y = np.zeros(number_of_maskvalues, dtype='float')
cdef double[:] mean_x_mv = mean_x
cdef double[:] mean_y_mv = mean_y
cdef int x_max = mask.shape[0]
cdef int y_max = mask.shape[1]
cdef int segment_index
cdef int x
cdef int y
for x in range(x_max):
for y in range(y_max):
segment_index = mask[x,y]
n_mv[segment_index] += 1
sum_x[segment_index] += x
sum_y[segment_index] += y
for segment_index in range(number_of_maskvalues):
mean_x_mv[segment_index] = sum_x[segment_index]/n[segment_index]
mean_y_mv[segment_index] = sum_y[segment_index]/n[segment_index]
return mean_x, mean_y, n
and call it with timeit magic
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
%timeit calc_xy_mean4(mask, 5)
This Cython solution is on my machine 9 times faster than the original code.
6.32 µs ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and if we run the same instruction without the timeit magic:
calc_xy_mean4(mask, 5)
we obtain as output:
(array([3.07692308, 1.33333333, 1.28571429, 4. , 5. ]),
array([4.59615385, 1.66666667, 5. , 2. , 6. ]),
array([52, 3, 7, 3, 5]))
I need to compute AB⁻¹ in Python / Numpy for two matrices A and B (B being square, of course).
I know that np.linalg.inv() would allow me to compute B⁻¹, which I can then multiply with A.
I also know that B⁻¹A is actually better computed with np.linalg.solve().
Inspired by that, I decided to rewrite AB⁻¹ in terms of np.linalg.solve().
I got to a formula, based on the identity (AB)ᵀ = BᵀAᵀ, which uses np.linalg.solve() and .transpose():
np.linalg.solve(a.transpose(), b.transpose()).transpose()
that seems to be doing the job:
import numpy as np
n, m = 4, 2
np.random.seed(0)
a = np.random.random((n, n))
b = np.random.random((m, n))
print(np.matmul(b, np.linalg.inv(a)))
# [[ 2.87169378 -0.04207382 -1.10553758 -0.83200471]
# [-1.08733434 1.00110176 0.79683577 0.67487591]]
print(np.linalg.solve(a.transpose(), b.transpose()).transpose())
# [[ 2.87169378 -0.04207382 -1.10553758 -0.83200471]
# [-1.08733434 1.00110176 0.79683577 0.67487591]]
print(np.all(np.isclose(np.matmul(b, np.linalg.inv(a)), np.linalg.solve(a.transpose(), b.transpose()).transpose())))
# True
and also comes up much faster for sufficiently large inputs:
n, m = 400, 200
np.random.seed(0)
a = np.random.random((n, n))
b = np.random.random((m, n))
print(np.all(np.isclose(np.matmul(b, np.linalg.inv(a)), np.linalg.solve(a.transpose(), b.transpose()).transpose())))
# True
%timeit np.matmul(b, np.linalg.inv(a))
# 100 loops, best of 3: 13.3 ms per loop
%timeit np.linalg.solve(a.transpose(), b.transpose()).transpose()
# 100 loops, best of 3: 7.71 ms per loop
My question is: does this identity always stand correct or there are some corner cases I am overlooking?
In general, np.linalg.solve(B, A) is equivalent to B-1A. The rest is just math.
In all cases, (AB)T = BTAT: https://math.stackexchange.com/q/1440305/295281.
Not necessary for this case, but for invertible matrices, (AB)-1 = B-1A-1: https://math.stackexchange.com/q/688339/295281.
For an invertible matrix, it is also the case that (A-1)T = (AT)-1: https://math.stackexchange.com/q/340233/295281.
From that it follows that (AB-1)T = (B-1)TAT = (BT)-1AT. As long as B is invertible, you should have no issues with the transformation you propose in any case.
It's a classic question, but I believe many people still searching for answers.
This question is a different than this one, since my question is operation between two sparse vectors (not a matrix).
I wrote a blog post about how Cosine Scipy Spatial Distance (SSD) is getting slower when the dimension of the data is getting higher (because it works on dense vectors). The post is in Indonesian language, but the code, my experiment settings & results should be easily understandable regardless of the language (at the bottom of the post).
Currently this solution is more than 70 times faster for high dimension data (compared to SSD) & more memory efficient:
import numpy as np
def fCosine(u,v): # u,v CSR vectors, Cosine Dissimilarity
uData = u.data; vData = v.data
denominator = np.sqrt(np.sum(uData**2)) * np.sqrt(np.sum(vData**2))
if denominator>0:
uCol = u.indices; vCol = v.indices # np array
intersection = set(np.intersect1d(uCol,vCol))
uI = np.array([u1 for i,u1 in enumerate(uData) if uCol[i] in intersection])
vI = np.array([v2 for j,v2 in enumerate(vData) if vCol[j] in intersection])
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
Is it possible to even further improve that function (Pythonic or via JIT/Cython???).
Here is an alternative, alt_fCosine, which (on my machine) is about 3x faster for CSR vectors of size 10**5 and 10**4 non-zero elements:
import scipy.sparse as sparse
import numpy as np
import math
def fCosine(u,v): # u,v CSR vectors, Cosine Dissimilarity
uData = u.data; vData = v.data
denominator = np.sqrt(np.sum(uData**2)) * np.sqrt(np.sum(vData**2))
if denominator>0:
uCol = u.indices; vCol = v.indices # np array
intersection = set(np.intersect1d(uCol,vCol))
uI = np.array([u1 for i,u1 in enumerate(uData) if uCol[i] in intersection])
vI = np.array([v2 for j,v2 in enumerate(vData) if vCol[j] in intersection])
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
def alt_fCosine(u,v):
uData, vData = u.data, v.data
denominator = math.sqrt(np.sum(uData**2) * np.sum(vData**2))
if denominator>0:
uCol, vCol = u.indices, v.indices
uI = uData[np.in1d(uCol, vCol)]
vI = vData[np.in1d(vCol, uCol)]
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
# Check that they return the same result
N = 10**5
u = np.round(10*sparse.random(1, N, density=0.1, format='csr'))
v = np.round(10*sparse.random(1, N, density=0.1, format='csr'))
assert np.allclose(fCosine(u, v), alt_fCosine(u, v))
alt_fCosine replaces two list comprehensions, a call to np.intersection1d
and the formation of a Python set with two calls to np.in1d and advanced
indexing.
For N = 10**5:
In [322]: %timeit fCosine(u, v)
100 loops, best of 3: 5.73 ms per loop
In [323]: %timeit alt_fCosine(u, v)
1000 loops, best of 3: 1.62 ms per loop
In [324]: 5.73/1.62
Out[324]: 3.537037037037037
I have two lists of coordinates:
l1 = [[x,y,z],[x,y,z],[x,y,z],[x,y,z],[x,y,z]]
l2 = [[x,y,z],[x,y,z],[x,y,z]]
I want to find the shortest pairwise distance between l1 and l2. Distance between two coordinates is simply:
numpy.linalg.norm(l1_element - l2_element)
So how do I use numpy to efficiently apply this operation to each pair of elements?
Here is a quick performance analysis of the four methods presented so far:
import numpy
import scipy
from itertools import product
from scipy.spatial.distance import cdist
from scipy.spatial import cKDTree as KDTree
n = 100
l1 = numpy.random.randint(0, 100, size=(n,3))
l2 = numpy.random.randint(0, 100, size=(n,3))
# by #Phillip
def a(l1,l2):
return min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))
# by #Kasra
def b(l1,l2):
return numpy.min(numpy.apply_along_axis(
numpy.linalg.norm,
2,
l1[:, None, :] - l2[None, :, :]
))
# mine
def c(l1,l2):
return numpy.min(scipy.spatial.distance.cdist(l1,l2))
# just checking that numpy.min is indeed faster.
def c2(l1,l2):
return min(scipy.spatial.distance.cdist(l1,l2).reshape(-1))
# by #BrianLarsen
def d(l1,l2):
# make KDTrees for both sets of points
t1 = KDTree(l1)
t2 = KDTree(l2)
# we need a distance to not look beyond, if you have real knowledge use it, otherwise guess
maxD = numpy.linalg.norm(l1[0] - l2[0]) # this could be closest but anyhting further is certainly not
# get a sparce matrix of all the distances
ans = t1.sparse_distance_matrix(t2, maxD)
# get the minimum distance and points involved
minD = min(ans.values())
return minD
for x in (a,b,c,c2,d):
print("Timing variant", x.__name__, ':', flush=True)
print(x(l1,l2), flush=True)
%timeit x(l1,l2)
print(flush=True)
For n=100
Timing variant a :
2.2360679775
10 loops, best of 3: 90.3 ms per loop
Timing variant b :
2.2360679775
10 loops, best of 3: 151 ms per loop
Timing variant c :
2.2360679775
10000 loops, best of 3: 136 µs per loop
Timing variant c2 :
2.2360679775
1000 loops, best of 3: 844 µs per loop
Timing variant d :
2.2360679775
100 loops, best of 3: 3.62 ms per loop
For n=1000
Timing variant a :
0.0
1 loops, best of 3: 9.16 s per loop
Timing variant b :
0.0
1 loops, best of 3: 14.9 s per loop
Timing variant c :
0.0
100 loops, best of 3: 11 ms per loop
Timing variant c2 :
0.0
10 loops, best of 3: 80.3 ms per loop
Timing variant d :
0.0
1 loops, best of 3: 933 ms per loop
Using newaxis and broadcasting, l1[:, None, :] - l2[None, :, :] is an array of the pairwise difference vectors. You can reduce this array to an array of norms using apply_along_axis and then take the min:
numpy.min(numpy.apply_along_axis(
numpy.linalg.norm,
2,
l1[:, None, :] - l2[None, :, :]
))
Of course, this only works if l1 and l2 are numpy arrays, so if your lists in the question weren't pseudo-code, you'll have to add l1 = numpy.array(l1); l2 = numpy.array(l2).
You can use itertools.product to get the all combinations the use min :
l1 = [[x,y,z],[x,y,z],[x,y,z],[x,y,z],[x,y,z]]
l2 = [[x,y,z],[x,y,z],[x,y,z]]
from itertools import product
min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))
If you have many, many, many points this is a great use for a KDTree. Totally overkill for this example but a good learning experience and really fast for a certain class of problems, and can give a bit more information on number of points within a certain distance.
import numpy as np
from scipy.spatial import cKDTree as KDTree
#sample data
l1 = [[0,0,0], [4,5,6], [7,6,7], [4,5,6]]
l2 = [[100,3,4], [1,0,0], [10,15,16], [17,16,17], [14,15,16], [-34, 5, 6]]
# make them arrays
l1 = np.asarray(l1)
l2 = np.asarray(l2)
# make KDTrees for both sets of points
t1 = KDTree(l1)
t2 = KDTree(l2)
# we need a distance to not look beyond, if you have real knowledge use it, otherwise guess
maxD = np.linalg.norm(l1[-1] - l2[-1]) # this could be closest but anyhting further is certainly not
# get a sparce matrix of all the distances
ans = t1.sparse_distance_matrix(t2, maxD)
# get the minimum distance and points involved
minA = min([(i,k) for k, i in ans.iteritems()])
print("Minimun distance is {0} between l1={1} and l2={2}".format(minA[0], l1[minA[1][0]], l2[minA[1][2]] ))
What this does is make a KDTree for the the sets of points then find all the distances for points within the guess distance and give back the distance and closest point. This post has a writeup of how a KDTree works.