Faster Python Cosine dissimilarity between Scipy CSR "vectors"

Faster Python Cosine dissimilarity between Scipy CSR "vectors" - python

It's a classic question, but I believe many people still searching for answers.
This question is a different than this one, since my question is operation between two sparse vectors (not a matrix).
I wrote a blog post about how Cosine Scipy Spatial Distance (SSD) is getting slower when the dimension of the data is getting higher (because it works on dense vectors). The post is in Indonesian language, but the code, my experiment settings & results should be easily understandable regardless of the language (at the bottom of the post).
Currently this solution is more than 70 times faster for high dimension data (compared to SSD) & more memory efficient:
import numpy as np
def fCosine(u,v): # u,v CSR vectors, Cosine Dissimilarity
uData = u.data; vData = v.data
denominator = np.sqrt(np.sum(uData**2)) * np.sqrt(np.sum(vData**2))
if denominator>0:
uCol = u.indices; vCol = v.indices # np array
intersection = set(np.intersect1d(uCol,vCol))
uI = np.array([u1 for i,u1 in enumerate(uData) if uCol[i] in intersection])
vI = np.array([v2 for j,v2 in enumerate(vData) if vCol[j] in intersection])
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
Is it possible to even further improve that function (Pythonic or via JIT/Cython???).

Here is an alternative, alt_fCosine, which (on my machine) is about 3x faster for CSR vectors of size 10**5 and 10**4 non-zero elements:
import scipy.sparse as sparse
import numpy as np
import math
def fCosine(u,v): # u,v CSR vectors, Cosine Dissimilarity
uData = u.data; vData = v.data
denominator = np.sqrt(np.sum(uData**2)) * np.sqrt(np.sum(vData**2))
if denominator>0:
uCol = u.indices; vCol = v.indices # np array
intersection = set(np.intersect1d(uCol,vCol))
uI = np.array([u1 for i,u1 in enumerate(uData) if uCol[i] in intersection])
vI = np.array([v2 for j,v2 in enumerate(vData) if vCol[j] in intersection])
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
def alt_fCosine(u,v):
uData, vData = u.data, v.data
denominator = math.sqrt(np.sum(uData**2) * np.sum(vData**2))
if denominator>0:
uCol, vCol = u.indices, v.indices
uI = uData[np.in1d(uCol, vCol)]
vI = vData[np.in1d(vCol, uCol)]
return 1-np.dot(uI,vI)/denominator
else:
return float("inf")
# Check that they return the same result
N = 10**5
u = np.round(10*sparse.random(1, N, density=0.1, format='csr'))
v = np.round(10*sparse.random(1, N, density=0.1, format='csr'))
assert np.allclose(fCosine(u, v), alt_fCosine(u, v))
alt_fCosine replaces two list comprehensions, a call to np.intersection1d
and the formation of a Python set with two calls to np.in1d and advanced
indexing.
For N = 10**5:
In [322]: %timeit fCosine(u, v)
100 loops, best of 3: 5.73 ms per loop
In [323]: %timeit alt_fCosine(u, v)
1000 loops, best of 3: 1.62 ms per loop
In [324]: 5.73/1.62
Out[324]: 3.537037037037037

Related

What is a fast(er) way to get the center points of objects represented in a 2D numpy array?

I have an image mask stored as a 2D numpy array where the values indicate the presence of objects that have been segmented in the image (0 = no object, 1..n = object 1 through n). I want to get a single coordinate for each object representing the center of the object. It doesn't have to be a perfectly accurate centroid or center of gravity. I'm just taking the mean of the x and y indices of all cells in the array that contain each object. I'm wondering if there's a faster way to do this than my current method:
for obj in np.unique(mask):
if obj == 0:
continue
x, y = np.mean(np.where(mask == obj), axis=1)
Here is a reproducible example:
import numpy as np
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
points = []
for obj in np.unique(mask):
if obj == 0:
continue
points.append(np.mean(np.where(mask == obj), axis=1))
print(points)
This outputs:
[array([1.33333333, 1.66666667]),
array([1.28571429, 5. ]),
array([4., 2.]),
array([5., 6.])]

I came up with another way to do it that seems to be about 3x faster:
import numpy as np
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
flat = mask.flatten()
split = np.unique(np.sort(flat), return_index=True)[1]
points = []
for inds in np.split(flat.argsort(), split)[2:]:
points.append(np.array(np.unravel_index(inds, mask.shape)).mean(axis=1))
print(points)
I wonder if the for loop can be replaced with a numpy operation which would likely be even faster.

You can copy this answer (give them an upvote too if this answer works for you) and use sparse matrices instead of np arrays. However, this only proves to be quicker for large arrays, with increasing speed boosts the larger your array is:
import numpy as np, time
from scipy.sparse import csr_matrix
def compute_M(data):
cols = np.arange(data.size)
return csr_matrix((cols, (np.ravel(data), cols)),
shape=(data.max() + 1, data.size))
def get_indices_sparse(data,M):
#M = compute_M(data)
return [np.mean(np.unravel_index(row.data, data.shape),1) for R,row in enumerate(M) if R>0]
def gen_random_mask(C, n, m):
mask = np.zeros([n,m],int)
for i in range(C):
x = np.random.randint(n)
y = np.random.randint(m)
mask[x:x+np.random.randint(n-x),y:y+np.random.randint(m-y)] = i
return mask
N = 100
C = 4
for S in [10,100,1000,10000]:
mask = gen_random_mask(C, S, S)
print('Time for size {:d}x{:d}:'.format(S,S))
s = time.time()
for _ in range(N):
points = []
for obj in np.unique(mask):
if obj == 0:
continue
points.append(np.mean(np.where(mask == obj), axis=1))
points_np = np.array(points)
print('NP: {:f}'.format((time.time() - s)/N))
mask_s = compute_M(mask)
s = time.time()
for _ in range(100):
points = get_indices_sparse(mask,mask_s)
print('Sparse: {:f}'.format((time.time() - s)/N))
np.testing.assert_equal(points,points_np)
Which results in the timings of:
Time for size 10x10:
NP: 0.000066
Sparse: 0.000226
Time for size 100x100:
NP: 0.000207
Sparse: 0.000253
Time for size 1000x1000:
NP: 0.018662
Sparse: 0.004472
Time for size 10000x10000:
NP: 2.545973
Sparse: 0.501061

The problem likely comes from np.where(mask == obj) which iterates on the whole mask array over and over. This is a problem when there are a lot of objects. You can solve this problem efficiently using a group-by strategy. However, Numpy do not yet provide such an operation. You can implement that using a sort followed by a split. But a sort is generally not efficient. An alternative method is to ask Numpy to return the index in the unique call so that you can then accumulate the value regarding the object (like a reduce-by-key where the reduction operator is an addition and the key are object integers). The mean can be obtained using a simple division in the end.
objects, inverts, counts = np.unique(mask, return_counts=True, return_inverse=True)
# Reduction by object
x = np.full(len(objects), 0.0)
y = np.full(len(objects), 0.0)
xPos = np.repeat(np.arange(mask.shape[0]), mask.shape[1])
yPos = np.tile(np.arange(mask.shape[1]), reps=mask.shape[0])
np.add.at(x, inverts, xPos)
np.add.at(y, inverts, yPos)
# Compute the final mean from the sum
x /= counts
y /= counts
# Discard the first item (when obj == 0)
x = x[1:]
y = y[1:]
If you need something faster, you could use Numba and perform the reduction manually (and possibly in parallel).
EDIT: if you really need a list in output, you can use points = list(np.stack([x, y]).T) but this is rather slow to use lists instead of Numpy arrays (and not memory efficient either).

Because the mask values number the segments they can be directly used as indices into numpy arrays. Combined with Cython this can be used to achieve a strong speed-up.
In Jupyter start with loading Cython:
%load_ext Cython
then use Python magic and a single pass over the whole array to calculate the means:
%%cython -a
import cython
import numpy as np
cimport numpy as np
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def calc_xy_mean4(int[:,:] mask, int number_of_maskvalues):
cdef int[:] sum_x = np.zeros(number_of_maskvalues, dtype='int')
cdef int[:] sum_y = np.zeros(number_of_maskvalues, dtype='int')
n = np.zeros(number_of_maskvalues, dtype='int')
cdef int[:] n_mv = n
mean_x = np.zeros(number_of_maskvalues, dtype='float')
mean_y = np.zeros(number_of_maskvalues, dtype='float')
cdef double[:] mean_x_mv = mean_x
cdef double[:] mean_y_mv = mean_y
cdef int x_max = mask.shape[0]
cdef int y_max = mask.shape[1]
cdef int segment_index
cdef int x
cdef int y
for x in range(x_max):
for y in range(y_max):
segment_index = mask[x,y]
n_mv[segment_index] += 1
sum_x[segment_index] += x
sum_y[segment_index] += y
for segment_index in range(number_of_maskvalues):
mean_x_mv[segment_index] = sum_x[segment_index]/n[segment_index]
mean_y_mv[segment_index] = sum_y[segment_index]/n[segment_index]
return mean_x, mean_y, n
and call it with timeit magic
mask = np.array([
[0,0,0,0,0,2,0,0,0,0],
[0,1,1,0,2,2,2,0,0,0],
[0,0,1,0,2,2,2,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,3,3,3,0,0,4,0,0,0],
[0,0,0,0,0,4,4,4,0,0],
[0,0,0,0,0,0,4,0,0,0],
])
%timeit calc_xy_mean4(mask, 5)
This Cython solution is on my machine 9 times faster than the original code.
6.32 µs ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and if we run the same instruction without the timeit magic:
calc_xy_mean4(mask, 5)
we obtain as output:
(array([3.07692308, 1.33333333, 1.28571429, 4. , 5. ]),
array([4.59615385, 1.66666667, 5. , 2. , 6. ]),
array([52, 3, 7, 3, 5]))

Computing `AB⁻¹` with `np.linalg.solve()`

I need to compute AB⁻¹ in Python / Numpy for two matrices A and B (B being square, of course).
I know that np.linalg.inv() would allow me to compute B⁻¹, which I can then multiply with A.
I also know that B⁻¹A is actually better computed with np.linalg.solve().
Inspired by that, I decided to rewrite AB⁻¹ in terms of np.linalg.solve().
I got to a formula, based on the identity (AB)ᵀ = BᵀAᵀ, which uses np.linalg.solve() and .transpose():
np.linalg.solve(a.transpose(), b.transpose()).transpose()
that seems to be doing the job:
import numpy as np
n, m = 4, 2
np.random.seed(0)
a = np.random.random((n, n))
b = np.random.random((m, n))
print(np.matmul(b, np.linalg.inv(a)))
# [[ 2.87169378 -0.04207382 -1.10553758 -0.83200471]
# [-1.08733434 1.00110176 0.79683577 0.67487591]]
print(np.linalg.solve(a.transpose(), b.transpose()).transpose())
# [[ 2.87169378 -0.04207382 -1.10553758 -0.83200471]
# [-1.08733434 1.00110176 0.79683577 0.67487591]]
print(np.all(np.isclose(np.matmul(b, np.linalg.inv(a)), np.linalg.solve(a.transpose(), b.transpose()).transpose())))
# True
and also comes up much faster for sufficiently large inputs:
n, m = 400, 200
np.random.seed(0)
a = np.random.random((n, n))
b = np.random.random((m, n))
print(np.all(np.isclose(np.matmul(b, np.linalg.inv(a)), np.linalg.solve(a.transpose(), b.transpose()).transpose())))
# True
%timeit np.matmul(b, np.linalg.inv(a))
# 100 loops, best of 3: 13.3 ms per loop
%timeit np.linalg.solve(a.transpose(), b.transpose()).transpose()
# 100 loops, best of 3: 7.71 ms per loop
My question is: does this identity always stand correct or there are some corner cases I am overlooking?

In general, np.linalg.solve(B, A) is equivalent to B-1A. The rest is just math.
In all cases, (AB)T = BTAT: https://math.stackexchange.com/q/1440305/295281.
Not necessary for this case, but for invertible matrices, (AB)-1 = B-1A-1: https://math.stackexchange.com/q/688339/295281.
For an invertible matrix, it is also the case that (A-1)T = (AT)-1: https://math.stackexchange.com/q/340233/295281.
From that it follows that (AB-1)T = (B-1)TAT = (BT)-1AT. As long as B is invertible, you should have no issues with the transformation you propose in any case.

Manipulation of numpy 2-D array

I have a 2-D numpy array like x = array([[ 1., 5.],[ 3., 4.]]), I have to compare each row with every other row in the matrix and create an new array of minimum values from both the rows and take the sum of minimum row and save it in a new matrix. Finally I will get a symmetric matrix.
Eg: I compare array [1,5] with itself. New 2-D array is array([[ 1., 5.],[ 1., 5.]]), I create a minimum array along axis=0 i.e [ 1., 5.] then take the sum of array which will be 6. Similarly I repeat the operation for all the rows and I end up with a 2*2 matrix array([[ 6, 5.],[ 5, 7.]]).
import numpy as np
x=np.array([[1,5],[3,4]])
y=np.zeros((len(x),len(x)))
for i in range(len(x)):
array_a=x[i]
for j in range(len(x)):
array_b=x[j]
array_c=np.array([array_a,array_b])
min_array=np.min(array_c,axis=0)
array_sum=np.sum(min_array)
y[i,j]=array_sum
My 2-D array is very big and performing above mentioned operations are taking lot of time. I am new to python so any suggestion to improve the performance will be really helpful.

The obvious improvement to save roughly half the time is to run only on i>=j indices. For elegance and some saving you can also use less variables.
import numpy as np
import time
x=np.random.randint(0, 10, (500, 500))
y=np.zeros((len(x),len(x)))
# OP version
t0 = time.time()
for i in range(len(x)):
array_a=x[i]
for j in range(len(x)):
array_b=x[j]
array_c=np.array([array_a,array_b])
min_array=np.min(array_c,axis=0)
array_sum=np.sum(min_array)
y[i,j]=array_sum
print(time.time() - t0)
z=np.zeros((len(x),len(x)))
# modified version
t0 = time.time()
for i in range(len(x)):
for j in range(i, len(x)):
z[i, j]=np.sum(np.min([x[i], x[j]], axis=0))
z[j, i] = z[i, j]
print(time.time() - t0)
# verify that the result are the same
print(np.all(z == y))
The results on my machine:
4.2974278926849365
2.746302604675293
True

The obvious way to speed up your code would be to do all the looping in numpy. I had a first solution (f2 in the code below), which would generate a matrix that contained all the combinations that need to be compared and then reduced that matrix into the final result performing the np.min and np.sum commands. Unfortunately that method is quite memory consuming and therefore becomes slow when the matrices are big, because the intermediate matrix is NxNx2xN for a NxN input matrix.
However, I found a different solution that uses one for loop (f3 below) and appears to be reasonably fast. The speed-up to the original posted by the OP is about 4 times for a 1000x1000 matrix. Here the codes with some tests:
import numpy as np
import timeit
def f(x):
y = np.zeros_like(x)
for i in range(x.shape[0]):
a = x[i]
for j in range(x.shape[1]):
b = x[j]
y[i,j] = np.sum(np.min([a,b], axis=0))
return y
def f2(x):
y = np.empty((x.shape[0],1,2,x.shape[0]))
y[:,0,0,:] = x[:,:]
y = np.repeat(y, x.shape[0],axis=1)
y[:,:,1,:] = x[:,:]
return np.sum(np.min(y,axis=2),axis=2)
def f3(x):
y = np.empty_like(x)
for i in range(x.shape[1]):
y[:,i] = np.sum(np.minimum(x[i,:],x[:,:]),axis=1)
return y
##some testing that the functions work
x = np.array([[1,5],[3,4]])
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))
x = np.array([[1,7,5],[2,3,8],[5,2,4]])
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))
x = np.random.randint(0,10,(100,100))
a=f(x)
b=f2(x)
c=f3(x)
print(np.all(a==b))
print(np.all(a==c))
##some speed testing:
print('-'*50)
print("speed test small")
x = np.random.randint(0,100,(100,100))
print("original")
print(min(timeit.Timer(
'f(x)',
setup = 'from __main__ import f,x',
).repeat(3,10)))
print("using np.repeat")
print(min(timeit.Timer(
'f2(x)',
setup = 'from __main__ import f2,x',
).repeat(3,10)))
print("one for loop")
print(min(timeit.Timer(
'f3(x)',
setup = 'from __main__ import f3,x',
).repeat(3,10)))
print('-'*50)
print("speed test big")
x = np.random.randint(0,100,(1000,1000))
print("original")
print(min(timeit.Timer(
'f(x)',
setup = 'from __main__ import f,x',
).repeat(3,1)))
print("one for loop")
print(min(timeit.Timer(
'f3(x)',
setup = 'from __main__ import f3,x',
).repeat(3,1)))
And here the output:
True
True
True
True
True
True
--------------------------------------------------
speed test small
original
1.3070102719939314
using np.repeat
0.15176948899170384
one for loop
0.029766165011096746
--------------------------------------------------
speed test big
original
17.505746565002482
one for loop
4.437685210024938
With other words, f2 is pretty fast for matrices that don't exhaust your memory, but especially for big matrices, f3 is the fastest that I could find.
EDIT:
Inspired by #Aguy's answer and this post, here still a modification that only computes the lower triangle of the matrix and then copies the results to the upper triangle:
def f4(x):
y = np.empty_like(x)
for i in range(x.shape[1]):
y[i:,i] = np.sum(np.minimum(x[i,:],x[i:,:]),axis=1)
i_upper = np.triu_indices(x.shape[1],1)
y[i_upper] = y.T[i_upper]
return y
The speed test for the 1000x1000 matrix now gives
speed test big
original
18.71281115297461
one for loop over lower triangle
2.0939957330119796
EDIT 2:
Here still a version that uses numba for speed up. According to this post it is better to write the loops explicitly in this case:
import numba as nb
#nb.jit(nopython=True)
def f_nb(x):
res = np.empty_like(x)
for j in range(res.shape[1]):
for i in range(j,res.shape[0]):
res[j,i] = res[i,j] = np.sum(np.minimum(x[i,:], x[j,:]))
return res
And the relevant speed tests give:
0.015975199989043176 for a 100x100 matrix
0.37946902704425156 for a 1000x1000 matrix
467.06363476096885 for a 10000x10000 matrix
The 10000x10000 speed test for f4 didn't seem to want to finish at all, so I left it out. If your matrices get much bigger than that, you might actually run into memory problems -- did you consider this?

Segmentation using maximum likelihood algorithm on images using python

I would like to perform image segmentation using maximum likelihood algorithm implemented in python.
The mean vectors of the classes, and covariance matrices are known, and iterating over the images (which are quite big...5100X7020) we can calculate for each pixel the probability of being part of the given class.
Simply written in Python
import numpy as np
from numpy.linalg import inv
from numpy.linalg import det
...
probImage1 = []
probImage1Vector = []
norm = 1.0 / (np.power((2*np.pi), i/2) * np.sqrt(np.linalg.det(covMatrixClass1)))
covMatrixInverz = np.linalg.inv(covMatrixClass1)
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
The problem that this code is very slow when performing on large images...
The calculations like vector subtraction and multiplication consumes a lot of time, even though they are only 1X3 vectors.
Could you please give a hint how to speed up this code? I would really appreciate. Sorry, if I was not clear I am still beginner in python.

Taking a closer look at :
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
We see that the operation is basically :
A.T (d) C (d) A # where `(d)` is the dot-product
Those three steps could be easily expressed as one string notation in np.einsum, like so -
np.einsum('k,lk,l->',pA,covMatrixInverz,-0.5*pA)
Performing this across both iterators i(=x) and j(=y), we would have a fully vectorized expression -
np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))
Alternatively, we could perform the first part of sume-reduction with np.tensordot -
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
output = np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized)
Benchmarking
Listing all approaches as functions -
# Original code posted by OP to return array
def org_app(meanVectorClass1, realImage, covMatrixInverz, norm):
probImage1 = []
probImage1Vector = []
x_img, y_img = realImage.shape[:2]
for x in xrange(x_img):
for y in xrange(y_img):
X = realImage[x,y]
pixelValueDifference = X - meanVectorClass1
mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)
expo = np.exp(multMult)
probImage1Vector.append(np.multiply(norm,expo))
probImage1.append(probImage1Vector)
probImage1Vector = []
return np.asarray(probImage1).reshape(x_img,y_img)
def vectorized(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
return np.exp(np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized))*norm
def vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm):
pA = realImage - meanVectorClass1
return np.exp(np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))*norm
Timings -
In [19]: # Setup inputs
...: meanVectorClass1 = np.array([23.96000000, 58.159999, 61.5399])
...:
...: covMatrixClass1 = np.array([[ 514.20040404, 461.68323232, 364.35515152],
...: [ 461.68323232, 519.63070707, 446.48848485],
...: [ 364.35515152, 446.48848485, 476.37212121]])
...: covMatrixInverz = np.linalg.inv(covMatrixClass1)
...:
...: norm = 0.234 # Random float number
...: realImage = np.random.rand(1000,2000,3)
...:
In [20]: out1 = org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out2 = vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
...: out3 = vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...:
True
True
In [21]: %timeit org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 27.8 s per loop
In [22]: %timeit vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 182 ms per loop
In [23]: %timeit vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 275 ms per loop
Looks like the fully vectorized einsum + tensordot hybrid solution is doing pretty good!
For further performance boost, one can also look into numexpr module to speedup the exponential computations on large arrays.

As a first step, I would get rid of unnecessary function calls like transpose, dot, and multiply. These are all simple calculations which you should be doing inline. When you can actually see what you are doing, instead of hiding things inside of functions, it will be easier to understand the performance problems.
The fundamental issue here is that this appears to be at least a quartic complexity operation. You might want to simply multiply out how many operations you are doing in all of your loops. Is it 500 million, 2 billion, 350 billion? How many?
To get control of performance you need to understand how many instructions you are doing. A modern computer can do about 1 billion instructions per second, but if memory movements are involved, it can be substantially slower.

numpy how to find index of nearest value according to a thresold value of multi dimensional array? [duplicate]

How do I find the nearest value in a numpy array? Example:
np.find_nearest(array, value)

import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
Example usage:
array = np.random.random(10)
print(array)
# [ 0.21069679 0.61290182 0.63425412 0.84635244 0.91599191 0.00213826
# 0.17104965 0.56874386 0.57319379 0.28719469]
print(find_nearest(array, value=0.5))
# 0.568743859261

IF your array is sorted and is very large, this is a much faster solution:
def find_nearest(array,value):
idx = np.searchsorted(array, value, side="left")
if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
return array[idx-1]
else:
return array[idx]
This scales to very large arrays. You can easily modify the above to sort in the method if you can't assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, ...):
def find_nearest(a, a0):
"Element in nd array `a` closest to the scalar value `a0`"
idx = np.abs(a - a0).argmin()
return a.flat[idx]
Or, written as a single line:
a.flat[np.abs(a - a0).argmin()]

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either.
If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.
First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.
Bisection:
def bisection(array,value):
'''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
to indicate that ``value`` is out of range below and above respectively.'''
n = len(array)
if (value < array[0]):
return -1
elif (value > array[n-1]):
return n
jl = 0# Initialize lower
ju = n-1# and upper limits.
while (ju-jl > 1):# If we are not yet done,
jm=(ju+jl) >> 1# compute a midpoint with a bitshift
if (value >= array[jm]):
jl=jm# and replace either the lower limit
else:
ju=jm# or the upper limit, as appropriate.
# Repeat until the test condition is satisfied.
if (value == array[0]):# edge cases at bottom
return 0
elif (value == array[n-1]):# and top
return n-1
else:
return jl
Now I'll define the code from the other answers, they each return an index:
import math
import numpy as np
def find_nearest1(array,value):
idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
return idx
def find_nearest2(array, values):
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
return indices
def find_nearest3(array, values):
values = np.atleast_1d(values)
indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
out = array[indices]
return indices
def find_nearest4(array,value):
idx = (np.abs(array-value)).argmin()
return idx
def find_nearest5(array, value):
idx_sorted = np.argsort(array)
sorted_array = np.array(array[idx_sorted])
idx = np.searchsorted(sorted_array, value, side="left")
if idx >= len(array):
idx_nearest = idx_sorted[len(array)-1]
elif idx == 0:
idx_nearest = idx_sorted[0]
else:
if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
idx_nearest = idx_sorted[idx-1]
else:
idx_nearest = idx_sorted[idx]
return idx_nearest
def find_nearest6(array,value):
xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
return xi
Now I'll time the codes:
Note methods 1,2,4,5 don't correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.
array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)
(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop
For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 - 1000 times faster). For smaller arrays it's ~2-100 times faster.

Here is a fast vectorized version of #Dimitri's solution if you have many values to search for (values can be multi-dimensional array):
# `values` should be sorted
def get_closest(array, values):
# make sure array is a numpy array
array = np.array(array)
# get insert positions
idxs = np.searchsorted(array, values, side="left")
# find indexes where previous index is closer
prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
idxs[prev_idx_is_less] -= 1
return array[idxs]
Benchmarks
> 100 times faster than using a for loop with #Demitri's solution`
>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

Here's an extension to find the nearest vector in an array of vectors.
import numpy as np
def find_nearest_vector(array, value):
idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
return array[idx]
A = np.random.random((10,2))*100
""" A = array([[ 34.19762933, 43.14534123],
[ 48.79558706, 47.79243283],
[ 38.42774411, 84.87155478],
[ 63.64371943, 50.7722317 ],
[ 73.56362857, 27.87895698],
[ 96.67790593, 77.76150486],
[ 68.86202147, 21.38735169],
[ 5.21796467, 59.17051276],
[ 82.92389467, 99.90387851],
[ 6.76626539, 30.50661753]])"""
pt = [6, 30]
print find_nearest_vector(A,pt)
# array([ 6.76626539, 30.50661753])

If you don't want to use numpy this will do it:
def find_nearest(array, value):
n = [abs(i-value) for i in array]
idx = n.index(min(n))
return array[idx]

Here's a version that will handle a non-scalar "values" array:
import numpy as np
def find_nearest(array, values):
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
return array[indices]
Or a version that returns a numeric type (e.g. int, float) if the input is scalar:
def find_nearest(array, values):
values = np.atleast_1d(values)
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
out = array[indices]
return out if len(out) > 1 else out[0]

Here is a version with scipy for #Ari Onasafari, answer "to find the nearest vector in an array of vectors"
In [1]: from scipy import spatial
In [2]: import numpy as np
In [3]: A = np.random.random((10,2))*100
In [4]: A
Out[4]:
array([[ 68.83402637, 38.07632221],
[ 76.84704074, 24.9395109 ],
[ 16.26715795, 98.52763827],
[ 70.99411985, 67.31740151],
[ 71.72452181, 24.13516764],
[ 17.22707611, 20.65425362],
[ 43.85122458, 21.50624882],
[ 76.71987125, 44.95031274],
[ 63.77341073, 78.87417774],
[ 8.45828909, 30.18426696]])
In [5]: pt = [6, 30] # <-- the point to find
In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point
Out[6]: array([ 8.45828909, 30.18426696])
#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)
In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393
In [9]: index # <-- The locations of the neighbors
Out[9]: 9
#then
In [10]: A[index]
Out[10]: array([ 8.45828909, 30.18426696])

For large arrays, the (excellent) answer given by #Demitri is far faster than the answer currently marked as best. I've adapted his exact algorithm in the following two ways:
The function below works whether or not the input array is sorted.
The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.
Note that the function below also handles a specific edge case that would lead to a bug in the original function written by #Demitri. Otherwise, my algorithm is identical to his.
def find_idx_nearest_val(array, value):
idx_sorted = np.argsort(array)
sorted_array = np.array(array[idx_sorted])
idx = np.searchsorted(sorted_array, value, side="left")
if idx >= len(array):
idx_nearest = idx_sorted[len(array)-1]
elif idx == 0:
idx_nearest = idx_sorted[0]
else:
if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
idx_nearest = idx_sorted[idx-1]
else:
idx_nearest = idx_sorted[idx]
return idx_nearest

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.
import numpy as np
import bisect
xarr = np.random.rand(int(1e7))
srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)
In [63]: %time bisect.bisect_left(xlist, 0.3)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 22.2 µs
np.searchsorted(xar, 0.3, side="left")
In [64]: %time np.searchsorted(xar, 0.3, side="left")
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 98.9 µs
randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")
%time np.searchsorted(xar, randpts, side="left")
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 1.2 ms
If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.

I think the most pythonic way would be:
num = 65 # Input number
array = np.random.random((10))*100 # Given array
nearest_idx = np.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)
This is the basic code. You can use it as a function if you want

This is a vectorized version of unutbu's answer:
def find_nearest(array, values):
array = np.asarray(array)
# the last dim must be 1 to broadcast in (array - values) below.
values = np.expand_dims(values, axis=-1)
indices = np.abs(array - values).argmin(axis=-1)
return array[indices]
image = plt.imread('example_3_band_image.jpg')
print(image.shape) # should be (nrows, ncols, 3)
quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)
quantiled_image = find_nearest(quantiles, image)
print(quantiled_image.shape) # should be (nrows, ncols, 3)

Maybe helpful for ndarrays:
def find_nearest(X, value):
return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

For 2d array, to determine the i, j position of nearest element:
import numpy as np
def find_nearest(a, a0):
idx = (np.abs(a - a0)).argmin()
w = a.shape[1]
i = idx // w
j = idx - i * w
return a[i,j], i, j

Here is a version that works with 2D arrays, using scipy's cdist function if the user has it, and a simpler distance calculation if they don't.
By default, the output is the index that is closest to the value you input, but you can change that with the output keyword to be one of 'index', 'value', or 'both', where 'value' outputs array[index] and 'both' outputs index, array[index].
For very large arrays, you may need to use kind='euclidean', as the default scipy cdist function may run out of memory.
This is maybe not the absolute fastest solution, but it is quite close.
def find_nearest_2d(array, value, kind='cdist', output='index'):
# 'array' must be a 2D array
# 'value' must be a 1D array with 2 elements
# 'kind' defines what method to use to calculate the distances. Can choose one
# of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
# arrays. Otherwise, cdist is much faster.
# 'output' defines what the output should be. Can be 'index' (default) to return
# the index of the array that is closest to the value, 'value' to return the
# value that is closest, or 'both' to return index,value
import numpy as np
if kind == 'cdist':
try: from scipy.spatial.distance import cdist
except ImportError:
print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
kind = 'euclidean'
index = np.where(array == value)[0] # Make sure the value isn't in the array
if index.size == 0:
if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
if output == 'index': return index
elif output == 'value': return array[index]
elif output == 'both': return index,array[index]
else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")

For those searching for multiple nearest, modifying the accepted answer:
import numpy as np
def find_nearest(array, value, k):
array = np.asarray(array)
idx = np.argsort(abs(array - value))[:k]
return array[idx]
See:
https://stackoverflow.com/a/66937734/11671779

import numpy as np
def find_nearest(array, value):
array = np.array(array)
z=np.abs(array-value)
y= np.where(z == z.min())
m=np.array(y)
x=m[0,0]
y=m[1,0]
near_value=array[x,y]
return near_value
array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

This one handles any number of queries, using numpy searchsorted, so after sorting the input arrays, is just as fast.
It works on regular grids in 2d, 3d ... too:
#!/usr/bin/env python3
# keywords: nearest-neighbor regular-grid python numpy searchsorted Voronoi
import numpy as np
#...............................................................................
class Near_rgrid( object ):
""" nearest neighbors on a Manhattan aka regular grid
1d:
near = Near_rgrid( x: sorted 1d array )
nearix = near.query( q: 1d ) -> indices of the points x_i nearest each q_i
x[nearix[0]] is the nearest to q[0]
x[nearix[1]] is the nearest to q[1] ...
nearpoints = x[nearix] is near q
If A is an array of e.g. colors at x[0] x[1] ...,
A[nearix] are the values near q[0] q[1] ...
Query points < x[0] snap to x[0], similarly > x[-1].
2d: on a Manhattan aka regular grid,
streets running east-west at y_i, avenues north-south at x_j,
near = Near_rgrid( y, x: sorted 1d arrays, e.g. latitide longitude )
I, J = near.query( q: nq × 2 array, columns qy qx )
-> nq × 2 indices of the gridpoints y_i x_j nearest each query point
gridpoints = np.column_stack(( y[I], x[J] )) # e.g. street corners
diff = gridpoints - querypoints
distances = norm( diff, axis=1, ord= )
Values at an array A definded at the gridpoints y_i x_j nearest q: A[I,J]
3d: Near_rgrid( z, y, x: 1d axis arrays ) .query( q: nq × 3 array )
See Howitworks below, and the plot Voronoi-random-regular-grid.
"""
def __init__( self, *axes: "1d arrays" ):
axarrays = []
for ax in axes:
axarray = np.asarray( ax ).squeeze()
assert axarray.ndim == 1, "each axis should be 1d, not %s " % (
str( axarray.shape ))
axarrays += [axarray]
self.midpoints = [_midpoints( ax ) for ax in axarrays]
self.axes = axarrays
self.ndim = len(axes)
def query( self, queries: "nq × dim points" ) -> "nq × dim indices":
""" -> the indices of the nearest points in the grid """
queries = np.asarray( queries ).squeeze() # or list x y z ?
if self.ndim == 1:
assert queries.ndim <= 1, queries.shape
return np.searchsorted( self.midpoints[0], queries ) # scalar, 0d ?
queries = np.atleast_2d( queries )
assert queries.shape[1] == self.ndim, [
queries.shape, self.ndim]
return [np.searchsorted( mid, q ) # parallel: k axes, k processors
for mid, q in zip( self.midpoints, queries.T )]
def snaptogrid( self, queries: "nq × dim points" ):
""" -> the nearest points in the grid, 2d [[y_j x_i] ...] """
ix = self.query( queries )
if self.ndim == 1:
return self.axes[0][ix]
else:
axix = [ax[j] for ax, j in zip( self.axes, ix )]
return np.array( axix )
def _midpoints( points: "array-like 1d, *must be sorted*" ) -> "1d":
points = np.asarray( points ).squeeze()
assert points.ndim == 1, points.shape
diffs = np.diff( points )
assert np.nanmin( diffs ) > 0, "the input array must be sorted, not %s " % (
points.round( 2 ))
return (points[:-1] + points[1:]) / 2 # floats
#...............................................................................
Howitworks = \
"""
How Near_rgrid works in 1d:
Consider the midpoints halfway between fenceposts | | |
The interval [left midpoint .. | .. right midpoint] is what's nearest each post --
| | | | points
| . | . | . | midpoints
^^^^^^ . nearest points[1]
^^^^^^^^^^^^^^^ nearest points[2] etc.
2d:
I, J = Near_rgrid( y, x ).query( q )
I = nearest in `x`
J = nearest in `y` independently / in parallel.
The points nearest [yi xj] in a regular grid (its Voronoi cell)
form a rectangle [left mid x .. right mid x] × [left mid y .. right mid y]
(in any norm ?)
See the plot Voronoi-random-regular-grid.
Notes
-----
If a query point is exactly halfway between two data points,
e.g. on a grid of ints, the lines (x + 1/2) U (y + 1/2),
which "nearest" you get is implementation-dependent, unpredictable.
"""
Murky = \
""" NaNs in points, in queries ?
"""
__version__ = "2021-10-25 oct denis-bz-py"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Faster Python Cosine dissimilarity between Scipy CSR "vectors" - python

Related

What is a fast(er) way to get the center points of objects represented in a 2D numpy array?

Computing `AB⁻¹` with `np.linalg.solve()`

Manipulation of numpy 2-D array

Segmentation using maximum likelihood algorithm on images using python

numpy how to find index of nearest value according to a thresold value of multi dimensional array? [duplicate]

Categories

Resources