Vectorize mask of squared euclidean distance in Python - python

I'm running code to generate a mask of locations in B closer than some distance D to locations in A.
N = [[0 for j in range(length_B)] for i in range(length_A)]
dSquared = D*D
for i in range(length_A):
for j in range(length_B):
if ((A[j][0]-B[i][0])**2 + (A[j][1]-B[i][1])**2) <= dSquared:
N[i][j] = 1
For lists of A and B that are tens of thousands of locations long, this code takes a while. I'm pretty sure there's a way to vectorize this though to make it run much faster. Thank you.

It's easier to visualize this code with 2d array indexing:
for j in range(length_A):
for i in range(length_B):
dist = (A[j,0] - B[i,0])**2 + (A[j,1] - B[i,1])**2
if dist <= dSquared:
N[i, j] = 1
That dist expression looks like
((A[j,:] - B[i,:])**2).sum(axis=1)
With 2 elements this array expression might not be faster, but it should help us rethink the problem.
We can perform the i,j, outter problems with broadcasting
A[:,None,:] - B[None,:,:] # 3d difference array
dist=((A[:,None,:] - B[None,:,:])**2).sum(axis=-1) # (lengthA,lengthB) array
Compare this to dSquared, and use the resulting boolean array as a mask for setting elements of N to 1:
N = np.zeros((lengthA,lengthB))
N[dist <= dSquared] = 1
I haven't tested this code, so there may be bugs, but I think basic idea is there. And may be enough of the thought process to let you work out the details for other cases.

You can use scipy's cdist that is supposedly pretty efficient for such distance calculations, like so -
from scipy.spatial.distance import cdist
N = (cdist(A,B,'sqeuclidean') <= dSquared).astype(int)
As suggested in #hpaulj's solution, one can use also use broadcasting. Now, from the posted code in the question, it looks like we are dealing with Nx2 shaped arrays. So, we can basically slice the first and second columns and perform broadcasted subtractions on them separately. The benefit would be that we won't be going 3D and as such keeping it memory efficient and that might also translate to performance boost. Thus, the squared euclidean distances would be calculated like so -
sq_eucl_dist = (A[:,None,0] - B[:,0])**2 + (A[:,None,1] - B[:,1])**2
Let's time all these three approaches for squared euclidean distance calculations.
Runtime test -
In [75]: # Input arrays
...: A = np.random.rand(200,2)
...: B = np.random.rand(200,2)
...:
In [76]: %timeit ((A[:,None,:] - B[None,:,:])**2).sum(axis=-1) # #hpaulj's solution
1000 loops, best of 3: 1.9 ms per loop
In [77]: %timeit (A[:,None,0] - B[:,0])**2 + (A[:,None,1] - B[:,1])**2
1000 loops, best of 3: 401 µs per loop
In [78]: %timeit cdist(A,B,'sqeuclidean')
1000 loops, best of 3: 249 µs per loop

I second the suggestions to use Numpy above. The looping code is also doing a lot more indexing into A than it needs to. You could use something like:
import numpy as np
dimension = 10000
A = np.random.rand(dimension, 2) + 0.0
B = np.random.rand(dimension, 2) + 1.0
N = []
d = 1.0
for i in range(len(A)):
distances = np.linalg.norm(B - A[i,:], axis=1)
for j in range(len(distances)):
if distances[j] <= d:
N.append((i,j))
print(len(N))
It is going to be pretty hard to get decent performance for this out of pure Python. I would also point out that the more-dimensional array solutions will require a... lot... of memory.

Insofar as your matrix N is likely to be sparse, scipy.spatial.cKDTree will give much better time complexity than any approach based on computing all distances brute force:
cKDTree(A).sparse_distance_matrix(cKDTree(B), max_distance=D)

Related

Optimize Python: Large arrays, memory problems

I'm having a speed problem running a python / numypy code. I don't know how to make it faster, maybe someone else?
Assume there is a surface with two triangulation, one fine (..._fine) with M points, one coarse with N points. Also, there's data on the coarse mesh at every point (N floats). I'm trying to do the following:
For every point on the fine mesh, find the k closest points on coarse mesh and get mean value. Short: interpolate data from coarse to fine.
My code right now goes like that. With large data (in my case M = 2e6, N = 1e4) the code runs about 25 minutes, guess due to the explicit for loop not going into numpy. Any ideas how to solve that one with smart indexing? M x N arrays blowing the RAM..
import numpy as np
p_fine.shape => m x 3
p.shape => n x 3
data_fine = np.empty((m,))
for i, ps in enumerate(p_fine):
data_fine[i] = np.mean(data_coarse[np.argsort(np.linalg.norm(ps-p,axis=1))[:k]])
Cheers!
First of all thanks for the detailed help.
First, Divakar, your solutions gave substantial speed-up. With my data, the code ran for just below 2 minutes depending a bit on the chunk size.
I also tried my way around sklearn and ended up with
def sklearnSearch_v3(p, p_fine, k):
neigh = NearestNeighbors(k)
neigh.fit(p)
return data_coarse[neigh.kneighbors(p_fine)[1]].mean(axis=1)
which ended up being quite fast, for my data sizes, I get the following
import numpy as np
from sklearn.neighbors import NearestNeighbors
m,n = 2000000,20000
p_fine = np.random.rand(m,3)
p = np.random.rand(n,3)
data_coarse = np.random.rand(n)
k = 3
yields
%timeit sklearv3(p, p_fine, k)
1 loop, best of 3: 7.46 s per loop
Approach #1
We are working with large sized datasets and memory is an issue, so I will try to optimize the computations within the loop. Now, we can use np.einsum to replace np.linalg.norm part and np.argpartition in place of actual sorting with np.argsort, like so -
out = np.empty((m,))
for i, ps in enumerate(p_fine):
subs = ps-p
sq_dists = np.einsum('ij,ij->i',subs,subs)
out[i] = data_coarse[np.argpartition(sq_dists,k)[:k]].sum()
out = out/k
Approach #2
Now, as another approach we can also use Scipy's cdist for a fully vectorized solution, like so -
from scipy.spatial.distance import cdist
out = data_coarse[np.argpartition(cdist(p_fine,p),k,axis=1)[:,:k]].mean(1)
But, since we are memory bound here, we can perform these operations in chunks. Basically, we would get chunks of rows from that tall array p_fine that has millions of rows and use cdist and thus at each iteration get chunks of output elements instead of just one scalar. With this, we would cut the loop count by the length of that chunk.
So, finally we would have an implementation like so -
out = np.empty((m,))
L = 10 # Length of chunk (to be used as a param)
num_iter = m//L
for j in range(num_iter):
p_fine_slice = p_fine[L*j:L*j+L]
out[L*j:L*j+L] = data_coarse[np.argpartition(cdist\
(p_fine_slice,p),k,axis=1)[:,:k]].mean(1)
Runtime test
Setup -
# Setup inputs
m,n = 20000,100
p_fine = np.random.rand(m,3)
p = np.random.rand(n,3)
data_coarse = np.random.rand(n)
k = 5
def original_approach(p,p_fine,m,n,k):
data_fine = np.empty((m,))
for i, ps in enumerate(p_fine):
data_fine[i] = np.mean(data_coarse[np.argsort(np.linalg.norm\
(ps-p,axis=1))[:k]])
return data_fine
def proposed_approach(p,p_fine,m,n,k):
out = np.empty((m,))
for i, ps in enumerate(p_fine):
subs = ps-p
sq_dists = np.einsum('ij,ij->i',subs,subs)
out[i] = data_coarse[np.argpartition(sq_dists,k)[:k]].sum()
return out/k
def proposed_approach_v2(p,p_fine,m,n,k,len_per_iter):
L = len_per_iter
out = np.empty((m,))
num_iter = m//L
for j in range(num_iter):
p_fine_slice = p_fine[L*j:L*j+L]
out[L*j:L*j+L] = data_coarse[np.argpartition(cdist\
(p_fine_slice,p),k,axis=1)[:,:k]].sum(1)
return out/k
Timings -
In [134]: %timeit original_approach(p,p_fine,m,n,k)
1 loops, best of 3: 1.1 s per loop
In [135]: %timeit proposed_approach(p,p_fine,m,n,k)
1 loops, best of 3: 539 ms per loop
In [136]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=100)
10 loops, best of 3: 63.2 ms per loop
In [137]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=1000)
10 loops, best of 3: 53.1 ms per loop
In [138]: %timeit proposed_approach_v2(p,p_fine,m,n,k,len_per_iter=2000)
10 loops, best of 3: 63.8 ms per loop
So, there's about 2x improvement with the first proposed approach and 20x over the original approach with the second one at the sweet spot with the len_per_iter param set at 1000. Hopefully this will bring down your 25 minutes runtime to little over a minute. Not bad I guess!

Efficient ways to iterate over several numpy arrays and process current and previous elements?

I've read a lot about different techniques for iterating over numpy arrays recently and it seems that consensus is not to iterate at all (for instance, see a comment here). There are several similar questions on SO, but my case is a bit different as I have to combine "iterating" (or not iterating) and accessing previous values.
Let's say there are N (N is small, usually 4, might be up to 7) 1-D numpy arrays of float128 in a list X, all arrays are of the same size. To give you a little insight, these are data from PDE integration, each array stands for one function, and I would like to apply a Poincare section. Unfortunately, the algorithm should be both memory- and time-efficient since these arrays are sometimes ~1Gb each, and there are only 4Gb of RAM on board (I've just learnt about memmap'ing of numpy arrays and now consider using them instead of regular ones).
One of these arrays is used for "filtering" the others, so I start with secaxis = X.pop(idx). Now I have to locate pairs of indices where (secaxis[i-1] > 0 and secaxis[i] < 0) or (secaxis[i-1] < 0 and secaxis[i] > 0) and then apply simple algebraic transformations to remaining arrays, X (and save results). Worth mentioning, data shouldn't be wasted during this operation.
There are multiple ways for doing that, but none of them seem efficient (and elegant enough) to me. One is a C-like approach, where you just iterate in a for-loop:
import array # better than lists
res = [ array.array('d') for _ in X ]
for i in xrange(1,secaxis.size):
if condition: # see above
co = -secaxis[i-1]/secaxis[i]
for j in xrange(N):
res[j].append( (X[j][i-1] + co*X[j][i])/(1+co) )
This is clearly very inefficient and besides not a Pythonic way.
Another way is to use numpy.nditer, but I haven't figured out yet how one accesses the previous value, though it allows iterating over several arrays at once:
# without secaxis = X.pop(idx)
it = numpy.nditer(X)
for vec in it:
# vec[idx] is current value, how do you get the previous (or next) one?
Third possibility is to first find sought indices with efficient numpy slices, and then use them for bulk multiplication/addition. I prefer this one for now:
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X: # loop is done only N-1 times, that is, 3 to 6
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
But this is seemingly done in 7 + 2*(N - 1) passes, moreover, I'm not sure about secaxis[inds] type of addressing (it is not slicing and generally it has to find all elements by indices just like in the first method, doesn't it?).
Finally, I've also tried using itertools and it resulted in monstrous and obscure structures, which might stem from the fact that I'm not very familiar with functional programming:
def filt(x):
return (x[0] < 0 and x[1] > 0) or (x[0] > 0 and x[1] < 0)
import array
from itertools import izip, tee, ifilter
res = [ array.array('d') for _ in X ]
iters = [iter(x) for x in X] # N-1 iterators in a list
prev, curr = tee(izip(*iters)) # 2 similar iterators, each of which
# consists of N-1 iterators
next(curr, None) # one of them is now for current value
seciter = tee(iter(secaxis))
next(seciter[1], None)
for x in ifilter(filt, izip(seciter[0], seciter[1], prev, curr)):
co = - x[0]/x[1]
for r, p, c in zip(res, x[2], x[3]):
r.append( (p+co*c) / (1+co) )
Not only this looks very ugly, it also takes an awful lot of time to complete.
So, I have following questions:
Of all these methods is the third one indeed the best? If so, what can be done to impove the last one?
Are there any other, better ones yet?
Out of sheer curiosity, is there a way to solve the problem using nditer?
Finally, will I be better off using memmap versions of numpy arrays, or will it probably slow things down a lot? Maybe I should only load secaxis array into RAM, keep others on disk and use third method?
(bonus question) List of equal in length 1-D numpy arrays comes from loading N .npy files whose sizes aren't known beforehand (but N is). Would it be more efficient to read one array, then allocate memory for one 2-D numpy array (slight memory overhead here) and read remaining into that 2-D array?
The numpy.where() version is fast enough, you can speedup it a little by method3(). If the > condition can change to >=, you can also use method4().
import numpy as np
a = np.random.randn(100000)
def method1(a):
idx = []
for i in range(1, len(a)):
if (a[i-1] > 0 and a[i] < 0) or (a[i-1] < 0 and a[i] > 0):
idx.append(i)
return idx
def method2(a):
inds, = np.where((a[:-1] < 0) * (a[1:] > 0) +
(a[:-1] > 0) * (a[1:] < 0))
return inds + 1
def method3(a):
m = a < 0
p = a > 0
return np.where((m[:-1] & p[1:]) | (p[:-1] & m[1:]))[0] + 1
def method4(a):
return np.where(np.diff(a >= 0))[0] + 1
assert np.allclose(method1(a), method2(a))
assert np.allclose(method2(a), method3(a))
assert np.allclose(method3(a), method4(a))
%timeit method1(a)
%timeit method2(a)
%timeit method3(a)
%timeit method4(a)
the %timeit result:
1 loop, best of 3: 294 ms per loop
1000 loops, best of 3: 1.52 ms per loop
1000 loops, best of 3: 1.38 ms per loop
1000 loops, best of 3: 1.39 ms per loop
I'll need to read your post in more detail, but will start with some general observations (from previous iteration questions).
There isn't an efficient way of iterating over arrays in Python, though there are things that slow things down. I like to distinguish between the iteration mechanism (nditer, for x in A:) and the action (alist.append(...), x[i+1] += 1). The big time consumer is usually the action, done many times, not the iteration mechanism itself.
Letting numpy do the iteration in compiled code is the fastest.
xdiff = x[1:] - x[:-1]
is much faster than
xdiff = np.zeros(x.shape[0]-1)
for i in range(x.shape[0]:
xdiff[i] = x[i+1] - x[i]
The np.nditer isn't any faster.
nditer is recommended as a general iteration tool in compiled code. But its main value lies in handling broadcasting and coordinating the iteration over several arrays (input/output). And you need to use buffering and c like code to get the best speed from nditer (I'll look up a recent SO question).
https://stackoverflow.com/a/39058906/901925
Don't use nditer without studying the relevant iteration tutorial page (the one that ends with a cython example).
=========================
Just judging from experience, this approach will be fastest. Yes it's going to iterate over secaxis a number of times, but those are all done in compiled code, and will be much faster than any iteration in Python. And the for f in X: iteration is just a few times.
res = []
inds, = numpy.where((secaxis[:-1] < 0) * (secaxis[1:] > 0) +
(secaxis[:-1] > 0) * (secaxis[1:] < 0))
coefs = -secaxis[inds] / secaxis[inds+1] # array of coefficients
for f in X:
res.append( (f[inds] + coefs*f[inds+1]) / (1+coefs) )
#HYRY has explored alternatives for making the where step faster. But as you can see the differences aren't that big. Other possible tweaks
inds1 = inds+1
coefs = -secaxis[inds] / secaxis[inds1]
coefs1 = coefs+1
for f in X:
res.append(( f[inds] + coefs*f[inds1]) / coefs1)
If X was an array, res could be an array as well.
res = (X[:,inds] + coefs*X[:,inds1])/coefs1
But for small N I suspect the list res is just as good. Don't need to make the arrays any bigger than necessary. The tweaks are minor, just trying to avoid recalculating things.
=================
This use of np.where is just np.nonzero. That actually makes two passes of the array, once with np.count_nonzero to determine how many values it will return, and create the return structure (list of arrays of now known length). And a second loop to fill in those indices. So multiple iterations are fine if it keeps action simple.

Third order moment calculation - numpy

In python, I have an array X with N rows (the number of examples) and n columns (the number of features).
If I want to calculate the second order moment matrix C
C[i,j] = E(x_i x_j)
then I have two possibility:
First, do the loop:
for i in range(N):
for j in range(n):
for k in range(n):
C[j,k] = C[j,k] + X[i,j]*X[i,k]/N
Second, more simple, use numpy product matrix:
import numpy np
C = np.transpose(X).dot(X)/N
The second version in practice is extremely faster.
If now I want to calculate the third order moment matrix T
T[i,j,k] = E(x_i x_j x_k)
then the loop alternative is easy:
for i in range(N):
for j in range(n):
for k in range(n):
for m in range(n):
T[j,k,m] = T[j,k,m] + X[i,j]*X[i,k]*X[i,m]/N
Is there a fast way using numpy libraries to calculate this last tensor, like for the second order moment?
You can use NumPy's einsum notation to solve both your second and third order cases.
Second order :
np.einsum('ij,ik->jk',X,X)/N
Third order :
np.einsum('ij,ik,il->jkl',X,X,X)/N
As can be seen, it would be easier/intuitive to extend this to higher order cases with it.
I know it is not perfect in terms of speed, but why not using np.power(x, 3).sum() / N? It is slower than the dot product, but faster than looping.
In [1]: import numpy as np
In [2]: x = np.random.rand(10000)
In [3]: x.dot(x.T)
Out[3]: 3373.6189738897856
In [4]: %timeit(x.dot(x.T))
The slowest run took 48.74 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.39 µs per loop
In [5]: %timeit(np.power(x, 2).sum())
The slowest run took 4.14 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 140 µs per loop
In [6]: np.power(x, 2).sum()
Out[6]: 3373.6189738897865
Btw, that's how I calculate the moments...

Numpy elementwise product of 3d array

I have two 3d arrays A and B with shape (N, 2, 2) that I would like to multiply element-wise according to the N-axis with a matrix product on each of the 2x2 matrix. With a loop implementation, it looks like
C[i] = dot(A[i], B[i])
Is there a way I could do this without using a loop? I've looked into tensordot, but haven't been able to get it to work. I think I might want something like tensordot(a, b, axes=([1,2], [2,1])) but that's giving me an NxN matrix.
It seems you are doing matrix-multiplications for each slice along the first axis. For the same, you can use np.einsum like so -
np.einsum('ijk,ikl->ijl',A,B)
We can also use np.matmul -
np.matmul(A,B)
On Python 3.x, this matmul operation simplifies with # operator -
A # B
Benchmarking
Approaches -
def einsum_based(A,B):
return np.einsum('ijk,ikl->ijl',A,B)
def matmul_based(A,B):
return np.matmul(A,B)
def forloop(A,B):
N = A.shape[0]
C = np.zeros((N,2,2))
for i in range(N):
C[i] = np.dot(A[i], B[i])
return C
Timings -
In [44]: N = 10000
...: A = np.random.rand(N,2,2)
...: B = np.random.rand(N,2,2)
In [45]: %timeit einsum_based(A,B)
...: %timeit matmul_based(A,B)
...: %timeit forloop(A,B)
100 loops, best of 3: 3.08 ms per loop
100 loops, best of 3: 3.04 ms per loop
100 loops, best of 3: 10.9 ms per loop
You just need to perform the operation on the first dimension of your tensors, which is labeled by 0:
c = tensordot(a, b, axes=(0,0))
This will work as you wish. Also you don't need a list of axes, because it's just along one dimension you're performing the operation. With axes([1,2],[2,1]) you're cross multiplying the 2nd and 3rd dimensions. If you write it in index notation (Einstein summing convention) this corresponds to c[i,j] = a[i,k,l]*b[j,k,l], thus you're contracting the indices you want to keep.
EDIT: Ok, the problem is that the tensor product of a two 3d object is a 6d object. Since contractions involve pairs of indices, there's no way you'll get a 3d object by a tensordot operation. The trick is to split your calculation in two: first you do the tensordot on the index to do the matrix operation and then you take a tensor diagonal in order to reduce your 4d object to 3d. In one command:
d = np.diagonal(np.tensordot(a,b,axes=()), axis1=0, axis2=2)
In tensor notation d[i,j,k] = c[i,j,i,k] = a[i,j,l]*b[i,l,k].

Python - Intersection of 2D Numpy Arrays

I'm desperately searching for an efficient way to check if two 2D numpy Arrays intersect.
So what I have is two arrays with an arbitrary amount of 2D arrays like:
A=np.array([[2,3,4],[5,6,7],[8,9,10]])
B=np.array([[5,6,7],[1,3,4]])
C=np.array([[1,2,3],[6,6,7],[10,8,9]])
All I need is a True if there is at least one vector intersecting with another one of the other array, otherwise a false. So it should give results like this:
f(A,B) -> True
f(A,C) -> False
I'm kind of new to python and at first I wrote my program with Python lists, which works but of course is very inefficient. The Program takes days to finish so I am working on a numpy.array solution now, but these arrays really are not so easy to handle.
Here's Some Context about my Program and the Python List Solution:
What i'm doing is something like a self-avoiding random walk in 3 Dimensions. http://en.wikipedia.org/wiki/Self-avoiding_walk. But instead of doing a Random walk and hoping that it will reach a desirable length (e.g. i want chains build up of 1000 beads) without reaching a dead end i do the following:
I create a "flat" Chain with the desired length N:
X=[]
for i in range(0,N+1):
X.append((i,0,0))
Now i fold this flat chain:
randomly choose one of the elements ("pivotelement")
randomly choose one direction ( either all elements to the left or to the right of the pivotelment)
randomly choose one out of 9 possible rotations in space (3 axes * 3 possible rotations 90°,180°,270°)
rotate all the elements of the chosen direction with the chosen rotation
Check if the new elements of the chosen direction intersect with the other direction
No intersection -> accept the new configuration, else -> keep the old chain.
Steps 1.-6. have to be done a large amount of times (e.g. for a chain of length 1000, ~5000 Times) so these steps have to be done efficiently. My List-based solution for this is the following:
def PivotFold(chain):
randPiv=random.randint(1,N) #Chooses a random pivotelement, N is the Chainlength
Pivot=chain[randPiv] #get that pivotelement
C=[] #C is going to be a shifted copy of the chain
intersect=False
for j in range (0,N+1): # Here i shift the hole chain to get the pivotelement to the origin, so i can use simple rotations around the origin
C.append((chain[j][0]-Pivot[0],chain[j][1]-Pivot[1],chain[j][2]-Pivot[2]))
rotRand=random.randint(1,18) # rotRand is used to choose a direction and a Rotation (2 possible direction * 9 rotations = 18 possibilitys)
#Rotations around Z-Axis
if rotRand==1:
for j in range (randPiv,N+1):
C[j]=(-C[j][1],C[j][0],C[j][2])
if C[0:randPiv].__contains__(C[j])==True:
intersect=True
break
elif rotRand==2:
for j in range (randPiv,N+1):
C[j]=(C[j][1],-C[j][0],C[j][2])
if C[0:randPiv].__contains__(C[j])==True:
intersect=True
break
...etc
if intersect==False: # return C if there was no intersection in C
Shizz=C
else:
Shizz=chain
return Shizz
The Function PivotFold(chain) will be used on the initially flat chain X a large amount of times. it's pretty naivly written so maybe you have some protips to improve this ^^ I thought that numpyarrays would be good because i can efficiently shift and rotate entire chains without looping over all the elements ...
This should do it:
In [11]:
def f(arrA, arrB):
return not set(map(tuple, arrA)).isdisjoint(map(tuple, arrB))
In [12]:
f(A, B)
Out[12]:
True
In [13]:
f(A, C)
Out[13]:
False
In [14]:
f(B, C)
Out[14]:
False
To find intersection? OK, set sounds like a logical choice.
But numpy.array or list are not hashable? OK, convert them to tuple.
That is the idea.
A numpy way of doing involves very unreadable boardcasting:
In [34]:
(A[...,np.newaxis]==B[...,np.newaxis].T).all(1)
Out[34]:
array([[False, False],
[ True, False],
[False, False]], dtype=bool)
In [36]:
(A[...,np.newaxis]==B[...,np.newaxis].T).all(1).any()
Out[36]:
True
Some timeit result:
In [38]:
#Dan's method
%timeit set_comp(A,B)
10000 loops, best of 3: 34.1 µs per loop
In [39]:
#Avoiding lambda will speed things up
%timeit f(A,B)
10000 loops, best of 3: 23.8 µs per loop
In [40]:
#numpy way probably will be slow, unless the size of the array is very big (my guess)
%timeit (A[...,np.newaxis]==B[...,np.newaxis].T).all(1).any()
10000 loops, best of 3: 49.8 µs per loop
Also the numpy method will be RAM hungry, as A[...,np.newaxis]==B[...,np.newaxis].T step creates a 3D array.
Using the same idea outlined here, you could do the following:
def make_1d_view(a):
a = np.ascontiguousarray(a)
dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(dt).ravel()
def f(a, b):
return len(np.intersect1d(make_1d_view(A), make_1d_view(b))) != 0
>>> f(A, B)
True
>>> f(A, C)
False
This doesn't work for floating point types (it will not consider +0.0 and -0.0 the same value), and np.intersect1d uses sorting, so it is has linearithmic, not linear, performance. You may be able to squeeze some performance by replicating the source of np.intersect1d in your code, and instead of checking the length of the return array, calling np.any on the boolean indexing array.
You can also get the job done with some np.tile and np.swapaxes business!
def intersect2d(X, Y):
"""
Function to find intersection of two 2D arrays.
Returns index of rows in X that are common to Y.
"""
X = np.tile(X[:,:,None], (1, 1, Y.shape[0]) )
Y = np.swapaxes(Y[:,:,None], 0, 2)
Y = np.tile(Y, (X.shape[0], 1, 1))
eq = np.all(np.equal(X, Y), axis = 1)
eq = np.any(eq, axis = 1)
return np.nonzero(eq)[0]
To answer the question more specifically, you'd only need to check if the returned array is empty.
This should be much faster it is not O(n^2) like the for-loop solution, but it isn't fully numpythonic. Not sure how better to leverage numpy here
def set_comp(a, b):
sets_a = set(map(lambda x: frozenset(tuple(x)), a))
sets_b = set(map(lambda x: frozenset(tuple(x)), b))
return not sets_a.isdisjoint(sets_b)
i think you want true if tow arrays have subarray set ! you can use this :
def(A,B):
for i in A:
for j in B:
if i==j
return True
return False
This problem can be solved efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
len(npi.intersection(A, B)) > 0

Categories