I've been struggling with an algorithm tied to comparisons with 3d triangle vectors. Unfortunately its very slow in places and I've gone back and forth on different methods to try and improve it. One thing I'm struggling with is speeding up a distance calculation.
I have two groups of triangles which have been broken down to three points each of which has a 3d float vector (xyz). The calculations I'm using are :
diffverts = numpy.zeros( ( ntris*3, ntesttris*3, 3 ), dtype = 'float32')
diffverts += triverts.reshape(ntris*3, 1, 3 )
diffverts -= ttriverts.reshape(1, ntesttris*3, 3 )
vertdist = ( diffverts[:,:,0]**2 + diffverts[:,:,1]**2 + diffverts[:,:,2]**2 ) ** 0.5
this calculation is faster than :
diffverts = triverts.reshape(ntris*3, 1, 3 ) - ttriverts.reshape(1, ntesttris*3, 3 )
vertdist = ( diffverts[:,:,0]**2 + diffverts[:,:,1]**2 + diffverts[:,:,2]**2 ) ** 0.5
Is there a faster method to populate the diff vert part (which takes longest) and/or the distance part which is also quite time consuming? This code is called a lot of times due to the number of groups to test. Also, trying to do it just on indexes to the verts causes me other issues with further calculations when trying to get back to some boolean tests (i.e. this is only one of a set of calculations so keeping at the tri point level works best.
I'm using numpy and python
The problem is that brute force testing of all triangles versus eachother takes quadratic time. It is better to use a datastructure which is specialized to perform such computations. Luckily, scipy contains one.
Take a look at scipy.spatial.cKDTree. The help should be self-explanatory.
I think diffverts is taking up enough memory to cause cache misses. Unfortunately while this solution is very elegant, you're probably better off computing the whole distance in one go, to avoid having to save an n*m*3 array of intermediate values. As ugly as it is, I would just do nested for loops.
Related
Can one get the cartesian outer product of two set of vectors without using two for loops? It is slow because the data is large.
[[f(x,y) for x in vectors] for y in vectors]
I am trying to do an agglomerative clustering project in python and for this, I need to create a distance matrix.
This is the code that I have to define the function for the distance matrix:
def distance_matrix(vectors):
s = np.zeros((len(vectors), len(vectors)))
for i in range(len(vectors)):
for v in range(len(vectors)):
s[i, v] = dissimilarity(vectors[i], vectors[v])
return s
What it should do is take a take a list of NumPy arrays and return a 2D NumPy array d where the entry d[i,j] should contain the dissimilarity between vectors[i] and vectors[j].
In this case, vectors is the list of NumPy arrays and the dissimilarity is calculated by:
def dissimilarity(v1, v2):
return (1-(v1.dot(v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))))
or in other words, the dissimilarity is the cosine dissimilarity between two 1D NumPy arrays.
My goal is to find a way to get the distance matrix without the double for loops but still have the computational time be very small.
In general one cannot do this. In general (but not in this case), the computation time will not change because the work is physically there and exists and nomatter the accounting tricks one might try to rewrite it (a single for loop that acts like two for loops, or a recursive implementation), at the end of the day the work must still be done irrespective of the ordering you do it.
Is there a way to get rid of the for loops? Even if it slows down the run time a bit. – Cat_Smithyyyy03
Your case is special. While one normally has no reason to consider a speedup is possible, you are asking to consider tensor products, or in this case a matrix product. When you think about general matrix multiplication AB=C, a matrix product C is basically the outer product {a dot b} of all vectors a in the rowspace of A and b in the colspace of B.
So, for your case, first normalize all your vectors (so the normalization is not required later), then stack them to form A, then let B=transp(A), then do a matrix multiply.
[[a0 a1 a2] [[a0 [b0 [z0 [aa ab ac .. az
[b0 b1 b2] a1 b1 ... z1 ba bb bc .. bz
( . )( a2] b2] z2]] ) = ( . . )
. . .
. . .
[z0 z1 z2]] za zb zc zz]
Interestingly, now you can then plug this into a fast matrix-multiply algorithm, which is actually faster than O(dim * #vecs^2). Hopefully it's also optimized for self-transpose matrices that would generate symmetric matrices (which might save a factor of 2 work... maybe it has some flag like matrixmult(a,b,outputWillBeSymmetric)).
This is faster than "O(N^2)", unintuitively: This rewrite exposed a substructure in the problem, which can be leveraged to get faster than O(dim * #vecs^2). The leveragable substructure is namely the fact that you are computing the outer product of THE SAME vectors. The fast matrix-multiply algorithm will leverage this.
edit: my original answer was wrong
You have a set of size N and you wish to compute f(a,b) for all a and b in the set.
Unless you know some values are trivial, there is no way to be asymptotically faster than this, because you have to imagine the worst case: Every pair f(a,b) may be unique... so there's no way to do less than roughly N^2 work.
However since your function f is symmetric, you could do half the work, then duplicate it:
N = len(vectors)
for i in range(N):
for v in range(N):
dissim = #...
s[i,v] = dissim
s[v,i] = dissim
(You can avoid calculating your metric in the reflexive case f(a,a) because it's trivial, but that doesn't make things asymptotically faster since the fraction of such work N/N^2 tends to zero as N increases, so it's not that great an optimization... it is reasonable only if you are working with a very small number of vectors.)
Whether you should optimize further depends on whether you need to. Such code should be able to easily handle millions of small vectors. Next steps:
Is there something fishy that is making my code very slow? What could it be? We don't have enough information to comment.
If there is nothing fishy of the sort, you can try rephrasing as matrix operations, in order to stay as much as possible in numpy's optimized C routines instead of bouncing back and forth into python. This is ugly and I would avoid doing it, because your code readability will decrease.
If you are dealing with hundreds of millions of vectors, perhaps consider a more cache-friendly approach where you do for blockI in range(N//10**6): for blockV in range(N//10**6): for i in range(blockI*10**6, (blockI+1)*10**6): for v in range(blockV*...):
If you're dealing with billions of vectors, then look into leveraging gpgpu. This is quite ideal for the gpu and might be a factor of thousands speedup.
I have a vector A of size 100k+ and i want to calculate the distance between every element of this vector with every other element. I am trying to solve this problem in R, using its in-built adist function and also trying to use the stringdist package.
The problem is that it is computationally very heavy and it keeps running for days without ending.
The end problem that I am trying to solve is finding duplicates or near-duplicates using a distance measure and then build some sort of a classification model around it.
The code I am using currently is
# declare an empty data frame and append data to it
matchedStr_vecA <- data.frame(row_index = integer(),
col_index = integer(),
vecA_i = character(),
vecA_j = character(),
dist_diff_vecA = double(),
stringsAsFactors=FALSE)
k = 1 # (keeps track of the pointer to the data frame)
# Run 2 different loops to calculate the bottom half of the matrix (below the diagonal -
# as the diagonal elements will be zero and the upper half is the mirror image of the bottom half)
for (i in 1:length(vecA)) {
for (j in 1:length(vecA)) {
if (i < j) {
dist_diff_vecA <- stringdist(vecA[i], vecA[j], method = "lv")
matchedStr_invId[k,] <- c(i, j, vecA[i], vecA[j], dist_diff_vecA)
k <- k + 1
}
}
}
Please help me to bring this computation from O(n^2) to O(n). I am fine with using python as well. I was told that this can be solved using dynamic programming programming but I am not sure how to implement it.
Thanks all
I had the very same problem of calculating the distance matrix and I have successfully solved it in Python. The crucial elements of the solution to ensure you are equally splitting the calculations between threads is discussed in this question:
How to split diagonal matrix into equal number of items each along one of axis?
There are two things to point out:
The distance between two points is typically symmetrical so you can reuse this mathematical feature and calculate distance between i and j elements once and either store it or reuse it for the distance between j and i.
The algorithm cannot be optimized below O(n^2) unless you are OK with imprecise results. And since you are new to programming I would not even consider going that way.
You should be able to parallelize the calculations using index splitting as I suggested in the question above for a near-optimal solution.
I have been browsing through the questions, and could find some help, but I prefer having confirmation by asking it directly. So here is my problem.
I have an (numpy) array u of dimension N, from which I want to build a square matrix k of dimension N^2. Basically, each matrix element k(i,j) is defined as k(i,j)=exp(-|u_i-u_j|^2).
My first naive way to do it was like this, which is, I believe, Fortran-like:
for i in range(N):
for j in range(N):
k[i][j]=np.exp(np.sum(-(u[i]-u[j])**2))
However, this is extremely slow. For N=1000, for example, it is taking around 15 seconds.
My other way to proceed is the following (inspired by other questions/answers):
i, j = np.ogrid[:N,:N]
k = np.exp(np.sum(-(u[i]-u[j])**2,axis=2))
This is way faster, as for N=1000, the result is almost instantaneous.
So I have two questions.
1) Why is the first method so slow, and why is the second one so fast ?
2) Is there a faster way to do it ? For N=10000, it is starting to take quite some time already, so I really don't know if this was the "right" way to do it.
Thank you in advance !
P.S: the matrix is symmetric, so there must also be a way to make the process faster by calculating only the upper half of the matrix, but my question was more related to the way to manipulate arrays, etc.
First, a small remark, there is no need to use np.sum if u can be re-written as u = np.arange(N). Which seems to be the case since you wrote that it is of dimension N.
1) First question:
Accessing indices in Python is slow, so best is to not use [] if there is a way to not use it. Plus you call multiple times np.exp and np.sum, whereas they can be called for vectors and matrices. So, your second proposal is better since you compute your k all in once, instead of elements by elements.
2) Second question:
Yes there is. You should consider using only numpy functions and not using indices (around 3 times faster):
k = np.exp(-np.power(np.subtract.outer(u,u),2))
(NB: You can keep **2 instead of np.power, which is a bit faster but has smaller precision)
edit (Take into account that u is an array of tuples)
With tuple data, it's a bit more complicated:
ma = np.subtract.outer(u[:,0],u[:,0])**2
mb = np.subtract.outer(u[:,1],u[:,1])**2
k = np.exp(-np.add(ma, mb))
You'll have to use twice np.substract.outer since it will return a 4 dimensions array if you do it in one time (and compute lots of useless data), whereas u[i]-u[j] returns a 3 dimensions array.
I used np.add instead of np.sum since it keep the array dimensions.
NB: I checked with
N = 10000
u = np.random.random_sample((N,2))
I returns the same as your proposals. (But 1.7 times faster)
I am trying to do a double integral by first interpolating the data to make a surface. I am using numba to try and speed this process up, but it's just taking too long.
Here is my code, with the images needed to run the code located at here and here.
Noting that your code has a quadruple-nested set of for loops, I focused on optimizing the inner pair. Here's the old code:
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i,j)
'''create an r vector '''
r=(i*distX,j*distY,z)
for x in xrange(img.shape[0]):
for y in xrange(img.shape[1]):
'''create an ksi vector, then calculate
it's norm, and the dot product of r and ksi'''
ksi=(x*distX,y*distY,z)
ksiNorm=np.linalg.norm(ksi)
ksiDotR=float(np.dot(ksi,r))
'''calculate the integrand'''
temp[x,y]=img[x,y]*np.exp(1j*k*ksiDotR/ksiNorm)
'''interpolate so that we can do the integral and take the integral'''
temp2=rbs(a,b,temp.real)
K[i,j]=temp2.integral(0,n,0,m)
Since K and img are each about 2000x2000, the innermost statements need to be executed sixteen trillion times. This is simply not practical using Python, but we can shift the work into C and/or Fortran using NumPy to vectorize. I did this one careful step at a time to try to make sure the results will match; here's what I ended up with:
'''create all r vectors'''
R = np.empty((K.shape[0], K.shape[1], 3))
R[:,:,0] = np.repeat(np.arange(K.shape[0]), K.shape[1]).reshape(K.shape) * distX
R[:,:,1] = np.arange(K.shape[1]) * distY
R[:,:,2] = z
'''create all ksi vectors'''
KSI = np.empty((img.shape[0], img.shape[1], 3))
KSI[:,:,0] = np.repeat(np.arange(img.shape[0]), img.shape[1]).reshape(img.shape) * distX
KSI[:,:,1] = np.arange(img.shape[1]) * distY
KSI[:,:,2] = z
# vectorized 2-norm; see http://stackoverflow.com/a/7741976/4323
KSInorm = np.sum(np.abs(KSI)**2,axis=-1)**(1./2)
# loop over entire K, which is same shape as img, rows first
# this loop populates K, one pixel at a time (so can be parallelized)
for i in xrange(K.shape[0]):
for j in xrange(K.shape[1]):
print(i, j)
KSIdotR = np.dot(KSI, R[i,j])
temp = img * np.exp(1j * k * KSIdotR / KSInorm)
'''interpolate so that we can do the integral and take the integral'''
temp2 = rbs(a, b, temp.real)
K[i,j] = temp2.integral(0, n, 0, m)
The inner pair of loops is now completely gone, replaced by vectorized operations done in advance (at a space cost linear in the size of the inputs).
This reduces the time per iteration of the outer two loops from 340 seconds to 1.3 seconds on my Macbook Air 1.6 GHz i5, without using Numba. Of the 1.3 seconds per iteration, 0.68 seconds are spent in the rbs function, which is scipy.interpolate.RectBivariateSpline. There is probably room to optimize further--here are some ideas:
Reenable Numba. I don't have it on my system. It may not make much difference at this point, but easy for you to test.
Do more domain-specific optimization, such as trying to simplify the fundamental calculations being done. My optimizations are intended to be lossless, and I don't know your problem domain so I can't optimize as deeply as you may be able to.
Try to vectorize the remaining loops. This may be tough unless you are willing to replace the scipy RBS function with something supporting multiple calculations per call.
Get a faster CPU. Mine is pretty slow; you can probably get a speedup of at least 2x simply by using a better computer than my tiny laptop.
Downsample your data. Your test images are 2000x2000 pixels, but contain fairly little detail. If you cut their linear dimensions by 2-10x, you'd get a huge speedup.
So that's it for me for now. Where does this leave you? Assuming a slightly better computer and no further optimization work, even the optimized code would take about a month to process your test images. If you only have to do this once, maybe that's fine. If you need to do it more often, or need to iterate on the code as you try different things, you probably need to keep optimizing--starting with that RBS function which consumes more than half the time now.
Bonus tip: your code would be a lot easier to deal with if it didn't have nearly-identical variable names like k and K, nor used j as a variable name and also as a complex number suffix (0j).
I have a 1D array of data and wish to extract the spatial variation. The standard way to do this which I wish to pythonize is to perform a moving linear regression to the data and save the gradient...
def nssl_kdp(phidp, distance, fitlen):
kdp=zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
print "Sweep ", swn+1
for rayn in range(myshape[1]):
print "ray ", rayn+1
small=[polyfit(distance[a:a+2*fitlen], phidp[swn, rayn, a:a+2*fitlen],1)[0] for a in xrange(myshape[2]-2*fitlen)]
kdp[swn, rayn, :]=array((list(itertools.chain(*[fitlen*[small[0]], small, fitlen*[small[-1]]]))))
return kdp
This works well but is SLOW... I need to do this 17*360 times...
I imagine the overhead is in the iterator in the [ for in arange] line... Is there an implimentation of a moving fit in numpy/scipy?
the calculation for linear regression is based on the sum of various values. so you could write a more efficient routine that modifies the sum as the window moves (adding one point and subtracting an earlier one).
this will be much more efficient than repeating the process every time the window shifts, but is open to rounding errors. so you would need to restart occasionally.
you can probably do better than this for equally spaced points by pre-calculating all the x dependencies, but i don't understand your example in detail so am unsure whether it's relevant.
so i guess i'll just assume that it is.
the slope is (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2) where the "2" is "squared" - http://easycalculation.com/statistics/learn-regression.php
for evenly spaced data the denominator is fixed (since you can shift the x axis to the start of the window without changing the gradient). the (ΣX) in the numerator is also fixed (for the same reason). so you only need to be concerned with ΣXY and ΣY. the latter is trivial - just add and subtract a value. the former decreases by ΣY (each X weighting decreases by 1) and increases by (N-1)Y (assuming x_0 is 0 and x_N is N-1) each step.
i suspect that's not clear. what i am saying is that the formula for the slope does not need to be completely recalculated each step. particularly because, at each step, you can rename the X values as 0,1,...N-1 without changing the slope. so almost everything in the formula is the same. all that changes are two terms, which depend on Y as Y_0 "drops out" of the window and Y_N "moves in".
I've used these moving window functions from the somewhat old scikits.timeseries module with some success. They are implemented in C, but I haven't managed to use them in a situation where the moving window varies in size (not sure if you need that functionality).
http://pytseries.sourceforge.net/lib.moving_funcs.html
Head here for downloads (if using Python 2.7+, you'll probably need to compile the extension itself -- I did this for 2.7 and it works fine):
http://sourceforge.net/projects/pytseries/files/scikits.timeseries/0.91.3/
I/we might be able to help you more if you clean up your example code a bit. I'd consider defining some of the arguments/objects in lines 7 and 8 (where you're defining 'small') as variables, so that you don't end row 8 with so many hard-to-follow parentheses.
Ok.. I have what seems to be a solution.. not an answer persay, but a way of doing a moving, multi-point differential... I have tested this and the result looks very very similar to a moving regression... I used a 1D sobel filter (ramp from -1 to 1 convolved with the data):
def KDP(phidp, dx, fitlen):
kdp=np.zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
#print "Sweep ", swn+1
for rayn in range(myshape[1]):
#print "ray ", rayn+1
kdp[swn, rayn, :]=sobel(phidp[swn, rayn,:], window_len=fitlen)/dx
return kdp
def sobel(x,window_len=11):
"""Sobel differential filter for calculating KDP
output:
differential signal (Unscaled for gate spacing
example:
"""
s=np.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
#print(len(s))
w=2.0*np.arange(window_len)/(window_len-1.0) -1.0
#print w
w=w/(abs(w).sum())
y=np.convolve(w,s,mode='valid')
return -1.0*y[window_len/2:len(x)+window_len/2]/(window_len/3.0)
this runs QUICK!