I'm fitting a line to 3D points with OpenCV fitLine. What's the best way to calculate residuals of the resulting fit? Or, since I want residuals in addition to the fit, is there a better method than fitLine?
The following works, but there must be a better (faster) way.
# fit points
u, v, w, x, y, z = cv2.fitLine(points, cv2.DIST_L2, 0, 1, 0.01)
v = np.array(u[0], v[0], w[0])
p = np.array(x[0], y[0], z[0])
# rotate fit to z axis
k = np.cross(v, [0, 0, 1])
mag = np.linalg.norm(k)
R, _ = cv2.Rodrigues(k * np.arcsin(mag) / mag)
# rotate points and calculate distance to z-axis
rot_points = np.dot(R, (points-p).T)
err = rot_points[0]**2 + rot_points[1]**2
I'm assuming that fitLine computes the residuals err while estimating the line, so it is a waste to have to recompute them myself. Basically, knowing that I want the line and the residuals, is there a better alternative than fitLine, which only returns the line?
I am not aware of any direct method to get the sum of residuals from cv2.fitLine itself and I would be focusing solely on speeding up the existing code. Now, upon benchmarking with a relatively high number of points, it shows up that the most of the runtime is spent at the last two lines, where we get rot_points and err. Also, it seems that we are not exactly using the last row of rot_points for calculating err, so hopefully we can shave off some percentage of runtime by slicing into the first two rows only.
Let's dive into researching efficient ways to obtain rot_points and err.
1) rot_points = np.dot(R, (points-p).T)
This step involves broadcasting in points-p, which is reduced by matrix-multiplication later on. Now, broadcasting involves heavy memory usage, which in this case can be skipped if we split up the matrix-multiplication of R with points and p separately. Also, let's bring in the first two rows of slicing as discussed earlier. Thus, we could get the first two rows of rot_points like so -
rot_points_row2 = np.dot(R[:2], (points.T)) - np.dot(R[:2],p[:,None])
2) err = rot_points[0]**2 + rot_points[1]**2
This second step could be sped up with np.einsum for efficient squaring and sum-reduction, like so -
err = np.einsum('ij,ij->j',rot_points_row2,rot_points_row2)
For a relatively small number of points as 2000, the step to calculate mag : mag = np.linalg.norm(k) might become significant too in terms of runtime. So, to speed that up, one could alternatively use np.einsum again, like so -
mag = np.sqrt(np.einsum('i,i->',k,k))
Runtime test
Let's use a random array of 2000 points in 3D space as the input points and look at the associated runtime numbers with the original approach and the proposed ones for the last two lines.
In [44]: # Setup input points
...: N = 2000
...: points = np.random.rand(N,3)
...:
...: u, v, w, x, y, z = cv2.fitLine(points, cv2.DIST_L2, 0, 1, 0.01)
...: v = np.array([u[0], v[0], w[0]])
...: p = np.array([x[0], y[0], z[0]])
...:
...: # rotate fit to z axis
...: k = np.cross(v, [0, 0, 1])
...: mag = np.linalg.norm(k)
...: R, _ = cv2.Rodrigues(k * np.arcsin(mag) / mag)
...:
...: # rotate points and calculate distance to z-axis
...: rot_points = np.dot(R, (points-p).T)
...: err = rot_points[0]**2 + rot_points[1]**2
...:
Let's run our proposed methods and verify their outputs against the original outputs -
In [45]: rot_points_row2 = np.dot(R[:2], (points.T)) - np.dot(R[:2],p[:,None])
...: err2 = np.einsum('ij,ij->j',rot_points_row2,rot_points_row2)
...:
In [46]: np.allclose(rot_points[:2],rot_points_row2)
Out[46]: True
In [47]: np.allclose(err,err2)
Out[47]: True
Finally and most importantly, let's time these sections of code -
In [48]: %timeit np.dot(R, (points-p).T) # Original code
10000 loops, best of 3: 79.5 µs per loop
In [49]: %timeit np.dot(R[:2], (points.T)) - np.dot(R[:2],p[:,None]) # Proposed
10000 loops, best of 3: 49.7 µs per loop
In [50]: %timeit rot_points[0]**2 + rot_points[1]**2 # Original code
100000 loops, best of 3: 12.6 µs per loop
In [51]: %timeit np.einsum('ij,ij->j',rot_points_row2,rot_points_row2) # Proposed
100000 loops, best of 3: 11.7 µs per loop
When the number of points is increased to a huge number, the runtimes look more promising. With N = 5000000 points, we get -
In [59]: %timeit np.dot(R, (points-p).T) # Original code
1 loops, best of 3: 410 ms per loop
In [60]: %timeit np.dot(R[:2], (points.T)) - np.dot(R[:2],p[:,None]) # Proposed
1 loops, best of 3: 254 ms per loop
In [61]: %timeit rot_points[0]**2 + rot_points[1]**2 # Original code
10 loops, best of 3: 144 ms per loop
In [62]: %timeit np.einsum('ij,ij->j',rot_points_row2,rot_points_row2) # Proposed
10 loops, best of 3: 77.5 ms per loop
Just wanted to mention that I compared PCA with fitLine for speed. fitLine was faster than PCA for all the cases where the number of points was greater than ~1000.
PCA directly gives you the eigenvectors, so you don't need the Rodrigues (but this time should be negligible). So, may be, the key to optimization lies in the rest of the code, unless there's a faster way to fit the model, other than fitLine and PCA.
I don't remember the PCA related math very well, so I could be wrong in the paragraph that follows.
Eigenvalues give us the variance along each dimension of the new eigenspace. If we consider a simple 2D case, I think you can take the smaller eigenvalue and multiply it with the number of points N (or is it N-1 ?) in the dataset to obtain the squared-residual-sum. Similarly we can extend it to the 3D case. As PCA gives us the eigenvalues, it takes simple scalar multiplications and additions to get the squared-residual-sum.
I'm adding the code (c++) for your reference.
RNG rng;
float a = -0.1, b = 0.1;
int rows = 3, cols = 1024*2;
Mat data = Mat::zeros(rows, cols, CV_32F);
for (int i = 0; i < cols; i++)
{
Vec3f& v = data.at<Vec3f>(i);
v = Vec3f(i+rng.uniform(a, b), i+rng.uniform(a, b), i+rng.uniform(a, b));
}
Mat datat = data.t();
Vec6f line;
fitLine(datat, line, CV_DIST_L2, 0, 1, 0.01);
PCA pca(datat, Mat(), CV_PCA_DATA_AS_ROW);
cout << "fitLine:\n" << line << endl;
cout << "\nPCA:\n" << pca.eigenvalues << endl;
cout << pca.eigenvectors << endl;
Related
My problem is the following. I have two arrays X and Y of shape n, p where p >> n (e.g. n = 50, p = 10000).
I also have a mask mask (1-d array of booleans of size p) with respect to p, of small density (e.g. np.mean(mask) is 0.05).
I try to compute, as fast as possible, the inner product of X and Y with respect to mask: the output inner is an array of shape n, n, and is such that inner[i, j] = np.sum(X[i, np.logical_not(mask)] * Y[j, np.logical_not(mask)]).
I have tried using the numpy.ma library, but it is quite slow for my use:
import numpy as np
import numpy.ma as ma
n, p = 50, 10000
density = 0.05
mask = np.array(np.random.binomial(1, density, size=p), dtype=np.bool_)
mask_big = np.ones(n)[:, None] * mask[None, :]
X = np.random.randn(n, p)
Y = np.random.randn(n, p)
X_ma = ma.array(X, mask=mask_big)
Y_ma = ma.array(Y, mask=mask_big)
But then, on my machine, X_ma.dot(Y_ma.T) is about 5 times slower than X.dot(Y.T)...
To begin with, I think it is a problem that .dot does not know that the mask is only with respect to p but I don't if its possible to use this information.
I'm looking for a way to perform the computation without being much slower than the naive dot.
Thanks a lot !
We can use matrix-multiplication with and without the masked versions as the masked subtraction from the full version yields to us the desired output -
inner = X.dot(Y.T)-X[:,mask].dot(Y[:,mask].T)
Or simply use the reversed mask, would be slower though for a sparsey mask -
inner = X[:,~mask].dot(Y[:,~mask].T)
Timings -
In [34]: np.random.seed(0)
...: p,n = 10000,50
...: X = np.random.rand(n,p)
...: Y = np.random.rand(n,p)
...: mask = np.random.rand(p)>0.95
In [35]: mask.mean()
Out[35]: 0.0507
In [36]: %timeit X.dot(Y.T)-X[:,mask].dot(Y[:,mask].T)
100 loops, best of 3: 2.54 ms per loop
In [37]: %timeit X[:,~mask].dot(Y[:,~mask].T)
100 loops, best of 3: 4.1 ms per loop
In [39]: %%timeit
...: inner = np.empty((n,n))
...: for i in range(X.shape[0]):
...: for j in range(X.shape[0]):
...: inner[i, j] = np.sum(X[i, ~mask] * Y[j, ~mask])
1 loop, best of 3: 302 ms per loop
I have a 3-d Numpy array flow as follows:
flow = np.random.uniform(low=-1.0, high=1.0, size=(720,1280,2))
# Suppose flow[0] are x-coordinates. flow[1] are y-coordinates.
Need to calculate the angle for each x,y point. Here is how I have implemented it:
def calcAngle(a):
assert(len(a) == 2)
(x, y) = a
# angle_deg = 0
angle_deg = np.angle(x + y * 1j, deg=True)
return angle_deg
fangle = np.apply_along_axis(calcAngle, axis=2, arr=flow)
# The above statement takes 14.0389318466 to execute
The calculation of angle at each point takes 14.0389318466 seconds to execute on my Macbook Pro.
Is there a way I could speed this up, probably by using some matrix operation, rather than processing each pixel one at a time.
You can use numpy.arctan2() to get the angle in radians, and then convert to degrees with numpy.rad2deg():
fangle = np.rad2deg(np.arctan2(flow[:,:,1], flow[:,:,0]))
On my computer, this is a little faster than Divakar's version:
In [17]: %timeit np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
10 loops, best of 3: 44.5 ms per loop
In [18]: %timeit np.rad2deg(np.arctan2(flow[:,:,1], flow[:,:,0]))
10 loops, best of 3: 35.4 ms per loop
A more efficient way to use np.angle() is to create a complex view of flow. If flow is an array of type np.float64 with shape (m, n, 2), then flow.view(np.complex128)[:,:,0] will be an array of type np.complex128 with shape (m, n):
fangle = np.angle(flow.view(np.complex128)[:,:,0], deg=True)
This appears to be a smidge faster than using arctan2 followed by rad2deg (but the difference is not far above the measurement noise of timeit):
In [47]: %timeit np.angle(flow.view(np.complex128)[:,:,0], deg=True)
10 loops, best of 3: 35 ms per loop
Note that this might not work if flow was creating as the tranpose of some other array, or as a slice of another array using steps bigger than 1.
numpy.angle supports vectorized operation. So, just feed in the first and second column slices to it for the final output, like so -
fangle = np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
Verification -
In [9]: flow = np.random.uniform(low=-1.0, high=1.0, size=(720,1280,2))
In [17]: out1 = np.apply_along_axis(calcAngle, axis=2, arr=flow)
In [18]: out2 = np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
In [19]: np.allclose(out1, out2)
Out[19]: True
Runtime test -
In [10]: %timeit np.apply_along_axis(calcAngle, axis=2, arr=flow)
1 loop, best of 3: 8.27 s per loop
In [11]: %timeit np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
10 loops, best of 3: 47.6 ms per loop
In [12]: 8270/47.6
Out[12]: 173.73949579831933
173x+ speedup!
I have a vector, a, which I wish to cross with every point in a defined 3D space.
import numpy as np
# Grid
x = np.arange(-4,4,0.1)
y = np.arange(-4,4,0.1)
z = np.arange(-4,4,0.1)
a = [1,0,0]
result = [[] for i in range(3)]
for j in range(len(x)): # loop on x coords
for k in range(len(y)): # loop on y coords
for l in range(len(z)): # loop on z coords
r = [x[j] , y[k], z[l]]
result[0].append(np.cross(a, r)[0])
result[1].append(np.cross(a, r)[1])
result[2].append(np.cross(a, r)[2])
This produces an array which has taken the cross product of a with every point in space. However, the process takes far too long, due to the nested loops. Is there anyway to exploit vectors (meshgrid perhaps?) to make this process faster?
Here's one vectorized approach -
np.cross(a, np.array(np.meshgrid(x,y,z)).transpose(2,1,3,0)).reshape(-1,3).T
Sample run -
In [403]: x = np.random.rand(4)
...: y = np.random.rand(5)
...: z = np.random.rand(6)
...:
In [404]: result = original_app(x,y,z,a)
In [405]: out = np.cross(a, np.array(np.meshgrid(x,y,z)).\
transpose(2,1,3,0)).reshape(-1,3).T
In [406]: np.allclose(result[0], out[0])
Out[406]: True
In [407]: np.allclose(result[1], out[1])
Out[407]: True
In [408]: np.allclose(result[2], out[2])
Out[408]: True
Runtime test -
# Original setup used in the question
In [393]: # Grid
...: x = np.arange(-4,4,0.1)
...: y = np.arange(-4,4,0.1)
...: z = np.arange(-4,4,0.1)
...:
# Original approach
In [397]: %timeit original_app(x,y,z,a)
1 loops, best of 3: 21.5 s per loop
# #Denziloe's soln
In [395]: %timeit [np.cross(a, r) for r in product(x, y, z)]
1 loops, best of 3: 7.34 s per loop
# Proposed in this post
In [396]: %timeit np.cross(a, np.array(np.meshgrid(x,y,z)).\
transpose(2,1,3,0)).reshape(-1,3).T
100 loops, best of 3: 16 ms per loop
More than 1000x speedup over the original one and more than 450x over the loopy approach from other post.
This takes a couple of seconds to run on my machine:
from itertools import product
result = [np.cross(a, r) for r in product(x, y, z)]
I don't know if that's fast enough for you, but there are a lot of calculations involved. It's certainly cleaner, and there is at least some reduction of redundancy (e.g. calculating np.cross(a, r) three times). It also gives the result in a slightly different format, but this is the natural way to store the result and is hopefully fine for your purposes.
In a research paper, the author introduces an exterior product between two (3*3) matrices A and B, resulting in C:
C(i, j) = sum(k=1..3, l=1..3, m=1..3, n=1..3) eps(i,k,l)*eps(j,m,n)*A(k,m)*B(l,n)
where eps(a, b, c) is the Levi-Civita symbol.
I am wondering how to vectorize such a mathematical operator in Numpy instead of implementing 6 nested loops (for i, j, k, l, m, n) naively.
It looks like a purely sum-reduction based problem without the requirement of keeping any axis aligned between the inputs. So, I would suggest matrix-multiplication based solution for tensors using np.tensordot.
Thus, one solution could be implemented in three steps -
# Matrix-multiplication between first eps and A.
# Thus losing second axis from eps and first from A : k
parte1 = np.tensordot(eps,A,axes=((1),(0)))
# Matrix-multiplication between second eps and B.
# Thus losing third axis from eps and second from B : n
parte2 = np.tensordot(eps,B,axes=((2),(1)))
# Finally, we are left with two products : ilm & jml.
# We need to lose lm and ml from these inputs respectively to get ij.
# So, we need to lose last two dims from the products, but flipped .
out = np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
Runtime test
Approaches -
def einsum_based1(eps, A, B): # #unutbu's soln1
return np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
def einsum_based2(eps, A, B): # #unutbu's soln2
return np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
def tensordot_based(eps, A, B):
parte1 = np.tensordot(eps,A,axes=((1),(0)))
parte2 = np.tensordot(eps,B,axes=((2),(1)))
return np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
Timings -
In [5]: # Setup inputs
...: N = 20
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [6]: %timeit einsum_based1(eps, A, B)
1 loops, best of 3: 773 ms per loop
In [7]: %timeit einsum_based2(eps, A, B)
1000 loops, best of 3: 972 µs per loop
In [8]: %timeit tensordot_based(eps, A, B)
1000 loops, best of 3: 214 µs per loop
Bigger dataset -
In [12]: # Setup inputs
...: N = 100
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [13]: %timeit einsum_based2(eps, A, B)
1 loops, best of 3: 856 ms per loop
In [14]: %timeit tensordot_based(eps, A, B)
10 loops, best of 3: 49.2 ms per loop
You could use einsum which implements Einstein summation notation:
C = np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
or for better performance, apply einsum to two arrays at a time:
C = np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
np.einsum computes a sum of products.
The subscript specifier 'ikl,jmn,km,ln->ij' tells np.einsum that
the first eps has subcripts i,k,l,
the second eps has subcripts j,m,n,
A has subcripts k,m,
B has subcripts l,n,
the output array has subscripts i,j
Thus, the summation is over products of the form
eps(i,k,l) * eps(j,m,n) * A(k,m) * B(l,n)
All subscripts not in the output array are summed over.
I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).
What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.
I'm expecting my output to be an array with the shape N X M.
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
Here are some things to note:
The numpy function correlate requires input arrays to be one-dimensional.
The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
The scipy.stats function pearsonr requires input arrays to be one-dimensional.
Correlation (default 'valid' case) between two 2D arrays:
You can simply use matrix-multiplication np.dot like so -
out = np.dot(arr_one,arr_two.T)
Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
Row-wise Correlation Coefficient calculation for two 2D arrays:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
Benchmarking
This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
Case #1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
Case #2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
Case #3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop
#Divakar provides a great option for computing the unscaled correlation, which is what I originally asked for.
In order to calculate the correlation coefficient, a bit more is required:
import numpy as np
def generate_correlation_map(x, y):
"""Correlate each n with each m.
Parameters
----------
x : np.array
Shape N X T.
y : np.array
Shape M X T.
Returns
-------
np.array
N X M array in which each element is a correlation coefficient.
"""
mu_x = x.mean(1)
mu_y = y.mean(1)
n = x.shape[1]
if n != y.shape[1]:
raise ValueError('x and y must ' +
'have the same number of timepoints.')
s_x = x.std(1, ddof=n - 1)
s_y = y.std(1, ddof=n - 1)
cov = np.dot(x,
y.T) - n * np.dot(mu_x[:, np.newaxis],
mu_y[np.newaxis, :])
return cov / np.dot(s_x[:, np.newaxis], s_y[np.newaxis, :])
Here's a test of this function, which passes:
from scipy.stats import pearsonr
def test_generate_correlation_map():
x = np.random.rand(10, 10)
y = np.random.rand(20, 10)
desired = np.empty((10, 20))
for n in range(x.shape[0]):
for m in range(y.shape[0]):
desired[n, m] = pearsonr(x[n, :], y[m, :])[0]
actual = generate_correlation_map(x, y)
np.testing.assert_array_almost_equal(actual, desired)
For those interested in computing the Pearson correlation coefficient between a 1D and 2D array, I wrote the following function, where x is a 1D array and y a 2D array.
def pearsonr_2D(x, y):
"""computes pearson correlation coefficient
where x is a 1D and y a 2D array"""
upper = np.sum((x - np.mean(x)) * (y - np.mean(y, axis=1)[:,None]), axis=1)
lower = np.sqrt(np.sum(np.power(x - np.mean(x), 2)) * np.sum(np.power(y - np.mean(y, axis=1)[:,None], 2), axis=1))
rho = upper / lower
return rho
Example run:
>>> x
Out[1]: array([1, 2, 3])
>>> y
Out[2]: array([[ 1, 2, 3],
[ 6, 7, 12],
[ 9, 3, 1]])
>>> pearsonr_2D(x, y)
Out[3]: array([ 1. , 0.93325653, -0.96076892])