Suppose I have two arrays. A has size n by d, and B has size t by d. Suppose I want to output an array C, where C[i, j] gives the cubed L3 norm between A[i] and B[j] (both of these have size d). i.e.
C[i, j] = |A[i, 0]-B[j, 0]|**3 + |A[i, 1]-B[j, 1]|**3 + ... + |A[i, d-1]-B[j, d-1]|**3
Can anyone redirect me to a really efficient way to do this? I have tried using a double for loop and something with array operations but the runtime is incredibly slow.
Edit: I know TensorFlow.norm works, but how could I implement this myself?
In accordance with #gph answer, an explicit example of the application of Numpy's np.linalg.norm, using broadcasting to avoid any loops:
import numpy as np
n, m, d = 200, 300, 3
A = np.random.random((n,d))
B = np.random.random((m,d))
C = np.linalg.norm(A[:,None,:] - B[None,...], ord=3, axis = -1)
# The 'raw' option would be:
C2 = (np.sum(np.abs(A[:,None,:] - B[None,...])**3, axis = -1))**(1/3)
Both ways seems to take around the same time to execute. I've tried to use np.einsum but let's just say my algebra skills have seen better days. Perhaps you have better luck (a nice resource is this wiki by Divakar).
There may be a few optimizations to speed this up, but the performance isn't going to be anywhere near specialized math packages. Those packages are using blas and lapack to vectorize the operations. They also avoid a lot of type checking overhead by enforcing types at the time you set a value. If performance is important there really is no way you are going to do better than a numerical computing package like numpy.
Use a 3rd-party library written in C or create your own
numpy has a linalg library which should be able to compute your L3 norm for each A[i]-B[j]
If numpy works for you, take a look at numba's JIT, which can compile and cache some (numpy) code to be orders of magnitude faster (successive runs will take advantage of it). I believe you will need to describe the parts to accelerate as functions which will be called many times to take advantage of it.
roughly
import numpy as np
from numpy import linalg as LA
from numba import jit as numbajit
#numbajit
def L3norm(A, B, i, j):
return LA.norm([A[i] - B[j]], ord=3)
# may be able to apply JIT here too
def operate(A, B, n, t):
# generate array
C = np.zeros(shape=(n, t))
for i in range(n):
for j in range(t):
C[i, j] = L3norm(A, B, i, j)
return C
Related
In my code (written in Python 2.7), I create two numpy arrays, A and B. I then use them to assemble a larger matrix, H, with the following code
H = np.block([A, B], [-B, -A])
Various computations follow, involving substantial amounts of numpy manipulations and for loops. As a result, I would like to use Numba to optimize the code. However, it appears that the numpy block function is unsupported in Numba. The matrices A and B are not terribly large, so I'm fine using a function that may not be as optimized as np.block, but I would still like to assemble H in a block matrix fashion for the purpose of readability. Are there any functions that would accomplish this?
Just to make #hpaulj's comment concrete, making some basic assumptions about your input data and not doing any sort of error checking:
#nb.njit
def nb_block(X):
xtmp1 = np.concatenate(X[0], axis=1)
xtmp2 = np.concatenate(X[1], axis=1)
return np.concatenate((xtmp1, xtmp2), axis=0)
The following also works:
#nb.njit
def nb_block2(X):
xtmp1 = np.hstack(X[0])
xtmp2 = np.hstack(X[1])
return np.vstack((xtmp1, xtmp2))
and the performance of the two differs with different sized arrays. You should benchmark your own application.
Then calling:
A = np.zeros((50,30))
B = np.ones((50,30))
X = np.block([[A, B], [-B, -A]])
Y = nb_block(((A, B), (-B, -A))) # Note the tuple-of-tuples vs list-of-lists
np.all_close(X, Y) # True
Suppose I have two arrays A and B with dimensions (n1,m1,m2) and (n2,m1,m2), respectively. I want to compute the matrix C with dimensions (n1,n2) such that C[i,j] = sum((A[i,:,:] - B[j,:,:])^2). Here is what I have so far:
import numpy as np
A = np.array(range(1,13)).reshape(3,2,2)
B = np.array(range(1,9)).reshape(2,2,2)
C = np.zeros(shape=(A.shape[0], B.shape[0]) )
for i in range(A.shape[0]):
for j in range(B.shape[0]):
C[i,j] = np.sum(np.square(A[i,:,:] - B[j,:,:]))
C
What is the most efficient way to do this? In R I would use a vectorized approach, such as outer. Is there a similar method for Python?
Thanks.
You can use scipy's cdist, which is pretty efficient for such calculations after reshaping the input arrays to 2D, like so -
from scipy.spatial.distance import cdist
C = cdist(A.reshape(A.shape[0],-1),B.reshape(B.shape[0],-1),'sqeuclidean')
Now, the above approach must be memory efficient and thus a better one when working with large datasizes. For small input arrays, one can also use np.einsum and leverage NumPy broadcasting, like so -
diffs = A[:,None]-B
C = np.einsum('ijkl,ijkl->ij',diffs,diffs)
I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]
I have to do many loops of the following type
for i in range(len(a)):
for j in range(i+1):
c[i] += a[j]*b[i-j]
where a and b are short arrays (of the same size, which is between about 10 and 50). This can be done efficiently using a convolution:
import numpy as np
np.convolve(a, b)
However, this gives me the full convolution (i.e. the vector is too long, compared to the for loop above). If I use the 'same' option in convolve, I get the central part, but what I want is the first part. Of course, I can chop off what I don't need from the full vector, but I would like to get rid of the unnecessary computation time if possible.
Can someone suggest a better vectorization of the loops?
You could write a small C extension in Cython:
# cython: boundscheck=False
cimport numpy as np
import numpy as np # zeros_like
ctypedef np.float64_t np_t
def convolve_cy_np(np.ndarray[np_t] a not None,
np.ndarray[np_t] b not None,
np.ndarray[np_t] c=None):
if c is None:
c = np.zeros_like(a)
cdef Py_ssize_t i, j, n = c.shape[0]
with nogil:
for i in range(n):
for j in range(i + 1):
c[i] += a[j] * b[i - j]
return c
It performs well for n=10..50 compared to np.convolve(a,b)[:len(a)] on my machine.
Also it seems like a job for numba.
There is no way to do a convolution with vectorized array manipulations in numpy. Your best bet is to use np.convolve(a, b, mode='same') and trim off what you don't need. Thats going to probably be 10x faster than the double loop in pure python that you have above. You could also roll your own with Cython if you are really concerned about the speed, but it likely won't be much if any faster than np.convolve().
Can anyone direct me to the section of numpy manual where i can get functions to accomplish root mean square calculations ...
(i know this can be accomplished using np.mean and np.abs .. isn't there a built in ..if no why?? .. just curious ..no offense)
can anyone explain the complications of matrix and arrays (just in the following case):
U is a matrix(T-by-N,or u say T cross N) , Ue is another matrix(T-by-N)
I define k as a numpy array
U[ind,:] is still matrix
in the following fashion
k = np.array(U[ind,:])
when I print k or type k in ipython
it displays following
K = array ([[2,.3 .....
......
9]])
You see the double square brackets (which makes it multi-dim i guess)
which gives it the shape = (1,N)
but I can't assign it to array defined in this way
l = np.zeros(N)
shape = (,N) or perhaps (N,) something like that
l[:] = k[:]
error:
matrix dimensions incompatible
Is there a way to accomplish the vector assignment which I intend to do ... Please don't tell me do this l = k (that defeats the purpose ... I get different errors in program .. I know the reasons ..If you need I may attach the piece of code)
writing a loop is the dumb way .. which I'm using for the time being ...
I hope I was able to explain .. the problems I'm facing ..
regards ...
For the RMS, I think this is the clearest:
from numpy import mean, sqrt, square, arange
a = arange(10) # For example
rms = sqrt(mean(square(a)))
The code reads like you say it: "root-mean-square".
For rms, the fastest expression I have found for small x.size (~ 1024) and real x is:
def rms(x):
return np.sqrt(x.dot(x)/x.size)
This seems to be around twice as fast as the linalg.norm version (ipython %timeit on a really old laptop).
If you want complex arrays handled more appropriately then this also would work:
def rms(x):
return np.sqrt(np.vdot(x, x)/x.size)
However, this version is nearly as slow as the norm version and only works for flat arrays.
For the RMS, how about
norm(V)/sqrt(V.size)
I don't know why it's not built in. I like
def rms(x, axis=None):
return sqrt(mean(x**2, axis=axis))
If you have nans in your data, you can do
def nanrms(x, axis=None):
return sqrt(nanmean(x**2, axis=axis))
Try this:
U = np.zeros((N,N))
ind = 1
k = np.zeros(N)
k[:] = U[ind,:]
I use this for RMS, all using NumPy, and let it also have an optional axis similar to other NumPy functions:
import numpy as np
rms = lambda V, axis=None: np.sqrt(np.mean(np.square(V), axis))
If you have complex vectors and are using pytorch, the vector norm is the fastest approach on CPU & GPU:
import torch
batch_size, length = 512, 4096
batch = torch.randn(batch_size, length, dtype=torch.complex64)
scale = 1 / torch.sqrt(torch.tensor(length))
rms_power = batch.norm(p=2, dim=-1, keepdim=True)
batch_rms = batch / (rms_power * scale)
Using batch vdot like goodboy's approach is 60% slower than above. Using naïve method similar to deprecated's approach is 85% slower than above.