Einsum formula for repeating dimensions - python

I have this piece of code:
other = np.random.rand((m,n,o))
prev = np.random.rand((m,n,o,m,n,o))
mu = np.zeros((m,n,o,m,n,o))
for c in range(m):
for i in range(n):
for j in range(o):
mu[c,i,j,c,i,j] = other[c,i,j]*prev[c,i,j,c,i,j]
And I'd like to simplify it using einsum notation (possibly saving time by skipping the for loops in python). However after a few tries I'm eventually not sure how to approach the problem. My current try is:
np.einsum('cijklm,cij->cijklm', prev, other)
It does not achieves the same result as the "for-loop" piece of code.

With shapes (2,3,4), I get:
In [52]: mu.shape
Out[52]: (2, 3, 4, 2, 3, 4)
This einsum expression complains that dimensions are repeated in the output:
In [53]: np.einsum('cijcij,cij->cijcij', prev, other).shape
Traceback (most recent call last):
File "<ipython-input-53-92862a0865a2>", line 1, in <module>
np.einsum('cijcij,cij->cijcij', prev, other).shape
File "<__array_function__ internals>", line 180, in einsum
File "/usr/local/lib/python3.8/dist-packages/numpy/core/einsumfunc.py", line 1359, in einsum
return c_einsum(*operands, **kwargs)
ValueError: einstein sum subscripts string includes output subscript 'c' multiple times
Without the repeat:
In [55]: x=np.einsum('cijcij,cij->cij', prev, other)
In [56]: x.shape
Out[56]: (2, 3, 4)
Nonzero values match:
In [57]: np.allclose(mu[np.nonzero(mu)].ravel(), x.ravel())
Out[57]: True
Or by extracting the diagonals from mu:
In [59]: I,J,K = np.ix_(np.arange(2),np.arange(3),np.arange(4))
In [60]: mu[I,J,K,I,J,K].shape
Out[60]: (2, 3, 4)
In [61]: np.allclose(mu[I,J,K,I,J,K],x)
Out[61]: True
Your einsum satisfies the same 'diagonals' test:
In [68]: y=np.einsum('cijklm,cij->cijklm', prev, other)
In [69]: y.shape
Out[69]: (2, 3, 4, 2, 3, 4)
In [70]: np.allclose(y[I,J,K,I,J,K],x)
Out[70]: True
So the mu values are also present in y, but distributed in a different way. But the arrays are too big to readily view and compare.
OK, each y[i,j,k] is the same, and equal to x. In mu most of these values are 0, with only selected diagonals being nonzero.
While einsum can generate the same nonzero values, it cannot distribute them in the same 3d diagonals way as your loop.
Changing your mu calculation to produce a 3d array:
In [76]: nu = np.zeros((m,n,o))
...: for c in range(m):
...: for i in range(n):
...: for j in range(o):
...: nu[c,i,j] = other[c,i,j]*prev[c,i,j,c,i,j]
...:
In [77]: np.allclose(nu,x)
Out[77]: True
edit
We can assign einsum result to the diagonals with:
In [134]: out = np.zeros((2,3,4,2,3,4))
In [135]: out[I,J,K,I,J,K] = x
In [136]: np.allclose(out, mu)
Out[136]: True
Conceptually it may be simpler than the as_strided solution. And may be just as fast. as_strided, while making a view, is not as fast as a reshape kind of view.
In [143]: %%timeit
...: out = np.zeros((m, n, o, m, n, o))
...: mu_view = np.lib.stride_tricks.as_strided(out,
...: shape=(m, n, o),
...: strides=[sum(mu.strides[i::3]) for i in range(3)
...: ]
...: )
...: np.einsum('cijcij,cij->cij', prev, other, out=mu_view)
...:
...:
31.6 µs ± 69.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [144]: %%timeit
...: out = np.zeros((2,3,4,2,3,4))
...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
...:
...:
18.5 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
or including the I,J,K generation in the time loop
In [146]: %%timeit
...: I,J,K = np.ix_(np.arange(2),np.arange(3),np.arange(4))
...: out = np.zeros((2,3,4,2,3,4))
...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
40.4 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
or creating the IJK directly:
In [151]: %%timeit
...: I,J,K = np.arange(2)[:,None,None],np.arange(3)[:,None],np.arange(4)
...: out = np.zeros((2,3,4,2,3,4))
...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
25.1 µs ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

It is not possible to get this result using np.einsum() alone, but you can try this:
import numpy as np
from numpy.lib.stride_tricks import as_strided
m, n, o = 2, 3, 5
np.random.seed(0)
other = np.random.rand(m, n, o)
prev = np.random.rand(m, n, o, m, n, o)
mu = np.zeros((m, n, o, m, n, o))
mu_view = as_strided(mu,
shape=(m, n, o),
strides=[sum(mu.strides[i::3]) for i in range(3)]
)
np.einsum('cijcij,cij->cij', prev, other, out=mu_view)
The array mu should be then the same as the one produced by the code using nested loops in the question.
Some explanation. Regardless of a shape of a numpy array, internally its elements are stored in a contiguous block of memory. Part of the structure of an array are strides, which specify how many bytes one needs to jump when one of the indices of an array element is incremented by 1. Thus, in a 2-dimensional array arr, arr.stride[0] is the number of bytes separating an element arr[i, j] from arr[i+1, j] and arr.stride[1] is the number of bytes separating arr[i, j] from a[i, j+1]. Using the strides information numpy can find a given element in an array based on its indices. See e.g. this post for more details.
numpy.lib.stride_tricks.as_strided is a function that creates a view of a given array with custom-made strides. By specifying strides, one can change which array element corresponds to which indices. In the code above this is used to create mu_view, which is a view of mu with the property, that the element mu_view[c, i, j] is the element mu[c, i, j, c, i, j]. This is done by specifying strides of mu_view in terms of strides of mu. For example, the distance between mu_view[c, i, j] and mu_view[c+1, i, j] is set to be the distance between mu[c, i, j, c, i, j] and mu[c+1, i, j, c+1, i, j], which is mu.strides[0] + mu.strides[3].

Related

Calculating Kernel matrix using numpy methods

I have a data of shape d X N (each column is a vector of features)
I have this code for calculating the kernel matrix:
def kernel(x1, x2):
return x1.T # x2
data = np.array([[1,2,3], [1,2,3], [1,2,3]])
result = []
for i in range(data.shape[1]):
current_result = []
for j in range(data.shape[1]):
x1 = data[:, i]
x2 = data[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
np.array(result)
and I am getting this result:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
The problem is that this code is too slow, so I tried to use np.vectorize:
vec = np.vectorize(kernel, signature='(n),(n)->()')
vec(data, data)
But I am getting the wrong result:
array([14, 14, 14])
what am I doing wrong?
When tested for bigger dimensions of your problem, and random numbers to ensure the robustness, for instance with dimensions (100,200), there are several ways:
import numpy as np
def kernel(x1, x2):
return x1.T # x2
def kernel_kenny(a):
result = []
for i in range(a.shape[1]):
current_result = []
for j in range(a.shape[1]):
x1 = a[:, i]
x2 = a[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
return np.array(result)
a = np.random.random((100,200))
res1 = kernel_kenny(a)
# perhaps einsum signature might help you to understand the calculations
res2 = np.einsum('ji,jk->ik', a, a, optimize=True)
# or the following if you want to explicitly specify the transpose
# res2 = np.einsum('ij,jk->ik', a.T, a, optimize=True)
# or simply ...
res3 = a.T # a
Hera are the sanity checks:
np.allclose(res1,res2)
>>> True
np.allclose(res1,res3)
>>> True
and timings:
%timeit kernel_kenny(a)
>>> 83.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ji,jk->ik', a, a, optimize=True)
>>> 325 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a.T # a
>>> 82 µs ± 9.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Multiply array of n 3x3 rotation matrices with 3d array of 3-vectors

I have a 3d array of position vectors p [np.shape(p) yields (Nx, Ny, Nz, 3)] and an array Rn of n rotation matrices [np.shape(R) yields (n, 3, 3)].
I am trying to get an array PR of shape (n, Nx, Ny, Nz, 3) where the i-th (0 < i < n) entry at dimension 0 is the 3d array of position vectors p rotated by the 3x3 rotation matrix at index i of array Rn.
theta = np.arange(0, 2*np.pi, np.pi/50)
phi = np.arange(0, np.pi, np.pi/100)
a = np.arange(100)
b = np.arange(50)
p = np.array(np.meshgrid(a, b, a, indexing="xy"))
p = np.moveaxis(p, 1, 2)
p = np.moveaxis(p, 0, 3)
# np.shape(p) => (100,50,100,3)
Rn = np.array([np.array([np.cos(theta)*np.cos(phi), np.cos(theta)*np.sin(phi), -np.sin(theta)]),
np.array([-np.sin(phi), np.cos(phi), np.zeros(np.shape(phi))]),
np.array([np.cos(phi)*np.sin(theta), np.sin(theta)*np.sin(phi), np.cos(theta)])])
Rn = np.moveaxis(Rn , 1, 2)
Rn = np.moveaxis(Rn , 0, 1)
# np.shape(Rn) => (100, 3, 3)
So far I have attempted the following, unsuccessfully.
PR= np.matmul(Rn, p)
What is the most efficient way to perform this operation? I know how to perform this using For loops, but in the interest of efficiency I have been trying to keep things vectorized within numpy.
Two possible solutions are -
np.einsum("ijkl,nal->nijka", p, Rn, optimize=True)
td = np.moveaxis(np.tensordot(p, Rn, axes=((-1), (-1))), 3, 0)
I will also compare these solutions with other answers in this thread.
p = np.random.rand(10, 20, 30, 3)
Rn = np.random.rand(100, 3, 3)
es = np.einsum("ijkl,nal->nijka", p, Rn, optimize=True)
td = np.moveaxis(np.tensordot(p, Rn, axes=((-1), (-1))), 3, 0)
d = np.squeeze(np.moveaxis(np.dot(Rn, p[..., None]), 1, -2), -1)
out = ((Rn # p.reshape(-1,3).T)
.reshape(Rn.shape[0],3,-1)
.swapaxes(1,2)
.reshape(-1, *p.shape)
)
print(np.allclose(es, out))
print(np.allclose(td, out))
print(np.allclose(d, out))
All gives True.
If you try benchmarking their performance,
%timeit np.einsum("ijkl,nal->nijka", p, Rn, optimize=True)
%timeit np.moveaxis(np.tensordot(p, Rn, axes=((-1), (-1))), 3, 0)
%timeit ((Rn # p.reshape(-1,3).T).reshape(Rn.shape[0],3,-1) .swapaxes(1,2).reshape(-1, *p.shape))
%timeit np.moveaxis(np.squeeze(np.dot(Rn, p[..., None]), -1), 1, -1)
Gives,
3.91 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.15 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.45 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.1 ms ± 98.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
For an array of the given size on my system.
einsum and tensordot seems to have comparable performance while the # solution seems the fastest. The dot solutions seems unreasonably slow though. I am not sure why since I would have imagined it's using # under the hood.
Let's try:
out = ((Rn # p.reshape(-1,3).T)
.reshape(Rn.shape[0],3,-1)
.swapaxes(1,2)
.reshape(-1, *p.shape)
)
You don't need to do any fancy packaging since np.dot already takes care of the product of dimensions (unlike np.matmul, which broadcasts the leading dimensions together).
There are two additional steps:
You need to add a trailing dimension to p to make it the product of 3x3 by 3x1 matrices.
The result will have shape (n, 3, Nx, Ny, Nz, 1) because of the product. You will want to move the second dimension to the second to last and squeeze out the last one:
np.moveaxis(np.squeeze(np.dot(Rn, p[..., None]), -1), 1, -1)
OR
np.squeeze(np.moveaxis(np.dot(Rn, p[..., None]), 1, -2), -1)

scipy pdist getting only two closest neighbors

I've been computing pairwise distances with scipy, and I am trying to get distances to two of the closest neighbors. My current working solution is:
dists = squareform(pdist(xs.todense()))
dists = np.sort(dists, axis=1)[:, 1:3]
However, the squareform method is spatially very expensive and somewhat redundant in my case. I only need the two closest distances, not all of them. Is there a simple workaround?
Thanks!
The relation between linear index and the (i, j) of the upper triangle distance matrix is not directly, or easily, invertible (see note 2 in squareform doc).
However, by looping over all indices the inverse relation can be obtained:
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist
def inverse_condensed_indices(idx, n):
k = 0
for i in range(n):
for j in range(i+1, n):
if k == idx:
return (i, j)
k +=1
else:
return None
# test
points = np.random.rand(8, 2)
distances = pdist(points)
sorted_idx = np.argsort(distances)
n = points.shape[0]
ij = [inverse_condensed_indices(idx, n)
for idx in sorted_idx[:2]]
# graph
plt.figure(figsize=(5, 5))
for i, j in ij:
x = [points[i, 0], points[j, 0]]
y = [points[i, 1], points[j, 1]]
plt.plot(x, y, '-', color='red');
plt.plot(points[:, 0], points[:, 1], '.', color='black');
plt.xlim(0, 1); plt.ylim(0, 1);
It seems to be a little faster than using squareform:
%timeit squareform(range(28))
# 9.23 µs ± 63 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit inverse_condensed_indices(27, 8)
# 2.38 µs ± 25 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Find largest row in a matrix with numpy (row with highest length)

I have a massive array with rows and columns. Some rows are larger than others. I need to get the max length row, that is, the row that has the highest length. I wrote a simple function for this, but I wanted it to be as fas as possible, like numpy fast. Currently, it looks like this:
Example array:
values = [
[1,2,3],
[4,5,6,7,8,9],
[10,11,12,13]
]
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
Is there any way to accomplish this with numpy?
In [261]: values = [
...: [1,2,3],
...: [4,5,6,7,8,9],
...: [10,11,12,13]
...: ]
...:
In [262]:
In [262]: values
Out[262]: [[1, 2, 3], [4, 5, 6, 7, 8, 9], [10, 11, 12, 13]]
In [263]: def values_max_width(values):
...: max_width = 1
...: for row in values:
...: if len(row) > max_width:
...: max_width = len(row)
...: return max_width
...:
In [264]: values_max_width(values)
Out[264]: 6
In [265]: [len(v) for v in values]
Out[265]: [3, 6, 4]
In [266]: max([len(v) for v in values])
Out[266]: 6
In [267]: np.max([len(v) for v in values])
Out[267]: 6
Your loop and the list comprehension are similar in speed, np.max is much slower - it has to first turn the list into an array.
In [268]: timeit max([len(v) for v in values])
656 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [269]: timeit np.max([len(v) for v in values])
13.9 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [271]: timeit values_max_width(values)
555 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you are starting with a list, it's a good idea to thoroughly test the list implementation. numpy is fast when it is doing compiled array stuff, but creating an array from a list is time consuming.
Making an array directly from values isn't much help. The result in a object dtype array:
In [272]: arr = np.array(values)
In [273]: arr
Out[273]:
array([list([1, 2, 3]), list([4, 5, 6, 7, 8, 9]), list([10, 11, 12, 13])],
dtype=object)
Math on such an array is hit-or-miss, and always slower than math on pure numeric arrays. We can iterate on such an array, but that iteration is slower than on a list.
In [275]: values_max_width(arr)
Out[275]: 6
In [276]: timeit values_max_width(arr)
1.3 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Not sure how you can make it faster. I've tried using np.max over the length of each item, but that will take even longer:
import numpy as np
import time
values = []
for k in range(100000):
values.append(list(np.random.randint(100, size=np.random.randint(1000))))
def timeit(func):
def wrapper(*args, **kwargs):
now = time.time()
retval = func(*args, **kwargs)
print('{} took {:.5f}s'.format(func.__name__, time.time() - now))
return retval
return wrapper
#timeit
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
#timeit
def value_max_width_len(values):
return np.max([len(l) for l in values])
values_max_width(values)
value_max_width_len(values)
values_max_width took 0.00598s
value_max_width_len took 0.00994s
* Edit *
As #Mstaino suggested, using map does make this code faster:
#timeit
def value_max_width_len(values):
return max(map(len, values))
values_max_width took 0.00598s
value_max_width_len took 0.00499s

How to convert the output of meshgrid to the corresponding array of points?

I want to create a list of points that would correspond to a grid. So if I want to create a grid of the region from (0, 0) to (1, 1), it would contain the points (0, 0), (0, 1), (1, 0) and (1, 0).
I know that that this can be done with the following code:
g = np.meshgrid([0,1],[0,1])
np.append(g[0].reshape(-1,1),g[1].reshape(-1,1),axis=1)
Yielding the result:
array([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
My question is twofold:
Is there a better way of doing this?
Is there a way of generalizing this to higher dimensions?
I just noticed that the documentation in numpy provides an even faster way to do this:
X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
This can easily be generalized to more dimensions using the linked meshgrid2 function and mapping 'ravel' to the resulting grid.
g = meshgrid2(x, y, z)
positions = np.vstack(map(np.ravel, g))
The result is about 35 times faster than the zip method for a 3D array with 1000 ticks on each axis.
Source: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde
To compare the two methods consider the following sections of code:
Create the proverbial tick marks that will help to create the grid.
In [23]: import numpy as np
In [34]: from numpy import asarray
In [35]: x = np.random.rand(100,1)
In [36]: y = np.random.rand(100,1)
In [37]: z = np.random.rand(100,1)
Define the function that mgilson linked to for the meshgrid:
In [38]: def meshgrid2(*arrs):
....: arrs = tuple(reversed(arrs))
....: lens = map(len, arrs)
....: dim = len(arrs)
....: sz = 1
....: for s in lens:
....: sz *= s
....: ans = []
....: for i, arr in enumerate(arrs):
....: slc = [1]*dim
....: slc[i] = lens[i]
....: arr2 = asarray(arr).reshape(slc)
....: for j, sz in enumerate(lens):
....: if j != i:
....: arr2 = arr2.repeat(sz, axis=j)
....: ans.append(arr2)
....: return tuple(ans)
Create the grid and time the two functions.
In [39]: g = meshgrid2(x, y, z)
In [40]: %timeit pos = np.vstack(map(np.ravel, g)).T
100 loops, best of 3: 7.26 ms per loop
In [41]: %timeit zip(*(x.flat for x in g))
1 loops, best of 3: 264 ms per loop
Are your gridpoints always integral? If so, you could use numpy.ndindex
print list(np.ndindex(2,2))
Higher dimensions:
print list(np.ndindex(2,2,2))
Unfortunately, this does not meet the requirements of the OP since the integral assumption (starting with 0) is not met. I'll leave this answer only in case someone else is looking for the same thing where those assumptions are true.
Another way to do this relies on zip:
g = np.meshgrid([0,1],[0,1])
zip(*(x.flat for x in g))
This portion scales nicely to arbitrary dimensions. Unfortunately, np.meshgrid doesn't scale well to multiple dimensions, so that part will need to be worked out, or (assuming it works), you could use this SO answer to create your own ndmeshgrid function.
Yet another way to do it is:
np.indices((2,2)).T.reshape(-1,2)
Which can be generalized to higher dimensions, e.g.:
In [60]: np.indices((2,2,2)).T.reshape(-1,3)
Out[60]:
array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[1, 1, 0],
[0, 0, 1],
[1, 0, 1],
[0, 1, 1],
[1, 1, 1]])
To get the coordinates of a grid from 0 to 1, a reshape can do the work. Here are examples for 2D and 3D. Also works with floats.
grid_2D = np.mgrid[0:2:1, 0:2:1]
points_2D = grid_2D.reshape(2, -1).T
grid_3D = np.mgrid[0:2:1, 0:2:1, 0:2:1]
points_3D = grid_3D.reshape(3, -1).T
A simple example in 3D (can be extended to N-dimensions I guess, but beware of the final dimension and RAM usage):
import numpy as np
ndim = 3
xmin = 0.
ymin = 0.
zmin = 0.
length_x = 1000.
length_y = 1000.
length_z = 50.
step_x = 1.
step_y = 1.
step_z = 1.
x = np.arange(xmin, length_x, step_x)
y = np.arange(ymin, length_y, step_y)
z = np.arange(zmin, length_z, step_z)
%timeit xyz = np.array(np.meshgrid(x, y, z)).T.reshape(-1, ndim)
in: 2.76 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which yields:
In [2]: xyx
Out[2]:
array([[ 0., 0., 0.],
[ 0., 1., 0.],
[ 0., 2., 0.],
...,
[999., 997., 49.],
[999., 998., 49.],
[999., 999., 49.]])
In [4]: xyz.shape
Out[4]: (50000000, 3)
Python 3.6.9
Numpy: 1.19.5
I am using the following to convert meshgrid to M X 2 array. Also changes the list of vectors to iterators can make it really fast.
import numpy as np
# Without iterators
x_vecs = [np.linspace(0,1,1000), np.linspace(0,1,1000)]
%timeit np.reshape(np.meshgrid(*x_vecs),(2,-1)).T
6.85 ms ± 93.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# With iterators
x_vecs = iter([np.linspace(0,1,1000), np.linspace(0,1,1000)])
%timeit np.reshape(np.meshgrid(*x_vecs),(2,-1)).T
5.78 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
for N-D array using generator
vec_dim = 3
res = 100
# Without iterators
x_vecs = [np.linspace(0,1,res) for i in range(vec_dim)]
>>> %timeit np.reshape(np.meshgrid(*x_vecs),(vec_dim,-1)).T
11 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# With iterators
x_vecs = (np.linspace(0,1,res) for i in range(vec_dim))
>>> %timeit np.reshape(np.meshgrid(*x_vecs),(vec_dim,-1)).T
5.54 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Categories