how to relpace a array to the diagonal of numpy array python - python

I have an array
import numpy as np
X = np.array([[0.7513, 0.6991, 0.5472, 0.2575],
[0.2551, 0.8909, 0.1386, 0.8407],
[0.5060, 0.9593, 0.1493, 0.2543],
[0.5060, 0.9593, 0.1493, 0.2543]])
y = np.array([[1,2,3,4]])
How to replace the diagonal of X with y. We can write a loop but any faster way?

A fast and reliable method is np.einsum:
>>> diag_view = np.einsum('ii->i', X)
This creates a view of the diagonal:
>>> diag_view
array([0.7513, 0.8909, 0.1493, 0.2543])
This view is writable:
>>> diag_view[None] = y
>>> X
array([[1. , 0.6991, 0.5472, 0.2575],
[0.2551, 2. , 0.1386, 0.8407],
[0.506 , 0.9593, 3. , 0.2543],
[0.506 , 0.9593, 0.1493, 4. ]])
This works for contiguous and non-contiguous arrays and is very fast:
contiguous:
loop 21.146424998732982
diag_indices 2.595232878000388
einsum 1.0271988900003635
flatten 1.5372659160002513
non contiguous:
loop 20.133818001340842
diag_indices 2.618005960001028
einsum 1.0305795049989683
Traceback (most recent call last): <- flatten does not work here
...
How does it work? Under the hood einsum does an advanced version of #Julien's trick: It adds the strides of arr:
>>> arr.strides
(3200, 16)
>>> np.einsum('ii->i', arr).strides
(3216,)
One can convince oneself that this will always work as long as arr is organized in strides, which is the case for numpy arrays.
While this use of einsum is pretty neat it is also almost impossible to find if one doesn't know. So spread the word!
Code to recreate the timings and the crash:
import numpy as np
n = 100
arr = np.zeros((n, n))
replace = np.ones(n)
def loop():
for i in range(len(arr)):
arr[i,i] = replace[i]
def other():
l = len(arr)
arr.shape = -1
arr[::l+1] = replace
arr.shape = l,l
def di():
arr[np.diag_indices(arr.shape[0])] = replace
def es():
np.einsum('ii->i', arr)[...] = replace
from timeit import timeit
print('\ncontiguous:')
print('loop ', timeit(loop, number=1000)*1000)
print('diag_indices ', timeit(di))
print('einsum ', timeit(es))
print('flatten ', timeit(other))
arr = np.zeros((2*n, 2*n))[::2, ::2]
print('\nnon contiguous:')
print('loop ', timeit(loop, number=1000)*1000)
print('diag_indices ', timeit(di))
print('einsum ', timeit(es))
print('flatten ', timeit(other))

This should be pretty fast (especially for bigger arrays, for your example it's about twice slower):
arr = np.zeros((4,4))
replace = [1,2,3,4]
l = len(arr)
arr.shape = -1
arr[::l+1] = replace
arr.shape = l,l
Test on bigger array:
n = 100
arr = np.zeros((n,n))
replace = np.ones(n)
def loop():
for i in range(len(arr)):
arr[i,i] = replace[i]
def other():
l = len(arr)
arr.shape = -1
arr[::l+1] = replace
arr.shape = l,l
%timeit(loop())
%timeit(other())
14.7 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.55 µs ± 24.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Use diag_indices for a vectorized solution:
X[np.diag_indices(X.shape[0])] = y
array([[1. , 0.6991, 0.5472, 0.2575],
[0.2551, 2. , 0.1386, 0.8407],
[0.506 , 0.9593, 3. , 0.2543],
[0.506 , 0.9593, 0.1493, 4. ]])

Related

How to vectorize a 2 level loop in NumPy

Based on the comments, I have revised the example:
Consider the following code
import numpy as np
def subspace_angle(A, B):
M = A.T # B
s = np.linalg.svd(M, compute_uv=False)
return s[0]
def principal_angles(bases):
k = bases.shape[0]
r = np.zeros((k, k))
for i in range(k):
x = bases[i]
r[i, i] = subspace_angle(x, x)
for j in range(i):
y = bases[j]
r[i, j] = subspace_angle(x, y)
r[j, i] = r[i, j]
r = np.minimum(1, r)
return np.rad2deg(np.arccos(r))
Following is an example use:
bases = []
# number of subspaces
k = 5
# ambient dimension
n = 8
# subspace dimension
m = 4
for i in range(5):
X = np.random.randn(n, m)
Q,R = np.linalg.qr(X)
bases.append(Q)
# combine the orthonormal bases for all the subspaces
bases = np.array(bases)
# Compute the smallest principal angles between each pair of subspaces.
print(np.round(principal_angles(bases), 2))
Is there a way to avoid the two-level for loops in the principal_angles function, so that the code could be sped up?
As a result of this code, the matrix r is symmetric. Since subspace_angle could be compute-heavy depending on the array size, it is important to avoid computing it twice for r[i,j] and r[j,i].
On the comment about JIT, actually, I am writing the code with Google/JAX. The two-level loop does get JIT compiled giving performance benefits. However, the JIT compilation time is pretty high (possibly due to two levels of for-loop). I am wondering if there is a better way to write this code so that it may compile faster.
I started to copy your code to ipython session, getting a (5,8,4) shaped bases. But then realized that func is undefined. So by commenting that out, I get:
In [6]: def principal_angles(bases):
...: k = bases.shape[0]
...: r = np.zeros((k, k))
...: for i in range(k):
...: x = bases[i]
...: # r[i, i] = func(x, x)
...: for j in range(i):
...: y = bases[j]
...: r[i, j] = subspace_angle(x, y)
...: #r[j, i] = r[i, j]
...: return r
...: #r = np.minimum(1, r)
...: #return np.rad2deg(np.arccos(r))
...:
In [7]: r=principal_angles(bases)
In [8]: r.shape
Out[8]: (5, 5)
Since both matmul and svd can work with higher dimensions, i.e. batches, I wonder if it's possible to call subspace_angle with all bases, rather than iteratively.
We have to think carefully about what shapes we pass it, and how they evolve.
def subspace_angle(A, B):
M = A.T # B
s = np.linalg.svd(M, compute_uv=False)
return s[0]
(Oops, my os just crashed the terminal, so I'll have get back to this later.)
So A and B are (8,4), A.T is (4,8), and A.T#B is (4,4)
If they were (5,8,4), A.transpose(0,2,1) would be (5,4,8), and M would be (5,4,4).
I believe np.linalg.svd accepts that M, returning a (5,4,4)
In [29]: r=principal_angles(bases)
In [30]: r
Out[30]:
array([[0. , 0. , 0. , 0. , 0. ],
[0.99902153, 0. , 0. , 0. , 0. ],
[0.99734371, 0.95318936, 0. , 0. , 0. ],
[0.99894054, 0.99790422, 0.87577343, 0. , 0. ],
[0.99840093, 0.92809283, 0.99896121, 0.98286429, 0. ]])
Let's try that with the whole bases. Use broadcasting to get the 'outer' product on the first dimension:
In [31]: M=bases[:,None,:,:].transpose(0,1,3,2)#bases
In [32]: r1=np.linalg.svd(M, compute_uv=False)
In [33]: M.shape
Out[33]: (5, 5, 4, 4)
In [34]: r1.shape
Out[34]: (5, 5, 4)
To match your s[0] I have to use (need to review the svd docs):
In [35]: r1[:,:,0]
Out[35]:
array([[1. , 0.99902153, 0.99734371, 0.99894054, 0.99840093],
[0.99902153, 1. , 0.95318936, 0.99790422, 0.92809283],
[0.99734371, 0.95318936, 1. , 0.87577343, 0.99896121],
[0.99894054, 0.99790422, 0.87577343, 1. , 0.98286429],
[0.99840093, 0.92809283, 0.99896121, 0.98286429, 1. ]])
time savings aren't massive, but may be better if the first dimension is larger than 5:
In [36]: timeit r=principal_angles(bases)
320 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [37]: %%timeit
...: M=bases[:,None,:,:].transpose(0,1,3,2)#bases
...: r1=np.linalg.svd(M, compute_uv=False)[:,:,0]
...:
...:
190 µs ± 450 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This may be enough to get you started with a more refined "vectorization".
After some more thinking and experimenting with np.triu_indices function, I have come up with the following solution which avoids extra unnecessary computation.
def vectorized_principal_angles(subspaces):
# number of subspaces
n = subspaces.shape[0]
# Indices for upper triangular matrix
i, j = np.triu_indices(n, k=1)
# prepare all the possible pairs of A and B
A = subspaces[i]
B = subspaces[j]
# Compute the Hermitian transpose of each matrix in A array
AH = np.conjugate(np.transpose(A, axes=(0,2,1)))
# Compute M = A^H B for each matrix pair
M = np.matmul(AH, B)
# Compute the SVD for each matrix in M
s = np.linalg.svd(M, compute_uv=False)
# keep only the first singular value for each M
s = s[:, 0]
# prepare the result matrix
# It is known in advance that diagonal elements will be 1.
r = 0.5 * np.eye(n)
r[i, j] = s
# Symmetrize the matrix
r = r + r.T
# Final result
return r
Here is what is going on:
np.triu_indices(k, k=1) gives me indices for n(n-1)/2 pairs of possible combinations of matrices.
All the remaining computation is limited to the n(n-1)/2 pairs only.
Finally, the scalar values array is put back into a square symmetric result matrix
Thank you #hpaulj for your solution. It helped me a lot in getting the right direction.

Trouble to vectorize function

For a project, I need to generate sample from function. I would like to be able to generate those samples as quickly as possible.
I have this example (in the final version, the function lambda will be provided in the arguments) The goal is to generate ys of the n points linespaced xs between start and stop using the lambda function.
def get_ys(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = [function(x, coefficients) for x in xs]
return ys
%%time
n = 1000
xs = np.random.random((n,5))
ys = np.apply_along_axis(get_ys, 1, xs)
Wall time: 616 ms
I am trying to vectorize it, and found numpy.apply_along_axis
%%time
for i in range(1000):
xs = np.random.random(5)
ys = get_ys(xs)
Wall time: 622 ms
Unfortunately it is still pretty slow :/
I am not so familiar with function vectorization, can someone guide me a little bit on how to improve the speed of the script ?
Thanks!
Edit:
example of input/output:
xs = np.ones(5)
ys = get_ys(xs)
[1.0, 0.9501385041551247, 0.9058171745152355, 0.8670360110803323, 0.8337950138504155,0.8060941828254848, 0.7839335180055402, 0.7673130193905817, 0.7562326869806094, 0.7506925207756232, 0.7506925207756232, 0.7562326869806094, 0.7673130193905817, 0.7839335180055401, 0.8060941828254847, 0.8337950138504155, 0.8670360110803323, 0.9058171745152354, 0.9501385041551246, 1.0]
def get_ys(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = [function(x, coefficients) for x in xs]
return ys
You are trying to get around calling get_ys 1000 times, once for each row of xs.
What will it take to pass xs as a whole to get_ys? In other words, what if coefficients was (n,5) instead of (5,)?
xs is (20,), and the ys will be same (right)?
The lambda is write to expect a scalar x and (5,) args. Can it be changed to work with a (20,) x and (n,5) args?
As a first step, what does function produce if given xs? That is instead of
ys = [function(x, coefficients) for x in xs]
ys = function(xs, coefficients)
As written your code iterates (at slow Python speeds) of the n (1000) rows, and the 20 linspace. So function is called 20,000 times. That's what makes your code slow.
Lets try that change
A sample run with your function:
In [126]: np.array(get_ys(np.arange(5)))
Out[126]:
array([-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ])
Replace the list comprehension with just one call to function:
In [127]: def get_ys1(coefficients, num_outputs=20, start=0., stop=1.):
...: function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
...:
...: xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
...: ys = function(xs, coefficients)
...: return ys
...:
...:
Same values:
In [128]: get_ys1(np.arange(5))
Out[128]:
array([-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ])
Comparative timings:
In [129]: timeit np.array(get_ys(np.arange(5)))
345 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [130]: timeit get_ys1(np.arange(5))
89.2 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
That's what we mean by "vectorization" - replacing python level iterations (a list comprehension) with an equivalent that makes fuller user of numpy array methods.
I suspect we can move on to work with a (n,5) coefficients, but this should be enough to get you started.
fully vectorized
By broadcasting the (n,5) against (20,) I can get a function that does not have any python loops:
def get_ys2(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[:,0]*(x-args[:,1])**2 + args[:,2]*(x-args[:,3]) + args[:,4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = function(xs[:,None], coefficients)
return ys.T
And with a (1,5) input:
In [156]: get_ys2(np.arange(5)[None,:])
Out[156]:
array([[-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ]])
With your test case:
In [146]: n = 1000
...: xs = np.random.random((n,5))
...: ys = np.apply_along_axis(get_ys, 1, xs)
In [147]: ys.shape
Out[147]: (1000, 20)
Two timings:
In [148]: timeit ys = np.apply_along_axis(get_ys, 1, xs)
...:
106 ms ± 303 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [149]: timeit ys = np.apply_along_axis(get_ys1, 1, xs)
...:
88 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
and testing this
In [150]: ys2 = get_ys2(xs)
In [151]: ys2.shape
Out[151]: (1000, 20)
In [152]: np.allclose(ys, ys2)
Out[152]: True
In [153]: timeit ys2 = get_ys2(xs)
424 µs ± 484 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
It matches values, and improves speed a lot.
In the new function, args can now be (n,5). And if x is (20,1), the result is (20,n), which I transpose on the return.

Python: Sum of all permutations of outer products of numpy arrays of arrays

I have a numpy array of arrays Ai and I want each outer product (np.outer(Ai[i],Ai[j])) to be summed with a scaling multiplier to produce H. I can step through and make them then tensordot them with a matrix of scaling factors. I think things could be significantly simplified, but haven't figured out a general/efficient way to do this for ND. How can Arr2D and H more easily be produced? Note: Arr2D could be 64 2D arrays rather than 8x8 2D arrays.
Ai = np.random.random((8,101))
Arr2D = np.zeros((Ai.shape[0], Ai.shape[0], Ai.shape[1], Ai.shape[1]))
Arr2D[:,:,:,:] = np.asarray([ np.outer(Ai[i], Ai[j]) for i in range(Ai.shape[0])
for j in range(Ai.shape[0]) ]).reshape(Ai.shape[0],Ai.shape[0],Ai[0].size,Ai[0].size)
arr = np.random.random( (Ai.shape[0] * Ai.shape[0]) )
arr2D = arr.reshape(Ai.shape[0], Ai.shape[0])
H = np.tensordot(Arr2D, arr2D, axes=([0,1],[0,1]))
Good setup to leverage einsum!
np.einsum('ij,kl,ik->jl',Ai,Ai,arr2D,optimize=True)
Timings -
In [71]: # Setup inputs
...: Ai = np.random.random((8,101))
...: arr = np.random.random( (Ai.shape[0] * Ai.shape[0]) )
...: arr2D = arr.reshape(Ai.shape[0], Ai.shape[0])
In [74]: %%timeit # Original soln
...: Arr2D = np.zeros((Ai.shape[0], Ai.shape[0], Ai.shape[1], Ai.shape[1]))
...: Arr2D[:,:,:,:] = np.asarray([ np.outer(Ai[i], Ai[j]) for i in range(Ai.shape[0])
...: for j in range(Ai.shape[0]) ]).reshape(Ai.shape[0],Ai.shape[0],Ai[0].size,Ai[0].size)
...: H = np.tensordot(Arr2D, arr2D, axes=([0,1],[0,1]))
100 loops, best of 3: 4.5 ms per loop
In [75]: %timeit np.einsum('ij,kl,ik->jl',Ai,Ai,arr2D,optimize=True)
10000 loops, best of 3: 146 µs per loop
30x+ speedup there!

Replace looping-over-axes with broadcasting, pt 2

Earlier I asked a similar question where the answer used np.dot, taking advantage of the fact that a dot product involves a sum of products. (To my understanding.)
Now I have a similar issue where I don't think dot will apply, because in place of a sum I want to take an element-wise diagonal. If it does, I haven't been able to apply it correctly.
Given a matrix x and array err:
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
My current implementation with loop is:
res = np.array([np.sqrt(np.diagonal(x * err[i])) for i in range(err.shape[0])])
print(res)
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
which takes the diagonal of x.dot(i) for each i in err. Could this be vectorized? In other words, can the output of x * err be 3-dimensional, with np.diagonal then yielding a 2d array, with one element for each diagonal?
Program:
import numpy as np
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
diag = np.diagonal(x)
ans = np.sqrt(diag*err[:,np.newaxis]) # sqrt of outer product
print(ans)
# use out keyword to avoid making new numpy array for many times.
ans = np.empty(x.shape, dtype=x.dtype)
for i in range(100):
ans = np.multiply(diag, err, out=ans)
ans = np.sqrt(ans, out=ans)
Result:
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
Here's an approach making use of diagonal-view with ndarray.flat into x and then use broadcasting for element-wise multiplication, like so -
np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Sample run -
In [108]: x = np.matrix([[ 0.02984406, -0.00257266],
...: [-0.00257266, 0.00320312]])
...:
...: err = np.array([ 7.6363226 , 13.16548267])
...:
In [109]: np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Out[109]:
array([[ 0.47738755, 0.15639712],
[ 0.62682649, 0.20535487]])
Runtime test to see how a view helps over np.diagonal that creates a copy -
In [104]: x = np.matrix(np.random.rand(5000,5000))
In [105]: err = np.random.rand(5000)
In [106]: %timeit np.diagonal(x)*err[:,np.newaxis]
10 loops, best of 3: 66.8 ms per loop
In [107]: %timeit x.flat[::x.shape[1]+1].A1 * err[:,None]
10 loops, best of 3: 37.7 ms per loop

Constructing a 3D cube of points from a list

I have a list pts containing N points (Python floats). I wish to construct a NumPy array of dimension N*N*N*3 such that the array is equivalent to:
for i in xrange(0, N):
for j in xrange(0, N):
for k in xrange(0, N):
arr[i,j,k,0] = pts[i]
arr[i,j,k,1] = pts[j]
arr[i,j,k,2] = pts[k]
I am wondering how I can exploit the array broadcasting rules of NumPy and functions such as tile to simplify this.
I think that the following should work:
pts = np.array(pts) #Skip if pts is a numpy array already
lp = len(pts)
arr = np.zeros((lp,lp,lp,3))
arr[:,:,:,0] = pts[:,None,None] #None is the same as np.newaxis
arr[:,:,:,1] = pts[None,:,None]
arr[:,:,:,2] = pts[None,None,:]
A quick test:
import numpy as np
import timeit
def meth1(pts):
pts = np.array(pts) #Skip if pts is a numpy array already
lp = len(pts)
arr = np.zeros((lp,lp,lp,3))
arr[:,:,:,0] = pts[:,None,None] #None is the same as np.newaxis
arr[:,:,:,1] = pts[None,:,None]
arr[:,:,:,2] = pts[None,None,:]
return arr
def meth2(pts):
lp = len(pts)
N = lp
arr = np.zeros((lp,lp,lp,3))
for i in xrange(0, N):
for j in xrange(0, N):
for k in xrange(0, N):
arr[i,j,k,0] = pts[i]
arr[i,j,k,1] = pts[j]
arr[i,j,k,2] = pts[k]
return arr
pts = range(10)
a1 = meth1(pts)
a2 = meth2(pts)
print np.all(a1 == a2)
NREPEAT = 10000
print timeit.timeit('meth1(pts)','from __main__ import meth1,pts',number=NREPEAT)
print timeit.timeit('meth2(pts)','from __main__ import meth2,pts',number=NREPEAT)
results in:
True
0.873255968094 #my way
11.4249279499 #original
So this new method is an order of magnitude faster as well.
import numpy as np
N = 10
pts = xrange(0,N)
l = [ [ [ [ pts[i],pts[j],pts[k] ] for k in xrange(0,N) ] for j in xrange(0,N) ] for i in xrange(0,N) ]
x = np.array(l, np.int32)
print x.shape # (10,10,10,3)
This can be done in two lines:
def meth3(pts):
arrs = np.broadcast_arrays(*np.ix_(pts, pts, pts))
return np.concatenate([a[...,None] for a in arrs], axis=3)
However, this method is not as fast as mgilson's answer, because concatenate is annoyingly slow. A generalized version of his answer performs roughly as well, though, and can generate the result you want (i.e. an n-dimensional cartesian product contained within an n-dimensional grid) for any set of arrays.
def meth4(arrs): # or meth4(*arrs) for a simplified interface
arr = np.empty([len(a) for a in arrs] + [len(arrs)])
for i, a in enumerate(np.ix_(*arrs)):
arr[...,i] = a
return arr
This accepts any sequence of sequences, as long as it can be converted into a sequence of numpy arrays:
>>> meth4([[0, 1], [2, 3]])
array([[[ 0., 2.],
[ 0., 3.]],
[[ 1., 2.],
[ 1., 3.]]])
And the cost of this generality isn't too high -- it's only twice as slow for small pts arrays:
>>> (meth4([pts, pts, pts]) == meth1(pts)).all()
True
>>> %timeit meth4([pts, pts, pts])
10000 loops, best of 3: 27.4 us per loop
>>> %timeit meth1(pts)
100000 loops, best of 3: 13.1 us per loop
And it's actually a bit faster for larger ones (although the speed gain is probably due to my use of empty instead of zeros):
>>> pts = np.linspace(0, 1, 100)
>>> %timeit meth4([pts, pts, pts])
100 loops, best of 3: 13.4 ms per loop
>>> %timeit meth1(pts)
100 loops, best of 3: 16.7 ms per loop

Categories