I want to mask a numpy array a with mask. The mask doesn't have exactly the same shape as a, but it is possible to mask a anyway (I guess because of the additional dimension being 1-dimensional (broadcasting?)).
a.shape
>>> (3, 9, 31, 2, 1)
mask.shape
>>> (3, 9, 31, 2)
masked_a = ma.masked_array(a, mask)
The same logic however, does not apply to array b which has 5 elements in its last dimension.
ext_mask = mask[..., np.newaxis] # extending or not extending has same effect
ext_mask.shape
>>> (3, 9, 31, 2, 1)
b.shape
>>> (3, 9, 31, 2, 5)
masked_b = ma.masked_array(b, ext_mask)
>>> numpy.ma.core.MaskError: Mask and data not compatible: data size is 8370, mask size is 1674.
How can I create a (3, 9, 31, 2, 5) mask from a (3, 9, 31, 2) mask by expanding any True value in the last dimension of the (3, 9, 31, 2) mask to [True, True, True, True, True] (and False respectively)?
This gives the desired result:
masked_b = ma.masked_array(*np.broadcast(b, ext_mask))
I have not profiled this method, but it should be faster than allocating a new mask. According to the documentation, no data is copied:
These arrays are views on the original arrays. They are typically not
contiguous. Furthermore, more than one element of a broadcasted array
may refer to a single memory location. If you need to write to the
arrays, make copies first.
It is possible to verify the no-copying behavior:
bb, mb = np.broadcast(b, ext_mask)
print(mb.shape) # (3, 9, 31, 2, 5) - same shape as b
print(mb.base.shape) # (3, 9, 31, 2) - the shape of the original mask
print(mb.strides) # (558, 62, 2, 1, 0) - that's how it works: 0 stride
Pretty impressive how the numpy developers implemented broadcasting. Values are repeated by using a stride of 0 along the last dimension. Whow!
Edit
I compared the speed of broadcasting and allocating with this code:
import numpy as np
from numpy import ma
a = np.random.randn(30, 90, 31, 2, 1)
b = np.random.randn(30, 90, 31, 2, 5)
mask = np.random.randn(30, 90, 31, 2) > 0
ext_mask = mask[..., np.newaxis]
def broadcasting(a=a, b=b, ext_mask=ext_mask):
mb1 = ma.masked_array(*np.broadcast_arrays(b, ext_mask))
def allocating(a=a, b=b, ext_mask=ext_mask):
m2 = np.empty(b.shape, dtype=bool)
m2[:] = ext_mask
mb2 = ma.masked_array(b, m2)
Broadcasting is clearly faster than allocating, here:
# array size: (30, 90, 31, 2, 5)
In [23]: %timeit broadcasting()
The slowest run took 10.39 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.4 µs per loop
In [24]: %timeit allocating()
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 982 µs per loop
Note that I had to increase array size for the difference in speed to become apparent. With the original array dimensions allocating was slightly faster than broadcasting:
# array size: (3, 9, 31, 2, 5)
In [28]: %timeit broadcasting()
The slowest run took 9.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39 µs per loop
In [29]: %timeit allocating()
The slowest run took 9.22 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.6 µs per loop
The broadcasting solution's runtime seems not to depend on array size.
Related
I am having some problems implementing the following equation in a performant way using Python:
beta and gamma are cartesian coordinates {x,y} and b,m are some index value which can be quite big n=10000. I have a working version of the code which is shown below for the simple case of l=2 and m,b = 4 (l and m always have the same length). I checked the code using timeit and the the bottleneck is the element-wise multiplication with an array of size (3,3) and the reshaping of the resulting array into shape (3m,3m).
Does anybody has an idea how to increase the performance? (I also noticed that my current version suffers a quite big overhead for large values of l....)
import numpy as np
g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
g_l33 = g_l3.reshape(-1, 3, 1) * g_l3.reshape(-1, 1, 3)
A_lm = np.arange(1, 9, 1).reshape(2, 4)
B_lb = np.arange(7, 15, 1).reshape(2, 4)
AB_lmb = A_lm.reshape(-1, 4, 1) * B_lb.reshape(-1, 1, 4)
D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1), axis=0)
D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
In [387]: %%timeit
...: g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
...
...: D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4,
...: 4, 1, 1), axis=0)
...: D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
...:
...:
70.7 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Examining the pieces, and rewriting the reshape with newaxis, which is visually clearer to me - though basically the same speed:
In [388]: g_l3.shape
Out[388]: (2, 3)
In [389]: g_l33.shape
Out[389]: (2, 3, 3)
In [390]: np.allclose(g_l33, g_l3[:,:,None]*g_l3[:,None,:])
Out[390]: True
In [391]: AB_lmb.shape
Out[391]: (2, 4, 4)
In [392]: np.allclose(AB_lmb, A_lm[:,:,None]*B_lb[:,None,:])
Out[392]: True
So these the common outer products on the last dimension of 2d arrays.
And another outer,
In [393]: temp=g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1)
In [394]: temp.shape
Out[394]: (2, 4, 4, 3, 3)
In [396]: np.allclose(temp, g_l33[:,None,None,:,:] * AB_lmb[:, :,:, None,None])
Out[396]: True
These probably could be combined into one expression, but that's not necessary.
D_lmb33 sums on the leading dimension:
In [405]: D_lmb33.shape
Out[405]: (4, 4, 3, 3)
the double concatenate can also be done with a transpose and reshape:
In [406]: np.allclose(D_lmb33.transpose(1,2,0,3).reshape(12,12),D)
Out[406]: True
Overall your code appears to make efficient use of the numpy. For a large leading dimension that (N,4,4,3,3) intermediate array could be large, and take time. But within numpy itself there isn't an alternative. I don't think the algebra allows us to do the sum earlier. Using numba or numexpr another question.
Is there any quicker way to physically transpose a large 2D numpy matrix than array.transpose.copy()? And are there any routines for doing it with efficient memory use?
It may be worth looking at what transpose does, just so we are clear about what you mean by 'physically tranposing'.
Start with a small (4,3) array:
In [51]: arr = np.array([[1,2,3],[10,11,12],[22,23,24],[30,32,34]])
In [52]: arr
Out[52]:
array([[ 1, 2, 3],
[10, 11, 12],
[22, 23, 24],
[30, 32, 34]])
This is stored with a 1d data buffer, which we can display with ravel:
In [53]: arr.ravel()
Out[53]: array([ 1, 2, 3, 10, 11, 12, 22, 23, 24, 30, 32, 34])
and strides which tell it to step columns by 8 bytes, and rows by 24 (3*8):
In [54]: arr.strides
Out[54]: (24, 8)
We can ravel with the "F" order - that's going down the rows:
In [55]: arr.ravel(order='F')
Out[55]: array([ 1, 10, 22, 30, 2, 11, 23, 32, 3, 12, 24, 34])
While [53] is a view, [55] is a copy.
Now the transpose:
In [57]: arrt=arr.T
In [58]: arrt
Out[58]:
array([[ 1, 10, 22, 30],
[ 2, 11, 23, 32],
[ 3, 12, 24, 34]])
This a view; we can tranverse the [53] data buffer, going down rows with 8 byte steps. Doing calculations with the arrt is basically just as fast as with arr. With the strided iteration, order 'F' is just as fast as order 'C'.
In [59]: arrt.strides
Out[59]: (8, 24)
the original order:
In [60]: arrt.ravel(order='F')
Out[60]: array([ 1, 2, 3, 10, 11, 12, 22, 23, 24, 30, 32, 34])
but doing a 'C' ravel creates a copy, same as [55]
In [61]: arrt.ravel(order='C')
Out[61]: array([ 1, 10, 22, 30, 2, 11, 23, 32, 3, 12, 24, 34])
Doing a copy of the transpose makes an array that's transpose with 'C' order. This is your 'physical transpose':
In [62]: arrc = arrt.copy()
In [63]: arrc.strides
Out[63]: (32, 8)
Reshaping a transpose as done with [61] does make a copy, but usually we don't need to explicitly make the copy. I think the only reason to do so is to avoid several redundant copies in later calculations.
I assume that you need to do a row-wise operation that uses the CPU cache more efficiently if rows are contiguous in memory, and you don't have enough memory available to make a copy.
Wikipedia has an article on in-place matrix transposition. It turns out that such a transposition is nontrivial. Here is a follow-the-cycles algorithm as described there:
import numpy as np
from numba import njit
#njit # comment this line for debugging
def transpose_in_place(a):
"""In-place matrix transposition for a rectangular matrix.
https://stackoverflow.com/a/62507342/6228891
Parameter:
- a: 2D array. Unless it's a square matrix, it will be scrambled
in the process.
Return:
- transposed array, using the same in-memory data storage as the
input array.
This algorithm is typically 10x slower than a.T.copy().
Only use it if you are short on memory.
"""
if a.shape == (1, 1):
return a # special case
n, m = a.shape
# find max length L of permutation cycle by starting at a[0,1].
# k is the index in the flat buffer; i, j are the indices in
# a.
L = 0
k = 1
while True:
j = k % m
i = k // m
k = n*j + i
L += 1
if k == 1:
break
permut = np.zeros(L, dtype=np.int32)
# Now do the permutations, one cycle at a time
seen = np.full(n*m, False)
aflat = a.reshape(-1) # flat view
for k0 in range(1, n*m-1):
if seen[k0]:
continue
# construct cycle
k = k0
permut[0] = k0
q = 1 # size of permutation array
while True:
seen[k] = True
# note that this is slightly faster than the formula
# on Wikipedia, k = n*k % (n*m-1)
i = k // m
j = k - i*m
k = n*j + i
if k == k0:
break
permut[q] = k
q += 1
# apply cyclic permutation
tmp = aflat[permut[q-1]]
aflat[permut[1:q]] = aflat[permut[:q-1]]
aflat[permut[0]] = tmp
aT = aflat.reshape(m, n)
return aT
def test_transpose(n, m):
a = np.arange(n*m).reshape(n, m)
aT = a.T.copy()
assert np.all(transpose_in_place(a) == aT)
def roundtrip_inplace(a):
a = transpose_in_place(a)
a = transpose_in_place(a)
def roundtrip_copy(a):
a = a.T.copy()
a = a.T.copy()
if __name__ == '__main__':
test_transpose(1, 1)
test_transpose(3, 4)
test_transpose(5, 5)
test_transpose(1, 5)
test_transpose(5, 1)
test_transpose(19, 29)
Even though I'm using numba.njit here so that the loops in the transpose function are compiled, it's still quite a bit slower than a copy-transpose.
n, m = 1000, 10000
a_big = np.arange(n*m, dtype=np.float64).reshape(n, m)
%timeit -r2 -n10 roundtrip_copy(a_big)
54.5 ms ± 153 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
%timeit -r2 -n1 roundtrip_inplace(a_big)
614 ms ± 141 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Whatever you do will require O(n^2) time and memory. I would assume that .transpose and .copy (written in C) will be the most efficient choice for your application.
Edit: this assumes you actually need to copy the matrix
I have a 2D numpy array, say array1 with values. array1 is of dimensions 2x4. I want to create a 4D numpy array array2 with dimensions 20x20x2x4 and I wish to replicate the array array1 to get this array.
That is, if array1 was
[[1, 2, 3, 4],
[5, 6, 7, 8]]
I want
array2[0, 0] = array1
array2[0, 1] = array1
array2[0, 2] = array1
array2[0, 3] = array1
# etc.
How can I do this?
One approach with initialization -
array2 = np.empty((20,20) + array1.shape,dtype=array1.dtype)
array2[:] = array1
Runtime test -
In [400]: array1 = np.arange(1,9).reshape(2,4)
In [401]: array1
Out[401]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
# #MSeifert's soln
In [402]: %timeit np.tile(array1, (20, 20, 1, 1))
100000 loops, best of 3: 8.01 µs per loop
# Proposed soln in this post
In [403]: %timeit initialization_based(array1)
100000 loops, best of 3: 4.11 µs per loop
# #MSeifert's soln for READONLY-view
In [406]: %timeit np.broadcast_to(array1, (20, 20, 2, 4))
100000 loops, best of 3: 2.78 µs per loop
There are two easy ways:
np.broadcast_to:
array2 = np.broadcast_to(array1, (20, 20, 2, 4)) # array2 is a READONLY-view
and np.tile:
array2 = np.tile(array1, (20, 20, 1, 1)) # array2 is a normal numpy array
If you don't want to modify your array2 then np.broadcast_to should be really fast and simple. Otherwise np.tile or assigning to a new allocated array (see Divakars answer) should be preferred.
i got the answer.
array2[:, :, :, :] = array1.copy()
this should work fine
I am looking for a more optimized way to convert a (n,n) or (n,n,1) matrix to a (n,n,3) matrix. I start out with an (n,n,3), but my dimensions get reduced after I perform a sum over the second axis to (n,n). Essentially, I want to keep the original size of the array and have the second axis just repeated 3 times. The reason I need this is that I will later be broadcasting it with another (n,n,3) array, but they need the same dimensions.
My current method works, but does not seem elegant.
a0=np.random.random((n,n))
b=a.flatten().tolist()
a=np.array(zip(b,b,b))
a.shape=n,n,3
This setup has the desired result, but is clunky and hard to follow. Is there perhaps a way to go directly from an (n,n) to an (n,n,3) by duplicating the second index? or perhaps a way to not downsize the array to begin with?
None or np.newaxis is a common way of adding a dimension to an array. reshape with (3,3,1) works just as well:
In [64]: arr=np.arange(9).reshape(3,3)
In [65]: arr1 = arr[...,None]
In [66]: arr1.shape
Out[66]: (3, 3, 1)
repeat as function or method replicates this.
In [72]: arr2=arr1.repeat(3,axis=2)
In [73]: arr2.shape
Out[73]: (3, 3, 3)
In [74]: arr2[0,0,:]
Out[74]: array([0, 0, 0])
But you might not need to do this. With broadcasting a (3,3,1) works with a (3,3,3).
In [75]: (arr1+arr2).shape
Out[75]: (3, 3, 3)
In fact it will broadcast with a (3,) to produce (3,3,3).
In [77]: arr1+np.ones(3,int)
Out[77]:
array([[[1, 1, 1],
[2, 2, 2],
...
[[7, 7, 7],
[8, 8, 8],
[9, 9, 9]]])
So arr1+np.zeros(3,int) is another way of expanding that (3,3,1) to (3,3,3).
The broadcasting rules are:
(3,3,1) + (3,) => (3,3,1) + (1,1,3) => (3,3,3)
broadcasting adds dimensions at the start as needed.
When you sum on an axis, you can keep the original number of dimensions with a parameter:
In [78]: arr2.sum(axis=2).shape
Out[78]: (3, 3)
In [79]: arr2.sum(axis=2, keepdims=True).shape
Out[79]: (3, 3, 1)
This is handy if you want to subtract the mean from an array along any dimension:
arr2-arr2.mean(axis=2, keepdims=True)
You can firstly create a new axis (axis = 2) on a and then use np.repeat along this new axis:
np.repeat(a[:,:,None], 3, axis = 2)
Or another approach, flatten the array, repeat elements and then reshape:
np.repeat(a.ravel(), 3).reshape(n,n,3)
The result comparison:
import numpy as np
n = 4
a=np.random.random((n,n))
b=a.flatten().tolist()
a1=np.array(zip(b,b,b))
a1.shape=n,n,3
# a1 is the result from the original method
(np.repeat(a[:,:,None], 3, axis = 2) == a1).all()
# True
(np.repeat(a.ravel(), 3).reshape(4,4,3) == a1).all()
# True
Timing, use built-in numpy.repeat also shows a speed up:
import numpy as np
n = 4
a=np.random.random((n,n))
def rep():
b=a.flatten().tolist()
a1=np.array(zip(b,b,b))
a1.shape=n,n,3
%timeit rep()
# 100000 loops, best of 3: 7.11 µs per loop
%timeit np.repeat(a[:,:,None], 3, axis = 2)
# 1000000 loops, best of 3: 1.64 µs per loop
%timeit np.repeat(a.ravel(), 3).reshape(4,4,3)
# 1000000 loops, best of 3: 1.9 µs per loop
This is motivated by my answer here.
Given array A with shape (n0,n1), and array J with shape (n0), I'd like to create an array B with shape (n0) such that
B[i] = A[i,J[i]]
I'd also like to be able to generalize this to k-dimensional arrays, where A has shape (n0,n1,...,nk) and J has shape (n0,n1,...,n(k-1))
There are messy, flattening ways of doing this that make assumptions about index order:
import numpy as np
B = A.ravel()[ J+A.shape[-1]*np.arange(0,np.prod(J.shape)).reshape(J.shape) ]
The question is, is there a way to do this that doesn't rely on flattening arrays and dealing with indexes manually?
For the 2 and 1d case, this indexing works:
A[np.arange(J.shape[0]), J]
Which can be applied to more dimensions by reshaping to 2d (and back):
A.reshape(-1, A.shape[-1])[np.arange(np.prod(A.shape[:-1])).reshape(J.shape), J]
For 3d A this works:
A[np.arange(J.shape[0])[:,None], np.arange(J.shape[1])[None,:], J]
where the 1st 2 arange indices broadcast to the same dimension as J.
With functions in lib.index_tricks, this can be expressed as:
A[np.ogrid[0:J.shape[0],0:J.shape[1]]+[J]]
A[np.ogrid[slice(J.shape[0]),slice(J.shape[1])]+[J]]
or for multiple dimensions:
A[np.ix_(*[np.arange(x) for x in J.shape])+(J,)]
A[np.ogrid[[slice(k) for k in J.shape]]+[J]]
For small A and J (eg 2*3*4), J.choose(np.rollaxis(A,-1)) is faster. All of the extra time is in preparing the index tuple. np.ix_ is faster than np.ogrid.
np.choose has a size limit. At its upper end it is slower than ix_:
In [610]: Abig=np.arange(31*31).reshape(31,31)
In [611]: Jbig=np.arange(31)
In [612]: Jbig.choose(np.rollaxis(Abig,-1))
Out[612]:
array([ 0, 32, 64, 96, 128, 160, ... 960])
In [613]: timeit Jbig.choose(np.rollaxis(Abig,-1))
10000 loops, best of 3: 73.1 µs per loop
In [614]: timeit Abig[np.ix_(*[np.arange(x) for x in Jbig.shape])+(Jbig,)]
10000 loops, best of 3: 22.7 µs per loop
In [635]: timeit Abig.ravel()[Jbig+Abig.shape[-1]*np.arange(0,np.prod(Jbig.shape)).reshape(Jbig.shape) ]
10000 loops, best of 3: 44.8 µs per loop
I did similar indexing tests at https://stackoverflow.com/a/28007256/901925, and found that flat indexing was faster for much larger arrays (e.g. n0=1000). That's where I learned about the 32 limit for choice.
It doesn't solve your problem exactly, but choose() should nevertheless help:
>>> A = array(range(1, 28)).reshape(3, 3, 3)
>>> B = array([0, 0, 0, 1, 1, 1, 2, 2, 2]).reshape(3, 3)
>>> B.choose(A)
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
It selects among the first dimension instead of the last.