I want to reshape a numpy array arr to a shape of (before, at, after) for any one axis of arr. How to do this faster?
The axis has been normalized: 0 <= axis < arr.ndim
Program:
import numpy as np
def f(arr, axis):
shape = arr.shape
before = int(np.product(shape[:axis]))
at = shape[axis]
return arr.reshape(before, at, -1)
Test:
a = np.arange(2 * 3 * 4 * 5).reshape(2, 3, 4, 5)
print(f(a, 2).shape)
Result:
(6, 4, 5)
shape is a tuple, and the desired result is also a tuple. Convert to/from arrays to use np.prod or some other array function will take time. So if we can do the same with plain Python code we might save time.
For example with shape:
In [309]: shape
Out[309]: (2, 3, 4, 5)
In [310]: np.prod(shape)
Out[310]: 120
In [311]: functools.reduce(operator.mul,shape)
Out[311]: 120
In [312]: timeit np.prod(shape)
13.6 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [313]: timeit functools.reduce(operator.mul,shape)
647 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The python version is noticeably faster. I had to import functools and operator to get the multiplication equivalent of sum (Python3).
Or to get the new shape tuple:
In [314]: axis=2
In [315]: (functools.reduce(operator.mul,shape[:axis]),shape[axis],-1)
Out[315]: (6, 4, -1)
In [316]: timeit (functools.reduce(operator.mul,shape[:axis]),shape[axis],-1)
739 ns ± 30.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
comparing the proposed reduceat:
In [318]: tuple(np.multiply.reduceat(shape, (0, axis, axis+1)))
Out[318]: (6, 4, 5)
In [319]: timeit tuple(np.multiply.reduceat(shape, (0, axis, axis+1)))
11.3 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If your axis is really in the middle you can use np.multiply.reduceat
shape = (2, 3, 4, 5, 6)
axis = 2
np.multiply.reduceat(shape, (0, axis, axis+1))
# array([ 6, 4, 30])
axis = 3
np.multiply.reduceat(shape, (0, axis, axis+1))
# array([24, 5, 6])
If you want the zeroth or last axis you'll have to special case, though.
Related
There is a random 1D array m_0
np.array([0, 1, 2])
I need to generate two 1D arrays:
np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
Is there faster way to do it than this one:
import numpy as np
import time
N = 3
m_0 = np.arange(N)
t = time.time()
m_1 = np.tile(m_0, N)
m_2 = np.repeat(m_0, N)
t = time.time() - t
Size of m_0 is 10**3
You could use itertools.product to form the Cartesian product of m_0 with itself, then take the result apart again to get your two arrays.
import numpy as np
from itertools import product
N = 3
m_0 = np.arange(N)
m_2, m_1 = map(np.array, zip(*product(m_0, m_0)))
# m_1 is now array([0, 1, 2, 0, 1, 2, 0, 1, 2])
# m_2 is now array([0, 0, 0, 1, 1, 1, 2, 2, 2])
However, for large N this is probably quite a bit less performant than your solution, as it probably can't use many of NumPy's SIMD optimizations.
For alternatives and comparisons, you'll probably want to look at the answers to Cartesian product of x and y array points into single array of 2D points.
I guess you could try reshape:
>>> np.reshape([m_0]*3, (-1,), order='C')
array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> np.reshape([m_0]*3, (-1,), order='F')
array([0, 0, 0, 1, 1, 1, 2, 2, 2])
Should be tiny bit faster for larger arrays.
>>> m_0 = np.random.randint(0, 10**3, size=(10**3,))
>>> %timeit np.tile([m_0]*10**3, N)
5.85 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit np.reshape([m_0]*10**3, (-1,), order='C')
1.94 ms ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
You can slightly improve speed if you reuse your first variable to create the second.
N=1000
%timeit t = np.arange(N); a = np.tile(t, N); b = np.repeat(t, N)
%timeit t = np.arange(N); a = np.tile(t, N); b = np.reshape(a.reshape((N,N)),-1,'F')
7.55 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.54 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you insist on speeding it up further, you can specify the dtype of your array.
%timeit t = np.arange(N,dtype=np.uint16); a = np.tile(t, N); b = np.repeat(t, N)
%timeit t = np.arange(N,dtype=np.uint16); a = np.tile(t, N); b = np.reshape(a.reshape((N,N)),-1,'F')
6.03 ms ± 587 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.2 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Be sure to keep the data type limit in mind.
I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True
I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])
Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]
I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
By default, numpy is row major.
Therefore, the following results are accepted naturally to me.
a = np.random.rand(5000, 5000)
%timeit a[0,:].sum()
3.57 µs ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a[:,0].sum()
38.8 µs ± 8.19 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Because it is a row major order, it is natural to calculate faster by a [0,:].
However, if use the numpy sum function, the result is different.
%timeit a.sum(axis=0)
16.9 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a.sum(axis=1)
29.5 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If use the numpy sum function, it is faster to compute it along the column.
So My point is why the speed along the axis = 0 (calculated along column) is faster than the along the axis = 1(along row).
For example
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], order='C')
In the row major order, [1,2,3] and [4,5,6], [7,8,9] are allocated to adjacent memory, respectively.
Therefore, the speed calculated along axis = 1 should be faster than axis = 0.
However, when using numpy sum function, it is faster to calculate along the column (axis = 0).
How can you explain this?
Thanks
You don't compute the same thing.
The first two commands only compute one row/column out of the entire array.
a[0, :].sum().shape # sums just the first row only
()
The second two commands, sum the entire contents of the 2D array, but along a certain axis. That way, you don't get a single result (as in the first two commands), but an 1D array of sums.
a.sum(axis=0).shape # computes the row-wise sum for each column
(5000,)
In summary, the two sets of commands do different things.
a
array([[1, 6, 9, 1, 6],
[5, 6, 9, 1, 3],
[5, 0, 3, 5, 7],
[2, 8, 3, 8, 6],
[3, 4, 8, 5, 0]])
a[0, :]
array([1, 6, 9, 1, 6])
a[0, :].sum()
23
a.sum(axis=0)
array([16, 24, 32, 20, 22])
I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])
Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]
I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)