By default, numpy is row major.
Therefore, the following results are accepted naturally to me.
a = np.random.rand(5000, 5000)
%timeit a[0,:].sum()
3.57 µs ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a[:,0].sum()
38.8 µs ± 8.19 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Because it is a row major order, it is natural to calculate faster by a [0,:].
However, if use the numpy sum function, the result is different.
%timeit a.sum(axis=0)
16.9 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a.sum(axis=1)
29.5 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If use the numpy sum function, it is faster to compute it along the column.
So My point is why the speed along the axis = 0 (calculated along column) is faster than the along the axis = 1(along row).
For example
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], order='C')
In the row major order, [1,2,3] and [4,5,6], [7,8,9] are allocated to adjacent memory, respectively.
Therefore, the speed calculated along axis = 1 should be faster than axis = 0.
However, when using numpy sum function, it is faster to calculate along the column (axis = 0).
How can you explain this?
Thanks
You don't compute the same thing.
The first two commands only compute one row/column out of the entire array.
a[0, :].sum().shape # sums just the first row only
()
The second two commands, sum the entire contents of the 2D array, but along a certain axis. That way, you don't get a single result (as in the first two commands), but an 1D array of sums.
a.sum(axis=0).shape # computes the row-wise sum for each column
(5000,)
In summary, the two sets of commands do different things.
a
array([[1, 6, 9, 1, 6],
[5, 6, 9, 1, 3],
[5, 0, 3, 5, 7],
[2, 8, 3, 8, 6],
[3, 4, 8, 5, 0]])
a[0, :]
array([1, 6, 9, 1, 6])
a[0, :].sum()
23
a.sum(axis=0)
array([16, 24, 32, 20, 22])
Related
In https://numpy.org/doc/stable/reference/generated/numpy.einsum.html
it is mentioned that
Broadcasting and scalar multiplication:
np.einsum('..., ...', 3, c) array([[ 0, 3, 6],[ 9, 12, 15]])
it seems einsum can mimick prefactors alpha/beta in DGEMM
http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html
Does it imply that it (include scalar multiplication inside einsum as one step) will be faster than two steps: (1) A,B->C and (2) C*prefactor?
I tried to extend https://ajcr.net/Basic-guide-to-einsum/ as
import numpy as np
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
C = np.einsum('i,ij->i', 2., A, B)
print(C)
and got ValueError: einstein sum subscripts string contains too many subscripts for operand.
So, my question is, is there any method to include scalar factor inside einsum and accelerate the calculation?
I haven't used this scalar feature, but here's how it works:
In [422]: np.einsum('i,ij->i',A,B)
Out[422]: array([ 0, 22, 76])
In [423]: np.einsum(',i,ij->i',2,A,B)
Out[423]: array([ 0, 44, 152])
The time savings appears to be minor
In [424]: timeit np.einsum(',i,ij->i',2,A,B)
11.5 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [425]: timeit 2*np.einsum('i,ij->i',A,B)
12.3 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
another example:
In [427]: np.einsum(',i,,ij->i',3,A,2,B)
Out[427]: array([ 0, 132, 456])
I want to add a different number to each row of the matrix below.
array([[ 6, 6, 6, 6],
[ 1, -5, -11, -17],
[ 1, 7, 13, 19]], dtype=int64)
For example I want to add this array to the matrix:
array([-4, -3, 0])
Add the -4 of the array to the first row so it will be array([2, 2, 2, 2], dtype=int64)
The whole matrix should then look like this:
array([[ 2, 2, 2, 2],
[ -2, -8, -14, -20],
[ 1, 7, 13, 19]], dtype=int64)
I could of course transform the 1d array to a matrix, but I wanted to know if there is maybe another option.
You can do it in several ways:
Using .reshape: it will create a "column-vector" instead of a "row-vector"
a + b.reshape((-1,1))
Creating a new array then transposing it:
a + np.array([b]).T
Using numpy.atleast_2d:
a + np.atleast_2d(b).T
All of them with the same output:
array([[ 2, 2, 2, 2],
[ -2, -8, -14, -20],
[ 1, 7, 13, 19]])
Performance
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + b.reshape((-1,1))
#3.39 ms ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + np.array([b]).T
#3.4 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + np.atleast_2d(b).T
#3.37 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True
Let’s say I have two NumPy arrays, a and b:
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([8,9])
And I would like to append the same array b to every row (ie. adding multiple columns) to get an array, c:
b = np.array([
[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]
])
How can I do this easily and efficiently in NumPy?
I am especially concerned about its behaviour with big datasets (where a is much bigger than b), is there any way around creating many copies (ie. a.shape[0]) of b?
Related to this question, but with multiple values.
Here's one way. I assume it's efficient because it's vectorised. It relies on the fact that in matrix multiplication, pre-multiplying a row by the column (1, 1) will produce two stacked copies of the row.
import numpy as np
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([[8,9]])
np.concatenate([a, np.array([[1],[1]]).dot(b)], axis=1)
Out: array([[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]])
Note that b is specified slightly differently (as a two-dimensional array).
Is there any way around creating many copies of b?
The final result contains those copies (and numpy arrays are literally arrays of values in memory), so I don't see how.
An alternative to concatenate approach is to make a recipient array, and copy values to it:
In [483]: a = np.arange(300).reshape(100,3)
In [484]: b=np.array([8,9])
In [485]: res = np.zeros((100,5),int)
In [486]: res[:,:3]=a
In [487]: res[:,3:]=b
sample timings
In [488]: %%timeit
...: res = np.zeros((100,5),int)
...: res[:,:3]=a
...: res[:,3:]=b
...:
...:
6.11 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [491]: timeit np.concatenate((a, b.repeat(100).reshape(2,-1).T),1)
7.74 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [164]: timeit np.concatenate([a, np.ones([a.shape[0],1], dtype=int).dot(np.array([b]))], axis=1)
8.58 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The way I solved this initially was :
c = np.concatenate([a, np.tile(b, (a.shape[0],1))], axis = 1)
But this feels very inefficient...
I want to reshape a numpy array arr to a shape of (before, at, after) for any one axis of arr. How to do this faster?
The axis has been normalized: 0 <= axis < arr.ndim
Program:
import numpy as np
def f(arr, axis):
shape = arr.shape
before = int(np.product(shape[:axis]))
at = shape[axis]
return arr.reshape(before, at, -1)
Test:
a = np.arange(2 * 3 * 4 * 5).reshape(2, 3, 4, 5)
print(f(a, 2).shape)
Result:
(6, 4, 5)
shape is a tuple, and the desired result is also a tuple. Convert to/from arrays to use np.prod or some other array function will take time. So if we can do the same with plain Python code we might save time.
For example with shape:
In [309]: shape
Out[309]: (2, 3, 4, 5)
In [310]: np.prod(shape)
Out[310]: 120
In [311]: functools.reduce(operator.mul,shape)
Out[311]: 120
In [312]: timeit np.prod(shape)
13.6 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [313]: timeit functools.reduce(operator.mul,shape)
647 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The python version is noticeably faster. I had to import functools and operator to get the multiplication equivalent of sum (Python3).
Or to get the new shape tuple:
In [314]: axis=2
In [315]: (functools.reduce(operator.mul,shape[:axis]),shape[axis],-1)
Out[315]: (6, 4, -1)
In [316]: timeit (functools.reduce(operator.mul,shape[:axis]),shape[axis],-1)
739 ns ± 30.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
comparing the proposed reduceat:
In [318]: tuple(np.multiply.reduceat(shape, (0, axis, axis+1)))
Out[318]: (6, 4, 5)
In [319]: timeit tuple(np.multiply.reduceat(shape, (0, axis, axis+1)))
11.3 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If your axis is really in the middle you can use np.multiply.reduceat
shape = (2, 3, 4, 5, 6)
axis = 2
np.multiply.reduceat(shape, (0, axis, axis+1))
# array([ 6, 4, 30])
axis = 3
np.multiply.reduceat(shape, (0, axis, axis+1))
# array([24, 5, 6])
If you want the zeroth or last axis you'll have to special case, though.