In https://numpy.org/doc/stable/reference/generated/numpy.einsum.html
it is mentioned that
Broadcasting and scalar multiplication:
np.einsum('..., ...', 3, c) array([[ 0, 3, 6],[ 9, 12, 15]])
it seems einsum can mimick prefactors alpha/beta in DGEMM
http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html
Does it imply that it (include scalar multiplication inside einsum as one step) will be faster than two steps: (1) A,B->C and (2) C*prefactor?
I tried to extend https://ajcr.net/Basic-guide-to-einsum/ as
import numpy as np
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
C = np.einsum('i,ij->i', 2., A, B)
print(C)
and got ValueError: einstein sum subscripts string contains too many subscripts for operand.
So, my question is, is there any method to include scalar factor inside einsum and accelerate the calculation?
I haven't used this scalar feature, but here's how it works:
In [422]: np.einsum('i,ij->i',A,B)
Out[422]: array([ 0, 22, 76])
In [423]: np.einsum(',i,ij->i',2,A,B)
Out[423]: array([ 0, 44, 152])
The time savings appears to be minor
In [424]: timeit np.einsum(',i,ij->i',2,A,B)
11.5 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [425]: timeit 2*np.einsum('i,ij->i',A,B)
12.3 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
another example:
In [427]: np.einsum(',i,,ij->i',3,A,2,B)
Out[427]: array([ 0, 132, 456])
Related
I want to add a different number to each row of the matrix below.
array([[ 6, 6, 6, 6],
[ 1, -5, -11, -17],
[ 1, 7, 13, 19]], dtype=int64)
For example I want to add this array to the matrix:
array([-4, -3, 0])
Add the -4 of the array to the first row so it will be array([2, 2, 2, 2], dtype=int64)
The whole matrix should then look like this:
array([[ 2, 2, 2, 2],
[ -2, -8, -14, -20],
[ 1, 7, 13, 19]], dtype=int64)
I could of course transform the 1d array to a matrix, but I wanted to know if there is maybe another option.
You can do it in several ways:
Using .reshape: it will create a "column-vector" instead of a "row-vector"
a + b.reshape((-1,1))
Creating a new array then transposing it:
a + np.array([b]).T
Using numpy.atleast_2d:
a + np.atleast_2d(b).T
All of them with the same output:
array([[ 2, 2, 2, 2],
[ -2, -8, -14, -20],
[ 1, 7, 13, 19]])
Performance
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + b.reshape((-1,1))
#3.39 ms ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + np.array([b]).T
#3.4 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
a = np.random.randint(0,10,(2000,100))
b = np.random.randint(0,10,2000)
a + np.atleast_2d(b).T
#3.37 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True
Let’s say I have two NumPy arrays, a and b:
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([8,9])
And I would like to append the same array b to every row (ie. adding multiple columns) to get an array, c:
b = np.array([
[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]
])
How can I do this easily and efficiently in NumPy?
I am especially concerned about its behaviour with big datasets (where a is much bigger than b), is there any way around creating many copies (ie. a.shape[0]) of b?
Related to this question, but with multiple values.
Here's one way. I assume it's efficient because it's vectorised. It relies on the fact that in matrix multiplication, pre-multiplying a row by the column (1, 1) will produce two stacked copies of the row.
import numpy as np
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([[8,9]])
np.concatenate([a, np.array([[1],[1]]).dot(b)], axis=1)
Out: array([[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]])
Note that b is specified slightly differently (as a two-dimensional array).
Is there any way around creating many copies of b?
The final result contains those copies (and numpy arrays are literally arrays of values in memory), so I don't see how.
An alternative to concatenate approach is to make a recipient array, and copy values to it:
In [483]: a = np.arange(300).reshape(100,3)
In [484]: b=np.array([8,9])
In [485]: res = np.zeros((100,5),int)
In [486]: res[:,:3]=a
In [487]: res[:,3:]=b
sample timings
In [488]: %%timeit
...: res = np.zeros((100,5),int)
...: res[:,:3]=a
...: res[:,3:]=b
...:
...:
6.11 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [491]: timeit np.concatenate((a, b.repeat(100).reshape(2,-1).T),1)
7.74 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [164]: timeit np.concatenate([a, np.ones([a.shape[0],1], dtype=int).dot(np.array([b]))], axis=1)
8.58 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The way I solved this initially was :
c = np.concatenate([a, np.tile(b, (a.shape[0],1))], axis = 1)
But this feels very inefficient...
By default, numpy is row major.
Therefore, the following results are accepted naturally to me.
a = np.random.rand(5000, 5000)
%timeit a[0,:].sum()
3.57 µs ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a[:,0].sum()
38.8 µs ± 8.19 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Because it is a row major order, it is natural to calculate faster by a [0,:].
However, if use the numpy sum function, the result is different.
%timeit a.sum(axis=0)
16.9 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a.sum(axis=1)
29.5 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If use the numpy sum function, it is faster to compute it along the column.
So My point is why the speed along the axis = 0 (calculated along column) is faster than the along the axis = 1(along row).
For example
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], order='C')
In the row major order, [1,2,3] and [4,5,6], [7,8,9] are allocated to adjacent memory, respectively.
Therefore, the speed calculated along axis = 1 should be faster than axis = 0.
However, when using numpy sum function, it is faster to calculate along the column (axis = 0).
How can you explain this?
Thanks
You don't compute the same thing.
The first two commands only compute one row/column out of the entire array.
a[0, :].sum().shape # sums just the first row only
()
The second two commands, sum the entire contents of the 2D array, but along a certain axis. That way, you don't get a single result (as in the first two commands), but an 1D array of sums.
a.sum(axis=0).shape # computes the row-wise sum for each column
(5000,)
In summary, the two sets of commands do different things.
a
array([[1, 6, 9, 1, 6],
[5, 6, 9, 1, 3],
[5, 0, 3, 5, 7],
[2, 8, 3, 8, 6],
[3, 4, 8, 5, 0]])
a[0, :]
array([1, 6, 9, 1, 6])
a[0, :].sum()
23
a.sum(axis=0)
array([16, 24, 32, 20, 22])
I have a 2D numpy array with 'n' unique values.
I want to produce a binary matrix, where all values are replaced with
'zero' and a value which I specify is assigned as 'one'.
For example, I have an array as follows and I want all instances
of 35 to be assigned 'one':
array([[12, 35, 12, 26],
[35, 35, 12, 26]])
I am trying to get the following output:
array([[0, 1, 0, 0],
[1, 1, 0, 0]])
what is the most efficient way to do it in Python?
import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35).astype(int)
will give you:
array([[0, 1, 0, 0],
[1, 1, 0, 0]])
The == operator in numpy performs an element-wise comparison, and when converting booleans to ints True is encoded as 1 and False as 0.
One more elegant way when compared to all other solutions, would be to just use np.isin()
>>> arr
array([[12, 35, 12, 26],
[35, 35, 12, 26]])
# get the result as binary matrix
>>> np.isin(arr, 35).astype(np.uint8)
array([[0, 1, 0, 0],
[1, 1, 0, 0]])
np.isin() would return a boolean mask with True values where the given element (here 35) is present in the original array, and False elsewhere.
Another variant would be to cast the boolean result using np.asarray() with data type np.uint8 for better speed:
In [18]: np.asarray(np.isin(x, 35), dtype=np.uint8)
Out[18]:
array([[0, 1, 0, 0],
[1, 1, 0, 0]], dtype=uint8)
Benchmarking
By explicitly casting the boolean result to uint8, we can gain more than 3x better performance. (Thanks to #Divakar for pointing this out!) See timings below:
# setup (large) input array
In [3]: x = np.arange(25000000)
In [4]: x[0] = 35
In [5]: x[1000000] = 35
In [6]: x[2000000] = 35
In [7]: x[-1] = 35
In [8]: x = x.reshape((5000, 5000))
# timings
In [20]: %timeit np.where(x==35, 1, 0)
427 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [21]: %timeit (x == 35) + 0
450 ms ± 72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %timeit (x == 35).astype(np.uint8)
126 ms ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# the fastest choice to go for!
In [23]: %timeit np.isin(x, 35).astype(np.uint8)
115 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [24]: %timeit np.asarray(np.isin(x, 35), dtype=np.uint8)
117 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you want a real warhorse, then use numexpr as in:
In [8]: import numexpr as ne
In [9]: %timeit ne.evaluate("x==35").astype(np.uint8)
23 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This is ca. 20x faster than the slowest approach using NumPy based computations.
Finally, if views are okay, we can get such crazy speedups using NumPy approaches itself.
In [13]: %timeit (x == 35).view(np.uint8)
20.1 ms ± 93.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [15]: %timeit np.isin(x, 35).view(np.uint8)
30.2 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
(Again, thanks to #Divakar for mentioning these super nice tricks!)
import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35) + 0
array([[0, 1, 0, 0],
[1, 1, 0, 0]])
Another option would be to use np.where; this solution is slower than #yuji's solution (see timings below) but it is more flexible if you want to do anything else but putting in zeroes and ones (see example below).
import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
np.where(x==35, 1, 0)
which yields
array([[0, 1, 0, 0],
[1, 1, 0, 0]])
One can read it like where x is equal to 35 put in a 1, everywhere else insert a 0.
As written, you now have great flexibility, you can e.g. also do the following:
np.where(x==35, np.sqrt(x), x - 3)
array([[ 9. , 5.91607978, 9. , 23. ],
[ 5.91607978, 5.91607978, 9. , 23. ]])
So everywhere, where x is equal to 35, you get the square root and from all other values you subtract 3.
Timings:
%timeit np.where(x==35, 1, 0)
100000 loops, best of 3: 5.85 µs per loop
%timeit (x == 35).astype(int)
100000 loops, best of 3: 3.23 µs per loop
%timeit np.isin(x, 35).astype(int)
10000 loops, best of 3: 18.7 µs per loop
%timeit (x == 35) + 0
100000 loops, best of 3: 5.85 µs per loop
If your array is a numpy array then you can use the '==' operator on your array to return a boolean array. Then use the astype feature to turn it to zeros and ones.
import numpy as np
my_array = np.array([[12, 35, 12, 26],
[35, 35, 12, 26]])
indexed = (my_array == 35).astype(int)
print indexed
I like #yuji approach. Very elegant!
Just for a sake of diversity here is another answer with a lot of labor....
>>> from numpy import np
>>> x = np.array([[12, 35, 12, 26],[35, 35, 12, 26]])
>>> x
array([[12, 35, 12, 26],
[35, 35, 12, 26]])
>>> y=np.zeros(x.shape)
>>> y[np.where(x==35)] = np.ones(len(np.where(x==35)[0]))
>>> y
array([[ 0., 1., 0., 0.],
[ 1., 1., 0., 0.]])
>>>