Say I have two arrays a and b,
a.shape = (5,2,3)
b.shape = (2,3)
then c = a * b will give me an array c of shape (5,2,3) with c[i,j,k] = a[i,j,k]*b[j,k].
Now the situation is,
a.shape = (5,2,3)
b.shape = (2,3,8)
and I want c to have a shape (5,2,3,8) with c[i,j,k,l] = a[i,j,k]*b[j,k,l].
How to do this efficiently? My a and b are actually quite large.
This should work:
a[..., numpy.newaxis] * b[numpy.newaxis, ...]
Usage:
In : a = numpy.random.randn(5,2,3)
In : b = numpy.random.randn(2,3,8)
In : c = a[..., numpy.newaxis]*b[numpy.newaxis, ...]
In : c.shape
Out: (5, 2, 3, 8)
Ref: Array Broadcasting in numpy
Edit: Updated reference URL
I think the following should work:
import numpy as np
a = np.random.normal(size=(5,2,3))
b = np.random.normal(size=(2,3,8))
c = np.einsum('ijk,jkl->ijkl',a,b)
and:
In [5]: c.shape
Out[5]: (5, 2, 3, 8)
In [6]: a[0,0,1]*b[0,1,2]
Out[6]: -0.041308376453821738
In [7]: c[0,0,1,2]
Out[7]: -0.041308376453821738
np.einsum can be a bit tricky to use, but is quite powerful for these sort of indexing problems:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
Also note that this requires numpy >= v1.6.0
I'm not sure about efficiency for your particular problem, but if it doesn't perform as well as needed, definitely look into using Cython with explicit for loops, and possibly parallelize it using prange
UPDATE
In [18]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
100000 loops, best of 3: 4.78 us per loop
In [19]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
100000 loops, best of 3: 12.2 us per loop
In [20]: a = np.random.normal(size=(50,20,30))
In [21]: b = np.random.normal(size=(20,30,80))
In [22]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
100 loops, best of 3: 16.6 ms per loop
In [23]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
100 loops, best of 3: 16.6 ms per loop
In [2]: a = np.random.normal(size=(500,20,30))
In [3]: b = np.random.normal(size=(20,30,800))
In [4]: %timeit np.einsum('ijk,jkl->ijkl',a,b)
1 loops, best of 3: 3.31 s per loop
In [5]: %timeit a[..., np.newaxis]*b[np.newaxis, ...]
1 loops, best of 3: 2.6 s per loop
Related
I'm sort of newbie in numpy so I'm sorry if this question was already asked. I'm looking for a vectorization solution which enable to run multiple cumsum of different size within a one dimension numpy array.
my_vector=np.array([1,2,3,4,5])
size_of_groups=np.array([3,2])
I would like something like
np.cumsum.group(my_vector,size_of_groups)
[1,3,6,4,9]
I do not want a solution with loops. Either numpy functions or numpy operations.
Not sure about numpy, but pandas can do this pretty easily with a groupby + cumsum:
import pandas as pd
s = pd.Series(my_vector)
s.groupby(s.index.isin(size_of_groups.cumsum()).cumsum()).cumsum()
0 1
1 3
2 6
3 4
4 9
dtype: int64
Here's a vectorized solution -
def intervaled_cumsum(ar, sizes):
# Make a copy to be used as output array
out = ar.copy()
# Get cumumlative values of array
arc = ar.cumsum()
# Get cumsumed indices to be used to place differentiated values into
# input array's copy
idx = sizes.cumsum()
# Place differentiated values that when cumumlatively summed later on would
# give us the desired intervaled cumsum
out[idx[0]] = ar[idx[0]] - arc[idx[0]-1]
out[idx[1:-1]] = ar[idx[1:-1]] - np.diff(arc[idx[:-1]-1])
return out.cumsum()
Sample run -
In [114]: ar = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
...: sizes = np.array([3,2,2,3,2])
In [115]: intervaled_cumsum(ar, sizes)
Out[115]: array([ 1, 3, 6, 4, 9, 6, 13, 8, 17, 27, 11, 23])
Benchmarking
Other approach(es) -
# #cᴏʟᴅsᴘᴇᴇᴅ's solution
import pandas as pd
def pandas_soln(my_vector, sizes):
s = pd.Series(my_vector)
return s.groupby(s.index.isin(sizes.cumsum()).cumsum()).cumsum().values
The given sample used two intervals of lengths 2 and 3 Keeping that and simply giving it more number of groups for timing purpose.
Timings -
In [146]: N = 10000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [147]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10000 loops, best of 3: 178 µs per loop
1000 loops, best of 3: 1.82 ms per loop
In [148]: N = 100000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [149]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
100 loops, best of 3: 3.91 ms per loop
100 loops, best of 3: 17.3 ms per loop
In [150]: N = 1000000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [151]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10 loops, best of 3: 31.6 ms per loop
1 loop, best of 3: 357 ms per loop
Here is an unconventional solution. Not very fast, though. (Even a bit slower than pandas).
>>> from scipy import linalg
>>>
>>> N = len(my_vector)
>>> D = np.repeat((*zip((1,-1)),), N, axis=1)
>>> D[1, np.cumsum(size_of_groups) - 1] = 0
>>>
>>> linalg.solve_banded((1, 0), D, my_vector)
array([1., 3., 6., 4., 9.])
Suppose we have
an n-dimensional numpy.array A
a numpy.array B with dtype=int and shape of (n, m)
How do I index A by B so that the result is an array of shape (m,), with values taken from the positions indicated by the columns of B?
For example, consider this code that does what I want when B is a python list:
>>> a = np.arange(27).reshape(3,3,3)
>>> a[[0, 1, 2], [0, 0, 0], [1, 1, 2]]
array([ 1, 10, 20]) # the result we're after
>>> bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
>>> a[bl]
array([ 1, 10, 20]) # also works when indexing with a python list
>>> a[bl].shape
(3,)
However, when B is a numpy array, the result is different:
>>> b = np.array(bl)
>>> a[b].shape
(3, 3, 3, 3)
Now, I can get the desired result by casting B into a tuple, but surely that cannot be the proper/idiomatic way to do it?
>>> a[tuple(b)]
array([ 1, 10, 20])
Is there a numpy function to achieve the same without casting B to a tuple?
One alternative would be converting to linear indices and then index with np.take or index into its flattened version -
np.take(a,np.ravel_multi_index(b, a.shape))
a.flat[np.ravel_multi_index(b, a.shape)]
Custom np.ravel_multi_index for performance boost
We could implement a custom version to simulate the behaviour of np.ravel_multi_index to boost the performance, like so -
def ravel_index(b, shp):
return np.concatenate((np.asarray(shp[1:])[::-1].cumprod()[::-1],[1])).dot(b)
Using it, the desired output would be found in two ways -
np.take(a,ravel_index(b, a.shape))
a.flat[ravel_index(b, a.shape)]
Benchmarking
Additionall incorporating tuple based method from the question and map based one from #Kanak's post.
Case #1 : dims = 3
In [23]: a = np.random.randint(0,9,([20]*3))
In [24]: b = np.random.randint(0,20,(a.ndim,1000000))
In [25]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
100 loops, best of 3: 6.56 ms per loop
100 loops, best of 3: 6.58 ms per loop
100 loops, best of 3: 6.95 ms per loop
100 loops, best of 3: 9.17 ms per loop
100 loops, best of 3: 6.31 ms per loop
100 loops, best of 3: 8.52 ms per loop
Case #2 : dims = 6
In [29]: a = np.random.randint(0,9,([10]*6))
In [30]: b = np.random.randint(0,10,(a.ndim,1000000))
In [31]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 40.9 ms per loop
10 loops, best of 3: 40 ms per loop
10 loops, best of 3: 20 ms per loop
10 loops, best of 3: 29.9 ms per loop
100 loops, best of 3: 15.7 ms per loop
10 loops, best of 3: 25.8 ms per loop
Case #3 : dims = 10
In [32]: a = np.random.randint(0,9,([4]*10))
In [33]: b = np.random.randint(0,4,(a.ndim,1000000))
In [34]: %timeit a[tuple(b)]
...: %timeit a[map(np.ravel, b)]
...: %timeit np.take(a,np.ravel_multi_index(b, a.shape))
...: %timeit a.flat[np.ravel_multi_index(b, a.shape)]
...: %timeit np.take(a,ravel_index(b, a.shape))
...: %timeit a.flat[ravel_index(b, a.shape)]
10 loops, best of 3: 60.7 ms per loop
10 loops, best of 3: 60.1 ms per loop
10 loops, best of 3: 27.8 ms per loop
10 loops, best of 3: 38 ms per loop
100 loops, best of 3: 18.7 ms per loop
10 loops, best of 3: 29.3 ms per loop
So, it makes sense to look for alternatives when working with higher-dimensional inputs and with large data.
Another alternative that fits your need involves the use of np.ravel
>>> a[map(np.ravel, b)]
array([ 1, 10, 20])
However not fully numpy-based.
Performance-concerns.
Updated following the comments below.
Be that as it may, your approach is better than mine, but not better than any of #Divakar's.
import numpy as np
import timeit
a = np.arange(27).reshape(3,3,3)
bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
b = np.array(bl)
imps = "from __main__ import np,a,b"
reps = 100000
tup_cas_t = timeit.Timer("a[tuple(b)]", imps).timeit(reps)
map_rav_t = timeit.Timer("a[map(np.ravel, b)]", imps).timeit(reps)
fla_rp1_t = timeit.Timer("np.take(a,np.ravel_multi_index(b, a.shape))", imps).timeit(reps)
fla_rp2_t = timeit.Timer("a.flat[np.ravel_multi_index(b, a.shape)]", imps).timeit(reps)
print tup_cas_t/map_rav_t ## 0.505382211881
print tup_cas_t/fla_rp1_t ## 1.18185817386
print tup_cas_t/fla_rp2_t ## 1.71288705886
Are you looking for numpy.ndarray.tolist() ?
>>> a = np.arange(27).reshape(3,3,3)
>>> bl = [[0, 1, 2], [0, 0, 0], [1, 1, 2]]
>>> b = np.array(bl)
>>> a[b.tolist()]
array([ 1, 10, 20])
Or for arrays indexing arrays which is quite similar to list indexing :
>>> a[np.array([0, 1, 2]), np.array([0, 0, 0]), np.array([1, 1, 2])]
array([ 1, 10, 20])
However as you can from the previous link, indexing an array a with an array b directly means you are indexing the first index of a only with your whole b array which can lead to confusing output.
I want to generate a cyclic sequence of numbers like: [A B C A B C] with arbitrary length N I tried:
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x,N//3) # Repeats the sequence N//3 times
return y
but the problem with my code is if i enter any integer which ain't dividable by three then the results would have smaller length (N) than I excpected. I know this is very newbish question but i really got stuck
You can just use numpy.resize
x = np.array([1.0, 2.0, 3.0])
y = np.resize(x, 13)
y
Out[332]: array([ 1., 2., 3., 1., 2., 3., 1., 2., 3., 1., 2., 3., 1.])
WARNING: This is answer does not extend to 2D, as resize flattens the array before repeating it.
Approach #1 : Here'e one approach to handle generic sequences using modulus to generate those cyclic indices -
def cyclic_seq(x, N):
return np.take(x, np.mod(np.arange(N),len(x)))
Approach #2 : For performance, here's another method that tiles to the multiple of the max number of intervals and then making use of slicing to select the first N elements -
def cyclic_seq_v2(x, N):
return np.tile(x,(N+N-1)//len(x))[:N]
Sample runs -
In [81]: cyclic_seq([6,9,2,1,7],14)
Out[81]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
In [82]: cyclic_seq_v2([6,9,2,1,7],14)
Out[82]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
Runtime test
In [327]: x = np.random.randint(0,9,(3))
In [328]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
1000 loops, best of 3: 296 µs per loop
10000 loops, best of 3: 185 µs per loop
10000 loops, best of 3: 120 µs per loop
10000 loops, best of 3: 28.7 µs per loop
In [329]: x = np.random.randint(0,9,(30))
In [330]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
10000 loops, best of 3: 38.8 µs per loop
10000 loops, best of 3: 101 µs per loop
10000 loops, best of 3: 115 µs per loop
100000 loops, best of 3: 13.2 µs per loop
In [331]: %timeit np.resize(x, 100000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),100000)) # #Chris soln
...: %timeit cyclic_seq(x,100000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,100000) # Approach #2 from this post
...:
1000 loops, best of 3: 297 µs per loop
1000 loops, best of 3: 942 µs per loop
1000 loops, best of 3: 1.13 ms per loop
10000 loops, best of 3: 88.3 µs per loop
On performance, approach #2 seems to be working quite well.
First over-length it (using math.ceil) then resize it after tile
import numpy as np
import math
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x, math.ceil(N / 3.0))
y = np.resize(y, N)
return y
After taking Daniel Forsman's suggestion, it can be simplified as
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.resize(x, N)
return y
because np.resize automatically tiles the response in 1D
You can use itertools.cycle, an infinite iterator, for this:
>>> import itertools
>>> it = itertools.cycle([1,2,3])
>>> next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)
1
You get a specific length of sequence (N), combine it with itertools.islice:
>>> list(itertools.islice(itertools.cycle([1,2,3]),11))
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2]
EDIT: as you can see in Divakar's benchmark, this approach is generally intermediate in terms of speed compared to other answers. I recommend when this solution when you want an iterator returned rather than a list or numpy array.
You can use itertools cycle for that.
In [3]: from itertools import cycle
In [4]: for x in cycle(['A','B','C']):
...: print(x)
...:
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
Edit:
If you want to implement it with out loops, you are going to need recursive functions. Solutions based on itertools cycle and the like are just hiding the loops behind the imported function.
In [5]: def repeater(arr, n):
...: yield arr[0]
...: yield arr[1]
...: yield arr[2]
...: if n == 0:
...: yield StopIteration
...: else:
...: yield from repeater(arr, n-1)
...:
I need to count the number of zero elements in numpy arrays. I'm aware of the numpy.count_nonzero function, but there appears to be no analog for counting zero elements.
My arrays are not very large (typically less than 1E5 elements) but the operation is performed several millions of times.
Of course I could use len(arr) - np.count_nonzero(arr), but I wonder if there's a more efficient way to do it.
Here's a MWE of how I do it currently:
import numpy as np
import timeit
arrs = []
for _ in range(1000):
arrs.append(np.random.randint(-5, 5, 10000))
def func1():
for arr in arrs:
zero_els = len(arr) - np.count_nonzero(arr)
print(timeit.timeit(func1, number=10))
A 2x faster approach would be to just use np.count_nonzero() but with the condition as needed.
In [3]: arr
Out[3]:
array([[1, 2, 0, 3],
[3, 9, 0, 4]])
In [4]: np.count_nonzero(arr==0)
Out[4]: 2
In [5]:def func_cnt():
for arr in arrs:
zero_els = np.count_nonzero(arr==0)
# here, it counts the frequency of zeroes actually
You can also use np.where() but it's slower than np.count_nonzero()
In [6]: np.where( arr == 0)
Out[6]: (array([0, 1]), array([2, 2]))
In [7]: len(np.where( arr == 0))
Out[7]: 2
Efficiency: (in descending order)
In [8]: %timeit func_cnt()
10 loops, best of 3: 29.2 ms per loop
In [9]: %timeit func1()
10 loops, best of 3: 46.5 ms per loop
In [10]: %timeit func_where()
10 loops, best of 3: 61.2 ms per loop
more speedups with accelerators
It is now possible to achieve more than 3 orders of magnitude speed boost with the help of JAX if you've access to accelerators (GPU/TPU). Another advantage of using JAX is that the NumPy code needs very little modification to make it JAX compatible. Below is a reproducible example:
In [1]: import jax.numpy as jnp
In [2]: from jax import jit
# set up inputs
In [3]: arrs = []
In [4]: for _ in range(1000):
...: arrs.append(np.random.randint(-5, 5, 10000))
# JIT'd function that performs the counting task
In [5]: #jit
...: def func_cnt():
...: for arr in arrs:
...: zero_els = jnp.count_nonzero(arr==0)
# efficiency test
In [8]: %timeit func_cnt()
15.6 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I have two matrices with different dimensions that I would like to multiply using einsum numpy:
C(24, 79) and D(1, 1, 24, 1). I want to obtain the matrix with the dimension (1, 1, 79, 1).
I have tried to multiply them in two ways:
tmp = np.einsum('px, klpj ->klxj', C, D)
tmp = np.einsum('xp, klpj ->klxj', C, D)
and I'm obtaining different results. Why? What is the correct way of multiplying these matrices?
Owing to the singleton dimensions that don't really result in sum-reduction, we can introduce matrix-multiplication with np.tensordot or np.dot to have two more approaches to solve it -
np.tensordot(C,D,axes=([0],[2])).swapaxes(0,2)
D.ravel().dot(C).reshape(1,1,C.shape[1],1)
Verify results -
In [26]: tmp = np.einsum('px, klpj ->klxj', C, D)
In [27]: out = np.tensordot(C,D,axes=([0],[2])).swapaxes(0,2)
In [28]: np.allclose(out, tmp)
Out[28]: True
In [29]: out = D.ravel().dot(C).reshape(1,1,C.shape[1],1)
In [30]: np.allclose(out, tmp)
Out[30]: True
Runtime test -
In [31]: %timeit np.einsum('px, klpj ->klxj', C, D)
100000 loops, best of 3: 5.84 µs per loop
In [32]: %timeit np.tensordot(C,D,axes=([0],[2])).swapaxes(0,2)
100000 loops, best of 3: 18.5 µs per loop
In [33]: %timeit D.ravel().dot(C).reshape(1,1,C.shape[1],1)
100000 loops, best of 3: 3.29 µs per loop
With bigger datasets, you would see noticeable benefits with matrix-multiplication -
In [36]: C = np.random.rand(240,790)
...: D = np.random.rand(1,1,240,1)
...:
In [37]: %timeit np.einsum('px, klpj ->klxj', C, D)
...: %timeit np.tensordot(C,D,axes=([0],[2])).swapaxes(0,2)
...: %timeit D.ravel().dot(C).reshape(1,1,C.shape[1],1)
...:
1000 loops, best of 3: 182 µs per loop
10000 loops, best of 3: 84.9 µs per loop
10000 loops, best of 3: 55.5 µs per loop