Select array elements with variable index bounds in numpy - python

This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?

I'm by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1

One of the issues with a purely NumPy method to solve this is that there exists no method to 'slice' a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]. Then you can use np.eye and np.sum over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4) from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is -
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is -
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])

Using numba.njit decorator
import numpy as np
import numba
#numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s

Related

Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of the array

I have an array and a mask array. They have the same rows. Each row of the mask contains the indices to mask the array for the corresponding row. How to do the vectorization instead of using for loop?
Codes like this:
a = np.zeros((2, 4))
mask = np.array([[2, 3], [0, 1]])
# I'd like a vectorized way to do this (because the rows and cols are large):
a[0, mask[0]] = 1
a[1, mask[1]] = 1
This is what I want to obtain:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])
==================================
The question has been answered by #mozway, but the efficiency between the for-loop solution and vectorized one is questioned by #AhmedAEK. So I did the efficiency comparison:
N = 5000
M = 10000
a = np.zeros((N, M))
# choice without replacement
mask = np.random.rand(N, M).argpartition(3, axis=1)[:,:3]
def t1():
for i in range(N):
a[i, mask[i]] = 1
def t2():
a[np.arange(a.shape[0])[:, None], mask] = 1
Then I use %timeit in Jupyter and got this screenshot:
You can use:
a[[[0],[1]], mask] = 1
Or, programmatically generating the rows slicer:
a[np.arange(a.shape[0])[:,None], mask] = 1
output:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])

frequency of unique values for 2d numpy array

I have a 2-dimensional numpy array of following format:
now how to print the frequency of unique elements in this 2d numpy array, so that it returns count([1. 0.]) = 1 and count([0. 1.]) = 1? I know how to do this using loops, but is there any better pythonic way to do this.
You can use numpy.unique(), for axis=0, and pass return_counts=True, It will return a tuple with unique values, and the counts for these values.
np.unique(arr, return_counts=True, axis=0)
OUTPUT:
(array([[0, 1],
[1, 0]]), array([1, 1], dtype=int64))
You can use collections.Counter, it will give you a dictionary with the sublists as keys and number of occurrences as values
y = np.array([[1., 0.], [0., 1.], [0., 1.]])
counter = collections.Counter(map(tuple, y))
print(counter[0., 1.]) # 2

How to reshape a 1-d array into a an array of shape (1,4,5)?

I have these vectors :
a = [1,2,3,4]
b = [1,2,3,5]
and I could like to have this at the end :
A = [ [1,0,0,0,0]
[0,1,0,0,0]
[0,0,1,0,0]
[0,0,0,1,0] ]
B = [ [1,0,0,0,0]
[0,1,0,0,0]
[0,0,1,0,0]
[0,0,0,0,1] ]
I have been using np.reshape from python this way:
A = np.reshape(a,(1,4,1))
B = np.reshape(b,(1,4,1))
And it does just partially the job as I have the following result:
A = [[1]
[2]
[3]
[4]]
B = [[1]
[2]
[3]
[5]]
Ideally I would like something like this:
A = np.reshape(a,(1,4,(1,5))
but when reading the docs, this is not possible.
Thanks in advance for your help
Alternatively, numpy can assign value to multiple indexes on rows/columns in one go, example:
In [1]: import numpy as np
In [2]: b = [1,2,3,5]
...:
...:
In [3]: zero = np.zeros([4,5])
In [4]: brow, bcol = range(len(b)), np.array(b) -1 # logical transform
In [5]: zero[brow, bcol] = 1
In [6]: zero
Out[6]:
array([[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1.]])
What you're trying to do is not actually a reshape, as you alter the structure of the data.
Make a new array with the shape you want:
A = np.zeros(myshape)
B = np.zeros(myshape)
and then index those arrays
n = 0
for i_a, i_b in zip(a, b):
A[n, i_a - 1] = 1
B[n, i_b - 1] = 1
n += 1
The i_a/i_b - 1 in the assignment is only there to make 1 index the 0th element. This also only works if a and b have the same length. Make this two loops if they are not the same length.
There might be a more elegant solution but this should get the job done :)

Numpy indexing 3-dimensional array into 2-dimensional array

I have a three-dimensional array of the following structure:
x = np.array([[[1,2],
[3,4]],
[[5,6],
[7,8]]], dtype=np.double)
Additionally, I have an index array
idx = np.array([[0,1],[1,3]], dtype=np.int)
Each row of idx defines the row/column indices for the placement of each sub-array along the 0 axis in x into a two-dimensional array K that is initialized as
K = np.zeros((4,4), dtype=np.double)
I would like to use fancy indexing/broadcasting to performing the indexing without a for loop. I currently do it this way:
for i, id in enumerate(idx):
idx_grid = np.ix_(id,id)
K[idx_grid] += x[i]
Such that the result is:
>>> K = array([[ 1., 2., 0., 0.],
[ 3., 9., 0., 6.],
[ 0., 0., 0., 0.],
[ 0., 7., 0., 8.]])
Is this possible to do with fancy indexing?
Here's one alternative way. With x, idx and K defined as in your question:
indices = (idx[:,None] + K.shape[1]*idx).ravel('f')
np.add.at(K.ravel(), indices, x.ravel())
Then we have:
>>> K
array([[ 1., 2., 0., 0.],
[ 3., 9., 0., 6.],
[ 0., 0., 0., 0.],
[ 0., 7., 0., 8.]])
To perform unbuffered inplace addition on NumPy arrays you need to use np.add.at (to avoid using += in a for loop).
However, it's slightly probelmatic to pass a list of 2D index arrays, and corresponding arrays to add at these indices, to np.add.at. This is because the function interprets these lists of arrays as higher-dimensional arrays and IndexErrors are raised.
It's much simpler to pass in 1D arrays. You can temporarily ravel K and x to give you a 1D array of zeros and a 1D array of values to add to those zeros. The only fiddly part is constructing a corresponding 1D array of indices from idx at which to add the values. This can be done via broadcasting with arithmetical operators and then ravelling, as shown above.
The intended operation is one of an accumulation of values from x into places indexed by idx. You could think of those idx places as bins of a histogram data and the x values as the weights that you need to accumulate for those bins. Now, to perform such a binning operation, np.bincount could be used. Here's one such implementation with it -
# Get size info of expected output
N = idx.max()+1
# Extend idx to cover two axes, equivalent to `np.ix_`
idx1 = idx[:,None,:] + N*idx[:,:,None]
# "Accumulate" values from x into places indexed by idx1
K = np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)
Runtime tests -
1) Create inputs:
In [361]: # Create x and idx, with idx having unique elements in each row of idx,
...: # as otherwise the intended operation is not clear
...:
...: nrows = 100
...: max_idx = 100
...: ncols_idx = 2
...:
...: x = np.random.rand(nrows,ncols_idx,ncols_idx)
...: idx = np.random.randint(0,max_idx,(nrows,ncols_idx))
...:
...: valid_mask = ~np.any(np.diff(np.sort(idx,axis=1),axis=1)==0,axis=1)
...:
...: x = x[valid_mask]
...: idx = idx[valid_mask]
...:
2) Define functions:
In [362]: # Define the original and proposed (bincount based) approaches
...:
...: def org_approach(x,idx):
...: N = idx.max()+1
...: K = np.zeros((N,N), dtype=np.double)
...: for i, id in enumerate(idx):
...: idx_grid = np.ix_(id,id)
...: K[idx_grid] += x[i]
...: return K
...:
...:
...: def bincount_approach(x,idx):
...: N = idx.max()+1
...: idx1 = idx[:,None,:] + N*idx[:,:,None]
...: return np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)
...:
3) Finally time them:
In [363]: %timeit org_approach(x,idx)
100 loops, best of 3: 2.13 ms per loop
In [364]: %timeit bincount_approach(x,idx)
10000 loops, best of 3: 32 µs per loop
I do not think it is efficiently possible, since you have += in the loop. This means, you would have to "blow up" your array idx by one dimension and reduce it again by utilizing np.sum(x[...], axis=...).
A minor optimization would be:
import numpy as np
xx = np.array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]], dtype=np.double)
idx = np.array([[0, 1], [1, 3]], dtype=np.int)
K0, K1 = np.zeros((4, 4), dtype=np.double), np.zeros((4, 4), dtype=np.double)
for k, i in enumerate(idx):
idx_grid = np.ix_(i, i)
K0[idx_grid] += xx[k]
for x, i in zip(xx, idx):
K1[np.ix_(i, i)] += x
print("K1 == K0:", np.allclose(K1, K0)) # prints: K1 == K0: True
PS: Do not use id as a variable name, since it is a Python keyword.

(Numpy) Index list to boolean array

Input:
array length (Integer)
indexes (Set or List)
Output:
A boolean numpy array that has a value 1 for the indexes 0 for the others.
Example:
Input: array_length=10, indexes={2,5,6}
Output:
[0,0,1,0,0,1,1,0,0,0]
Here is a my simple implementation:
def indexes2booleanvec(size, indexes):
v = numpy.zeros(size)
for index in indexes:
v[index] = 1.0
return v
Is there more elegant way to implement this?
One way is to avoid the loop
In [7]: fill = np.zeros(array_length) # array_length = 10
In [8]: fill[indexes] = 1 # indexes = [2,5,6]
In [9]: fill
Out[9]: array([ 0., 0., 1., 0., 0., 1., 1., 0., 0., 0.])
Another way to do it (in one line):
np.isin(np.arange(array_length), indexes)
However this is slower than Zero's solution.

Categories