Input:
array length (Integer)
indexes (Set or List)
Output:
A boolean numpy array that has a value 1 for the indexes 0 for the others.
Example:
Input: array_length=10, indexes={2,5,6}
Output:
[0,0,1,0,0,1,1,0,0,0]
Here is a my simple implementation:
def indexes2booleanvec(size, indexes):
v = numpy.zeros(size)
for index in indexes:
v[index] = 1.0
return v
Is there more elegant way to implement this?
One way is to avoid the loop
In [7]: fill = np.zeros(array_length) # array_length = 10
In [8]: fill[indexes] = 1 # indexes = [2,5,6]
In [9]: fill
Out[9]: array([ 0., 0., 1., 0., 0., 1., 1., 0., 0., 0.])
Another way to do it (in one line):
np.isin(np.arange(array_length), indexes)
However this is slower than Zero's solution.
Related
I have an array and a mask array. They have the same rows. Each row of the mask contains the indices to mask the array for the corresponding row. How to do the vectorization instead of using for loop?
Codes like this:
a = np.zeros((2, 4))
mask = np.array([[2, 3], [0, 1]])
# I'd like a vectorized way to do this (because the rows and cols are large):
a[0, mask[0]] = 1
a[1, mask[1]] = 1
This is what I want to obtain:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])
==================================
The question has been answered by #mozway, but the efficiency between the for-loop solution and vectorized one is questioned by #AhmedAEK. So I did the efficiency comparison:
N = 5000
M = 10000
a = np.zeros((N, M))
# choice without replacement
mask = np.random.rand(N, M).argpartition(3, axis=1)[:,:3]
def t1():
for i in range(N):
a[i, mask[i]] = 1
def t2():
a[np.arange(a.shape[0])[:, None], mask] = 1
Then I use %timeit in Jupyter and got this screenshot:
You can use:
a[[[0],[1]], mask] = 1
Or, programmatically generating the rows slicer:
a[np.arange(a.shape[0])[:,None], mask] = 1
output:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])
This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?
I'm by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1
One of the issues with a purely NumPy method to solve this is that there exists no method to 'slice' a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]. Then you can use np.eye and np.sum over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4) from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is -
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is -
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
Using numba.njit decorator
import numpy as np
import numba
#numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s
I know that in numpy I can compute the element-wise minimum of two vectors with
numpy.minimum(v1, v2)
What if I have a list of vectors of equal dimension, V = [v1, v2, v3, v4] (but a list, not an array)? Taking numpy.minimum(*V) doesn't work. What's the preferred thing to do instead?
*V works if V has only 2 arrays. np.minimum is a ufunc and takes 2 arguments.
As a ufunc it has a .reduce method, so it can apply repeated to a list inputs.
In [321]: np.minimum.reduce([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
Out[321]: array([ 0., 1., 0.])
I suspect the np.min approach is faster, but that could depend on the array and list size.
In [323]: np.array([np.arange(3), np.arange(2,-1,-1), np.ones((3,))]).min(axis=0)
Out[323]: array([ 0., 1., 0.])
The ufunc also has an accumulate which can show us the results of each stage of the reduction. Here's it's not to interesting, but I could tweak the inputs to change that.
In [325]: np.minimum.accumulate([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
...:
Out[325]:
array([[ 0., 1., 2.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
Convert to NumPy array and perform ndarray.min along the first axis -
np.asarray(V).min(0)
Or simply use np.amin as under the hoods, it will convert the input to an array before finding the minimum along that axis -
np.amin(V,axis=0)
Sample run -
In [52]: v1 = [2,5]
In [53]: v2 = [4,5]
In [54]: v3 = [4,4]
In [55]: v4 = [1,4]
In [56]: V = [v1, v2, v3, v4]
In [57]: np.asarray(V).min(0)
Out[57]: array([1, 4])
In [58]: np.amin(V,axis=0)
Out[58]: array([1, 4])
If you need to final output as a list, append the output with .tolist().
I have these vectors :
a = [1,2,3,4]
b = [1,2,3,5]
and I could like to have this at the end :
A = [ [1,0,0,0,0]
[0,1,0,0,0]
[0,0,1,0,0]
[0,0,0,1,0] ]
B = [ [1,0,0,0,0]
[0,1,0,0,0]
[0,0,1,0,0]
[0,0,0,0,1] ]
I have been using np.reshape from python this way:
A = np.reshape(a,(1,4,1))
B = np.reshape(b,(1,4,1))
And it does just partially the job as I have the following result:
A = [[1]
[2]
[3]
[4]]
B = [[1]
[2]
[3]
[5]]
Ideally I would like something like this:
A = np.reshape(a,(1,4,(1,5))
but when reading the docs, this is not possible.
Thanks in advance for your help
Alternatively, numpy can assign value to multiple indexes on rows/columns in one go, example:
In [1]: import numpy as np
In [2]: b = [1,2,3,5]
...:
...:
In [3]: zero = np.zeros([4,5])
In [4]: brow, bcol = range(len(b)), np.array(b) -1 # logical transform
In [5]: zero[brow, bcol] = 1
In [6]: zero
Out[6]:
array([[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1.]])
What you're trying to do is not actually a reshape, as you alter the structure of the data.
Make a new array with the shape you want:
A = np.zeros(myshape)
B = np.zeros(myshape)
and then index those arrays
n = 0
for i_a, i_b in zip(a, b):
A[n, i_a - 1] = 1
B[n, i_b - 1] = 1
n += 1
The i_a/i_b - 1 in the assignment is only there to make 1 index the 0th element. This also only works if a and b have the same length. Make this two loops if they are not the same length.
There might be a more elegant solution but this should get the job done :)
I had a weird behaviour trying to change the value of an element of a numpy array today, and I would like to understand why it didn't work. I have two arrays (a and b), and I want to change the values of b where a > 0.
a = array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
b = array([[ 5., 0., 0.],
[ 0., 5., 0.],
[ 0., 0., 5.]])
mask = a > 0
print b[mask][0]
=> 5.0
b[mask][0] = 10
print b[mask][0]
=> 5.0
Could someone please explain why the assignment b[mask][0] didn't change my value 5.0?
b[mask] is a copy of b. b[mask][0] = 1 is effectively:
c = b[mask]
c[0] = 1
The data elements of c are not (in general) a contiguous block of the elements of b.
b[mask] = 10
b[mask] = [10, 11, 12]
You can assign values to b[mask] when it is the only thing on the left. You need to change all the masked elements.
If you need to change one or two, then first change the mask so it selects only those elements.
In general
b[...][...] = ...
is not good practice. Sometimes it works (if the first indexing is a slice that produces a view), but you shouldn't count on it. It takes a while to full grasp the difference between a view and copy.
The [] get translated by the Python interpreter into calls to __getitem__ or __setitem__. The following pairs are equivalent:
c = b[mask]
c = b.__getitem__(mask)
b[mask] = 10
b.__setitem__(mask, 10)
b[mask][0] = 10
b.__getitem__(mask).__setitem__(0, 10)
b[mask][10] is 2 operations, a get followed by a set. The set operates on the result of the get. It modifies b only if the result of the get is a view.