frequency of unique values for 2d numpy array - python

I have a 2-dimensional numpy array of following format:
now how to print the frequency of unique elements in this 2d numpy array, so that it returns count([1. 0.]) = 1 and count([0. 1.]) = 1? I know how to do this using loops, but is there any better pythonic way to do this.

You can use numpy.unique(), for axis=0, and pass return_counts=True, It will return a tuple with unique values, and the counts for these values.
np.unique(arr, return_counts=True, axis=0)
OUTPUT:
(array([[0, 1],
[1, 0]]), array([1, 1], dtype=int64))

You can use collections.Counter, it will give you a dictionary with the sublists as keys and number of occurrences as values
y = np.array([[1., 0.], [0., 1.], [0., 1.]])
counter = collections.Counter(map(tuple, y))
print(counter[0., 1.]) # 2

Related

Vectorization: Each row of the mask contains the column indices to mask for the corresponding row of the array

I have an array and a mask array. They have the same rows. Each row of the mask contains the indices to mask the array for the corresponding row. How to do the vectorization instead of using for loop?
Codes like this:
a = np.zeros((2, 4))
mask = np.array([[2, 3], [0, 1]])
# I'd like a vectorized way to do this (because the rows and cols are large):
a[0, mask[0]] = 1
a[1, mask[1]] = 1
This is what I want to obtain:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])
==================================
The question has been answered by #mozway, but the efficiency between the for-loop solution and vectorized one is questioned by #AhmedAEK. So I did the efficiency comparison:
N = 5000
M = 10000
a = np.zeros((N, M))
# choice without replacement
mask = np.random.rand(N, M).argpartition(3, axis=1)[:,:3]
def t1():
for i in range(N):
a[i, mask[i]] = 1
def t2():
a[np.arange(a.shape[0])[:, None], mask] = 1
Then I use %timeit in Jupyter and got this screenshot:
You can use:
a[[[0],[1]], mask] = 1
Or, programmatically generating the rows slicer:
a[np.arange(a.shape[0])[:,None], mask] = 1
output:
array([[0., 0., 1., 1.],
[1., 1., 0., 0.]])

How to Replace multiple pixels with each numbers in numpy?

I have two np-array and one list like this.
import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
y = np.ones((3,3))
idx = [[1,1],[2,2]]
x
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
y
>>> array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
I would like to get z array like this.
z
>>> array([[1., 1., 1.],
[1., 5., 1.],
[1., 1., 9.]])
Like above z array, i would like to replace index index of y in idx with index index of x in idx by not using for-loop.
If I use for loop, I can do it like this.
for i in range(len(idx)):
y[idx[i][0]][idx[i][1]] = x[idx[i][0]][idx[i][1]]
Actually, this is easy example, but I have bigger array, so if I use for-loop, this takes much time.
HOW TO DO?
Don't use a for loop, it defeats one of the reasons to use NumPy -- vectorized operations.
import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
y = np.ones((3,3))
# Rows and columns to filter.
idx = np.array([[1,1],[2,2]]).T
# Copying desired elements of x to y.
y[tuple(idx)] = x[tuple(idx)]
Taking the transpose of index as NumPy expects you to pass a tuple of row indices and column indices (row_idx, col_idx), which in your case would be ([1, 1], [2, 2]).
Let numpy do the looping for you:
idx = np.array([[1,1],[2,2]])
y[idx[:,0],idx[:,1]] = x[idx[:,0],idx[:,1]]
print(y)
Output:
[[1. 1. 1.]
[1. 5. 1.]
[1. 1. 9.]]

Select array elements with variable index bounds in numpy

This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?
I'm by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1
One of the issues with a purely NumPy method to solve this is that there exists no method to 'slice' a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]. Then you can use np.eye and np.sum over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4) from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is -
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is -
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
Using numba.njit decorator
import numpy as np
import numba
#numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s

Element-wise minimum of multiple vectors in numpy

I know that in numpy I can compute the element-wise minimum of two vectors with
numpy.minimum(v1, v2)
What if I have a list of vectors of equal dimension, V = [v1, v2, v3, v4] (but a list, not an array)? Taking numpy.minimum(*V) doesn't work. What's the preferred thing to do instead?
*V works if V has only 2 arrays. np.minimum is a ufunc and takes 2 arguments.
As a ufunc it has a .reduce method, so it can apply repeated to a list inputs.
In [321]: np.minimum.reduce([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
Out[321]: array([ 0., 1., 0.])
I suspect the np.min approach is faster, but that could depend on the array and list size.
In [323]: np.array([np.arange(3), np.arange(2,-1,-1), np.ones((3,))]).min(axis=0)
Out[323]: array([ 0., 1., 0.])
The ufunc also has an accumulate which can show us the results of each stage of the reduction. Here's it's not to interesting, but I could tweak the inputs to change that.
In [325]: np.minimum.accumulate([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
...:
Out[325]:
array([[ 0., 1., 2.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
Convert to NumPy array and perform ndarray.min along the first axis -
np.asarray(V).min(0)
Or simply use np.amin as under the hoods, it will convert the input to an array before finding the minimum along that axis -
np.amin(V,axis=0)
Sample run -
In [52]: v1 = [2,5]
In [53]: v2 = [4,5]
In [54]: v3 = [4,4]
In [55]: v4 = [1,4]
In [56]: V = [v1, v2, v3, v4]
In [57]: np.asarray(V).min(0)
Out[57]: array([1, 4])
In [58]: np.amin(V,axis=0)
Out[58]: array([1, 4])
If you need to final output as a list, append the output with .tolist().

Assign multiple values to multiple slices of a numpy array at once

I have a numpy array, a list of start/end indexes that define ranges within the array, and a list of values, where the number of values is the same as the number of ranges. Doing this assignment in a loop is currently very slow, so I'd like to assign the values to the corresponding ranges in the array in a vectorized way. Is this possible to do?
Here's a concrete, simplified example:
a = np.zeros([10])
Here's the list of start and a list of end indexes that define ranges within a, like this:
starts = [0, 2, 4, 6]
ends = [2, 4, 6, 8]
And here's a list of values I'd like to assign to each range:
values = [1, 2, 3, 4]
I have two problems. The first is that I can't figure out how to index into the array using multiple slices at the same time, since the list of ranges is constructed dynamically in the actual code. Once I'm able to extract the ranges, I'm not sure how to assign multiple values at once - one value per range.
Here's how I've tried creating a list of slices and the problems I've run into when using that list to index into the array:
slices = [slice(start, end) for start, end in zip(starts, ends)]
In [97]: a[slices]
...
IndexError: too many indices for array
In [98]: a[np.r_[slices]]
...
IndexError: arrays used as indices must be of integer (or boolean) type
If I use a static list, I can extract multiple slices at once, but then assignment doesn't work the way I want:
In [106]: a[np.r_[0:2, 2:4, 4:6, 6:8]] = [1, 2, 3]
/usr/local/bin/ipython:1: DeprecationWarning: assignment will raise an error in the future, most likely because your index result shape does not match the value array shape. You can use `arr.flat[index] = values` to keep the old behaviour.
#!/usr/local/opt/python/bin/python2.7
In [107]: a
Out[107]: array([ 1., 2., 3., 1., 2., 3., 1., 2., 0., 0.])
What I actually want is this:
np.array([1., 1., 2., 2., 3., 3., 4., 4., 0., 0.])
This will do the trick in a fully vectorized manner:
counts = ends - starts
idx = np.ones(counts.sum(), dtype=np.int)
idx[np.cumsum(counts)[:-1]] -= counts[:-1]
idx = np.cumsum(idx) - 1 + np.repeat(starts, counts)
a[idx] = np.repeat(values, count)
One possibility is to zip the start, end index with the values and broadcast the index and values manually:
starts = [0, 2, 4, 6]
ends = [2, 4, 6, 8]
values = [1, 2, 3, 4]
a = np.zeros(10)
import numpy as np
# calculate the index array and value array by zipping the starts, ends and values and expand it
idx, val = zip(*[(list(range(s, e)), [v] * (e-s)) for s, e, v in zip(starts, ends, values)])
# assign values
a[np.array(idx).flatten()] = np.array(val).flatten()
a
# array([ 1., 1., 2., 2., 3., 3., 4., 4., 0., 0.])
Or write a for loop to assign values one range by another:
for s, e, v in zip(starts, ends, values):
a[slice(s, e)] = v
a
# array([ 1., 1., 2., 2., 3., 3., 4., 4., 0., 0.])

Categories