Element-wise minimum of multiple vectors in numpy - python

I know that in numpy I can compute the element-wise minimum of two vectors with
numpy.minimum(v1, v2)
What if I have a list of vectors of equal dimension, V = [v1, v2, v3, v4] (but a list, not an array)? Taking numpy.minimum(*V) doesn't work. What's the preferred thing to do instead?

*V works if V has only 2 arrays. np.minimum is a ufunc and takes 2 arguments.
As a ufunc it has a .reduce method, so it can apply repeated to a list inputs.
In [321]: np.minimum.reduce([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
Out[321]: array([ 0., 1., 0.])
I suspect the np.min approach is faster, but that could depend on the array and list size.
In [323]: np.array([np.arange(3), np.arange(2,-1,-1), np.ones((3,))]).min(axis=0)
Out[323]: array([ 0., 1., 0.])
The ufunc also has an accumulate which can show us the results of each stage of the reduction. Here's it's not to interesting, but I could tweak the inputs to change that.
In [325]: np.minimum.accumulate([np.arange(3), np.arange(2,-1,-1), np.ones((3,))])
...:
Out[325]:
array([[ 0., 1., 2.],
[ 0., 1., 0.],
[ 0., 1., 0.]])

Convert to NumPy array and perform ndarray.min along the first axis -
np.asarray(V).min(0)
Or simply use np.amin as under the hoods, it will convert the input to an array before finding the minimum along that axis -
np.amin(V,axis=0)
Sample run -
In [52]: v1 = [2,5]
In [53]: v2 = [4,5]
In [54]: v3 = [4,4]
In [55]: v4 = [1,4]
In [56]: V = [v1, v2, v3, v4]
In [57]: np.asarray(V).min(0)
Out[57]: array([1, 4])
In [58]: np.amin(V,axis=0)
Out[58]: array([1, 4])
If you need to final output as a list, append the output with .tolist().

Related

Trying to understand signature in numpy.vectorize

I am trying to understand the signature functionality in numpy.vectorize. I have some examples but did not help much in the understanding.
>>import scipy.stats
>>pearsonr = np.vectorize(scipy.stats.pearsonr, signature='(n),(n)->(),()')
>>pearsonr([[0, 1, 2, 3]], [[1, 2, 3, 4], [4, 3, 2, 1]])
(array([ 1., -1.]), array([ 0., 0.]))
>>convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')
>>convolve(np.eye(4), [1, 2, 1])
array([[1., 2., 1., 0., 0., 0.],
[0., 1., 2., 1., 0., 0.],
[0., 0., 1., 2., 1., 0.],
[0., 0., 0., 1., 2., 1.]])
>>>import numpy as np
>>>qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
>>>qr(np.random.normal(size=(1, 3, 2)))
(array([[-0.31622777, -0.9486833 ],
[-0.9486833 , 0.31622777]]),
array([[-3.16227766, -4.42718872, -5.69209979],
[ 0. , -0.63245553, -1.26491106]]))
>>>import scipy
>>>logm = np.vectorize(scipy.linalg.logm, signature='(m,m)->(m,m)')
>>>logm(np.random.normal(size=(1, 3, 2)))
array([[[ 1.08226288, -2.29544602],
[ 2.12599894, -1.26335203]]])
Can you please someone explain the functionality-syntax of the signatures
signature='(n),(n)->(),()'
signature='(n),(m)->(k)'
signature='(m,n)->(m,k),(k,n)'
signature='(m,m)->(m,m)'
used in the aforementioned examples? If we didn't use the signatures, how the examples would have been implemented in a more easy-naive way?
Any help is highly appreciated.
The aforementioned examples can be found here and here.
I think the explanation would be clearer if we knew the 'signature' of the individual functions - what they expect, and what they produce. But I can make some deductions from the code you show.
>>pearsonr = np.vectorize(scipy.stats.pearsonr, signature='(n),(n)->(),()')
>>pearsonr([[0, 1, 2, 3]], [[1, 2, 3, 4], [4, 3, 2, 1]])
(array([ 1., -1.]), array([ 0., 0.]))
This is called with a (4,) and (2,4) arrays (well, lists that become such arrays). They broadcast together to (2,4). The stats function is then called twice, once for each row of the pair, getting two (4,) arrays, and returning 2 scalar values (maybe the mean and std?)
>>convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')
>>convolve(np.eye(4), [1, 2, 1])
array([[1., 2., 1., 0., 0., 0.],
[0., 1., 2., 1., 0., 0.],
[0., 0., 1., 2., 1., 0.],
[0., 0., 0., 1., 2., 1.]])
This called with (4,4) and (3,) arrays. I think convolve gets called 4 times, once for each row of the eye, and getting the same [1,2,1] each time. The result is a 4 row array (with 6 columns - determined by convolve itself, not vectorize.
>>>import numpy as np
>>>qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
>>>qr(np.random.normal(size=(1, 3, 2)))
(array([[-0.31622777, -0.9486833 ],
[-0.9486833 , 0.31622777]]),
array([[-3.16227766, -4.42718872, -5.69209979],
[ 0. , -0.63245553, -1.26491106]]))
Signature: np.linalg.qr(a, mode='reduced')
a : array_like, shape (M, N)
'reduced' : returns q, r with dimensions (M, K), (K, N) (default)
vectorize signature just repeats the information in the docs.
a is (1,3,2) shape array; so qr is called once (1st dimension), with a (3,2) array. The result is 2 arrays, (2,k) and (k,3) shapes. When I run it I get an added size 1 dimension (1,2,3) and (1,2,2). Different numbers because of random:
In [120]: qr = np.vectorize(np.linalg.qr, signature='(m,n)->(m,k),(k,n)')
...: qr(np.random.normal(size=(1, 3,2)))
Out[120]:
(array([[[-0.61362528, 0.09161174],
[ 0.63682861, -0.52978942],
[-0.46681188, -0.84316692]]]),
array([[[-0.65301725, -1.00494992],
[ 0. , 0.8068886 ]]]))
>>>import scipy
>>> logm = np.vectorize(scipy.linalg.logm, signature='(m,m)->(m,m)')
>>>logm(np.random.normal(size=(1, 3, 2)))
array([[[ 1.08226288, -2.29544602],
[ 2.12599894, -1.26335203]]])
scipy.linalg.logm expects square array, and returns the same.
Calling logm with a (1,3,2) produces an error, because (3,2) is not a square array:
ValueError: inconsistent size for core dimension 'm': 2 vs 3
Calling scipy.linalg.logm directly produces the same error, worded differently:
linalg.logm(np.random.normal(size=(3, 2)))
ValueError: expected square array_like input
When I say the function is called twice, or something like that, I'm ignoring the test call that's used to determine the return dtype.

frequency of unique values for 2d numpy array

I have a 2-dimensional numpy array of following format:
now how to print the frequency of unique elements in this 2d numpy array, so that it returns count([1. 0.]) = 1 and count([0. 1.]) = 1? I know how to do this using loops, but is there any better pythonic way to do this.
You can use numpy.unique(), for axis=0, and pass return_counts=True, It will return a tuple with unique values, and the counts for these values.
np.unique(arr, return_counts=True, axis=0)
OUTPUT:
(array([[0, 1],
[1, 0]]), array([1, 1], dtype=int64))
You can use collections.Counter, it will give you a dictionary with the sublists as keys and number of occurrences as values
y = np.array([[1., 0.], [0., 1.], [0., 1.]])
counter = collections.Counter(map(tuple, y))
print(counter[0., 1.]) # 2

Select array elements with variable index bounds in numpy

This might be not possible as the intermediate array would have variable length rows.
What I am trying to accomplish is assigning a value to an array for the elements which have ad index delimited by my array of bounds. As an example:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
__assign(array, bounds, 1)
after the assignment should result in
array = [
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 1]
]
I have tried something like this in various iterations without success:
ind = np.arange(array.shape[0])
array[ind, bounds[ind][0]:bounds[ind][1]] = 1
I am trying to avoid loops as this function will be called a lot. Any ideas?
I'm by no means a Numpy expert, but from the different array indexing options I could find, this was the fastest solution I could figure out:
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
for i, x in enumerate(bounds):
cols = slice(x[0], x[1])
array[i, cols] = 1
Here we iterate through the list of bounds and reference the columns using slices.
I tried the below way of first constructing a list of column indices and a list of row indices, but it was way slower. Like 10 seconds plus vir 0.04 seconds on my laptop for a 10 000 x 10 000 array. I guess the slices make a huge difference.
bounds = np.array([[1,2], [1,3], [1,4]])
array = np.zeros((3,4))
cols = []
rows = []
for i, x in enumerate(bounds):
cols += list(range(x[0], x[1]))
rows += (x[1] - x[0]) * [i]
# print(cols) [1, 1, 2, 1, 2, 3]
# print(rows) [0, 1, 1, 2, 2, 2]
array[rows, cols] = 1
One of the issues with a purely NumPy method to solve this is that there exists no method to 'slice' a NumPy array using bounds from another over an axis. So the resultant expanded bounds end up becoming a variable-length list of lists such as [[1],[1,2],[1,2,3]. Then you can use np.eye and np.sum over axis=0 to get the required output.
bounds = np.array([[1,2], [1,3], [1,4]])
result = np.stack([np.sum(np.eye(4)[slice(*i)], axis=0) for i in bounds])
print(result)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
I tried various ways of being able to slice the np.eye(4) from [start:stop] over a NumPy array of starts and stops but sadly you will need an iteration to accomplish this.
EDIT: Another way you can do this in a vectorized way without any loops is -
def f(b):
o = np.sum(np.eye(4)[b[0]:b[1]], axis=0)
return o
np.apply_along_axis(f, 1, bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
EDIT: If you are looking for a superfast solution but can tolerate a single for loop then the fastest approach based on my simulations among all answers on this thread is -
def h(bounds):
zz = np.zeros((len(bounds), bounds.max()))
for z,b in zip(zz,bounds):
z[b[0]:b[1]]=1
return zz
h(bounds)
array([[0., 1., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 1., 1.]])
Using numba.njit decorator
import numpy as np
import numba
#numba.njit
def numba_assign_in_range(arr, bounds, val):
for i in range(len(bounds)):
s, e = bounds[i]
arr[i, s:e] = val
return arr
test_size = int(1e6) * 2
bounds = np.zeros((test_size, 2), dtype='int32')
bounds[:, 0] = 1
bounds[:, 1] = np.random.randint(0, 100, test_size)
a = np.zeros((test_size, 100))
with numba.njit
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.2 µs
without numba.njit
CPU times: user 3.54 s, sys: 1.63 ms, total: 3.54 s
Wall time: 3.55 s

Numpy indexing set 1 to max value and zero's to all others

I think I've misunderstood something with indexing in numpy.
I have a 3D-numpy array of shape (dim_x, dim_y, dim_z) and I want to find the maximum along the third axis (dim_z), and set its value to 1 and all the others to zero.
The problem is that I end up with several 1 in the same row, even if values are different.
Here is the code :
>>> test = np.random.rand(2,3,2)
>>> test
array([[[ 0.13110146, 0.07138861],
[ 0.84444158, 0.35296986],
[ 0.97414498, 0.63728852]],
[[ 0.61301975, 0.02313646],
[ 0.14251848, 0.91090492],
[ 0.14217992, 0.41549218]]])
>>> result = np.zeros_like(test)
>>> result[:test.shape[0], np.arange(test.shape[1]), np.argmax(test, axis=2)]=1
>>> result
array([[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]],
[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]]])
I was expecting to end with :
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Probably I'm missing something here. From what I've understood, 0:dim_x, np.arange(dim_y) returns dim_x of dim_y tuples and np.argmax(test, axis=dim_z) has the shape (dim_x, dim_y) so if the indexing is of the form [x, y, z] a couple [x, y] is not supposed to appear twice.
Could someone explain me where I'm wrong ? Thanks in advance.
What we are looking for
We get the argmax indices along the last axis -
idx = np.argmax(test, axis=2)
For the given sample data, we have idx :
array([[0, 0, 0],
[0, 1, 1]])
Now, idx covers the first and second axes, while getting those argmax indices.
To assign the corresponding ones in the output, we need to create range arrays for the first two axes covering the lengths along those and aligned according to the shape of idx. Now, idx is a 2D array of shape (m,n), where m = test.shape[0] and n = test.shape[1].
Thus, the range arrays for assignment into first two axes of output must be -
X = np.arange(test.shape[0])[:,None]
Y = np.arange(test.shape[1])
Notice, the extension of the first range array to 2D is needed to have it aligned against the rows of idx and Y would align against the cols of idx -
In [239]: X
Out[239]:
array([[0],
[1]])
In [240]: Y
Out[240]: array([0, 1, 2])
Schematically put -
idx :
Y array
--------->
x x x | X array
x x x |
v
The fault in original code
Your code was -
result[:test.shape[0], np.arange(test.shape[1]), ..
This is essentially :
result[:, np.arange(test.shape[1]), ...
So, you are selecting all elements along the first axis, instead of only selecting the corresponding ones that correspond to idx indices. In that process, you were selecting a lot more than required elements for assignment and hence you were seeing many more than required 1s in result array.
The correction
Thus, the only correction needed was indexing into the first axis with the range array and a working solution would be -
result[np.arange(test.shape[0])[:,None], np.arange(test.shape[1]), ...
The alternative(s)
Alternatively, using the range arrays created earlier with X and Y -
result[X,Y,idx] = 1
Another way to get X,Y would be with np.mgrid -
m,n = test.shape[:2]
X,Y = np.ogrid[:m,:n]
I think there's a problem with mixing basic (slice) and advanced indexing. It's easier to see when selecting value from an array than with this assignment; but it can result in transposed axes. For a problem like this it is better use advanced indexing all around, as provided by ix_
In [24]: test = np.random.rand(2,3,2)
In [25]: idx=np.argmax(test,axis=2)
In [26]: idx
Out[26]:
array([[1, 0, 1],
[0, 1, 1]], dtype=int32)
with basic and advanced:
In [31]: res1 = np.zeros_like(test)
In [32]: res1[:, np.arange(test.shape[1]), idx]=1
In [33]: res1
Out[33]:
array([[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]],
[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]]])
with advanced:
In [35]: I,J = np.ix_(range(test.shape[0]), range(test.shape[1]))
In [36]: I
Out[36]:
array([[0],
[1]])
In [37]: J
Out[37]: array([[0, 1, 2]])
In [38]: res2 = np.zeros_like(test)
In [40]: res2[I, J , idx]=1
In [41]: res2
Out[41]:
array([[[ 0., 1.],
[ 1., 0.],
[ 0., 1.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
On further thought, the use of the slice for the 1st dimension is just wrong , if the goal is to set or find the 6 argmax values
In [54]: test
Out[54]:
array([[[ 0.15288242, 0.36013289],
[ 0.90794601, 0.15265616],
[ 0.34014976, 0.53804266]],
[[ 0.97979479, 0.15898605],
[ 0.04933804, 0.89804999],
[ 0.10199319, 0.76170911]]])
In [55]: test[I, J, idx]
Out[55]:
array([[ 0.36013289, 0.90794601, 0.53804266],
[ 0.97979479, 0.89804999, 0.76170911]])
In [56]: test[:, J, idx]
Out[56]:
array([[[ 0.36013289, 0.90794601, 0.53804266],
[ 0.15288242, 0.15265616, 0.53804266]],
[[ 0.15898605, 0.04933804, 0.76170911],
[ 0.97979479, 0.89804999, 0.76170911]]])
With the slice it selects a (2,3,2) set of values from test (or res), not the intended (2,3). There 2 extra rows.
Here is an easier way to do it:
>>> test == test.max(axis=2, keepdims=1)
array([[[ True, False],
[ True, False],
[ True, False]],
[[ True, False],
[False, True],
[False, True]]], dtype=bool)
...and if you really want that as floating-point 1.0 and 0.0, then convert it:
>>> (test==test.max(axis=2, keepdims=1)).astype(float)
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Here is a way to do it with only one winner per row-column combo (i.e. no ties, as discussed in comments):
rowmesh, colmesh = np.meshgrid(range(test.shape[0]), range(test.shape[1]), indexing='ij')
maxloc = np.argmax(test, axis=2)
flatind = np.ravel_multi_index( [rowmesh, colmesh, maxloc ], test.shape )
result = np.zeros_like(test)
result.flat[flatind] = 1
UPDATE after reading hpaulj's answer:
rowmesh, colmesh = np.ix_(range(test.shape[0]), range(test.shape[1]))
is a more-efficient, more numpythonic, alternative to my meshgrid call (the rest of the code stays the same)
The issue of why your approach fails is hard to explain, but here's one place where intuition could start: your slicing approach says "all rows, times all columns, times a certain sequence of layers". How many elements is that slice in total? By contrast, how many elements do you actually want to set to 1? It can be instructive to look at the values you get when you view the corresponding test values of the slice you're trying to assign to:
>>> test[:, :, maxloc].shape
(2, 3, 2, 3) # oops! it's because maxloc itself is 2x3
>>> test[:, :, maxloc]
array([[[[ 0.13110146, 0.13110146, 0.13110146],
[ 0.13110146, 0.07138861, 0.07138861]],
[[ 0.84444158, 0.84444158, 0.84444158],
[ 0.84444158, 0.35296986, 0.35296986]],
[[ 0.97414498, 0.97414498, 0.97414498],
[ 0.97414498, 0.63728852, 0.63728852]]],
[[[ 0.61301975, 0.61301975, 0.61301975],
[ 0.61301975, 0.02313646, 0.02313646]],
[[ 0.14251848, 0.14251848, 0.14251848],
[ 0.14251848, 0.91090492, 0.91090492]],
[[ 0.14217992, 0.14217992, 0.14217992],
[ 0.14217992, 0.41549218, 0.41549218]]]]) # note the repetition, because in maxloc you're repeatedly asking for layer 0 sometimes, and sometimes repeatedly for layer 1

(Numpy) Index list to boolean array

Input:
array length (Integer)
indexes (Set or List)
Output:
A boolean numpy array that has a value 1 for the indexes 0 for the others.
Example:
Input: array_length=10, indexes={2,5,6}
Output:
[0,0,1,0,0,1,1,0,0,0]
Here is a my simple implementation:
def indexes2booleanvec(size, indexes):
v = numpy.zeros(size)
for index in indexes:
v[index] = 1.0
return v
Is there more elegant way to implement this?
One way is to avoid the loop
In [7]: fill = np.zeros(array_length) # array_length = 10
In [8]: fill[indexes] = 1 # indexes = [2,5,6]
In [9]: fill
Out[9]: array([ 0., 0., 1., 0., 0., 1., 1., 0., 0., 0.])
Another way to do it (in one line):
np.isin(np.arange(array_length), indexes)
However this is slower than Zero's solution.

Categories