Related
numpy.unique has an optional argument return_counts. From the docs:
return_counts bool, optional If True, also return the number of times
each unique item appears in ar.
New in version 1.9.0.
Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
])
When I apply np.unique:
np.unique(m_sample, axis=1, return_counts=True)
(array([[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[1, 5, 4]]), array([1, 1, 1]))
I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?
When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True) gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]
I have a multidimension array that looks like this:
my_array = np.arange(2)[:,None,None] *np.arange(4)[:, None]*np.arange(8)
I am looking for a multidimensional equivalent of the 2-D argmax
In particular, I am looking for argmax of maxima along axis = 2. I tried reshaping first, but reshaping will completely destroy the original indices information of the entire array, so it probably won't work. I have no clue how to do it and need helps from you guys. Thank you in advance
EDIT: Desire output is:
[(0,0,0),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5),(1,3,6),(1,3,7)]
This exactly is the array of the indices of maxima along axis = 2
For finding such argmax indices along the last axis of a 3D ndarray, we can use something along these lines -
In [66]: idx = my_array.reshape(-1,my_array.shape[-1]).argmax(0)
In [67]: r,c = np.unravel_index(idx,my_array.shape[:-1])
In [68]: l = np.arange(len(idx))
In [69]: np.c_[r,c,l]
Out[69]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])
To extend this to a generic ndarray -
In [99]: R = np.unravel_index(idx,my_array.shape[:-1])
In [104]: np.hstack((np.c_[R],l[:,None]))
Out[104]:
array([[0, 0, 0],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[1, 3, 4],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7]])
I am not able to understand integer array indexing in numpy.
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
Please explain me what is happening in this?
x[[0,1,2],[0,1,0]]
[0,1,2] <- here you specify which arrays you will be using
[0,1,0] <- here you choose elements from each of specified arrays
So element 0 from array 0, element 1 form arr 1 and so on
In [76]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [77]: x
Out[77]:
array([[1, 2],
[3, 4],
[5, 6]])
Because the 1st and 2nd indexing lists match in size, their values are paired up to select elements from x. I'll illustrate it with list indexing:
In [78]: x[[0, 1, 2], [0, 1, 0]]
Out[78]: array([1, 4, 5])
In [79]: list(zip([0, 1, 2], [0, 1, 0]))
Out[79]: [(0, 0), (1, 1), (2, 0)]
In [80]: [x[i,j] for i,j in zip([0, 1, 2], [0, 1, 0])]
Out[80]: [1, 4, 5]
Or more explicitly, it is returning x[0,0], x[1,1] and x[2,0], as a 1d array. Another way to think it is that you've picked the [0,1,0] elements from the 3 rows (respectively).
I find it easiest to understand as follows:
In [179]: x = np.array([[1, 2], [3, 4], [5, 6]])
In [180]: x
Out[180]:
array([[1, 2],
[3, 4],
[5, 6]])
Say we want to select 1, 4, and 5 from this matrix. So the 0th column of row 0, the 1st column of the 1st row, and the 0th column of the 2nd row. Now provide the index with two arrays (one for each dimension of the matrix), where we populate these arrays with the rows and then the columns we are interested in:
In [181]: rows = np.array([0, 1, 2])
In [182]: cols = np.array([0, 1, 0])
In [183]: x[rows, cols]
Out[183]: array([1, 4, 5])
I have a matrix with dimention (2,5) and I have have a vector of values to be fill in that matrix. What is the best way. I can think of three methods but I have trouble using the np.empty & fill and np.full without loops
x=np.array(range(0,10))
mat=x.reshape(2,5)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
mat=np.empty((2,5))
newMat=mat.fill(x) # Error: The x has to be scalar
mat=np.full((2,5),x) # Error: The x has to be scalar
full and fill are for setting all elements the same
In [557]: np.full((2,5),10)
Out[557]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
Assigning an array works provided the shapes match (in the broadcasting sense):
In [558]: arr[...] = x.reshape(2,5) # make source the same shape as target
In [559]: arr
Out[559]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [560]: arr.flat = x # make target same shape as source
In [561]: arr
Out[561]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
arr.flat and arr.ravel() are equivalent. Well, not quite:
In [562]: arr.flat = x.reshape(2,5) # don't need the [:] with flat #wim
In [563]: arr
Out[563]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [564]: arr.ravel()[:] = x.reshape(2,5)
ValueError: could not broadcast input array from shape (2,5) into shape (10)
In [565]: arr.ravel()[:] = x.reshape(2,5).flat
flat works with any shape source, even ones that require replication
In [570]: arr.flat = [1,2,3]
In [571]: arr
Out[571]:
array([[1, 2, 3, 1, 2],
[3, 1, 2, 3, 1]])
More broadcasted inputs
In [572]: arr[...] = np.ones((2,1))
In [573]: arr
Out[573]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
In [574]: arr[...] = np.arange(5)
In [575]: arr
Out[575]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
An example of the problem Eric mentioned. The ravel (or other reshape) of a transpose is (often) a copy. So writing to that does not modify the original.
In [578]: arr.T.ravel()[:]=10
In [579]: arr
Out[579]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [580]: arr.T.flat=10
In [581]: arr
Out[581]:
array([[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10]])
ndarray.flat returns an object which can modify the contents of the array by direct assignment:
>>> array = np.empty((2,5), dtype=int)
>>> vals = range(10)
>>> array.flat = vals
>>> array
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
If that seems kind of magical to you, then read about the descriptor protocol.
Warning: assigning to flat does not raise exceptions for size mismatch. If there are not enough values on the right hand side of the assignment, the data will be rolled/repeated. If there are too many values, only the first few will be used.
If you want a 10x2 matrix of 5:
np.ones((10,2))*5
If you have a list of values and just want them in a particular shape:
datavalues = [1,2,3,4,5,6,7,8,9,10]
np.reshape(datavalues,(2,5))
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
I want to get the result of a list (or array) of indices from a numpy array, in the shape: ( len(indices), (shape of one indexing operation) ).
Is there any way to use a list of indices directly, without using a for loop, like I used in the mininal example, shown below?
c = np.random.randint(0, 5, size=(4, 5))
indices = [[0, slice(0, 4)], [1, slice(0, 4)], [1, slice(0, 4)], [2, slice(0, 4)]]
# desired result using a for loop
res = []
for idx in indices:
res.append(c[idx])
It should be noted, that the indices list is not representative of my problem, it serves as an example, in general it is generated during runtime.
However, each index operation returns the same shape
It seems that you are basically slicing until 2 rows and 4 columns from the start of the 2D input array and then splitting each row. You can do the slicing with c[:2,:4] and then split rows with np.vsplit to have a one-liner solution like so -
res_out = np.vsplit(c[:2,:4],2)
Sample run -
In [10]: c
Out[10]:
array([[0, 2, 5, 1, 0],
[1, 5, 5, 0, 3],
[0, 1, 0, 6, 6],
[2, 6, 2, 3, 3]])
In [11]: indices
Out[11]: [[0, slice(0, 4, None)], [1, slice(0, 4, None)]]
In [12]: # desired result using a for loop
...: res = []
...: for idx in indices:
...: res.append(c[idx])
...:
In [13]: res
Out[13]: [array([0, 2, 5, 1]), array([1, 5, 5, 0])]
In [14]: np.vsplit(c[:2,:4],2)
Out[14]: [array([[0, 2, 5, 1]]), array([[1, 5, 5, 0]])]
Please note that the output from np.vsplit would be a list of 2D arrays, rather than a list of 1D arrays as with the posted code in the question.
Your example can be rewritten as a list comprehension:
In [121]: [c[idx] for idx in indices]
Out[121]:
[array([4, 2, 1, 2]),
array([3, 2, 2, 3]),
array([3, 2, 2, 3]),
array([0, 3, 4, 4])]
which can be turned into a nice 2d array:
In [122]: np.array([c[idx] for idx in indices])
Out[122]:
array([[4, 2, 1, 2],
[3, 2, 2, 3],
[3, 2, 2, 3],
[0, 3, 4, 4]])
Here np.array() is a form of concatenation, joining the arrays along a new axis.
Since the 2nd index is the same for all rows (slice(4)), this indexing also works:
In [123]: c[[0,1,1,2],slice(4)] # or [...,:4]
Out[123]:
array([[4, 2, 1, 2],
[3, 2, 2, 3],
[3, 2, 2, 3],
[0, 3, 4, 4]])
Repetition on the 1st axis is not a problem. Differing slices in the 2nd take some more manipulation. Except for this special :4 case, you will have to turn the slices in to ranges. There's no way of indexing one dimension with multiple slices.
The case where the slices all have same length, but different 'start' values, is similar to the one discussed in https://stackoverflow.com/a/28007256/901925 access-multiple-elements-of-an-array.
In [135]: c.flat[[i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices]]
Out[135]:
array([[4, 2, 1, 2],
[3, 2, 2, 3],
[3, 2, 2, 3],
[0, 3, 4, 4]])
The indices that I generate this way are:
In [136]: [i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices]
Out[136]:
[array([0, 1, 2, 3]),
array([5, 6, 7, 8]),
array([5, 6, 7, 8]),
array([10, 11, 12, 13])]
It works fine if indices is somewhat irregular: indices1 = [[0, slice(0, 3)], [1, slice(2, 5)], [1, slice(1, 4)], [2, slice(0, 3)]]
My earlier answer looks at some other ways indexing. But often indexing on a flatten array is fastest, even if you take into account the calculation required to generate the index array.
If the slices vary in length, then you are stuck with generating a list of arrays, or an hstack of such a list:
In [158]: indices2 = [[0, slice(0, 2)], [1, slice(2, 5)],
[1, slice(0, 4)], [2, slice(0, 5)]]
In [159]: c.flat[np.hstack([i*c.shape[1]+np.arange(j.start,j.stop)
for i,j in indices2])]
Out[159]: array([4, 2, 2, 3, 1, 3, 2, 2, 3, 0, 3, 4, 4, 3])
In [160]: [c.flat[i*c.shape[1]+np.arange(j.start,j.stop)] for i,j in indices2]
Out[160]: [array([4, 2]), array([2, 3, 1]), array([3, 2, 2, 3]),
array([0, 3, 4, 4, 3])]
In [161]: np.hstack(_)
Out[161]: array([4, 2, 2, 3, 1, 3, 2, 2, 3, 0, 3, 4, 4, 3])
more on the varying, but equal length slices:
In [190]: indices1 = [[0, slice(0, 3)], [1, slice(2, 5)], [1, slice(1, 4)], [2, slice(0, 3)]]
In [191]: c.flat[[i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices1]]Out[191]:
array([[4, 2, 1],
[2, 3, 1],
[2, 2, 3],
[0, 3, 4]])
In [193]: rows = [[i] for i,j in indices1]
In [200]: cols=[np.arange(j.start,j.stop) for i,j in indices1]
In [201]: c[rows,cols]
Out[201]:
array([[4, 2, 1],
[2, 3, 1],
[2, 2, 3],
[0, 3, 4]])
In this case rows is a vertical list that can be broadcasted with cols.