Related
Given indexes for each row, how to return the corresponding elements in a 2-d matrix?
For instance, In array of np.array([[1,2,3,4],[4,5,6,7]]) I expect to see the output [[1,2],[4,5]] given indxs = np.array([[0,1],[0,1]]). Below is what I've tried:
a= np.array([[1,2,3,4],[4,5,6,7]])
indxs = np.array([[0,1],[0,1]]) #means return the elements located at 0 and 1 for each row
#I tried this, but it returns an array with shape (2, 2, 4)
a[idxs]
The reason you are getting two times your array is that when you do a[[0,1]] you are selecting the rows 0 and 1 from your array a, which are indeed your entire array.
In[]: a[[0,1]]
Out[]: array([[1, 2, 3, 4],
[4, 5, 6, 7]])
You can get the desired output using slides. That would be the easiest way.
a = np.array([[1,2,3,4],[4,5,6,7]])
a[:,0:2]
Out []: array([[1, 2],
[4, 5]])
In case you are still interested on indexing, you could also get your output doing:
In[]: [list(a[[0],[0,1]]),list(a[[1],[0,1]])]
Out[]: [[1, 2], [4, 5]]
The NumPy documentation gives you a really nice overview on how indexes work.
In [120]: indxs = np.array([[0,1],[0,1]])
In [121]: a= np.array([[1,2,3,4],[4,5,6,7]])
...: indxs = np.array([[0,1],[0,1]]) #
You need to provide an index for the first dimension, one that broadcasts with with indxs.
In [122]: a[np.arange(2)[:,None], indxs]
Out[122]:
array([[1, 2],
[4, 5]])
indxs is (2,n), so you need a (2,1) array to give a (2,n) result
I have arrays
A = np.array([
[1,6,5],
[2,3,4],
[8,3,0]
])
B = np.array([
[3,2,1],
[9,2,8],
[2,1,8]
])
Doing a = np.argwhere(A > 4) gives me an array of position/indexes of values in array A that are greater than 4 i.e. [[0 1], [0 2], [2 0]].
I need to use these indexes/position from a = np.argwhere(A > 4) to replace the values in array B to zero at these specific position i.e. array B should now be
B = np.array([
[3,0,0],
[9,2,8],
[0,1,8]
])
I am big time stuck any help with this will be really appreciated.
Thank You :)
It should be as simple as:
B[A > 4] = 0
In general, though, note that the indices returned by np.where are meant to be applied to numpy.ndarray objects, so you could have done:
B[np.where(A > 4)] = 0
Generally I don't use np.where with a condition like this, I just use the boolean mask directly, as in John Zwinck's answer. But it is probably important to understand that you could
>>> B[np.where(A > 4)] = 0
>>> B
array([[3, 0, 0],
[9, 2, 8],
[0, 1, 8]])
I have a numpy array which stores a set of indices I need to access another numpy array.
I tried to use a for loop but it doesn't work as I expected.
The situation is like this:
>>> a
array([[1, 2],
[3, 4]])
>>> c
array([[0, 0],
[0, 1]])
>>> a[c[0]]
array([[1, 2],
[1, 2]])
>>> a[0,0] # the result I want
1
Above is a simplified version of my actual code, where the c array is much larger so I have to use a for loop to get every index.
Convert it to a tuple:
>>> a[tuple(c[0])]
1
Because list and array indices trigger advanced indexing. tuples are (mostly) basic slicing.
Index a with columns of c by passing the first column as row's index and second one as column index:
In [23]: a[c[:,0], c[:,1]]
Out[23]: array([1, 2])
Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b
I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?
unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])
Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])