How do I access the ith column of a NumPy multidimensional array? - python

Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?

To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.

>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])

And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])

You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])

Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes

To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2

>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])

This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b

Related

Given the indexes corresponding to each row, get the corresponding elements from a matrix

Given indexes for each row, how to return the corresponding elements in a 2-d matrix?
For instance, In array of np.array([[1,2,3,4],[4,5,6,7]]) I expect to see the output [[1,2],[4,5]] given indxs = np.array([[0,1],[0,1]]). Below is what I've tried:
a= np.array([[1,2,3,4],[4,5,6,7]])
indxs = np.array([[0,1],[0,1]]) #means return the elements located at 0 and 1 for each row
#I tried this, but it returns an array with shape (2, 2, 4)
a[idxs]
The reason you are getting two times your array is that when you do a[[0,1]] you are selecting the rows 0 and 1 from your array a, which are indeed your entire array.
In[]: a[[0,1]]
Out[]: array([[1, 2, 3, 4],
[4, 5, 6, 7]])
You can get the desired output using slides. That would be the easiest way.
a = np.array([[1,2,3,4],[4,5,6,7]])
a[:,0:2]
Out []: array([[1, 2],
[4, 5]])
In case you are still interested on indexing, you could also get your output doing:
In[]: [list(a[[0],[0,1]]),list(a[[1],[0,1]])]
Out[]: [[1, 2], [4, 5]]
The NumPy documentation gives you a really nice overview on how indexes work.
In [120]: indxs = np.array([[0,1],[0,1]])
In [121]: a= np.array([[1,2,3,4],[4,5,6,7]])
...: indxs = np.array([[0,1],[0,1]]) #
You need to provide an index for the first dimension, one that broadcasts with with indxs.
In [122]: a[np.arange(2)[:,None], indxs]
Out[122]:
array([[1, 2],
[4, 5]])
indxs is (2,n), so you need a (2,1) array to give a (2,n) result

Is there an `np.repeat` which acts on an existing array?

I have a large NumPy array which I want to fill with new data on each iteration of a loop. The array is filled with data repeated along axis 0, for example:
[[1, 5],
[1, 5],
[1, 5],
[1, 5]]
I know how to create this array from scratch in each iteration:
x = np.repeat([[1, 5]], 4, axis=0)
However, I don't want to create a new array every time, because it's a very large array (much larger than 4x2). Instead, I want to create the array in advance using the above code, and then just fill the array with new data on each iteration.
But np.repeat() returns a new array, rather than acting on an existing array. Is there an equivalent of np.repeat() for filling an existing array?
As we noted in comments, you can use a broadcasting assignment to fill your 2d array with a 1d array-like of the appropriate size:
x[...] = [1, 5]
If by any chance your large array always contains the same items in each row (i.e. you won't change these preset values later), you can almost certainly use broadcasting in later parts of your code and just work with an initial x such as
x = np.array([[1, 5]])
This array has shape (1, 2) which is broadcast-compatible with other arrays of shape (4, 2) you might have in the above example.
If you always need the same values in each row and for some reason you can't use broadcasting (both cases are highly unlikely), you can use broadcast_to to create an array with an explicit 2d shape without copying memory:
x_bc = np.broadcast_to([1, 5], (4, 2)) # broadcast 1d [1, 5] to shape (4, 2)
This might work because it has the right shape with only 2 unique elements in memory:
>>> x_bc
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_bc.strides
(0, 8)
However you can't mutate it, because it's a read-only view:
>>> x_bc[0, :] = [2, 4]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-35-ae12ecfe3c5e> in <module>
----> 1 x_bc[0, :] = [2, 4]
ValueError: assignment destination is read-only
So, if you only need the same values in each row and you can't use broadcasting and you want to mutate those same rows later, you can use stride tricks to map the same 1d data to a 2d array:
>>> x_in = np.array([1, 5])
... x_strided = np.lib.stride_tricks.as_strided(x_in, shape=(4,) + x_in.shape,
... strides=(0,) + x_in.strides[-1:])
>>> x_strided
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_strided[0, :] = [2, 4]
>>> x_strided
array([[2, 4],
[2, 4],
[2, 4],
[2, 4]])
Which gives you a 2d array of fixed shape that always contains one unique row, and mutating any of the rows mutates the rest (since the underlying data corresponds to only a single row). Handle with care, because if you ever want to have two different rows you'll have to do something else.

No fortran order in numpy.array

I see no fortran order in:
import numpy as np
In [143]: np.array([[1,2],[3,4]],order='F')
Out[143]:
array([[1, 2],
[3, 4]])
But in the following it works:
In [139]: np.reshape(np.arange(9),newshape=(3,3),order='F')
Out[139]:
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
So what am I doing wrong in the first one?
When you call numpy.array to create an array from an existing Python object, it will give you an object with whatever shape that the original Python object has. So,
np.array([[1,2],[3,4]], ...)
Will always give you,
np.array([[1, 2],
[3, 4]])
Which is exactly what you typed in, so it should not come as a surprise. Fortran order and C order do not describe the shape of the data, they describe the memory layout. When you print out an object, NumPy doesn't show you what the memory layout is, it only shows you the shape.
You can witness that the array truly is stored in Fortran order when you flatten it with the "K" order, which keeps the original order of the elements:
>>> a = np.array([[1,2],[3,4]], order="F")
>>> a.flatten(order="K")
array([1, 3, 2, 4])
This is what truly distinguishes Fortran from C order: the memory layout. Most NumPy functions do not force you to consider memory layout, instead, different layouts are handled transparently.
It sounds like what you want is to transpose, reversing the axis order. This can be done simply:
>>> b = numpy.transpose(a)
>>> b
array([[1, 3],
[2, 4]])
This does not create a new array, but a new view of the same array:
>>> b.base is a
True
If you want the data to have the memory layout 1 2 3 4 and have a Fortran order view of that [[1, 3], [2, 4]], the efficient way to do this is to store the existing array with C order and then transpose it, which results in a Fortran-order array with the desired contents and requires no extra copies.
>>> a = np.array([[1, 2], [3, 4]]).transpose()
>>> a.flatten(order="K")
array([1, 2, 3, 4])
>>> a
array([[1, 3],
[2, 4]])
If you store the original with Fortran order, the transposition will result in C order, so you don't want that (or maybe all you care about is the transposition, and memory order is not important?). In either case, the array will look the same in NumPy.
>>> a = np.array([[1, 2], [3, 4]], order="F").transpose()
>>> a.flatten(order="K")
array([1, 3, 2, 4])
>>> a
array([[1, 3],
[2, 4]])
Your two means of constructing the 2D array are not at all equivalent. In the first, you specified the structure of the array. In the second, you formed an array and then reshaped to your liking.
>>> np.reshape([1,2,3,4],newshape=(2,2),order='F')
array([[1, 3],
[2, 4]])
Again, for comparison, even if you ask for the reshape and format change to FORTRAN, you'll get your specified structure:
>>> np.reshape([[1,2],[3,4]],newshape=(2,2),order='F')
array([[1, 2],
[3, 4]])

substract element to row array in python [duplicate]

This question already has answers here:
numpy subtract every row of matrix by vector
(3 answers)
numpy subtract/add 1d array from 2d array
(2 answers)
Closed 5 years ago.
I have two numpy array a and b
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
b = np.array([1,2,3])
array([1, 2, 3])
I would like to substract to each row of a the correspondent element of b (ie. to the first row of a, the first element of b, etc)
so that c is
array([[0, 1, 2],
[2, 3, 4],
[4, 5, 6]])
Is there a python command to do this?
Is there a python command to do this?
Yes, the - operator.
In addition you need to make b into a column vector so that broadcasting can do the rest for you:
a - b[:, np.newaxis]
# array([[0, 1, 2],
# [2, 3, 4],
# [4, 5, 6]])
yup! You just need to make b a column vector first
a - b[:, np.newaxis]
Reshape b into a column vector, then subtract:
a - b.reshape(3, 1)
b isn't altered in place, but the result of the reshape method call will be the column vector:
array([[1],
[2],
[3]])
Allowing the "shape" of the subtraction you wanted. A little more general reshape operation would be:
b.reshape(b.size, 1)
Taking however many elements b has, and molding them into an N x 1 vector.
Update: A quick benchmark shows kazemakase's answer, using b[:, np.newaxis] as the reshaping strategy, to be ~7% faster. For small vectors, those few extra fractions of a µs won't matter. But for large vectors or inner loops, prefer his approach. It's a less-general reshape, but more performant for this use.

Numpy: Concatenating multidimensional and unidimensional arrays

I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?
unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])
Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])

Categories