In Python numpy when declaring matrices I use np.array([[row 1], [row 2], . . . [row n]]) form. This is declaring a matrix row-wise. Is their any facility in Python to declare a matrix column-wise? I would expect something like - np.array([[col 1], [col 2], . . . [col n]], parameter = 'column-wise') so that a matrix with n columns is produced.
I know such a thing can be achieved via transposing. But is there a way for np.array([...], parameter = '...') being considered as a row or column based on the parameter value I provide?
***np.array() is just used as a dummy here. Any function with above desired facility will do.
In [65]: np.array([[1,2,3],[4,5,6]])
Out[65]:
array([[1, 2, 3],
[4, 5, 6]])
There's a whole family of concatenate functions, that help you join arrays in various ways.
stack with default axis behaves much like np.array:
In [66]: np.stack([[1,2,3],[4,5,6]], axis=0)
Out[66]:
array([[1, 2, 3],
[4, 5, 6]])
np.vstack also does this.
But to make columns:
In [67]: np.stack([[1,2,3],[4,5,6]], axis=1)
Out[67]:
array([[1, 4],
[2, 5],
[3, 6]])
np.column_stack([[1,2,3],[4,5,6]]) does the same.
transposing is also an option: np.array([[1,2,3],[4,5,6]]).T.
All these '*stack' functions end up using np.concatenate, so it's worth your time to learn to use it directly. You may need to add dimensions to the inputs.
[66] does (under the covers):
In [72]: np.concatenate((np.array([1,2,3])[:,None], np.array([4,5,6])[:,None]),axis=1)
Out[72]:
array([[1, 4],
[2, 5],
[3, 6]])
At the time of array-creation itself, you could use numpy.transpose() instead of numpy.array(), because numpy.tranpose() takes any "array-like" object as input:
my_array = np.transpose ([[1,2,3],[4,5,6]])
print (my_array)
Output:
[[1 4]
[2 5]
[3 6]]
Related
This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]
These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])
You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])
The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()
I see no fortran order in:
import numpy as np
In [143]: np.array([[1,2],[3,4]],order='F')
Out[143]:
array([[1, 2],
[3, 4]])
But in the following it works:
In [139]: np.reshape(np.arange(9),newshape=(3,3),order='F')
Out[139]:
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
So what am I doing wrong in the first one?
When you call numpy.array to create an array from an existing Python object, it will give you an object with whatever shape that the original Python object has. So,
np.array([[1,2],[3,4]], ...)
Will always give you,
np.array([[1, 2],
[3, 4]])
Which is exactly what you typed in, so it should not come as a surprise. Fortran order and C order do not describe the shape of the data, they describe the memory layout. When you print out an object, NumPy doesn't show you what the memory layout is, it only shows you the shape.
You can witness that the array truly is stored in Fortran order when you flatten it with the "K" order, which keeps the original order of the elements:
>>> a = np.array([[1,2],[3,4]], order="F")
>>> a.flatten(order="K")
array([1, 3, 2, 4])
This is what truly distinguishes Fortran from C order: the memory layout. Most NumPy functions do not force you to consider memory layout, instead, different layouts are handled transparently.
It sounds like what you want is to transpose, reversing the axis order. This can be done simply:
>>> b = numpy.transpose(a)
>>> b
array([[1, 3],
[2, 4]])
This does not create a new array, but a new view of the same array:
>>> b.base is a
True
If you want the data to have the memory layout 1 2 3 4 and have a Fortran order view of that [[1, 3], [2, 4]], the efficient way to do this is to store the existing array with C order and then transpose it, which results in a Fortran-order array with the desired contents and requires no extra copies.
>>> a = np.array([[1, 2], [3, 4]]).transpose()
>>> a.flatten(order="K")
array([1, 2, 3, 4])
>>> a
array([[1, 3],
[2, 4]])
If you store the original with Fortran order, the transposition will result in C order, so you don't want that (or maybe all you care about is the transposition, and memory order is not important?). In either case, the array will look the same in NumPy.
>>> a = np.array([[1, 2], [3, 4]], order="F").transpose()
>>> a.flatten(order="K")
array([1, 3, 2, 4])
>>> a
array([[1, 3],
[2, 4]])
Your two means of constructing the 2D array are not at all equivalent. In the first, you specified the structure of the array. In the second, you formed an array and then reshaped to your liking.
>>> np.reshape([1,2,3,4],newshape=(2,2),order='F')
array([[1, 3],
[2, 4]])
Again, for comparison, even if you ask for the reshape and format change to FORTRAN, you'll get your specified structure:
>>> np.reshape([[1,2],[3,4]],newshape=(2,2),order='F')
array([[1, 2],
[3, 4]])
Let a be a list in python.
a = [1,2,3]
When matrix transpose is applied to a, we get:
np.matrix(a).transpose()
matrix([[1],
[2],
[3]])
I am looking to generalize this functionality and will next illustrate what I am looking to do with the help of an example. Let b be another list.
b = [[1, 2], [2, 3], [3, 4]]
In a, the list items are 1, 2, and 3. I would like to consider each of [1,2], [2,3], and [3,4] as list items in b, only for the purpose of performing a transpose. I would like the output to be as follows:
array([[[1,2]],
[[2,3]],
[[3,4]]])
In general, I would like to be able to specify what a list item would look like, and perform a matrix transpose based on that.
I could just write a few lines of code to do the above, but my purpose of asking this question is to find out if there is an inbuilt numpy functionality or a pythonic way, to do this.
EDIT: unutbu's output below matches the output that I have above. However, I wanted a solution that would work for a more general case. I have posted another input/output below. My initial example wasn't descriptive enough to convey what I wanted to say. Let items in b be [1,2], [2,3], [3,4], and [5,6]. Then the output given below would be of doing a matrix transpose on higher dimension elements. More generally, once I describe what an 'item' would look like, I would like to know if there is a way to do something like a transpose.
Input: b = [[[1, 2], [2, 3]], [[3, 4], [5,6]]]
Output: array([[[1,2], [3,4]],
[[2,3], [5,6]]])
Your desired array has shape (3,1,2). b has shape (3,2). To stick an extra axis in the middle, use b[:,None,:], or (equivalently) b[:, np.newaxis, :]. Look for "newaxis" in the section on Basic Slicing.
In [178]: b = np.array([[1, 2], [2, 3], [3, 4]])
In [179]: b
Out[179]:
array([[1, 2],
[2, 3],
[3, 4]])
In [202]: b[:,None,:]
Out[202]:
array([[[1, 2]],
[[2, 3]],
[[3, 4]]])
Another userful tool is np.swapaxes:
In [222]: b = np.array([[[1, 2], [2, 3]], [[3, 4], [5,6]]])
In [223]: b.swapaxes(0,1)
Out[223]:
array([[[1, 2],
[3, 4]],
[[2, 3],
[5, 6]]])
The transpose, b.T is the same as swapping the first and last axes, b.swapaxes(0,-1):
In [226]: b.T
Out[226]:
array([[[1, 3],
[2, 5]],
[[2, 4],
[3, 6]]])
In [227]: b.swapaxes(0,-1)
Out[227]:
array([[[1, 3],
[2, 5]],
[[2, 4],
[3, 6]]])
Summary:
Use np.newaxis (or None) to add new axes. (Thus, increasing the dimension of the array)
Use np.swapaxes to swap any two axes.
Use np.transpose to permute all the axes at once. (Thanks to #jorgeca for pointing this out.)
Use np.rollaxis to "rotate" the axes.
Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b
I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?
unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])
Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])