Numpy: Concatenating multidimensional and unidimensional arrays - python

I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?

unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])

Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])

Related

Does Python support declaring a matrix column-wise?

In Python numpy when declaring matrices I use np.array([[row 1], [row 2], . . . [row n]]) form. This is declaring a matrix row-wise. Is their any facility in Python to declare a matrix column-wise? I would expect something like - np.array([[col 1], [col 2], . . . [col n]], parameter = 'column-wise') so that a matrix with n columns is produced.
I know such a thing can be achieved via transposing. But is there a way for np.array([...], parameter = '...') being considered as a row or column based on the parameter value I provide?
***np.array() is just used as a dummy here. Any function with above desired facility will do.
In [65]: np.array([[1,2,3],[4,5,6]])
Out[65]:
array([[1, 2, 3],
[4, 5, 6]])
There's a whole family of concatenate functions, that help you join arrays in various ways.
stack with default axis behaves much like np.array:
In [66]: np.stack([[1,2,3],[4,5,6]], axis=0)
Out[66]:
array([[1, 2, 3],
[4, 5, 6]])
np.vstack also does this.
But to make columns:
In [67]: np.stack([[1,2,3],[4,5,6]], axis=1)
Out[67]:
array([[1, 4],
[2, 5],
[3, 6]])
np.column_stack([[1,2,3],[4,5,6]]) does the same.
transposing is also an option: np.array([[1,2,3],[4,5,6]]).T.
All these '*stack' functions end up using np.concatenate, so it's worth your time to learn to use it directly. You may need to add dimensions to the inputs.
[66] does (under the covers):
In [72]: np.concatenate((np.array([1,2,3])[:,None], np.array([4,5,6])[:,None]),axis=1)
Out[72]:
array([[1, 4],
[2, 5],
[3, 6]])
At the time of array-creation itself, you could use numpy.transpose() instead of numpy.array(), because numpy.tranpose() takes any "array-like" object as input:
my_array = np.transpose ([[1,2,3],[4,5,6]])
print (my_array)
Output:
[[1 4]
[2 5]
[3 6]]

How to index a numpy array of dimension N with a 1-dimensional array of shape (N,)

I would like to index an array of dimension N using an array of size (N,).
For example, let us consider a case where N is 2.
import numpy as np
foo = np.arange(9).reshape(3,3)
bar = np.array((2,1))
>>> foo
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>>bar
array([2, 1])
>>>foo[bar[0],bar[1]]
7
This works fine. However, with this method, I would need to write N times bar[i], which is not a nice solution if N is high.
The following command does not give the result that I need:
>>>foo[bar]
array([[6, 7, 8],
[3, 4, 5]])
What could I do to get the result that I want in a nice and concise way?
I think you can turn bar into tuple:
foo[tuple(bar)]
# 7

NumPy using the reshape function to reshape an array [duplicate]

This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]
These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])
You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])
The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()

How to properly concatenate two 1D arrays without flattening?

How do I concatenate properly two numpy vectors without flattening the result? This is really obvious with append, but it gets shamefully messy when turning to numpy.
I've tried concatenate (expliciting axis and not), hstack, vstack. All with no results.
In [1]: a
Out[1]: array([1, 2, 3])
In [2]: b
Out[2]: array([6, 7, 8])
In [3]: c = np.concatenate((a,b),axis=0)
In [4]: c
Out[4]: array([1, 2, 3, 6, 7, 8])
Note that the code above works indeed if a and b are lists instead of numpy arrays.
The output I want:
Out[4]: array([[1, 2, 3], [6, 7, 8]])
EDIT
vstack works indeed for a and b as in above. It does not in my real life case, where I want to iteratively fill an empty array with vectors of some dimension.
hist=[]
for i in range(len(filenames)):
fileload = np.load(filenames[i])
maxarray.append(fileload['maxamp'])
hist_t, bins_t = np.histogram(maxarray[i], bins=np.arange(0,4097,4))
hist = np.vstack((hist,hist_t))
SOLUTION:
I found the solution: you have to properly initialize the array e.g.: How to add a new row to an empty numpy array
For np.concatenate to work here the input arrays should have two dimensions, as you wasnt a concatenation along the second axis here, and the input arrays only have 1 dimension.
You can use np.vstack here, which as explained in the docs:
It is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N)
a = np.array([1, 2, 3])
b = np.array([6, 7, 8])
np.vstack([a, b])
array([[1, 2, 3],
[6, 7, 8]])

How do I access the ith column of a NumPy multidimensional array?

Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b

Categories