I have a 'row' vector cast as a numpy ndarray. I would simply like to make it a 'column' vector (I don't care too much about the type as long as it is compatible with matplotlib). Here is an example of what I'm trying:
import numpy as np
a = np.ndarray(shape=(1,4), dtype=float, order='F')
print(a.shape)
a.T #I think this performs the transpose?
print(a.shape)
The output looks like this:
(1, 4)
(1, 4)
I was hoping to get:
(1, 4)
(4, 1)
Can someone point me in the right direction? I have seen that the transpose in numpy doesn't do anything to a 1D array. But is this a 1D array?
Transposing an array does not happen in place. Writing a.T creates a view of the transpose of the array a, but this view is then lost immediately since no variable is assigned to it. a remains unchanged.
You need to write a = a.T to bind the name a to the transpose:
>>> a = a.T
>>> a.shape
(4, 1)
In your example a is indeed a 2D array. Transposing a 1D array (with shape (n,)) does not change that array at all.
you can alter the shape 'in place' which will be the same as a.T for (1,4) but see the comment by Mr E whether it's needed. i.e.
...
print(a.shape)
a.shape = (4, 1)
print(a.shape)
You probably don't want or need the singular dimension, unless you are trying to force a broadcasting operation.
Link
You can treat rank-1 arrays as either row or column vectors. dot(A,v)
treats v as a column vector, while dot(v,A) treats v as a row vector.
This can save you having to type a lot of transposes.
Related
I have a numpy array of shape (29, 10) and a list of 29 elements and I want to end up with an array of shape (29,11)
I am basically converting the list to a numpy array and trying to vstack, but it complain about dimensions not being the same.
Toy example
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*29)
b.shape
(29,)
np.vstack((a, b))
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Dimensions do actually match, why am I getting this error and how can I solve it?
I think you are looking for np.hstack.
np.hstack((a, b.reshape(-1,1)))
Moreover b must be 2-dimensional, that's why I used a reshape.
The problem is that you want to append a 1D array to a 2D array.
Also, for the dimension you've given for b, you are probably looking for hstack.
Try this:
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*29)[:,None] #to ensure 2D structure
b.shape
(29,1)
np.hstack((a, b))
If you do want to vertically stack, you'd need this:
a = np.zeros((29,10))
a.shape
(29,10)
b = np.array(['A']*10)[None,:] #to ensure 2D structure
b.shape
(1,10)
np.vstack((a, b))
Instead of a n-dimentional array, let's take a 3D array to illustrate my question :
>>> import numpy as np
>>> arr = np.ones(24).reshape(2, 3, 4)
So I have an array of shape (2, 3, 4). I would like to concatenate/fuse the 2nd and 3rd axis together to get an array of the shape (2, 12).
Wrongly, thought I could have done it easily with np.concatenate :
>>> np.concatenate(arr, axis=1).shape
(3, 8)
I found a way to do it by a combination of np.rollaxis and np.concatenate but it is increasingly ugly as the array goes up in dimension:
>>> np.rollaxis(np.concatenate(np.rollaxis(arr, 0, 3), axis=0), 0, 2).shape
(2, 12)
Is there any simple way to accomplish this? It seems very trivial, so there must exist some function, but I cannot seem to find it.
EDIT : Indeed I could use np.reshape, which means to compute the dimensions of the axis first. Is it possible without accessing/computing the shape beforehand?
On recent python versions you can do:
anew = a.reshape(*a.shape[:k], -1, *a.shape[k+2:])
I recommend against directly assigning to .shape since it doesn't work on sufficiently noncontiguous arrays.
Let's say that you have n dimensions in your array and that you want to fuse adjacent axis i and i+1:
shape = a.shape
new_shape = list(shape[:i]) + [-1] + list(shape[i+2:])
a.shape = new_shape
I am wondering why in numpy there are one dimensional array of dimension (length, 1) and also one dimensional array of dimension (length, ) w/o a second value.
I am running into this quite frequently, e.g. when using np.concatenate() which then requires a reshape step beforehand (or I could directly use hstack/vstack).
I can't think of a reason why this behavior is desirable. Can someone explain?
Edit:
It was suggested by one of the comments that my question is a possible duplicate. I am more interested in the underlying working logic of Numpy and not that there is a distinction between 1d and 2d arrays which I think is the point of the mentioned thread.
The data of a ndarray is stored as a 1d buffer - just a block of memory. The multidimensional nature of the array is produced by the shape and strides attributes, and the code that uses them.
The numpy developers chose to allow for an arbitrary number of dimensions, so the shape and strides are represented as tuples of any length, including 0 and 1.
In contrast MATLAB was built around FORTRAN programs that were developed for matrix operations. In the early days everything in MATLAB was a 2d matrix. Around 2000 (v3.5) it was generalized to allow more than 2d, but never less. The numpy np.matrix still follows that old 2d MATLAB constraint.
If you come from a MATLAB world you are used to these 2 dimensions, and the distinction between a row vector and column vector. But in math and physics that isn't influenced by MATLAB, a vector is a 1d array. Python lists are inherently 1d, as are c arrays. To get 2d you have to have lists of lists or arrays of pointers to arrays, with x[1][2] style of indexing.
Look at the shape and strides of this array and its variants:
In [48]: x=np.arange(10)
In [49]: x.shape
Out[49]: (10,)
In [50]: x.strides
Out[50]: (4,)
In [51]: x1=x.reshape(10,1)
In [52]: x1.shape
Out[52]: (10, 1)
In [53]: x1.strides
Out[53]: (4, 4)
In [54]: x2=np.concatenate((x1,x1),axis=1)
In [55]: x2.shape
Out[55]: (10, 2)
In [56]: x2.strides
Out[56]: (8, 4)
MATLAB adds new dimensions at the end. It orders its values like a order='F' array, and can readily change a (n,1) matrix to a (n,1,1,1). numpy is default order='C', and readily expands an array dimension at the start. Understanding this is essential when taking advantage of broadcasting.
Thus x1 + x is a (10,1)+(10,) => (10,1)+(1,10) => (10,10)
Because of broadcasting a (n,) array is more like a (1,n) one than a (n,1) one. A 1d array is more like a row matrix than a column one.
In [64]: np.matrix(x)
Out[64]: matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [65]: _.shape
Out[65]: (1, 10)
The point with concatenate is that it requires matching dimensions. It does not use broadcasting to adjust dimensions. There are a bunch of stack functions that ease this constraint, but they do so by adjusting the dimensions before using concatenate. Look at their code (readable Python).
So a proficient numpy user needs to be comfortable with that generalized shape tuple, including the empty () (0d array), (n,) 1d, and up. For more advanced stuff understanding strides helps as well (look for example at the strides and shape of a transpose).
Much of it is a matter of syntax. This tuple (x) isn't a tuple at all (just a redundancy). (x,), however, is.
The difference between (x,) and (x,1) goes even further. You can take a look into the examples of previous questions like this. Quoting the example from it, this is an 1D numpy array:
>>> np.array([1, 2, 3]).shape
(3,)
But this one is 2D:
>>> np.array([[1, 2, 3]]).shape
(1, 3)
Reshape does not make a copy unless it needs to so it should be safe to use.
I'm trying to input vectors into a numpy matrix by doing:
eigvec[:,i] = null
However I keep getting the error:
ValueError: could not broadcast input array from shape (20,1) into shape (20)
I've tried using flatten and reshape, but nothing seems to work
The shapes in the error message are a good clue.
In [161]: x = np.zeros((10,10))
In [162]: x[:,1] = np.ones((1,10)) # or x[:,1] = np.ones(10)
In [163]: x[:,1] = np.ones((10,1))
...
ValueError: could not broadcast input array from shape (10,1) into shape (10)
In [166]: x[:,1].shape
Out[166]: (10,)
In [167]: x[:,[1]].shape
Out[167]: (10, 1)
In [168]: x[:,[1]] = np.ones((10,1))
When the shape of the destination matches the shape of the new value, the copy works. It also works in some cases where the new value can be 'broadcasted' to fit. But it does not try more general reshaping. Also note that indexing with a scalar reduces the dimension.
I can guess that
eigvec[:,i] = null.flat
would work (however, null.flatten() should work too). In fact, it looks like NumPy complains because of you are assigning a pseudo-1D array (shape (20, 1)) to a 1D array which is considered to be oriented differently (shape (1, 20), if you wish).
Another solution would be:
eigvec[:,i] = null.T
where you properly transpose the "vector" null.
The fundamental point here is that NumPy has "broadcasting" rules for converting between arrays with different numbers of dimensions. In the case of conversions between 2D and 1D, a 1D array of size n is broadcast into a 2D array of shape (1, n) (and not (n, 1)). More generally, missing dimensions are added to the left of the original dimensions.
The observed error message basically said that shapes (20,) and (20, 1) are not compatible: this is because (20,) becomes (1, 20) (and not (20, 1)). In fact, one is a column matrix, while the other is a row matrix.
I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this
>>> a = np.array([[2,3],[4,5]])
it works great and size of the array is
>>> a.shape
(2, 2)
which is also same as MATLAB
But when i extract the first entire column and see the size
>>> b = a[:,0]
>>> b.shape
(2,)
I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??
A 1D numpy array* is literally 1D - it has no size in any second dimension, whereas in MATLAB, a '1D' array is actually 2D, with a size of 1 in its second dimension.
If you want your array to have size 1 in its second dimension you can use its .reshape() method:
a = np.zeros(5,)
print(a.shape)
# (5,)
# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)
# or use -1 in the first dimension, so that its size in that dimension is
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)
Edit
As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:
a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)
np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].
* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:
a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)
For more info on the differences between arrays and matrices, see here.
Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:
b = np.array([a[:,0]])
Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:
a = np.array([[vec1], [vec2]])
So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that's it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.
You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.
That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).