Confusion in size of a numpy array - python

Python numpy array 'size' confuses me a lot
a = np.array([1,2,3])
a.size = (3, )
------------------------
b = np.array([[2,1,3,5],
[2,2,5,1],
[3,6,99,5]])
b.size = (3,4)
'b' makes sense since it has 3 rows and 4 columns in each
But how is 'a' size = (3, ) ? Shouldn't it be (1,3) since its 1 row and 3 columns?

You should resist the urge to think of numpy arrays as having rows and columns, but instead consider them as having dimensions and shape. This is an important point which differentiates np.array and np.matrix:
x = np.array([1, 2, 3])
print(x.ndim, x.shape) # 1 (3,)
y = np.matrix([1, 2, 3])
print(y.ndim, y.shape) # 2 (1, 3)
An n-D array can only use n integer(s) to represent its shape. Therefore, a 1-D array only uses 1 integer to specify its shape.
In practice, combining calculations between 1-D and 2-D arrays is not a problem for numpy, and syntactically clean since # matrix operation was introduced in Python 3.5. Therefore, there is rarely a need to resort to np.matrix in order to satisfy the urge to see expected row and column counts.
In the rare instances where 2 dimensions are required, you can still use numpy.array with some manipulation:
a = np.array([1, 2, 3])[:, None] # equivalent to np.array([[1], [2], [3]])
print(a.ndim, a.shape) # 2 (3, 1)
b = np.array([[1, 2, 3]]) # equivalent to np.array([1, 2, 3])[:, None].T
print(b.ndim, b.shape) # 2 (1, 3)

No, a numpy.ndarray with shape (1, 3) would look like:
np.array([[1,2,3]])
Think about how the shape corresponds to indexing:
arr[0, ...] #First row
I still have three more options, namely:
arr[0,0]
arr[0,1]
arr[0,2]
Try doing that with a 1 dimensional array

I think you meant ndarray.shape. In that case, there's no need for confusion. Quoting the documentation from ndarray.shape:
Tuple of array dimensions.
ndarray.shape simply returns a shape tuple.
In [21]: a.shape
Out[21]: (3,)
This simply means that a is an 1D array with 3 entries.
If the shape tuple returns it as (1,3) then a would become a 2D array. For that you need to use:
In [23]: a = a[np.newaxis, :]
In [24]: a.shape
Out[24]: (1, 3)
Since array b is 2D, the shape tuple has two entries.
In [22]: b.shape
Out[22]: (3, 4)

Related

numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]

This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
Closed 3 years ago.
Let's consider a as an 1D row/horizontal array:
import numpy as np
N = 10
a = np.arange(N) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.shape # (10,)
now I want to have b a 1D column/vertical array transposed of a:
b = a.transpose() # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b.shape # (10,)
but the .transpose() method returns an identical ndarray whith the exact same shape!
What I expected to see was
np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
which can be achieved by
c = a.reshape(a.shape[0], 1) # or c = a; c.shape = (c.shape[0], 1)
c.shape # (10, 1)
and to my surprise, it has a shape of (10, 1) instead of (1, 10).
In Octave/Scilab I could do:
N = 10
b = 0:(N-1)
a = b'
size(b) % ans = 1 10
size(a) % ans = 10 1
I understand that numpy ndarrays are not matrices (as discussed here), but the behavior of the numpy's transpose function just doesn't make sense to me! I would appreciate it if you could help me understand how this behavior makes sense and what am I missing here.
P.S. So what I have understood so far is that b = a.transpose() is the equivalent of b = a; b.shape = b.shape[::-1] which if you had a "2D array" of (N, 1) would return a (1, N) shaped array, as you would expect from a transpose operator. However, numpy seems to treat the "1D array" of (N,) as a 0D scalar. I think they should have named this method something else, as this is very misleading/confusing IMHO.
To understand the numpy array better, you should take a look at this review paper: The NumPy array: a structure for efficient numerical computation
In short, numpy ndarrays have this attribute called the stride, which is
the number of bytes to skip in memory to proceed to the next element.
For a (10, 10) array of bytes, for example, the strides may be (10,
1), in other words: proceed one byte to get to the next column and ten
bytes to locate the next row.
For your ndarray a, a.stride = (8,), which shows that it is only 1 dimensional, and that to get to the next element on this single dimension, you need to advance 8 bytes in memory (each int is 64-bit).
Strides are useful for representing transposes:
By modifying strides, for example, an array can be transposed or
reshaped at zero cost (no memory needs to be copied).
So if there was a 2-dimensional ndarray, say b = np.ones((3,5)) for example, then b.strides = (40, 8), while b.transpose().strides = (8, 40). So as you see a transposed 2D-ndarray is simply the exact same array, whose strides have been reordered. And since your 1D ndarray has only 1 dimension, swapping the the values of its strides (i.e. taking its transpose), doesn't do anything.
As you already mentioned that numpy array are not matrix. The defination of transpose function is like below
Permute the dimensions of an array.
Which means that numpy's transpose method will move data from one dimension to another. As 1D array has only one dimension there is no other dimension to move the data t0. So you need add a dimension before transpose has any effect. This behavior make sense also to be consistent with higher dimensional array (3D, 4D ...) array.
There is a clean way to achive what you want
N = 10
a = np.arange(N)
a[ :, np.newaxis]

How to make numpy array with 2D shape

I have simple array like this
x = np.array([1,2,3,4])
In [3]: x.shape
Out[3]: (4,)
But I don't want shape to return (4,), but (4,1). How can I achieve this?
Generally in Numpy you would declare a matrix or vector using two square brackets. It's common misconception to use single square brackets for single dimensional matrix or vector.
Here is an example:
a = np.array([[1,2,3,4], [5,6,7,8]])
a.shape # (2,4) -> Multi-Dimensional Matrix
In similar way if I want single dimensional matrix then just remove the data not the outer square bracket.
a = np.array([[1,2,3,4]])
a.shape # (1,4) -> Row Matrix
b = np.array([[1], [2], [3], [4]])
b.shape # (4, 1) -> Column Matrix
When you use single square brackets, it's likely to give some odd dimensions.
Always enclose your data within another square bracket for such single dimensional matrix (like you are entering the data for multi-dimensional matrix) without data for those extra dimensions.
Also: You could also always reshape
x = np.array([1,2,3,4])
x = x.reshape(4,1)
x.shape # (4,1)
One Line:
x = np.array([1,2,3,4]).reshape(4,1)
x.shape # (4,1)
If you want a column vector use
x2 = x[:, np.newaxis]
x2.shape # (4, 1)
Alternatively, you could reshape the array yourself:
arr1 = np.array([1,2,3,4])
print arr1.shape
# (4,)
arr2 = arr1.reshape((4,1))
print arr2.shape
# (4, 1)
You could of course reshape the array when you create it:
arr1 = np.array([1,2,3,4]).reshape((4,1))
If you want to change the array in place as suggested by #FHTMitchell in the comments:
arr1.resize((4, 1))
Below achieves what you want. However, I strongly suggest you look at why exactly you need shape to return (4, 1). Most matrix-type operations are possible without this explicit casting.
x = np.array([1,2,3,4])
y = np.matrix(x)
z = y.T
x.shape # (4,)
y.shape # (1, 4)
z.shape # (4, 1)
You can use zip to transpose at python (non-numpy) level:
>>> a = [1, 2, 3, 4]
>>>
>>> *zip(a),
((1,), (2,), (3,), (4,))
>>>
>>> import numpy as np
>>> np.array([*zip(a)])
array([[1],
[2],
[3],
[4]])
Please note that while this is convenient in terms of key strokes it is a bit wasteful given that a tuple object has to be constructed for every list element whereas reshaping an array comes essentially for free. So do not use this on long lists.

What is the difference between an array with shape (N,1) and one with shape (N)? And how to convert between the two?

Python newbie here coming from a MATLAB background.
I have a 1 column array and I want to move that column into the first column of a 3 column array. With a MATLAB background this is what I would do:
import numpy as np
A = np.zeros([150,3]) #three column array
B = np.ones([150,1]) #one column array which needs to replace the first column of A
#MATLAB-style solution:
A[:,0] = B
However this does not work because the "shape" of A is (150,3) and the "shape" of B is (150,1). And apparently the command A[:,0] results in a "shape" of (150).
Now, what is the difference between (150,1) and (150)? Aren't they the same thing: a column vector? And why isn't Python "smart enough" to figure out that I want to put the column vector, B, into the first column of A?
Is there an easy way to convert a 1-column vector with shape (N,1) to a 1-column vector with shape (N)?
I am new to Python and this seems like a really silly thing that MATLAB does much better...
Several things are different. In numpy arrays may be 0d or 1d or higher. In MATLAB 2d is the smallest (and at one time the only dimensions). MATLAB readily expands dimensions the end because it is Fortran ordered. numpy, is by default c ordered, and most readily expands dimensions at the front.
In [1]: A = np.zeros([5,3])
In [2]: A[:,0].shape
Out[2]: (5,)
Simple indexing reduces a dimension, regardless whether it's A[0,:] or A[:,0]. Contrast that with happens to a 3d MATLAB matrix, A(1,:,:) v A(:,:,1).
numpy does broadcasting, adjusting dimensions during operations like sum and assignment. One basic rule is that dimensions may be automatically expanded toward the start if needed:
In [3]: A[:,0] = np.ones(5)
In [4]: A[:,0] = np.ones([1,5])
In [5]: A[:,0] = np.ones([5,1])
...
ValueError: could not broadcast input array from shape (5,1) into shape (5)
It can change (5,) LHS to (1,5), but can't change it to (5,1).
Another broadcasting example, +:
In [6]: A[:,0] + np.ones(5);
In [7]: A[:,0] + np.ones([1,5]);
In [8]: A[:,0] + np.ones([5,1]);
Now the (5,) works with (5,1), but that's because it becomes (1,5), which together with (5,1) produces (5,5) - an outer product broadcasting:
In [9]: (A[:,0] + np.ones([5,1])).shape
Out[9]: (5, 5)
In Octave
>> x = ones(2,3,4);
>> size(x(1,:,:))
ans =
1 3 4
>> size(x(:,:,1))
ans =
2 3
>> size(x(:,1,1) )
ans =
2 1
>> size(x(1,1,:) )
ans =
1 1 4
To do the assignment that you want you adjust either side
Index in a way that preserves the number of dimensions:
In [11]: A[:,[0]].shape
Out[11]: (5, 1)
In [12]: A[:,[0]] = np.ones([5,1])
transpose the (5,1) to (1,5):
In [13]: A[:,0] = np.ones([5,1]).T
flatten/ravel the (5,1) to (5,):
In [14]: A[:,0] = np.ones([5,1]).flat
In [15]: A[:,0] = np.ones([5,1])[:,0]
squeeze, ravel also work.
Some quick tests in Octave indicate that it is more forgiving when it comes to dimensions mismatch. But the numpy prioritizes consistency. Once the broadcasting rules are understood, the behavior makes sense.
Use squeeze method to eliminate the dimensions of size 1.
A[:,0] = B.squeeze()
Or just create B one-dimensional to begin with:
B = np.ones([150])
The fact that NumPy maintains a distinction between a 1D array and 2D array with one of dimensions being 1 is reasonable, especially when one begins working with n-dimensional arrays.
To answer the question in the title: there is an evident structural difference between an array of shape (3,) such as
[1, 2, 3]
and an array of shape (3, 1) such as
[[1], [2], [3]]

How to do dyadics-like operations in numpy

I have two 2-D arrays A and B. I want to get a 3-D array C, whose relation with A and B is:
C_mnl=A_mn*B_ml
How can I do this elegantly in numpy?
numpy.einsum can do that:
a = np.arange(6).reshape(3,2) # a.shape = (3, 2)
b = np.arange(12).reshape(3,4) # b.shape = (3, 4)
c = np.einsum('mn,ml->mnl', a, b) # c.shape = (3, 2, 4)
You can also use broadcasting -
C = A[...,None]*B[:,None,:]
Explanation
A[...,None] adds a new axis as the last axis with None (an equivalent for np.newaxis) pushing all existing dimensions to the front. Thus, this would be same as A[:,:,None].
Similarly with B[:,None,:], it adds a new axis between the existing dimensions.
With steps 1 and 2, we have the axes of the input arrays aligned and thus when operated with elementwise-multiplication would result in the desired output of shape (m,n,l) with broadcasting.

Convert NumPy vector to 2D array / matrix

What is the best way to convert a vector to a 2-dimensional array?
For example, a vector b of size (10, )
a = rand(10,10)
b = a[1, :]
b.shape
Out: (10L,)
can be converted to array of size (10,1) as
b = b.reshape(len(b), 1)
Is there a more concise way to do it?
Since you lose a dimension when indexing with a[1, :], the lost dimension needs to be replaced to maintain a 2D shape. With this in mind, you can make the selection using the syntax:
b = a[1, :, None]
Then b has the required shape of (10, 1). Note that None is the same as np.newaxis and inserts a new axis of length 1.
(This is the same thing as writing b = a[1, :][:, None] but uses only one indexing operation, hence saves a few microseconds.)
If you want to continue using reshape (which is also fine for this purpose), it's worth remembering that you can use -1 for (at most) one axis to have NumPy figure out what the correct length should be instead:
b.reshape(-1, 1)
Use np.newaxis:
In [139]: b.shape
Out[139]: (10,)
In [140]: b=b[:,np.newaxis]
In [142]: b.shape
Out[142]: (10, 1)
I think clearest way of doing this is by using np.expand_dims, which basically adds an axis to the array. If you use axis=-1, a new axis will be added as the last dimension.
b = np.expand_dims(b, axis=-1)
or if you want to me more concise:
b = np.expand_dims(b, -1)
Although the question is old, still it is worth to answer I think.
Use this style:
b = a[1:2, :]
you can use np.asmatrix(b) as well
a.shape #--> (12,)
np.asmatrix(a).shape #--> (1, 12)
np.asmatrix(a).T.shape #--> (12, 1)

Categories