Very Basic Numpy array dimension visualization - python

I'm a beginner to numpy with no experience in matrices. I understand basic 1d and 2d arrays but I'm having trouble visualizing a 3d numpy array like the one below. How do the following python lists form a 3d array with height, length and width? Which are the rows and columns?
b = np.array([[[1, 2, 3],[4, 5, 6]],
[[7, 8, 9],[10, 11, 12]]])

The anatomy of an ndarray in NumPy looks like this red cube below: (source: Physics Dept, Cornell Uni)
Once you leave the 2D space and enter 3D or higher dimensional spaces, the concept of rows and columns doesn't make much sense anymore. But still you can intuitively understand 3D arrays. For instance, considering your example:
In [41]: b
Out[41]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
In [42]: b.shape
Out[42]: (2, 2, 3)
Here the shape of b is (2, 2, 3). You can think about it like, we've two (2x3) matrices stacked to form a 3D array. To access the first matrix you index into the array b like b[0] and to access the second matrix, you index into the array b like b[1].
# gives you the 2D array (i.e. matrix) at position `0`
In [43]: b[0]
Out[43]:
array([[1, 2, 3],
[4, 5, 6]])
# gives you the 2D array (i.e. matrix) at position 1
In [44]: b[1]
Out[44]:
array([[ 7, 8, 9],
[10, 11, 12]])
However, if you enter 4D space or higher, it will be very hard to make any sense out of the arrays itself since we humans have hard time visualizing 4D and more dimensions. So, one would rather just consider the ndarray.shape attribute and work with it.
More information about how we build higher dimensional arrays using (nested) lists:
For 1D arrays, the array constructor needs a sequence (tuple, list, etc) but conventionally list is used.
In [51]: oneD = np.array([1, 2, 3,])
In [52]: oneD.shape
Out[52]: (3,)
For 2D arrays, it's list of lists but can also be tuple of lists or tuple of tuples etc:
In [53]: twoD = np.array([[1, 2, 3], [4, 5, 6]])
In [54]: twoD.shape
Out[54]: (2, 3)
For 3D arrays, it's list of lists of lists:
In [55]: threeD = np.array([[[1, 2, 3], [2, 3, 4]], [[5, 6, 7], [6, 7, 8]]])
In [56]: threeD.shape
Out[56]: (2, 2, 3)
P.S. Internally, the ndarray is stored in a memory block as shown in the below picture. (source: Enthought)

Related

Understanding Numpy dimensions of arrays

I started looking into Numpy using a 'Python for data analysis'. Why is the array dimension for arr2d is "2", instead of "3". Also why is the dimension for arr3d "3", instead of "2".
I thought the dimension of the array is based on the number of rows? Or this doesn't apply to higher dimensional and multidimensional arrays?
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d.shape
Output: (3, 3)
arr2d.ndim
Output: 2
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d.shape
Output: (2, 2, 3)
arr3d.ndim
Output: 3
well see basically the dimension of the array is not based on the number of rows
basically it is based on the brackets i.e [] that you entered in np.array() method
see
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
in arr2d there are 2 brackets([[]]) or there are 2 opening brackets([[) or its has 2 closing brackets(]]) so its an 2D array of (3,3) i.e 3 rows and 3 columns
similarly
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
in arr3d there are 3 brackets([[[]]]) or there are 3 opening brackets([[[) or or its has 3 closing brackets(]]]) so its an 3D array of (2,2,3) i.e its has 2 arrays of 2 rows and 3 columns
Numpy stores its ndarrays as contiguous blocks of memory. Each element is stored in a sequential manner every n bytes after the previous.
(images referenced from this excellent SO post)
So if your 3D array looks like this -
np.arange(0,16).reshape(2,2,4)
#array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7]],
#
# [[ 8, 9, 10, 11],
# [12, 13, 14, 15]]])
Then in memory its stores as -
When retrieving an element (or a block of elements), NumPy calculates how many strides (of 8 bytes each) it needs to traverse to get the next element in that direction/axis. So, for the above example, for axis=2 it has to traverse 8 bytes (depending on the datatype) but for axis=1 it has to traverse 8*4 bytes, and axis=0 it needs 8*8 bytes.
With this in mind, let's understand what dimensions are in numpy.
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr2d.shape, arr3d.shape)
(3, 3) (2, 2, 3)
These can be considered a 2D matrix and a 3D tensor respectively. Here is an intuitive diagram to show how this would look like.
A 1D numpy array with (ndims=1) is a vector, 2D is a matrix, and 3D is a rank 2 tensor which can be imagined as a cube. The number of values it can store is equal to - array.shape[0] * array.shape[1] * array.shape[2] which in your second case is 2*2*3.
Vector (n,) -> (axis0,) #elements
Matrix (m,n) -> (axis0, axis1) #rows, columns
Tensor2 (l,m,n) -> (axis0, axis1, axis2)
Tensor3 (l,m,n,o) -> (axis0, axis1, axis2, axis3)

What's the difference between shape(150,) and shape (150,1)?

What's the difference between shape(150,) and shape (150,1)?
I think they are the same, I mean they both represent a column vector.
Both have the same values, but one is a vector and the other one is a matrix of the vector. Here's an example:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([[1], [2], [3], [4], [5]])
print(x.shape)
print(y.shape)
And the output is:
(5,)
(5, 1)
Although they both occupy same space and positions in memory,
I think they are the same, I mean they both represent a column vector.
No they are not and certainly not according to NumPy (ndarrays).
The main difference is that the
shape (150,) => is a 1D array, whereas
shape (150,1) => is a 2D array
Questions like this see to come from two misconceptions.
not realizing that (5,) is a 1 element tuple.
expecting MATLAB like matrices
Make an array with the handy arange function:
In [424]: x = np.arange(5)
In [425]: x.shape
Out[425]: (5,) # 1 element tuple
In [426]: x.ndim
Out[426]: 1
numpy does not automatically make matrices, 2d arrays. It does not follow MATLAB in that regard.
We can reshape that array, adding a 2nd dimension. The result is a view (sooner or later you need to learn what that means):
In [427]: y = x.reshape(5,1)
In [428]: y.shape
Out[428]: (5, 1)
In [429]: y.ndim
Out[429]: 2
The display of these 2 arrays is very different. Same numbers, but the layout and number of brackets is very different, reflecting the respective shapes:
In [430]: x
Out[430]: array([0, 1, 2, 3, 4])
In [431]: y
Out[431]:
array([[0],
[1],
[2],
[3],
[4]])
The shape difference may seem academic - until you try to do math with the arrays:
In [432]: x+x
Out[432]: array([0, 2, 4, 6, 8]) # element wise sum
In [433]: x+y
Out[433]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
How did that end up producing a (5,5) array? Broadcasting a (5,) array with a (5,1) array!

Understanding non-homogeneous numpy arrays

I have recently started numpy and noticed a peculiar thing.
import numpy as np
a = np.array([[1,2,3], [4,5,9, 8]])
print a.shape, "shape"
print a[1, 0]
The shape, in this case, comes out to be 2L. However if I make a homogenous numpy array as
a = np.array([[1,2,3], [4,5,6]], then a.shape gives (2L, 3L). I understand that the shape of a non-homogenous array is difficult to represent as a tuple.
Additionally, print a[1,0] for non-homogenous array that I created earlier gives a traceback IndexError: too many indices for array. Doing the same on the homogenous array gives back the correct element 4.
Noticing these two peculiarities, I am curious to know how python looks at non-homogenous numpy arrays at a low level.
Thank You in advance
When the sublists differ in length, np.array falls back to creating an object dtype array:
In [272]: a = np.array([[1,2,3], [4,5,9, 8]])
In [273]: a
Out[273]: array([[1, 2, 3], [4, 5, 9, 8]], dtype=object)
This array is similar to the list we started with. Both store the sublists as pointers. The sublists exist else where in memory.
With equal length sublsts, it can create a 2d array, with integer elements:
In [274]: a2 = np.array([[1,2,3], [4,5,9]])
In [275]: a2
Out[275]:
array([[1, 2, 3],
[4, 5, 9]])
In fact to confirm my claim that the sublists are stored elsewhere in memory, let's try to change one:
In [276]: alist = [[1,2,3], [4,5,9, 8]]
In [277]: a = np.array(alist)
In [278]: a
Out[278]: array([[1, 2, 3], [4, 5, 9, 8]], dtype=object)
In [279]: a[0].append(4)
In [280]: a
Out[280]: array([[1, 2, 3, 4], [4, 5, 9, 8]], dtype=object)
In [281]: alist
Out[281]: [[1, 2, 3, 4], [4, 5, 9, 8]]
That would not work in the case of a2. a2 has its own data storage, independent of the source list.
The basic point is that np.array tries to create an n-d array where possible. If it can't it falls back on to creating an object dtype array. And, as has been discussed in other questions, it sometimes raises an error. It is also tricky to intentionally create an object array.
The shape of a is easy, (2,). A single element tuple. a is a 1d array. But that shape does not convey information about the elements of a. And the same goes for the elements of alist. len(alist) is 2. An object array can have a more complex shape, e.g. a.reshape(1,2,1), but it is still just contains pointers
a contains 2 4byte pointers; a2 contains 6 4byte integers.
n [282]: a.itemsize
Out[282]: 4
In [283]: a.nbytes
Out[283]: 8
In [284]: a2.nbytes
Out[284]: 24
In [285]: a2.itemsize
Out[285]: 4

slice a 3d numpy array using a 2d numpy array

Is it possible to slice a 3d array using a 2d array. Im assuming it can be done but would require that you have to specify the axis?
If I have 3 arrays, such that:
A = [[1,2,3,4,5],
[1,3,5,7,9],
[5,4,3,2,1]] # shape (3,5)
B1 = [[1],
[2],
[3]] # shape (3, 1)
B2 = [[4],
[3],
[4]] # shape (3,1)
Is its possible to slice A using B1 an B2 like:
Out = A[B1:B2]
so that it would return me:
Out = [[2,3,4,5],
[5, 7],
[2, 1]]
or would this not work if the slices created arrays in Out of different lengths?
Numpy is optimized for homogeneous arrays of numbers with fixed dimensions, so it does not support varying row or column sizes.
However you can achieve what you want by using a list of arrays:
Out = [A[i, B1[i]:B2[i]+1] for i in range(len(B1))]
Here's one to vectorization -
n_range = np.arange(A.shape[1])
elems = A[(n_range >= B1) & (n_range <= B2)]
idx = (B2 - B1 + 1).ravel().cumsum()
out = np.split(elems,idx)[:-1]
The trick is to use broadcasting to create a mask of elements to be selected for the output. Then, splitting the array of those elements at specified positions to get list of arrays.
Sample input, output -
In [37]: A
Out[37]:
array([[1, 2, 3, 4, 5],
[1, 3, 5, 7, 9],
[5, 4, 3, 2, 1]])
In [38]: B1
Out[38]:
array([[1],
[2],
[3]])
In [39]: B2
Out[39]:
array([[4],
[3],
[4]])
In [40]: out
Out[40]: [array([2, 3, 4, 5]), array([5, 7]), array([2, 1])]
# Please note that the o/p is a list of arrays
Your desired result has a different number of terms in each row - that's a strong indicator that a fully vectorized solution is not possible. It is not doing the same thing for each row or each column.
Secondly, n:m translates to slice(n,m). slice only takes integers, not lists or arrays.
The obvious solution is some sort of iteration over rows:
In [474]: A = np.array([[1,2,3,4,5],
[1,3,5,7,9],
[5,4,3,2,1]]) # shape (3,5)
In [475]: B1=[1,2,3] # no point in making these 2d
In [476]: B2=[5,4,5] # corrected values
In [477]: [a[b1:b2] for a,b1,b2 in zip(A,B1,B2)]
Out[477]: [array([2, 3, 4, 5]), array([5, 7]), array([2, 1])]
This solution works just as well if A is a nested list
In [479]: [a[b1:b2] for a,b1,b2 in zip(A.tolist(),B1,B2)]
Out[479]: [[2, 3, 4, 5], [5, 7], [2, 1]]
The 2 lists could also be converted to an array of 1d indices, and then select values from A.ravel(). That would produce a 1d array, e.g.
array([2, 3, 4, 5, 5, 7, 2, 1]
which in theory could be np.split - but recent experience with other questions indicates that this doesn't save much time.
If the length of the row selections were all the same we can get a 2d array. Iterative version taking 2 elements per row:
In [482]: np.array([a[b1:b1+2] for a,b1 in zip(A,B1)])
Out[482]:
array([[2, 3],
[5, 7],
[2, 1]])
I've discussed in earlier SO questions how produce this sort of result with one indexing operation.
On what slice accepts:
In [486]: slice([1,2],[3,4]).indices(10)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-486-0c3514e61cf6> in <module>()
----> 1 slice([1,2],[3,4]).indices(10)
TypeError: slice indices must be integers or None or have an __index__ method
'vectorized' ravel indexing
In [505]: B=np.array([B1,B2])
In [506]: bb=A.shape[1]*np.arange(3)+B
In [508]: ri =np.r_[tuple([slice(i,j) for i,j in bb.T])]
# or np.concatenate([np.arange(i,j) for i,j in bb.T])
In [509]: ri
Out[509]: array([ 1, 2, 3, 4, 7, 8, 13, 14])
In [510]: A.ravel()[ri]
Out[510]: array([2, 3, 4, 5, 5, 7, 2, 1])
It still has an iteration - to generate the slices that go into np.r_ (which expands them into a single indexing array)

How do I convert a 2D numpy array into a 1D numpy array of 1D numpy arrays?

In other words, each element of the outer array will be a row vector from the original 2D array.
A #Jaime already said, a 2D array can be interpreted as an array of 1D arrays, suppose:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
doing a[0] will return array([1, 2, 3]).
So you don't need to do any conversion.
I think it makes little sense to use numpy arrays to do that, just think you're missing out on all the advantages of numpy.
I had the same issue to append a raw with a different length to a 2D-array.
The only trick I found up to now was to use list comprenhsion and append the new row (see below). Not very optimal I guess but at least it works ;-)
Hope this can help
>>> x=np.reshape(np.arange(0,9),(3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> row_to_append = np.arange(9,11)
>>> row_to_append
array([ 9, 10])
>>> result=[item for item in x]
>>> result.append(row_to_append)
>>> result
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10])]
np.vsplit Split an array into multiple sub-arrays vertically (row-wise).
x=np.arange(12).reshape(3,4)
In [7]: np.vsplit(x,3)
Out[7]: [array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
A comprehension could be used to reshape those arrays into 1d ones.
This is a list of arrays, not an array of arrays. Such a sequence of arrays can be recombined with vstack (or hstack, dstack).
np.array([np.arange(3),np.arange(4)])
makes a 2 element array of arrays. But if the arrays in the list are all the same shape (or compatible), it makes a 2d array. In terms of data storage it may not matter whether it is 2d or 1d of 1d arrays.

Categories