Understanding Numpy dimensions of arrays

Understanding Numpy dimensions of arrays - python

I started looking into Numpy using a 'Python for data analysis'. Why is the array dimension for arr2d is "2", instead of "3". Also why is the dimension for arr3d "3", instead of "2".
I thought the dimension of the array is based on the number of rows? Or this doesn't apply to higher dimensional and multidimensional arrays?
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d.shape
Output: (3, 3)
arr2d.ndim
Output: 2
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d.shape
Output: (2, 2, 3)
arr3d.ndim
Output: 3

well see basically the dimension of the array is not based on the number of rows
basically it is based on the brackets i.e [] that you entered in np.array() method
see
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
in arr2d there are 2 brackets([[]]) or there are 2 opening brackets([[) or its has 2 closing brackets(]]) so its an 2D array of (3,3) i.e 3 rows and 3 columns
similarly
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
in arr3d there are 3 brackets([[[]]]) or there are 3 opening brackets([[[) or or its has 3 closing brackets(]]]) so its an 3D array of (2,2,3) i.e its has 2 arrays of 2 rows and 3 columns

Numpy stores its ndarrays as contiguous blocks of memory. Each element is stored in a sequential manner every n bytes after the previous.
(images referenced from this excellent SO post)
So if your 3D array looks like this -
np.arange(0,16).reshape(2,2,4)
#array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7]],
#
# [[ 8, 9, 10, 11],
# [12, 13, 14, 15]]])
Then in memory its stores as -
When retrieving an element (or a block of elements), NumPy calculates how many strides (of 8 bytes each) it needs to traverse to get the next element in that direction/axis. So, for the above example, for axis=2 it has to traverse 8 bytes (depending on the datatype) but for axis=1 it has to traverse 8*4 bytes, and axis=0 it needs 8*8 bytes.
With this in mind, let's understand what dimensions are in numpy.
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr2d.shape, arr3d.shape)
(3, 3) (2, 2, 3)
These can be considered a 2D matrix and a 3D tensor respectively. Here is an intuitive diagram to show how this would look like.
A 1D numpy array with (ndims=1) is a vector, 2D is a matrix, and 3D is a rank 2 tensor which can be imagined as a cube. The number of values it can store is equal to - array.shape[0] * array.shape[1] * array.shape[2] which in your second case is 2*2*3.
Vector (n,) -> (axis0,) #elements
Matrix (m,n) -> (axis0, axis1) #rows, columns
Tensor2 (l,m,n) -> (axis0, axis1, axis2)
Tensor3 (l,m,n,o) -> (axis0, axis1, axis2, axis3)

Related

Unable to expand numpy array

I have an array containing information about images. It contains information about 21495 images in an array named 'shuffled'.
np.shape(shuffled) = (21495, 1)
np.shape(shuffled[0]) = (1,)
np.shape(shuffled[0][0]) = (128, 128, 3) # (These are the image dimensions, with 3 channels of RGB)
How do I convert this array to an array of shape (21495, 128, 128, 3) to feed to my model?

There are 2 ways that I can think of:
One is using the vstack() fucntion of numpy, but it gets quite slow overtime when the size of array starts to increase.
Another way (which I use) is to take an empty list and keep appending the images array to that list using .append(), then finally convert that list to a numpy array.

Try
np.stack(shuffled[:,0])
stack, a form of concatenate, joins a list (or array) of arrays on a new initial dimension. We need to get get rid of the size 1 dimension first.
In [23]: arr = np.empty((4,1),object)
In [24]: for i in range(4): arr[i,0] = np.arange(i,i+6).reshape(2,3)
In [25]: arr
Out[25]:
array([[array([[0, 1, 2],
[3, 4, 5]])],
[array([[1, 2, 3],
[4, 5, 6]])],
[array([[2, 3, 4],
[5, 6, 7]])],
[array([[3, 4, 5],
[6, 7, 8]])]], dtype=object)
In [26]: arr.shape
Out[26]: (4, 1)
In [27]: arr[0,0].shape
Out[27]: (2, 3)
In [28]: np.stack(arr[:,0])
Out[28]:
array([[[0, 1, 2],
[3, 4, 5]],
[[1, 2, 3],
[4, 5, 6]],
[[2, 3, 4],
[5, 6, 7]],
[[3, 4, 5],
[6, 7, 8]]])
In [29]: _.shape
Out[29]: (4, 2, 3)
But beware, if the subarrays differ in shape, say one or two is b/w rather than 3 channel, this won't work.

Very Basic Numpy array dimension visualization

I'm a beginner to numpy with no experience in matrices. I understand basic 1d and 2d arrays but I'm having trouble visualizing a 3d numpy array like the one below. How do the following python lists form a 3d array with height, length and width? Which are the rows and columns?
b = np.array([[[1, 2, 3],[4, 5, 6]],
[[7, 8, 9],[10, 11, 12]]])

The anatomy of an ndarray in NumPy looks like this red cube below: (source: Physics Dept, Cornell Uni)
Once you leave the 2D space and enter 3D or higher dimensional spaces, the concept of rows and columns doesn't make much sense anymore. But still you can intuitively understand 3D arrays. For instance, considering your example:
In [41]: b
Out[41]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
In [42]: b.shape
Out[42]: (2, 2, 3)
Here the shape of b is (2, 2, 3). You can think about it like, we've two (2x3) matrices stacked to form a 3D array. To access the first matrix you index into the array b like b[0] and to access the second matrix, you index into the array b like b[1].
# gives you the 2D array (i.e. matrix) at position `0`
In [43]: b[0]
Out[43]:
array([[1, 2, 3],
[4, 5, 6]])
# gives you the 2D array (i.e. matrix) at position 1
In [44]: b[1]
Out[44]:
array([[ 7, 8, 9],
[10, 11, 12]])
However, if you enter 4D space or higher, it will be very hard to make any sense out of the arrays itself since we humans have hard time visualizing 4D and more dimensions. So, one would rather just consider the ndarray.shape attribute and work with it.
More information about how we build higher dimensional arrays using (nested) lists:
For 1D arrays, the array constructor needs a sequence (tuple, list, etc) but conventionally list is used.
In [51]: oneD = np.array([1, 2, 3,])
In [52]: oneD.shape
Out[52]: (3,)
For 2D arrays, it's list of lists but can also be tuple of lists or tuple of tuples etc:
In [53]: twoD = np.array([[1, 2, 3], [4, 5, 6]])
In [54]: twoD.shape
Out[54]: (2, 3)
For 3D arrays, it's list of lists of lists:
In [55]: threeD = np.array([[[1, 2, 3], [2, 3, 4]], [[5, 6, 7], [6, 7, 8]]])
In [56]: threeD.shape
Out[56]: (2, 2, 3)
P.S. Internally, the ndarray is stored in a memory block as shown in the below picture. (source: Enthought)

Numpy Search & Slice 3D Array

I'm very new to Python & Numpy and am trying to accomplish the following:
Given, 3D Array:
arr_3d = [[[1,2,3],[4,5,6],[0,0,0],[0,0,0]],
[[3,2,1],[0,0,0],[0,0,0],[0,0,0]]
[[1,2,3],[4,5,6],[7,8,9],[0,0,0]]]
arr_3d = np.array(arr_3d)
Get the indices where [0,0,0] appears in the given 3D array.
Slice the given 3D array from where [0,0,0] appears first.
In other words, I'm trying to remove the padding (In this case: [0,0,0]) from the given 3D array.
Here is what I have tried,
arr_zero = np.zeros(3)
for index in range(0, len(arr_3d)):
rows, cols = np.where(arr_3d[index] == arr_zero)
arr_3d[index] = np.array(arr_3d[0][:rows[0]])
But doing this, I keep getting the following error:
Could not broadcast input array from shape ... into shape ...
I'm expecting something like this:
[[[1,2,3],[4,5,6]],
[[3,2,1]]
[[1,2,3],[4,5,6],[7,8,9]]]
Any help would be appreciated.

Get the first occurance of those indices with all() reduction alongwith argmax() and then slice each 2D slice off the 3D array -
In [106]: idx = (arr_3d == [0,0,0]).all(-1).argmax(-1)
# Output as list of arrays
In [107]: [a[:i] for a,i in zip(arr_3d,idx)]
Out[107]:
[array([[1, 2, 3],
[4, 5, 6]]), array([[3, 2, 1]]), array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])]
# Output as list of lists
In [108]: [a[:i].tolist() for a,i in zip(arr_3d,idx)]
Out[108]: [[[1, 2, 3], [4, 5, 6]], [[3, 2, 1]], [[1, 2, 3], [4, 5, 6], [7, 8, 9]]]

How to use numpy as_strided (from np.stride_tricks) correctly?

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087
My use case is very similar, with the difference being that I need strides of 3.
Given this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I'd like to get:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Here's what I tried:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)
as_strided(a, (len(a) - 2, 3), (3, 3))
array([[ 1, 2199023255552, 131072],
[ 2199023255552, 131072, 216172782113783808],
[ 131072, 216172782113783808, 12884901888],
[216172782113783808, 12884901888, 768],
[ 12884901888, 768, 1125899906842624],
[ 768, 1125899906842624, 67108864],
[ 1125899906842624, 67108864, 4]])
I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?

The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:
In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided
In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides
Out[377]: (4,)
For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.
What the OP tried:
In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]:
array([[ 1, 512, 196608],
[ 512, 196608, 67108864],
[ 196608, 67108864, 4],
[ 67108864, 4, 1280],
[ 4, 1280, 393216],
[ 1280, 393216, 117440512],
[ 393216, 117440512, 7]])
This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.
Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:
In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.
In [382]: _.strides
Out[382]: (4, 4)
Changing strides to (8,4) steps 2 elements for each window.
In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 25, -1316948568],
[-1316948568, 184787224, -1420192452],
[-1420192452, 0, 0]])
But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.
Now step 3 elements for each row (3*4, 4):
In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)
This is the same shape and strides as a 3x3 reshape.
We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:
In [399]: np.broadcast_to(a, (2,9))
Out[399]:
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)
In [401]: a.reshape(3,3)[::-1,:]
Out[401]:
array([[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)
However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.

I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:
as_strided(a, (len(a) - 2, 3), a.strides*2)

I was trying to do a similar operation and run into the same problem.
In your case, as stated in this comment, the problems were:
You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.
I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).
I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:
def window_signal(signal, window, overlap):
"""
Windowing function for data segmentation.
Parameters:
------------
signal: ndarray
The signal to segment.
window: int
Window length, in samples.
overlap: int
Number of samples to overlap
Returns:
--------
nd-array
A copy of the signal array with shape (rows, window),
where row = (N-window)//(window-overlap) + 1
"""
N = signal.reshape(-1).shape[0]
if (window == overlap):
rows = N//window
overlap = 0
else:
rows = (N-window)//(window-overlap) + 1
miss = (N-window)%(window-overlap)
if(miss != 0):
print('Windowing led to the loss of ', miss, ' samples.')
item_size = signal.dtype.itemsize
strides = (window - overlap) * item_size
return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
strides=(strides, item_size))
The solution for this case is, according to your code:
as_strided(a, (len(a) - 2, 3), (4, 4))
Alternatively, using the function window_signal:
window_signal(a, 3, 2)
Both return as output the following array:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])

Apply same permutation for every row in a 2D numpy array

To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?

Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])

Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.

It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding Numpy dimensions of arrays - python

Related

Unable to expand numpy array

Very Basic Numpy array dimension visualization

Numpy Search & Slice 3D Array

How to use numpy as_strided (from np.stride_tricks) correctly?

Apply same permutation for every row in a 2D numpy array

Categories

Resources