Numpy array dimensions are unexpectedly reordered after indexing [duplicate] - python

I have the following minimal example:
a = np.zeros((5,5,5))
a[1,1,:] = [1,1,1,1,1]
print(a[1,:,range(4)])
I would expect as output an array with 5 rows and 4 columns, where we have ones on the second row. Instead it is an array with 4 rows and 5 columns with ones on the second column. What is happening here, and what can I do to get the output I expected?

This is an example of mixed basic and advanced indexing, as discussed in https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The slice dimension has been appended to the end.
With one scalar index this is a marginal case for the ambiguity described there. It's been discussed in previous SO questions and one or more bug/issues.
Numpy sub-array assignment with advanced, mixed indexing
In this case you can replace the range with a slice, and get the expected order:
In [215]: a[1,:,range(4)].shape
Out[215]: (4, 5) # slice dimension last
In [216]: a[1,:,:4].shape
Out[216]: (5, 4)
In [219]: a[1][:,[0,1,3]].shape
Out[219]: (5, 3)

Related

What is the Numpy slicing notation in this code?

# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Can someone explain the second line of code with reference to specific documentation? I know its slicing but the I couldn't find any reference for the notation ":-1" anywhere. Please give the specific documentation portion.
Thank you
It results in slicing, most probably using numpy and it is being done on a data of shape (610, 14)
Per the docs:
Indexing on ndarrays
ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.
1D array
Slicing a 1-dimensional array is much like slicing a list
import numpy as np
np.random.seed(0)
array_1d = np.random.random((5,))
print(len(array_1d.shape))
1
NOTE: The len of the array shape tells you the number of dimensions.
We can use standard python list slicing on the 1D array.
# get the last element
print(array_1d[-1])
0.4236547993389047
# get everything up to but excluding the last element
print(array_1d[:-1])
[0.5488135 0.71518937 0.60276338 0.54488318]
2D array
array_2d = np.random.random((5, 1))
print(len(array_2d.shape))
2
Think of a 2-dimensional array like a data frame. It has rows (the 0th axis) and columns (the 1st axis). numpy grants us the ability to slice these axes independently by separating them with a comma (,).
# the 0th row and all columns
# the 0th row and all columns
print(array_2d[0, :])
[0.79172504]
# the 1st row and everything after + all columns
print(array_2d[1:, :])
[[0.52889492]
[0.56804456]
[0.92559664]
[0.07103606]]
# the 1st through second to last row + the last column
print(array_2d[1:-1, -1])
[0.52889492 0.56804456 0.92559664]
Your Example
# split into inputs and outputs
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
Note that data.shape is >= 2 (otherwise you'd get an IndexError).
This means data[:, :-1] is keeping all "rows" and slicing up to, but not including, the last "column". Likewise, data[:, -1] is keeping all "rows" and selecting only the last "column".
It's important to know that when you slice an ndarray using a colon (:), you will get an array with the same dimensions.
print(len(array_2d[1:, :-1].shape)) # 2
But if you "select" a specific index (i.e. don't use a colon), you may reduce the dimensions.
print(len(array_2d[1, :-1].shape)) # 1, because I selected a single index value on the 0th axis
print(len(array_2d[1, -1].shape)) # 0, because I selected a single index value on both the 0th and 1st axes
You can, however, select a list of indices on either axis (assuming they exist).
print(len(array_2d[[1], [-1]].shape)) # 1
print(len(array_2d[[1, 3], :].shape)) # 2
This slicing notation is explained here https://docs.python.org/3/tutorial/introduction.html#strings
-1 means last element, -2 - second from last, etc. For example, if there are 8 elements in a list, -1 is equivalent to 7 (not 8 because indexing starts from 0)
Keep in mind that "normal" python slicing for nested lists looks like [1:3][5:7], while numpy arrays also have a slightly different syntax ([8:10, 12:14]) that lets you slice multidimensional arrays. However, -1 always means the same thing. Here is the numpy documentation for slicing https://numpy.org/doc/stable/user/basics.indexing.html

Call numpy ravel and include an interger multi-index

I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".

Multiplying specific elements in a 4D array

I'm using python 3 and I have an array oh_array which has the shape (12, 72, 46, 38) and I need to multiply [20:27],[38:43],[-16:-1] axis=1 by 10
then [17:26] axis=2 by 10 and then [0:8]axis=3 by 10. The array has to stay the same size and dimensions but just have these elements changed in it. I have thought to use a loop with a range but do not know if they can be used in multiple dimensions.
IIUC, use np.multiply.at and np.r_
np.multiply.at(arr, (slice(None),
np.r_[10:28, 38:43, -16:-1],
slice(None),
slice(None)),
10)
where arr is your array. The ufunc.at function allows you to multiply values in the indexes (specified in the second argument) or an array arr (specified in the first argument) by a certain number b (in this case, 10) specified in the last argument. Just change the indexes accordingly

Finding the index of similar columns in 2 numpy array

I have two 2d numpy arrays of size (12550,200) and (12550,10). I need to find the set of column indexes of the first array that are matching the 2nd array columns.
Eg:
ar1 = [[1,2,3,4],[4,5,6,7],[1,3,4,5],[6,7,8,5]]
ar2 = [[1,3],[4,6],[1,4],[6,8]]
so matching columns are 1,4,1,4 and 3,6,4,8
I need the index of these columns in ar1 as output i.e., [0,2]
Can anyone help me with the python code that is fast enough as the original array dimensions are big
Check this out:
ar1 = np.array([[1,2,3,4],[4,5,6,7],[1,3,4,5],[6,7,8,5]])
ar2 = np.array([[1,3],[4,6],[1,4],[6,8]])
np.where((ar1[:,None].T == ar2.T).all(axis=2))[0]
gives
array([0, 2], dtype=int64)
meaning column 0 of ar2 is found at column 0 of ar1, and column 1 of ar2 is found at column 2 of ar1.
The transpose is used because you care about columns rather than rows. The [:,None] is used for broadcasting (i.e. test every column against every other). The all() checks that entire columns match. And finally the [0] element of the np.where result will give you the ar1 column indices where this happens.

Reshaping Numpy Arrays to a multidimensional array

For a numpy array I have found that
x = numpy.array([]).reshape(0,4)
is fine and allows me to append (0,4) arrays to x without the array losing its structure (ie it dosnt just become a list of numbers). However, when I try
x = numpy.array([]).reshape(2,3)
it throws an error. Why is this?
This out put will explain what it mean to reshape an array...
np.array([2, 3, 4, 5, 6, 7]).reshape(2, 3)
Output -
array([[2, 3, 4],
[5, 6, 7]])
So reshaping just means reshaping an array. reshape(0, 4) means convert the current array into a format with 0 rows and 4 columns intuitively. But 0 rows means no elements means so it works as your array is empty. Similarly (2, 3) means 2 rows and 3 columns which is 6 elements...
reshape is not an 'append' function. It reshapes the array you give it to the dimensions you want.
np.array([]).reshape(0,4) works because you reshape a zero element array to a 0x4(=0 elements) array.
np.reshape([]).reshape(2,3) doesn't work because you're trying to reshape a zero element array to a 2x3(=6 elements) array.
To create an empty array use np.zeros((2,3)) instead.
And in case you're wondering, numpy arrays can't be appended to. You'll have to work around by casting it as a list, appending what you want and the converting back to a numpy array. Preferably, you only create a numpy array when you don't mean to append data later.

Categories