array[row][col] vs array[row,col] in Python - python

What is the difference between indexing a 2D array row/col with [row][col] vs [row, col] in numpy/pandas? Is there any implications of using either of these two?
For example:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(arr[1][0])
print(arr[1, 0])
Both give 3.

Single-element indexing
For single elements indexing as in your example, the result is indeed the same. Although as stated in the docs:
So note that x[0,2] = x[0][2] though the second case is more
inefficient as a new temporary array is created after the first index
that is subsequently indexed by 2.
emphasis mine
Array indexing
In this case, not only that double-indexing is less efficient - it simply gives different results. Let's look at an example:
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[1:][0]
[3 4]
>>> arr[1:, 0]
[3 5]
In the first case, we create a new array after the first index which is all rows from index 1 onwards:
>>> arr[1:]
[[3 4]
[5 6]]
Then we simply take the first element of that new array which is [3 4].
In the second case, we use numpy indexing which doesn't index the elements but indexes the dimensions. So instead of taking the first row, it is actually taking the first column - [3 5].

Using [row][col] is one more function call than using [row, col]. When you are indexing an array (in fact, any object, for that matter), you are calling obj.__getitem__ under the hook. Since Python wraps the comma in a tuple, doing obj[row][col] is the equivalent of calling obj.__getitem__(row).__getitem__(col), whereas obj[row, col] is simply obj.__getitem__((row,col)). Therefore, indexing with [row, col] is more efficient because it has one fewer function call (plus some namespace lookups but they can normally be ignored).

Related

List of lists equivalent for numpy 2D indexing [:,0]

I need to access the first element of each list within a list. Usually I do this via numpy arrays by indexing:
import numpy as np
nparr=np.array([[1,2,3],[4,5,6],[7,8,9]])
first_elements = nparr[:,0]
b/c:
print(nparr[0,:])
[1 2 3]
print(nparr[:,0])
[1 4 7]
Unfortunately I have to tackle non-rectangular dynamic arrays now so numpy won't work.
But Pythons standard lists behave strangely (at least for me):
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0][:])
[1, 2, 3]
print(pylist[:][0])
[1, 2, 3]
I guess either lists doesn't support this (which would lead to a second question: What to use instead) or I got the syntax wrong?
You have a few options. Here's one.
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0]) # [1, 2, 3]
print([row[0] for row in pylist]) # [1, 4, 7]
Alternatively, if you want to transpose pylist (make its rows into columns), you could do the following.
pylist_transpose = [*zip(*pylist)]
print(pylist_transpose[0]) # [1, 4, 7]
pylist_transpose will always be a rectangular array with a number of rows equal to the length of the shortest row in pylist.

Numpy array indexing syntax

I am learning numpy newly and confused about syntax used in indexing of arrays. For example:
arr[2, 3]
This means element at intersection of 3nd row and 4th column. What confuses me separation of different indices by comma inside square brackets (like in function arguments). Doing so with python lists is not valid:
l = [[1, 2], [3, 4]]
l[1, 1]
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers or slices, not tuple
So, if this not a valid python syntax, how numpy arrays work?
Use Colon ':' instead of commas ','.
In slicing or indexing is done using colon ':'
In your above example,
l = [[1, 2], [3, 4]]
->l[0] is [1,2] and -> l[1] is [3,4]
Read further documentation for better understanding.
Thank You
In your given example, you're comparing a numpy array to a list of lists. The main difference between the two is that a numpy array is predictable in terms of shape, data type of its elements, and so on, while a list can contain an arbitrary combination of any other python objects (lists, tuples, strings, etc.)
Take this as an example, say you create a numpy array like so:
arr = np.array([[0, 1], [2, 3], [4, 5]])
Here, the shape of arr is known right after instantiation "arr.shape returns (3,2)", so you can easily index the array with only a comma separated square bracket. On the other hand, take the list example:
l = [[0, 1], [2, 3], [4, 5]]
l[0] # This returns the list [0, 1]
l[0].append("HELLO")
l[0] # This returns the list [0, 1, "HELLO"]
A list is very unpredictable, as there's no way to know what each list element will return to you. So, the way we index a specific element in a list of lists is by using 2 square brackets "e.g. l[0][0]"
What if we created a non-uniform numpy array? Well, you get a similar behaviour to a list of lists:
arr = np.array([[0, 1], [2, 3], [4]]) # Here, you get a Warning!
print(arr) # Returns: array([list([0, 1]), list([2, 3]), list([4])], dtype=object)
In this case, you can't index the numpy array using [0, 0]. Instead, you have to use two square brackets, just like a list of lists
You can also check the documentation of ndarray for more info.

Numpy array indexing behavior

I was playing with numpy array indexing and find this odd behavior. When I index with np.array or list it works as expected:
In[1]: arr = np.arange(10).reshape(5,2)
arr[ [1, 1] ]
Out[1]: array([[2, 3],
[2, 3]])
But when I put tuple, it gives me a single element:
In[1]: arr = np.arange(10).reshape(5,2)
arr[ (1, 1) ]
Out[1]: 3
Also some kind of this strange tuple vs list behavior occurs with arr.flat:
In[1]: arr = np.arange(10).reshape(5,2)
In[2]: arr.flat[ [3, 4] ]
Out[2]: array([3, 4])
In[3]: arr.flat[ (3, 4) ]
Out[3]: IndexError: unsupported iterator index
I can't understand what is going on under the hood? What difference between tuple and list in this case?
Python 3.5.2
NumPy 1.11.1
What's happening is called fancy indexing, or advanced indexing. There's a difference between indexing with slices, or with a list/array. The trick is that multidimensional indexing actually works with tuples due to the implicit tuple syntax:
import numpy as np
arr = np.arange(10).reshape(5,2)
arr[2,1] == arr[(2,1)] # exact same thing: 2,1 matrix element
However, using a list (or array) inside an index expression will behave differently:
arr[[2,1]]
will index into arr with 1, then with 2, so first it fetches arr[2]==arr[2,:], then arr[1]==arr[1,:], and returns these two rows (row 2 and row 1) as the result.
It gets funkier:
print(arr[1:3,0:2])
print(arr[[1,2],[0,1]])
The first one is regular indexing, and it slices rows 1 to 2 and columns 0 to 1 inclusive; giving you a 2x2 subarray. The second one is fancy indexing, it gives you arr[1,0],arr[2,1] in an array, i.e. it indexes selectively into your array using, essentially, the zip() of the index lists.
Now here's why flat works like that: it returns a flatiter of your array. From help(arr.flat):
class flatiter(builtins.object)
| Flat iterator object to iterate over arrays.
|
| A `flatiter` iterator is returned by ``x.flat`` for any array `x`.
| It allows iterating over the array as if it were a 1-D array,
| either in a for-loop or by calling its `next` method.
So the resulting iterator from arr.flat behaves as a 1d array. When you do
arr.flat[ [3, 4] ]
you're accessing two elements of that virtual 1d array using fancy indexing; it works. But when you're trying to do
arr.flat[ (3,4) ]
you're attempting to access the (3,4) element of a 1d (!) array, but this is erroneous. The reason that this doesn't throw an IndexError is probably only due to the fact that arr.flat itself handles this indexing case.
In [387]: arr=np.arange(10).reshape(5,2)
With this list, you are selecting 2 rows from arr
In [388]: arr[[1,1]]
Out[388]:
array([[2, 3],
[2, 3]])
It's the same as if you explicitly marked the column slice (with : or ...)
In [389]: arr[[1,1],:]
Out[389]:
array([[2, 3],
[2, 3]])
Using an array instead of a list works: arr[np.array([1,1]),:]. (It also eliminates some ambiguities.)
With the tuple, the result is the same as if you wrote the indexing without the tuple wrapper. So it selects an element with row index of 1, column index of 1.
In [390]: arr[(1,1)]
Out[390]: 3
In [391]: arr[1,1]
Out[391]: 3
The arr[1,1] is translated by the interpreter to arr.__getitem__((1,1)). As is common in Python 1,1 is shorthand for (1,1).
In the arr.flat cases you are indexing the array as if it were 1d. np.arange(10)[[2,3]] selects 2 items, while np.arange(10)[(2,3)] is 2d indexing, hence the error.
A couple of recent questions touch on a messier corner case. Sometimes the list is treated as a tuple. The discussion might be enlightening, but don't go there if it's confusing.
Advanced slicing when passed list instead of tuple in numpy
numpy indexing: shouldn't trailing Ellipsis be redundant?

Index a numpy array with another array

I feel silly, because this is such a simple thing, but I haven't found the answer either here or anywhere else.
Is there no straightforward way of indexing a numpy array with another?
Say I have a 2D array
>> A = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
if I want to access element [3,1] I type
>> A[3,1]
8
Now, say I store this index in an array
>> ind = np.array([3,1])
and try using the index this time:
>> A[ind]
array([[7, 8],
[3, 4]])
the result is not A[3,1]
The question is: having arrays A and ind, what is the simplest way to obtain A[3,1]?
Just use a tuple:
>>> A[(3, 1)]
8
>>> A[tuple(ind)]
8
The A[] actually calls the special method __getitem__:
>>> A.__getitem__((3, 1))
8
and using a comma creates a tuple:
>>> 3, 1
(3, 1)
Putting these two basic Python principles together solves your problem.
You can store your index in a tuple in the first place, if you don't need NumPy array features for it.
That is because by giving an array you actually ask
A[[3,1]]
Which gives the third and first index of the 2d array instead of the first index of the third index of the array as you want.
You can use
A[ind[0],ind[1]]
You can also use (if you want more indexes at the same time);
A[indx,indy]
Where indx and indy are numpy arrays of indexes for the first and second dimension accordingly.
See here for all possible indexing methods for numpy arrays: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html

Acquiring the Minimum array out of Multiple Arrays by order in Python

Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])

Categories