Index a numpy array with another array - python

I feel silly, because this is such a simple thing, but I haven't found the answer either here or anywhere else.
Is there no straightforward way of indexing a numpy array with another?
Say I have a 2D array
>> A = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
if I want to access element [3,1] I type
>> A[3,1]
8
Now, say I store this index in an array
>> ind = np.array([3,1])
and try using the index this time:
>> A[ind]
array([[7, 8],
[3, 4]])
the result is not A[3,1]
The question is: having arrays A and ind, what is the simplest way to obtain A[3,1]?

Just use a tuple:
>>> A[(3, 1)]
8
>>> A[tuple(ind)]
8
The A[] actually calls the special method __getitem__:
>>> A.__getitem__((3, 1))
8
and using a comma creates a tuple:
>>> 3, 1
(3, 1)
Putting these two basic Python principles together solves your problem.
You can store your index in a tuple in the first place, if you don't need NumPy array features for it.

That is because by giving an array you actually ask
A[[3,1]]
Which gives the third and first index of the 2d array instead of the first index of the third index of the array as you want.
You can use
A[ind[0],ind[1]]
You can also use (if you want more indexes at the same time);
A[indx,indy]
Where indx and indy are numpy arrays of indexes for the first and second dimension accordingly.
See here for all possible indexing methods for numpy arrays: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html

Related

List of lists equivalent for numpy 2D indexing [:,0]

I need to access the first element of each list within a list. Usually I do this via numpy arrays by indexing:
import numpy as np
nparr=np.array([[1,2,3],[4,5,6],[7,8,9]])
first_elements = nparr[:,0]
b/c:
print(nparr[0,:])
[1 2 3]
print(nparr[:,0])
[1 4 7]
Unfortunately I have to tackle non-rectangular dynamic arrays now so numpy won't work.
But Pythons standard lists behave strangely (at least for me):
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0][:])
[1, 2, 3]
print(pylist[:][0])
[1, 2, 3]
I guess either lists doesn't support this (which would lead to a second question: What to use instead) or I got the syntax wrong?
You have a few options. Here's one.
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0]) # [1, 2, 3]
print([row[0] for row in pylist]) # [1, 4, 7]
Alternatively, if you want to transpose pylist (make its rows into columns), you could do the following.
pylist_transpose = [*zip(*pylist)]
print(pylist_transpose[0]) # [1, 4, 7]
pylist_transpose will always be a rectangular array with a number of rows equal to the length of the shortest row in pylist.

Iterating through a numpy array /matrix and storing it in a seperate array

I have a matrix/numpy array A and need to carry out a function f to all elements and then store the result in a matrix B. How would I do this, I'm thinking of appending an empty array of B as I go on but is this the best way of doing it?
You can use numpy.vectorize:
A = np.arange(9).reshape((3,3))
#array([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
my_func = lambda x: x + 1 #Example function
vect_func = np.vectorize(my_func)
B = vect_func(A)
Output
print(B)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Most operations can be done directly on the array itself. For example, if you want to add one to each element of the array, you would simply call;
arr2 = arr1+1
Vectorizing may be necessary if the required operation isn't universal, but will be at the expense of execution time. Vectorizing a function will still result in a looped operation, where as the universal functions (like above) are performed as scalars.

Rationale for numpy.split returning a list and not an array

I was surprised that numpy.split yields a list and not an array. I would have thought it would be better to return an array, since numpy has put a lot of work into making arrays more useful than lists. Can anyone justify numpy returning a list instead of an array? Why would that be a better programming decision for the numpy developers to have made?
A comment pointed out that if the slit is uneven, the result can't be a array, at least not one that has the same dtype. At best it would be an object dtype.
But lets consider the case of equal length subarrays:
In [124]: x = np.arange(10)
In [125]: np.split(x,2)
Out[125]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
In [126]: np.array(_) # make an array from that
Out[126]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
But we can get the same array without split - just reshape:
In [127]: x.reshape(2,-1)
Out[127]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Now look at the code for split. It just passes the task to array_split. Ignoring the details about alternative axes, it just does
sub_arys = []
for i in range(Nsections):
# st and end from `div_points
sub_arys.append(sary[st:end])
return sub_arys
In other words, it just steps through array and returns successive slices. Those (often) are views of the original.
So split is not that sophisticate a function. You could generate such a list of subarrays yourself without a lot of numpy expertise.
Another point. Documentation notes that split can be reversed with an appropriate stack. concatenate (and family) takes a list of arrays. If give an array of arrays, or a higher dim array, it effectively iterates on the first dimension, e.g. concatenate(arr) => concatenate(list(arr)).
Actually you are right it returns a list
import numpy as np
a=np.random.randint(1,30,(2,2))
b=np.hsplit(a,2)
type(b)
it will return type(b) as list so, there is nothing wrong in the documentation, i also first thought that the documentation is wrong it doesn't return a array, but when i checked
type(b[0])
type(b[1])
it returned type as ndarray.
it means it returns a list of ndarrary's.

numpy array TypeError: only integer scalar arrays can be converted to a scalar index

i=np.arange(1,4,dtype=np.int)
a=np.arange(9).reshape(3,3)
and
a
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[:,0:1]
>>>array([[0],
[3],
[6]])
a[:,0:2]
>>>array([[0, 1],
[3, 4],
[6, 7]])
a[:,0:3]
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Now I want to vectorize the array to print them all together. I try
a[:,0:i]
or
a[:,0:i[:,None]]
It gives TypeError: only integer scalar arrays can be converted to a scalar index
Short answer:
[a[:,:j] for j in i]
What you are trying to do is not a vectorizable operation. Wikipedia defines vectorization as a batch operation on a single array, instead of on individual scalars:
In computer science, array programming languages (also known as vector or multidimensional languages) generalize operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays.
...
... an operation that operates on entire arrays can be called a vectorized operation...
In terms of CPU-level optimization, the definition of vectorization is:
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
The problem with your case is that the result of each individual operation has a different shape: (3, 1), (3, 2) and (3, 3). They can not form the output of a single vectorized operation, because the output has to be one contiguous array. Of course, it can contain (3, 1), (3, 2) and (3, 3) arrays inside of it (as views), but that's what your original array a already does.
What you're really looking for is just a single expression that computes all of them:
[a[:,:j] for j in i]
... but it's not vectorized in a sense of performance optimization. Under the hood it's plain old for loop that computes each item one by one.
I ran into the problem when venturing to use numpy.concatenate to emulate a C++ like pushback for 2D-vectors; If A and B are two 2D numpy.arrays, then numpy.concatenate(A,B) yields the error.
The fix was to simply to add the missing brackets: numpy.concatenate( ( A,B ) ), which are required because the arrays to be concatenated constitute to a single argument
This could be unrelated to this specific problem, but I ran into a similar issue where I used NumPy indexing on a Python list and got the same exact error message:
# incorrect
weights = list(range(1, 129)) + list(range(128, 0, -1))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
# TypeError: only integer scalar arrays can be converted to a scalar index
It turns out I needed to turn weights, a 1D Python list, into a NumPy array before I could apply multi-dimensional NumPy indexing. The code below works:
# correct
weights = np.array(list(range(1, 129)) + list(range(128, 0, -1)))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
try the following to change your array to 1D
a.reshape((1, -1))
You can use numpy.ravel to return a flattened array from n-dimensional array:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
I had a similar problem and solved it using list...not sure if this will help or not
classes = list(unique_labels(y_true, y_pred))
this problem arises when we use vectors in place of scalars
for example in a for loop the range should be a scalar, in case you have given a vector in that place you get error. So to avoid the problem use the length of the vector you have used
I ran across this error when while trying to access elements of a list using a 1-D array. I was suggested this page but I don't the answer I was looking for.
Let l be the list and myarray be my 1D array. The correct way to access list l using elements of myarray is
np.take(l,myarray)

Seemingly inconsistent slicing behavior in numpy arrays

I ran across something that seemed to me like inconsistent behavior in Numpy slices. Specifically, please consider the following example:
import numpy as np
a = np.arange(9).reshape(3,3) # a 2d numpy array
y = np.array([1,2,2]) # vector that will be used to index the array
b = a[np.arange(len(a)),y] # a vector (what I want)
c = a[:,y] # a matrix ??
I wanted to obtain a vector such that the i-th element is a[i,y[i]]. I tried two things (b and c above) and was surprised that b and c are not the same... in fact one is a vector and the other is a matrix! I was under the impression that : was shorthand for "all elements" but apparently the meaning is somewhat more subtle.
After trial and error I somewhat understand the difference now (b == np.diag(c)), but would appreciate clarification on why they are different, what exactly using : implies, and how to understand when to use either case.
Thanks!
It's hard to understand advanced indexing (with lists or arrays) without understanding broadcasting.
In [487]: a=np.arange(9).reshape(3,3)
In [488]: idx = np.array([1,2,2])
Index with a (3,) and (3,) producing shape (3,) result:
In [489]: a[np.arange(3),idx]
Out[489]: array([1, 5, 8])
Index with (3,1) and (3,), result is (3,3)
In [490]: a[np.arange(3)[:,None],idx]
Out[490]:
array([[1, 2, 2],
[4, 5, 5],
[7, 8, 8]])
The slice : does basically the same thing. There are subtle differences, but here it's the same.
In [491]: a[:,idx]
Out[491]:
array([[1, 2, 2],
[4, 5, 5],
[7, 8, 8]])
ix_ does the same thing, converting the (3,) & (3,) to (3,1) and (1,3):
In [492]: np.ix_(np.arange(3),idx)
Out[492]:
(array([[0],
[1],
[2]]), array([[1, 2, 2]]))
A broadcasted sum might help visualize the two cases:
In [495]: np.arange(3)*10+idx
Out[495]: array([ 1, 12, 22])
In [496]: np.sum(np.ix_(np.arange(3)*10,idx),axis=0)
Out[496]:
array([[ 1, 2, 2],
[11, 12, 12],
[21, 22, 22]])
When you pass
np.arange(len(a)), y
You can view the result as being all the indexed pairs for the zipped elements you passed. In this case, indexing by np.arange(len(a)) and y
np.arange(len(a))
# [0, 1, 2]
y
# [1, 2, 2]
effectively takes elements: (0, 1), (1, 2), and (2, 2).
print(a[0, 1], a[1, 2], a[2, 2]) # 0th, 1st, 2nd elements from each indexer
# 1 5 8
In the second case, you take the entire slice along the first dimension. (Nothing before the colon.) So this is all elements along the 0th axis. You then specify with y that you want the 1st, 2nd, and 2nd element along each row. (0-indexed.)
As you pointed out, it may seem a bit unintuitive that the results are different given that the individual elements of the slice are equivalent:
a[:] == a[np.arange(len(a))]
and
a[:y] == a[:y]
However, NumPy advanced indexing cares what type of data structure you pass when indexing (tuples, integers, etc). Things can become hairy very quickly.
The detail behind that is this: first consider all NumPy indexing to be of the general form x[obj], where obj is the evaluation of whatever you passed. How NumPy "behaves" depends on what type of object obj is:
Advanced indexing is triggered when the selection object, obj, is a
non-tuple sequence object, an ndarray (of data type integer or bool),
or a tuple with at least one sequence object or ndarray (of data type
integer or bool).
...
The definition of advanced indexing means that x[(1,2,3),] is
fundamentally different than x[(1,2,3)]. The latter is equivalent to
x[1,2,3] which will trigger basic selection while the former will
trigger advanced indexing. Be sure to understand why this occurs.
In your first case, obj = np.arange(len(a)),y, a tuple that fits the bill in bold above. This triggers advanced indexing and forces the behavior described above.
As for the second case, [:,y]
When there is at least one slice (:), ellipsis (...) or np.newaxis in
the index (or the array has more dimensions than there are advanced
indexes), then the behaviour can be more complicated. It is like
concatenating the indexing result for each advanced index element.
Demonstrated:
# Concatenate the indexing result for each advanced index element.
np.vstack((a[0, y], a[1, y], a[2, y]))

Categories