How to do 2D slicing? - python

I'm trying to do what I think should be simple:
I make a 2D list:
a = [[1,5],[2,6],[3,7]]
and I want to slide out the first column and tried:
1)
a[:,0]
...
TypeError: list indices must be integers or slices, not tuple
2)
a[:,0:1]
...
TypeError: list indices must be integers or slices, not tuple
3)
a[:][0]
[1, 5]
4)
a[0][:]
[1, 5]
5) got it but is this the way to do it?
aa[0] for aa in a
Using numpy it would be easy but what is the Python way?

2D slicing like a[:, 0] only works for NumPy arrays, not for lists.
However you can transpose (rows become columns and vice versa) nested lists using zip(*a). After transposing, simply slice out the first row:
a = [[1,5],[2,6],[3,7]]
print zip(*a) # [(1, 2, 3), (5, 6, 7)]
print list(zip(*a)[0]) # [1, 2, 3]

What you are trying to do in numerals 1 and 2 works in numpy arrays (or similarly with pandas dataframes), but not with basic python lists. If you want to do it with basic python lists, see the answer from #cricket_007 in the comments to your question.
One of the reasons to use numpy is exactly this - it makes it much easier to slice arrays with multiple dimensions

Use [x[0] for x in a] is the clear and proper way.

Related

Numpy: two dimensional arrays, delete the odd indexes and keep the same array format

My array looks like this:
a = ([1,2],[2,3],[4,5],[3,8])
I did the following to delete odd indexes :
a = [v for i, v in enumerate(a) if i % 2 == 0]
but it dives me now two different arrays instead of one two dimensional:
a= [array([1, 2]), array([4, 5])]
How can I keep the same format as the beginning? thank you!
That is as simple as
a[::2]
which yields the lines with even index.
Use numpy array indexing, not comprehensions:
c = a[list(range(0,len(a),2)),:]
If you define c as the output of a list comprehension, it will return a list of one-dimensional numpy arrays. Instead, using the proper indexing maintains the result a numpy array.
Note than instead of "deleting" the odd indices, what we do is specify what to keep: take all lines with an even index (the list(range(0,len(a),2)) part) and for each line take all elements (the : part)

numpy array TypeError: only integer scalar arrays can be converted to a scalar index

i=np.arange(1,4,dtype=np.int)
a=np.arange(9).reshape(3,3)
and
a
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[:,0:1]
>>>array([[0],
[3],
[6]])
a[:,0:2]
>>>array([[0, 1],
[3, 4],
[6, 7]])
a[:,0:3]
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Now I want to vectorize the array to print them all together. I try
a[:,0:i]
or
a[:,0:i[:,None]]
It gives TypeError: only integer scalar arrays can be converted to a scalar index
Short answer:
[a[:,:j] for j in i]
What you are trying to do is not a vectorizable operation. Wikipedia defines vectorization as a batch operation on a single array, instead of on individual scalars:
In computer science, array programming languages (also known as vector or multidimensional languages) generalize operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays.
...
... an operation that operates on entire arrays can be called a vectorized operation...
In terms of CPU-level optimization, the definition of vectorization is:
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
The problem with your case is that the result of each individual operation has a different shape: (3, 1), (3, 2) and (3, 3). They can not form the output of a single vectorized operation, because the output has to be one contiguous array. Of course, it can contain (3, 1), (3, 2) and (3, 3) arrays inside of it (as views), but that's what your original array a already does.
What you're really looking for is just a single expression that computes all of them:
[a[:,:j] for j in i]
... but it's not vectorized in a sense of performance optimization. Under the hood it's plain old for loop that computes each item one by one.
I ran into the problem when venturing to use numpy.concatenate to emulate a C++ like pushback for 2D-vectors; If A and B are two 2D numpy.arrays, then numpy.concatenate(A,B) yields the error.
The fix was to simply to add the missing brackets: numpy.concatenate( ( A,B ) ), which are required because the arrays to be concatenated constitute to a single argument
This could be unrelated to this specific problem, but I ran into a similar issue where I used NumPy indexing on a Python list and got the same exact error message:
# incorrect
weights = list(range(1, 129)) + list(range(128, 0, -1))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
# TypeError: only integer scalar arrays can be converted to a scalar index
It turns out I needed to turn weights, a 1D Python list, into a NumPy array before I could apply multi-dimensional NumPy indexing. The code below works:
# correct
weights = np.array(list(range(1, 129)) + list(range(128, 0, -1)))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
try the following to change your array to 1D
a.reshape((1, -1))
You can use numpy.ravel to return a flattened array from n-dimensional array:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
I had a similar problem and solved it using list...not sure if this will help or not
classes = list(unique_labels(y_true, y_pred))
this problem arises when we use vectors in place of scalars
for example in a for loop the range should be a scalar, in case you have given a vector in that place you get error. So to avoid the problem use the length of the vector you have used
I ran across this error when while trying to access elements of a list using a 1-D array. I was suggested this page but I don't the answer I was looking for.
Let l be the list and myarray be my 1D array. The correct way to access list l using elements of myarray is
np.take(l,myarray)

Numpy array indexing behavior

I was playing with numpy array indexing and find this odd behavior. When I index with np.array or list it works as expected:
In[1]: arr = np.arange(10).reshape(5,2)
arr[ [1, 1] ]
Out[1]: array([[2, 3],
[2, 3]])
But when I put tuple, it gives me a single element:
In[1]: arr = np.arange(10).reshape(5,2)
arr[ (1, 1) ]
Out[1]: 3
Also some kind of this strange tuple vs list behavior occurs with arr.flat:
In[1]: arr = np.arange(10).reshape(5,2)
In[2]: arr.flat[ [3, 4] ]
Out[2]: array([3, 4])
In[3]: arr.flat[ (3, 4) ]
Out[3]: IndexError: unsupported iterator index
I can't understand what is going on under the hood? What difference between tuple and list in this case?
Python 3.5.2
NumPy 1.11.1
What's happening is called fancy indexing, or advanced indexing. There's a difference between indexing with slices, or with a list/array. The trick is that multidimensional indexing actually works with tuples due to the implicit tuple syntax:
import numpy as np
arr = np.arange(10).reshape(5,2)
arr[2,1] == arr[(2,1)] # exact same thing: 2,1 matrix element
However, using a list (or array) inside an index expression will behave differently:
arr[[2,1]]
will index into arr with 1, then with 2, so first it fetches arr[2]==arr[2,:], then arr[1]==arr[1,:], and returns these two rows (row 2 and row 1) as the result.
It gets funkier:
print(arr[1:3,0:2])
print(arr[[1,2],[0,1]])
The first one is regular indexing, and it slices rows 1 to 2 and columns 0 to 1 inclusive; giving you a 2x2 subarray. The second one is fancy indexing, it gives you arr[1,0],arr[2,1] in an array, i.e. it indexes selectively into your array using, essentially, the zip() of the index lists.
Now here's why flat works like that: it returns a flatiter of your array. From help(arr.flat):
class flatiter(builtins.object)
| Flat iterator object to iterate over arrays.
|
| A `flatiter` iterator is returned by ``x.flat`` for any array `x`.
| It allows iterating over the array as if it were a 1-D array,
| either in a for-loop or by calling its `next` method.
So the resulting iterator from arr.flat behaves as a 1d array. When you do
arr.flat[ [3, 4] ]
you're accessing two elements of that virtual 1d array using fancy indexing; it works. But when you're trying to do
arr.flat[ (3,4) ]
you're attempting to access the (3,4) element of a 1d (!) array, but this is erroneous. The reason that this doesn't throw an IndexError is probably only due to the fact that arr.flat itself handles this indexing case.
In [387]: arr=np.arange(10).reshape(5,2)
With this list, you are selecting 2 rows from arr
In [388]: arr[[1,1]]
Out[388]:
array([[2, 3],
[2, 3]])
It's the same as if you explicitly marked the column slice (with : or ...)
In [389]: arr[[1,1],:]
Out[389]:
array([[2, 3],
[2, 3]])
Using an array instead of a list works: arr[np.array([1,1]),:]. (It also eliminates some ambiguities.)
With the tuple, the result is the same as if you wrote the indexing without the tuple wrapper. So it selects an element with row index of 1, column index of 1.
In [390]: arr[(1,1)]
Out[390]: 3
In [391]: arr[1,1]
Out[391]: 3
The arr[1,1] is translated by the interpreter to arr.__getitem__((1,1)). As is common in Python 1,1 is shorthand for (1,1).
In the arr.flat cases you are indexing the array as if it were 1d. np.arange(10)[[2,3]] selects 2 items, while np.arange(10)[(2,3)] is 2d indexing, hence the error.
A couple of recent questions touch on a messier corner case. Sometimes the list is treated as a tuple. The discussion might be enlightening, but don't go there if it's confusing.
Advanced slicing when passed list instead of tuple in numpy
numpy indexing: shouldn't trailing Ellipsis be redundant?

Index of multidimensional array

I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])

Acquiring the Minimum array out of Multiple Arrays by order in Python

Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])

Categories