Python 2D numpy.ndarray slicing without comma

Python 2D numpy.ndarray slicing without comma - python

Recently someone told me to extract the first two columns of a 2D numpy.ndarray by
firstTwoCols = some2dMatrix[:2]
Where is this notation from and how does it work?
I'm only familiar with the comma separated slicing like
twoCols = some2dMatrix[:,:2]
The : before the comma says to get all rows, and the :2 after the comma says for columns 0 up to but not including 2.

firstTwoCols = some2dMatrix[:2]
This will just extract the first 2 rows with all the columns.
twoCols = some2dMatrix[:,:2] is the one that will extract your first 2 columns for all the rows.

The syntax you describe does not extract the first two columns; it extracts the first two rows. If you specify less slices than the dimension of the array, NumPy treats this as equivalent to all further slices being :, so
arr[:2]
is equivalent to
arr[:2, :]
for a 2D array.

Not sure I understand the question but...
If you do:
>>> Matrix = [[x for x in range(1,5)] for x in range(5)]
>>> Matrix
[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
doing Matrix[:2], it will select the first two list in Matrix, [1, 2, 3, 4], [1, 2, 3, 4]. But if you do:
>>> Matrix[:,:2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
But if you work with a Numpy, do:
Matrix = np.array(Matrix)
>>>Matrix[:, :2]
array([[1, 2],
[1, 2],
[1, 2],
[1, 2],
[1, 2]])

Related

Using values in 2d matrix as lookups

I have a 2-D numpy array that I would like to use as a series of indexes. The values in the array are all integer values:
array([[3, 3, 3, 2],
[1, 5, 2, 3],
[4, 2, 3, 2],
[2, 3, 1, 3]])
I also have a 1D array with 5 values
array([x,y,z,q,p])
I'd like to use the cells of the 2D array as lookup values into the 1D array. In other words, where the 2D array is equal to 1, return x. Where it's equal to 2, return y, and so on.
This is simple enough to do in matlab, but I'd like a numpy solution that doesn't involve looping.
Thoughts?

Assuming x, y, z, q, p are variables (and not characters).
If you define the first array as (say, src):
src = np.array([[3, 3, 3, 2],
[1, 5, 2, 3],
[4, 2, 3, 2],
[2, 3, 1, 3]])
and the second as (say, lookup):
lookup = np.array([x,y,z,q,p])
You can get the desired output using:
lookup[src - 1]
Or if you want individual outputs:
lookup[src[i, j] - 1]
where (i, j) is the 2-D index of the value that you want to look up.
Please note that the -1 here is to account for the offset (as mentioned in the comment by slothrop)

Finding the intersection

I want intersection of x and y.Is there any way i can get output in below format.
I do not want use for loop.Since x can be of very large size.
x=np.array([[1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]])
y=np.array([1,2,0,9,9])
I want output in format:
np.array([[1],[1,2],[2]])
output can also be list of list.
Also consider a case if y is also 2D(np.array([[1,2,0,9,9],[1,5,6,8,9]])) .

You can use numpy intersect1d
arr = [np.intersect1d(z, y).tolist() for z in x]
print(arr) # [[1], [1, 2], [2]]

Fill several parts of NumPy array, given a list of indexes

I want to fill a numpy.ndarray with data (32x32 pixel integer pictures==arrays)
From the name of the file of the picture I know where in my ndarray I want my values to be stored.
I would like to give my ndarray a list but also some slice(0) in it, because the picture is stored in the last two dimensions. How do I do that?
I would like to do something like
Pesudocode:
data=numpy.ndarray(dim1,dim2,dim3,32,32)
list=function(filename)
data[list,slice(0),slice(0)]=read_image(filename)
Is that possible?
My list has entries specifying the positions of the ndarray [int,int,int] and my read image is a 32 times 32 integer array (filling the last two dimension of my ndarray).

To perform this assignment, pass a suitable array in each of the first three dimensions, and : (meaning entire index range) in the last two dimensions.
If your list is, for example,
list = [[1, 2, 3], [4, 2, 0], [5, 3, 4], [2, 2, 2]]
then the array to pass as the first index is [1, 4, 5, 2], and similarly for two others: [2, 2, 3, 2] and [3, 0, 4, 2]. Complete example with fake (random) image:
data = np.zeros((6, 7, 8, 32, 32))
list = [[1, 2, 3], [4, 2, 0], [5, 3, 4], [2, 2, 2]]
image = np.random.uniform(size=(32, 32))
ix = np.array(list)
data[ix[:, 0], ix[:, 1], ix[:, 2], :, :] = image
Here ix[:, 0] is [1, 4, 5, 2], ix[:, 1] is [2, 2, 3, 2], and so on.
Reference: NumPy indexing and broadcasting.

Indexing with lists and arrays in numpy appears inconsistent

Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works.
I've found an interesting case. Here's an array:
>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
if I index it a scalar, I get a scalar of course:
>>> y[4]
4
with a 1D array of integers, I get another 1D array:
>>> idx = [4, 3, 2, 1]
>>> y[idx]
array([4, 3, 2, 1])
so if I index it with a 2D array of integers, I get... what do I get?
>>> idx = [[4, 3], [2, 1]]
>>> y[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array!
>>> idx = [[[4, 3], [2, 1]]]
>>> y[idx]
array([[4, 3],
[2, 1]])
What makes numpy behave this way?
To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D:
>>> idx = np.array([[4, 3], [2, 1]])
>>> y[idx]
array([[4, 3],
[2, 1]])
This looks inconsistent from where I'm at. What's the rule here?

The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing.
Just like arr[1, 2] returns the element arr[1][2] the arr[[[4, 3], [2, 1]]] is identical to arr[[4, 3], [2, 1]] and will, according to the rules of multidimensional indexing return the elements arr[4, 2] and arr[3, 1].
By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": arr[[[[4, 3], [2, 1]]]].
From the documentation:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
and:
Warning
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.
In such cases it's probably better to use np.take:
>>> y.take([[4, 3], [2, 1]]) # 2D array
array([[4, 3],
[2, 1]])
This function [np.take] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.
Or convert the indices to an array. That way NumPy interprets it (array is special cased!) as fancy indexing instead of as "multidimensional indexing":
>>> y[np.asarray([[4, 3], [2, 1]])]
array([[4, 3],
[2, 1]])

Is there any function in python which can perform the inverse of numpy.repeat function?

For example
x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
gives you
x = array([[1, 1, 2, 2],
[3, 3, 4, 4]])
but is there something which can perform
x = np.*inverse_repeat*(np.array([[1, 1, 2, 2],[3, 3, 4, 4]]), axis=1)
and gives you
x = array([[1,2],[3,4]])

Regular slicing should work. For the axis you want to inverse repeat, use ::number_of_repetitions
x = np.repeat(np.array([[1,2],[3,4]]), 4, axis=0)
x[::4, :] # axis=0
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[1,2],[3,4]]), 3, axis=1)
x[:,::3] # axis=1
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[[1],[2]],[[3],[4]]]), 5, axis=2)
x[:,:,::5] # axis=2
Out:
array([[[1],
[2]],
[[3],
[4]]])

This should work, and has the exact same signature as np.repeat:
def inverse_repeat(a, repeats, axis):
if isinstance(repeats, int):
indices = np.arange(a.shape[axis] / repeats, dtype=np.int) * repeats
else: # assume array_like of int
indices = np.cumsum(repeats) - 1
return a.take(indices, axis)
Edit: added support for per-item repeats as well, analogous to np.repeat

For the case where we know the axis and the repeat - and the repeat is a scalar (same value for all elements) we can construct a slicing index like this:
In [1117]: a=np.array([[1, 1, 2, 2],[3, 3, 4, 4]])
In [1118]: axis=1; repeats=2
In [1119]: ind=[slice(None)]*a.ndim
In [1120]: ind[axis]=slice(None,None,a.shape[axis]//repeats)
In [1121]: ind
Out[1121]: [slice(None, None, None), slice(None, None, 2)]
In [1122]: a[ind]
Out[1122]:
array([[1, 2],
[3, 4]])
#Eelco's use of take makes it easier to focus on one axis, but requires a list of indices, not a slice.
But repeat does allow for differing repeat counts.
In [1127]: np.repeat(a1,[2,3],axis=1)
Out[1127]:
array([[1, 1, 2, 2, 2],
[3, 3, 4, 4, 4]])
Knowing axis=1 and repeats=[2,3] we should be able construct the right take indexing (probably with cumsum). Slicing won't work.
But if we only know the axis, and the repeats are unknown then we probably need some sort of unique or set operation as in #redratear's answer.
In [1128]: a2=np.repeat(a1,[2,3],axis=1)
In [1129]: y=[list(set(c)) for c in a2]
In [1130]: y
Out[1130]: [[1, 2], [3, 4]]
A take solution with list repeats. This should select the last of each repeated block:
In [1132]: np.take(a2,np.cumsum([2,3])-1,axis=1)
Out[1132]:
array([[1, 2],
[3, 4]])
A deleted answer uses unique; here's my row by row use of unique
In [1136]: np.array([np.unique(row) for row in a2])
Out[1136]:
array([[1, 2],
[3, 4]])
unique is better than set for this use since it maintains element order. There's another problem with unique (or set) - what if the original had repeated values, e.g. [[1,2,1,3],[3,3,4,1]].
Here is a case where it would be difficult to deduce the repeat pattern from the result. I'd have to look at all the rows first.
In [1169]: a=np.array([[2,1,1,3],[3,3,2,1]])
In [1170]: a1=np.repeat(a,[2,1,3,4], axis=1)
In [1171]: a1
Out[1171]:
array([[2, 2, 1, 1, 1, 1, 3, 3, 3, 3],
[3, 3, 3, 2, 2, 2, 1, 1, 1, 1]])
But cumsum on a known repeat solves it nicely:
In [1172]: ind=np.cumsum([2,1,3,4])-1
In [1173]: ind
Out[1173]: array([1, 2, 5, 9], dtype=int32)
In [1174]: np.take(a1,ind,axis=1)
Out[1174]:
array([[2, 1, 1, 3],
[3, 3, 2, 1]])

>>> import numpy as np
>>> x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
>>> y=[list(set(c)) for c in x] #This part remove duplicates for each array in tuple. So this will not work for x = np.repeat(np.array([[1,1],[3,3]]), 2, axis=1)=[[1,1,1,1],[3,3,3,3]. Result will be [[1],[3]]
>>> print y
[[1, 2], [3, 4]]
You dont need know to axis and repeat amount...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 2D numpy.ndarray slicing without comma - python

firstTwoCols = some2dMatrix[:2] This will just extract the first 2 rows with all the columns. twoCols = some2dMatrix[:,:2] is the one that will extract your first 2 columns for all the rows.

The syntax you describe does not extract the first two columns; it extracts the first two rows. If you specify less slices than the dimension of the array, NumPy treats this as equivalent to all further slices being :, so arr[:2] is equivalent to arr[:2, :] for a 2D array.

Related

Using values in 2d matrix as lookups

Finding the intersection

Fill several parts of NumPy array, given a list of indexes

Indexing with lists and arrays in numpy appears inconsistent

Is there any function in python which can perform the inverse of numpy.repeat function?

Categories

Resources