Why numpy is behaving differenly for array slicing? [duplicate] - python

When I am doing the slicing, an unexpected thing happened that seems the first to be view but the second is copy.
First
First slice of row, then slice of column. It seems is a view.
>>> a = np.arange(12).reshape(3, 4)
>>> a[0:3:2, :][:, [0, 2]] = 100
>>> a
array([[100, 1, 100, 3],
[ 4, 5, 6, 7],
[100, 9, 100, 11]])
Second
But if I first slice of column, then slice of row, it seems a copy:
>>> a[:, [0, 2]][0:3:2, :] = 0
>>> a
array([[100, 1, 100, 3],
[ 4, 5, 6, 7],
[100, 9, 100, 11]])
I am confused because the two methods finally will cause seem position to change, but why the second actually doesn't change the number?

The accepted answer by John Zwinck is actually false (I just figured this out the hard way!).
The problem in the question is a combination of doing "l-value indexing" with numpy's fancy indexing.
The following doc explains exactly this case
https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
in the section "But fancy indexing does seem to return views sometimes, doesn't it?"
Edit:
To summarize the above link:
Whether a view or a copy is created is determined by whether the indexing can be represented as a slice.
Exception: If one does "fancy indexing" then always a copy is created. Fancy indexing is something like a[[1,2]].
Exception to the exception: If one does l-value indexing (i.e. the indexing happens left of the = sign), then the rule for when a view or a copy are created doesn't apply anymore (though see below for a further exception). The python interpreter will directly assign the values to the left hand side without creating a copy or a view.
To prove that a copy is created in both cases, you can do the operation in two steps:
>>> a = np.arange(12).reshape(3, 4)
>>> b = a[0:3:2, :][:, [0, 2]]
>>> b[:] = 100
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
and
>>> b = a[:, [0, 2]][0:3:2, :]
>>> b[:] = 0
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Just as an aside, the question by the original poster is the exact problem stated at the end of the scipy-cookbook link above. There is no solution given in the book. The tricky thing about the question is that there are two indexing operations done in a row.
Exception to the exception to the exception: If there are two indexing operations done in a row on the left hand side (as is the case in this question), the direct assignment in l-value indexing only works if the first indexing operation can be represented as a slice. Otherwise a copy has to be created even though it is l-value indexing.

All that matters is whether you slice by rows or by columns. Slicing by rows can return a view because it is a contiguous segment of the original array. Slicing by column must return a copy because it is not a contiguous segment. For example:
A1 A2 A3
B1 B2 B3
C1 C2 C3
By default, it is stored in memory this way:
A1 A2 A3 B1 B2 B3 C1 C2 C3
So if you want to choose every second row, it is:
[A1 A2 A3] B1 B2 B3 [C1 C2 C3]
That can be described as {start: 0, size: 3, stride: 6}.
But if you want to choose every second column:
[A1] A2 [A3 B1] B2 [B3 C1] C2 [C3]
And there is no way to describe that using a single start, size, and stride. So there is no way to construct such a view.
If you want to be able to view every second column instead of every second row, you can construct your array in column-major aka Fortran order instead:
np.array(a, order='F')
Then it will be stored as such:
A1 B1 C1 A2 B2 C2 A3 B3 C3

This is my understanding, for your reference
a[0:3:2, :] # basic indexing, a view
... = a[0:3:2, :][:, [0, 2]] # getitme version, a copy,
# because you use advanced
# indexing [:,[0,2]]
a[0:3:2, :][:, [0, 2]] = ... # howver setitem version is
# like a view, setitem version
# is different from getitem version,
# this is not c++
a[:, [0, 2]] # getitem version, a copy,
# because you use advanced indexing
a[:, [0, 2]][0:3:2, :] = 0 # the copy is modied,
# but a keeps unchanged.
If I have any misunderstanding, please point it out.

Related

Iterating over xarray. DataArray first dimension and its coordinates

Suppose I have the following DataArray
arr = xarray.DataArray(np.arange(6).reshape(2,3),
dims=['A', 'B'],
coords=dict(A=['a0', 'a1'],
B=['b0', 'b1', 'b2']))
I want to iterate over the first dimension and do the following (of course I want to do something more complex than printing)
for coor in arr.A.values:
print(coor, arr.sel(A=coor).values)
and get
a0 [0 1 2]
a1 [3 4 5]
I am new to xarray, so I was wondering whether there was some more natural way to achieve this, something like
for coor, sub_arr in arr.some_method():
print(coor, sub_arr)
You can simply iterate over the DataArray - each element of the iterator will itself be a DataArray with a single value for the first coordinate:
for a in arr:
print(a.A.item(), a.values)
prints
a0 [0 1 2]
a1 [3 4 5]
Note the use of the .item() method to access the scalar value of the zero-dimensional array a.A.
To iterate over the second dimension, you can just transpose the data:
for b in arr.T: # or arr.transpose()
print(b.B.item(), b.values)
prints
b0 [0 3]
b1 [1 4]
b2 [2 5]
For multidimensional data, you can move the dimension you want to iterate over to the first place using ellipsis:
for x in arr.transpose("B", ...):
# x has one less dimension than arr, and x.B is a scalar
do_stuff_with(x)
The documentation on reshaping and reorganizing data has further details.
It's an old question, but I find that using groupby is cleaner and makes more intuitive sense to me than using transpose when you want to iterate some dimension other than the first:
for coor, sub_arr in arr.groupby('A'):
print(coor)
print(sub_arr)
a0
<xarray.DataArray (B: 3)>
array([0, 1, 2])
Coordinates:
* B (B) <U2 'b0' 'b1' 'b2'
A <U2 'a0'
a1
<xarray.DataArray (B: 3)>
array([3, 4, 5])
Coordinates:
* B (B) <U2 'b0' 'b1' 'b2'
A <U2 'a1'
Also it seems that older versions of xarray don't handle the ellipsis correctly (see mgunyho's answer), but groupby still works correctly.

deleting rows based on value found in specififc column

I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.
This is what I have so far:
for rows in range(len(b)):
if b[:,4].any() != 50:
b = np.delete(b, b[rows])
However, I keep getting the following error:
too many indices for array
Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)
In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
...: print(row)
...: if b[:,2].any() !=2:
...: print(b[row])
...: b = np.delete(b, b[row])
...:
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
1 for row in range(len(b)):
2 print(row)
----> 3 if b[:,2].any() !=2:
4 print(b[row])
5 b = np.delete(b, b[row])
IndexError: too many indices for array
So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?
In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])
b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????
Look at the axis parameter:
axis : int, optional
The axis along which to delete the subarray defined by `obj`.
If `axis` is None, `obj` is applied to the flattened array.
We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.
But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:
In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False, True, False])
Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.
We can use that boolean directly as row selection mask
In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])
No need to iterate.
Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.
Sorry to be long winded about this, but it looks like you need some basic debugging guidance.
Here's a basic list approach:
In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]
I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).
It would be better to provide b and desired output, but if i understand it correctly, you could use:
import numpy as np
b = np.array([[50, 2, 3, 4, 5, 6],
[4, 50, 6, 7, 8, 9],
[1, 1, 1, 1, 50, 9]])
array([[50, 2, 3, 4, 5, 6],
[ 4, 50, 6, 7, 8, 9],
[ 1, 1, 1, 1, 50, 9]])
Then you can check which rows contain 50 in the 5th column using
b[:, 4] == 50
array([False, False, True])
and feed this Boolean array back to b to select the desired columns:
b[b[:, 4] == 50]
which leaves you with one row in this case
array([[ 1, 1, 1, 1, 50, 9]])

find indeces of grouped-item matches between two arrays

a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
I have two lists of integers that each is grouped by every 2 consecutive items (ie indices [0, 1], [2, 3] and so on).
The pairs of items cannot be found as duplicates in either list, neither in the same or the reverse order.
One list is significantly larger and inclusive of the other.
I am trying to figure out an efficient way to get the indices
of the larger list's grouped items that are also in the smaller one.
The desired output in the example above should be:
[2,3,6,7,10,11] #indices
Notice that, as an example, the first group ([3,4]) should not get indices 11,12 as a match because in that case 3 is the second element of [1,3] and 4 the first element of [4,7].
Since you are grouping your arrays by pairs, you can reshape them into 2 columns for comparison. You can then compare each of the elements in the shorter array to the longer array, and reduce the boolean arrays. From there it is a simple matter to get the indices using a reshaped np.arange.
import numpy as np
from functools import reduce
a = np.array([5,8,3,4,2,5,7,8,1,9,1,3,4,7])
b = np.array ([3,4,7,8,1,3])
# reshape a and b into columns
a2 = a.reshape((-1,2))
b2 = b.reshape((-1,2))
# create a generator of bools for the row of a2 that holds b2
b_in_a_generator = (np.all(a2==row, axis=1) for row in b2)
# reduce the generator to get an array of boolean that is True for each row
# of a2 that equals one of the rows of b2
ix_bool = reduce(lambda x,y: x+y, b_in_a_generator)
# grab the indices by slicing a reshaped np.arange array
ix = np.arange(len(a)).reshape((-1,2))[ix_bool]
ix
# returns:
array([[ 2, 3],
[ 6, 7],
[10, 11]])
If you want a flat array, simply ravel ix
ix.ravel()
# returns
array([ 2, 3, 6, 7, 10, 11])
Here's one approach making use of NumPy view of group of elements -
# Taken from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def grouped_indices(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
sidx = a0v.argsort()
idx = sidx[np.searchsorted(a0v,b0v, sorter=sidx)]
return ((idx*2)[:,None] + [0,1]).ravel()
If there isn't a membership between any group from b in a, we could filter that out using a mask : a0v[idx] == b0v.
Sample run -
In [345]: a
Out[345]: array([5, 8, 3, 4, 2, 5, 7, 8, 1, 9, 1, 3, 4, 7])
In [346]: b
Out[346]: array([3, 4, 7, 8, 1, 3])
In [347]: grouped_indices(a, b)
Out[347]: array([ 2, 3, 6, 7, 10, 11])
Another one using np.in1d to replace np.searchsorted -
def grouped_indices_v2(a, b):
a0v, b0v = view1D(a.reshape(-1,2), b.reshape(-1,2))
return (np.flatnonzero(np.in1d(a0v, b0v))[:,None]*2 + [0,1]).ravel()

Augmenting a matrix in Python using numpy

I'm trying to augment a matrix to solve an equation, yet have been unable to. And yes, I saw the "Augment a matrix in NumPy" question; it is not what I need.
So my problem: create an augmented matrix [ A b1 b2 ]
import numpy
a = numpy.array([[1,2],[5,12]])
b1 = numpy.array([-1,3]).T
b2 = numpy.array([1,-5]).T
I've tried the numpy.concatenate function, returns
ValueError: all the input arrays must have same number of dimensions
Is there a way to augment the matrix, such that I have one array of
[ 1 2 -1 1
5 12 3 -5 ]
If anyone knows, please inform me! Note that I was doing this in the IPython notebook
(btw, I know that I can't row reduce it with Numpy, it's a university problem, and just was doing the rest in IPython)
Thanks
Matt
You can stack 1D arrays as if they were column vectors using the np.column_stack function. This should do what you are after:
>>> np.column_stack((a, b1, b2))
array([[ 1, 2, -1, 1],
[ 5, 12, 3, -5]])
I put your code into Ipython, and ask for the array shapes:
In [1]: a = numpy.array([[1,2],[5,12]])
In [2]: b1 = numpy.array([-1,3]).T
In [3]: b2 = numpy.array([1,-5]).T
In [4]: a.shape
Out[4]: (2, 2)
In [5]: b1.shape
Out[5]: (2,)
In [6]: b2.shape
Out[6]: (2,)
Notice a has 2 dimensions, the others 1. The .T does nothing on the 1d arrays.
Try making b1 a 2d. Also make sure you are concatenating on the right axis.
In [7]: b1 = numpy.array([[-1,3]]).T
In [9]: b1.shape
Out[9]: (2, 1)
numpy.concatenate() requires that all arrays have the same number of dimensions, so you have to augment the 1d vectors to 2d using None or numpy.newaxis like so:
>>> numpy.concatenate((a, b1[:,None], b2[:,None]), axis=1)
array([[ 1, 2, -1, 1],
[ 5, 12, 3, -5]])
There are also the shorthands r_ and c_ for row/column concatenation, which mimic Matlab's notation:
>>> from numpy import c_
>>> c_[a, b1, b2]
array([[ 1, 2, -1, 1],
[ 5, 12, 3, -5]])
Look up the source code to understand how they work ;-)

How to access the elements of a 2D array?

I would like to understand how one goes about manipulating the elements of a 2D array.
If I have for example:
a= ( a11 a12 a13 ) and b = (b11 b12 b13)
a21 a22 a23 b21 b22 b23
I have defined them in python as for example:
a=[[1,1],[2,1],[3,1]]
b=[[1,2],[2,2],[3,2]]
I saw that I cannot refer to a[1][1] but to a[1] which gives me a result of [2,1].
So, I don't understand how do I access the second row of these arrays? That would be a21, a22, a23, b21, b22, b23?
And how would I do in order to multiply them as c1 = a21*b21, c2 = a22*b22, etc ?
If you have
a=[[1,1],[2,1],[3,1]]
b=[[1,2],[2,2],[3,2]]
Then
a[1][1]
Will work fine. It points to the second column, second row just like you wanted.
I'm not sure what you did wrong.
To multiply the cells in the third column you can just do
c = [a[2][i] * b[2][i] for i in range(len(a[2]))]
Which will work for any number of rows.
Edit: The first number is the column, the second number is the row, with your current layout. They are both numbered from zero. If you want to switch the order you can do
a = zip(*a)
or you can create it that way:
a=[[1, 2, 3], [1, 1, 1]]
If you want do many calculation with 2d array, you should use NumPy array instead of nest list.
for your question, you can use:zip(*a) to transpose it:
In [55]: a=[[1,1],[2,1],[3,1]]
In [56]: zip(*a)
Out[56]: [(1, 2, 3), (1, 1, 1)]
In [57]: zip(*a)[0]
Out[57]: (1, 2, 3)
Seems to work here:
>>> a=[[1,1],[2,1],[3,1]]
>>> a
[[1, 1], [2, 1], [3, 1]]
>>> a[1]
[2, 1]
>>> a[1][0]
2
>>> a[1][1]
1
Look carefully how many brackets does your array have. I met an example when function returned answer with extra bracket, like that:
>>>approx
array([[[1192, 391]],
[[1191, 409]],
[[1209, 438]],
[[1191, 409]]])
And this didn't work
>>> approx[1,1]
IndexError: index 1 is out of bounds for axis 1 with size 1
This could open the brackets:
>>> approx[:,0]
array([[1192, 391],
[1191, 409],
[1209, 438],
[1191, 409]])
Now it is possible to use an ordinary element access notation:
>>> approx[:,0][1,1]
409
If you have this :
a = [[1, 1], [2, 1],[3, 1]]
You can easily access this by using :
print(a[0][2])
a[0][1] = 7
print(a)
a[1][1] does work as expected. Do you mean a11 as the first element of the first row? Cause that would be a[0][0].

Categories