Iterating over xarray. DataArray first dimension and its coordinates - python

Suppose I have the following DataArray
arr = xarray.DataArray(np.arange(6).reshape(2,3),
dims=['A', 'B'],
coords=dict(A=['a0', 'a1'],
B=['b0', 'b1', 'b2']))
I want to iterate over the first dimension and do the following (of course I want to do something more complex than printing)
for coor in arr.A.values:
print(coor, arr.sel(A=coor).values)
and get
a0 [0 1 2]
a1 [3 4 5]
I am new to xarray, so I was wondering whether there was some more natural way to achieve this, something like
for coor, sub_arr in arr.some_method():
print(coor, sub_arr)

You can simply iterate over the DataArray - each element of the iterator will itself be a DataArray with a single value for the first coordinate:
for a in arr:
print(a.A.item(), a.values)
prints
a0 [0 1 2]
a1 [3 4 5]
Note the use of the .item() method to access the scalar value of the zero-dimensional array a.A.
To iterate over the second dimension, you can just transpose the data:
for b in arr.T: # or arr.transpose()
print(b.B.item(), b.values)
prints
b0 [0 3]
b1 [1 4]
b2 [2 5]
For multidimensional data, you can move the dimension you want to iterate over to the first place using ellipsis:
for x in arr.transpose("B", ...):
# x has one less dimension than arr, and x.B is a scalar
do_stuff_with(x)
The documentation on reshaping and reorganizing data has further details.

It's an old question, but I find that using groupby is cleaner and makes more intuitive sense to me than using transpose when you want to iterate some dimension other than the first:
for coor, sub_arr in arr.groupby('A'):
print(coor)
print(sub_arr)
a0
<xarray.DataArray (B: 3)>
array([0, 1, 2])
Coordinates:
* B (B) <U2 'b0' 'b1' 'b2'
A <U2 'a0'
a1
<xarray.DataArray (B: 3)>
array([3, 4, 5])
Coordinates:
* B (B) <U2 'b0' 'b1' 'b2'
A <U2 'a1'
Also it seems that older versions of xarray don't handle the ellipsis correctly (see mgunyho's answer), but groupby still works correctly.

Related

How to convert 2D matrix to 3D by embeddings in numpy? [duplicate]

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

Numpy - Index last dimension of array with index array

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Iterate over numpy array columnwise

np.nditer automatically iterates of the elements of an array row-wise. Is there a way to iterate of elements of an array columnwise?
x = np.array([[1,3],[2,4]])
for i in np.nditer(x):
print i
# 1
# 3
# 2
# 4
What I want is:
for i in Columnwise Iteration(x):
print i
# 1
# 2
# 3
# 4
Is my best bet just to transpose my array before doing the iteration?
For completeness, you don't necessarily have to transpose the matrix before iterating through the elements. With np.nditer you can specify the order of how to iterate through the matrix. The default is usually row-major or C-like order. You can override this behaviour and choose column-major, or FORTRAN-like order which is what you desire. Simply specify an additional argument order and set this flag to 'F' when using np.nditer:
In [16]: x = np.array([[1,3],[2,4]])
In [17]: for i in np.nditer(x,order='F'):
....: print i
....:
1
2
3
4
You can read more about how to control the order of iteration here: http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.nditer.html#controlling-iteration-order
You could use the shape and slice each column
>>> [x[:, i] for i in range(x.shape[1])]
[array([1, 2]), array([3, 4])]
You could transpose it?
>>> x = np.array([[1,3],[2,4]])
>>> [y for y in x.T]
[array([1, 2]), array([3, 4])]
Or less elegantly:
>>> [np.array([x[j,i] for j in range(x.shape[0])]) for i in range(x.shape[1])]
[array([1, 2]), array([3, 4])]
nditer is not the best iteration tool for this case. It is useful when working toward a compiled (cython) solution, but not in pure Python coding.
Look at some regular iteration strategies:
In [832]: x=np.array([[1,3],[2,4]])
In [833]: x
Out[833]:
array([[1, 3],
[2, 4]])
In [834]: for i in x:print i # print each row
[1 3]
[2 4]
In [835]: for i in x.T:print i # print each column
[1 2]
[3 4]
In [836]: for i in x.ravel():print i # print values in order
1
3
2
4
In [837]: for i in x.T.ravel():print i # print values in column order
1
2
3
4
You comment: I need to fill values into an array based on the index of each cell in the array
What do you mean by index?
A crude 2d iteration with indexing:
In [838]: for i in range(2):
.....: for j in range(2):
.....: print (i,j),x[i,j]
(0, 0) 1
(0, 1) 3
(1, 0) 2
(1, 1) 4
ndindex uses nditer to generate similar indexes
In [841]: for i,j in np.ndindex(x.shape):
.....: print (i,j),x[i,j]
.....:
(0, 0) 1
(0, 1) 3
(1, 0) 2
(1, 1) 4
enumerate is a good Python way of getting both values and indexes:
In [847]: for i,v in enumerate(x):print i,v
0 [1 3]
1 [2 4]
Or you can use meshgrid to generate all the indexes, as arrays
In [843]: I,J=np.meshgrid(range(2),range(2))
In [844]: I
Out[844]:
array([[0, 1],
[0, 1]])
In [845]: J
Out[845]:
array([[0, 0],
[1, 1]])
In [846]: x[I,J]
Out[846]:
array([[1, 2],
[3, 4]])
Note that most of these iterative methods just treat your array as a list of lists. They don't take advantage of the array nature, and will be slow compared to methods that work with the whole x.

Iterate over columns of a NumPy array and elements of another one?

I am trying to replicate the behaviour of zip(a, b) in order to be able to loop simultaneously along two NumPy arrays. In particular, I have two arrays a and b:
a.shape=(n,m)
b.shape=(m,)
I would like to get for every loop a column of a and an element of b.
So far, I have tried the following:
for a_column, b_element in np.nditer([a, b]):
print(a_column)
However, I get printed the element a[0,0] rather than the column a[0,:], which I want.
How can I solve this?
You can still use zip on numpy arrays, because they are iterables.
In your case, you'd need to transpose a first, to make it an array of shape (m,n), i.e. an iterable of length m:
for a_column, b_element in zip(a.T, b):
...
Adapting my answer in shallow iteration with nditer,
nditer and ndindex can be used to iterate over rows or columns by generating indexes.
In [19]: n,m=3,4
In [20]: a=np.arange(n*m).reshape(n,m)
In [21]: b=np.arange(m)
In [22]: it=np.nditer(b)
In [23]: for i in it: print a[:,i],b[i]
[0 4 8] 0
[1 5 9] 1
[ 2 6 10] 2
[ 3 7 11] 3
In [24]: for i in np.ndindex(m):print a[:,i],b[i]
[[0]
[4]
[8]] 0
[[1]
[5]
[9]] 1
[[ 2]
[ 6]
[10]] 2
[[ 3]
[ 7]
[11]] 3
In [25]:
ndindex uses an iterator like: it = np.nditer(b, flags=['multi_index'].
For iteration over a single dimension like this, for i in range(m): works just as well.
Also from the other thread, here's a trick using order to iterate without the indexes:
In [28]: for i,j in np.nditer([a,b],order='F',flags=['external_loop']):
print i,j
[0 4 8] [0 0 0]
[1 5 9] [1 1 1]
[ 2 6 10] [2 2 2]
[ 3 7 11] [3 3 3]
Usually, because of NumPy's ability to broadcast arrays, it is not necessary to iterate over the columns of an array one-by-one. For example, if a has shape (n,m) and b has shape (m,) then you can add a+b and b will broadcast itself to shape (n, m) automatically.
Moreover, your calculation will complete much faster if it can be expressed through operations on the whole array, a, rather than through operations on pieces of a (such as on columns) using a Python for-loop.
Having said that, the easiest way to loop through the columns of a is to iterate over the index:
for i in np.arange(b.shape[0]):
a_column, b_element = a[:, i], b[i]
print(a_column)

Categories