I want to optimize my numpy code, Im using large arrays so efficiency is required. I tried to omit using for-looop if possible.
Let`s assume simple 2-d array
1 3 5
2 0 1
5 6 2
My task is to choose this values from columns until cumsum reaches certain value (cutting values to it if needed). Lets, name this value as clip. So after this operation I`ll have array like this:
1 3 3
2 0 0
0 0 0
I get an, rather naive idea, to calculate it with simple transformations:
array_clipped = np.clip(array, 0, clip)
array_clipped_cumsum = np.cumsum(array_clipped, axis=0)
difference = clip - cumsum
difference_trimmed = np.where(difference<0, temp, 0)
final = array_clipped + difference_trimmed
final_clean = np.where(final>=0, final, 0)
As this code works, it looks very dirty and non-numpy.
Here is another one-liner:
A = np.random.randint(0,10,(6,4))
A
# array([[0, 8, 7, 6],
# [3, 2, 0, 4],
# [5, 6, 6, 4],
# [4, 5, 0, 3],
# [7, 9, 6, 8],
# [0, 9, 8, 3]])
cap = 15
np.diff(np.minimum(A.cumsum(0),cap),axis=0,prepend=0)
# array([[0, 8, 7, 6],
# [3, 2, 0, 4],
# [5, 5, 6, 4],
# [4, 0, 0, 1],
# [3, 0, 2, 0],
# [0, 0, 0, 0]])
Or in two lines avoiding the slow prepend:
out = np.minimum(A.cumsum(0),cap)
out[1:] -= out[:-1]
out
# array([[0, 8, 7, 6],
# [3, 2, 0, 4],
# [5, 5, 6, 4],
# [4, 0, 0, 1],
# [3, 0, 2, 0],
# [0, 0, 0, 0]])
A cleaner way would be -
# a is input array and clip is the clipping value
c = a.cumsum(0)
out = (a-c+c.clip(max=clip)).clip(min=0)
Related
I have an xarray like this:
import xarray as xr
da1 = xr.DataArray([[0, 1, 5, 5], [1, 2, 2, 0], [9, 3, 2, 0]], dims=['x', 'y'])
da2 = xr.DataArray([[0, 2, 9, 3], [0, 0, 7, 0], [0, 2, 6, 0]], dims=['x', 'y'])
da3 = xr.DataArray([[0, 7, 2, 0], [7, 2, 6, 0], [0, 6, 1, 0]], dims=['x', 'y'])
combined = xr.concat([da1, da2, da3], 'band')
It looks like this:
array([[[0, 1, 5, 5],
[1, 2, 2, 0],
[9, 3, 2, 0]],
[[0, 2, 9, 3],
[0, 0, 7, 0],
[0, 2, 6, 0]],
[[0, 7, 2, 0],
[7, 2, 6, 0],
[0, 6, 1, 0]]])
with three dimensions: band, x and y.
I want to set the values in this array to NaN in the situations where all the values across the band dimension are zero. For example, the values at combined.isel(x=0, y=0) should be set to NaN, as all those values are zero, but the values at combined.isel(x=1, y=1) should not be, as only one of the values is zero.
How can I do this?
I've tried using:
combined.where(combined != 0)
but this sets all values that are zero to NaN, which doesn't do what I want.
I then tried something like:
combined.where((combined.isel(band=0) != 0) & (combined.isel(band=1) != 0) & (combined.isel(band=2) != 0))
but the 'and' bit doesn't seem to work properly and it gives a strange (and incorrect) result.
Update: As an extension, I would like to be able to do the same thing but for very small values, rather than zeros. For example, setting all values across the band dimension to NaN if all values across that dimension are < 0.01. Is there an easy way to do this?
Any advice very much appreciated
You can do that with:
combined.where(combined.any(dim = 'band'))
Suppose I have a Tensor like
a = torch.tensor([[3, 1, 5, 0, 4, 2],
[2, 1, 3, 4, 5, 0],
[0, 4, 5, 1, 2, 3],
[3, 1, 4, 5, 0, 2],
[3, 5, 4, 2, 0, 1],
[5, 3, 0, 4, 1, 2]])
and I want to reorganize the rows of the tensor by applying the transformation a[c] where
c = torch.tensor([0,2,4,1,3,5])
to get
b = torch.tensor([[3, 1, 5, 0, 4, 2],
[0, 4, 5, 1, 2, 3],
[3, 5, 4, 2, 0, 1],
[2, 1, 3, 4, 5, 0],
[3, 1, 4, 5, 0, 2],
[5, 3, 0, 4, 1, 2]])
For doing it, I want to generate the tensor c so that I can do this transformation irrespective of the size of tensor a and the stepping size (which I have taken to be equal to 2 in this example for simplicity). Can anyone let me know how do I generate such a tensor for the general case without using an explicit for loop in PyTorch?
You can use torch.index_select, so:
b = torch.index_select(a, 0, c)
The explanation in the official docs is pretty clear.
I also came up with another solution, which solves the above problem of reorganizing the rows of tensor a to generate tensor b without generating the indices array c
step = 2
b = a.view(-1,step,a.size(-1)).transpose(0,1).reshape(-1,a.size(-1))
Thinking for a little longer, I came up with the below solution for generation of the indices
step = 2
idx = torch.arange(0,a.size(0),step)
# idx = tensor([0, 2, 4])
idx = idx.repeat(int(a.size(0)/idx.size(0)))
# idx = tensor([0, 2, 4, 0, 2, 4])
incr = torch.arange(0,step)
# incr = tensor([0, 1])
incr = incr.repeat_interleave(int(a.size(0)/incr.size(0)))
# incr = tensor([0, 0, 0, 1, 1, 1])
c = incr + idx
# c = tensor([0, 2, 4, 1, 3, 5])
After this, the tensor c can be used to get the tensor b by using
b = a[c.long()]
I am trying to index an np.array with another array so that I can have zeros everywhere after a certain index but it gives me the error
TypeError: only integer scalar arrays can be converted to a scalar
index
Basically what I would like my code to do is that if I have:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
d = np.array([2, 1, 3])
that I could do something like
a[d:] = 0
to give the output
a = [[ 1 2 3]
[ 4 0 6]
[ 0 0 9]
[ 0 0 0]]
It can be done with array indexing but it doesn't feel natural.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
d = np.array([2, 1, 3])
col_ix = [ 0, 0, 1, 1, 1, 2 ] # column ix for each item to change
row_ix = [ 2, 3, 1, 2, 3, 3 ] # row index for each item to change
a[ row_ix, col_ix ] = 0
a
# array([[1, 2, 3],
# [4, 0, 6],
# [0, 0, 9],
# [0, 0, 0]])
With a for loop
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
for ix_col, ix_row in enumerate( d ): # iterate across the columns
a[ ix_row:, ix_col ] = 0
a
# array([[1, 2, 3],
# [4, 0, 6],
# [0, 0, 9],
# [0, 0, 0]])
A widely used approach for this kind of problem is to construct a boolean mask, comparing the index array with the appropriate arange:
In [619]: mask = np.arange(4)[:,None]>=d
In [620]: mask
Out[620]:
array([[False, False, False],
[False, True, False],
[ True, True, False],
[ True, True, True]])
In [621]: a[mask]
Out[621]: array([ 5, 7, 8, 10, 11, 12])
In [622]: a[mask] = 0
In [623]: a
Out[623]:
array([[1, 2, 3],
[4, 0, 6],
[0, 0, 9],
[0, 0, 0]])
That's not necessarily faster than a row (or in this case column) iteration. Since slicing is basic indexing, it may be faster, even if done several times.
In [624]: for i,v in enumerate(d):
...: print(a[v:,i])
...:
[0 0]
[0 0 0]
[0]
Generally if a result involves multiple arrays or lists with different lengths, there isn't a "neat" multidimensional solution. Either iterate over those lists, or step back and "think outside the box".
Given a matrix in python numpy which has for some of its rows, leading zeros. I need to shift all zeros to the end of the line.
E.g.
0 2 3 4
0 0 1 5
2 3 1 1
should be transformed to
2 3 4 0
1 5 0 0
2 3 1 1
Is there any nice way to do this in python numpy?
To fix for leading zeros rows -
def fix_leading_zeros(a):
mask = a!=0
flipped_mask = mask[:,::-1]
a[flipped_mask] = a[mask]
a[~flipped_mask] = 0
return a
To push all zeros rows to the back -
def push_all_zeros_back(a):
# Based on http://stackoverflow.com/a/42859463/3293881
valid_mask = a!=0
flipped_mask = valid_mask.sum(1,keepdims=1) > np.arange(a.shape[1]-1,-1,-1)
flipped_mask = flipped_mask[:,::-1]
a[flipped_mask] = a[valid_mask]
a[~flipped_mask] = 0
return a
Sample runs -
In [220]: a
Out[220]:
array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
In [221]: fix_leading_zero_rows(a)
Out[221]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
In [266]: a
Out[266]:
array([[0, 2, 3, 4, 0],
[0, 0, 1, 5, 6],
[2, 3, 0, 1, 0]])
In [267]: push_all_zeros_back(a)
Out[267]:
array([[2, 3, 4, 0, 0],
[1, 5, 6, 0, 0],
[2, 3, 1, 0, 0]])
leading zeros, simple loop
ar = np.array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
for i in range(ar.shape[0]):
for j in range(ar.shape[1]): # prevent infinite loop if row all zero
if ar[i,0] == 0:
ar[i]=np.roll(ar[i], -1)
ar
Out[31]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.
Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)