I have found a solution to cumsum the previous numbers if they are negative:
def func(x):
for i, value in enumerate(x):
if i == len(x)-1:
break
if value < 0:
x[i+1] += value
x = x.clip(min=0)
return x
data = np.array([-3, 4, -2, -2, 6])
print(func(data))
>>>> [0 1 0 0 2]
Is there a vectorized numpy solution? This is a very small data sample, but it will become quite large and is 2D such as:
data = np.array([[-3, 4, -2, -2, 6],[1, -2, -3, 7, 1]])
And I would like to apply it rowwise.
Broadly speaking, vectorization relies on the fact that many elements of the array can be processed independently of all other elements, and can then make use of operations that can be applied to all elements of an array at once. However because your calculation relies on results from previous iterations, it needs to run linearly through the data.
Therefore it is probably not possible to fully vectorize your problem. But since the calculation of each row is independent of each other row, there is some room left for vectorization: Here is a solution that vectorizes across all columns and just loops over all rows
def func(x):
x = x.copy()
for i in range(len(x) - 1):
mask = x[i, ...] < 0
x[i+1, mask, ...] += x[i, mask, ...]
x = x.clip(min=0)
return x
data = np.array([[-3, 4, -2, -2, 6],[1, -2, -3, 7, 1]])
func(data.T)
# array([[0, 1],
# [1, 0],
# [0, 0],
# [0, 2],
# [2, 1]])
I am aware that instead of columns you want to treat each row individually, however I chose to swap the two as in general iterating like this over the first dimension of an array is more efficient than iterating over the last dimension:
data = numpy.random.randint(0, 10, size=(10000, 10000))
%timeit colwise_func(data) # 1.08 s ± 35.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit rowwise_func(data) # 2.31 s ± 65.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can use a binarized array of your data. Once you defined your threshold, binarize it with np.where:
data = np.array([-3, 4, -2, -2, 6])
binarized = np.where(data>0, 1, 0)
# array([0, 1, 0, 0, 1])
The np.where function returns an array the same size as data, where any value above your threshold (here equal to 0) will be set to 1, and all other will be set to 0.
Then simply multiply its cumulative sum with itself. It will sum all values, and set zeros where there are no valid sums.
np.cumsum(binarized)*binarized
# array([0, 1, 0, 0, 2])
For 2-dimensional arrays use a similar approach, but give the axis you want to sum on. In your case you want it along the rows, so set it to axis=1:
data = np.array([[-3, 4, -2, -2, 6],[1, -2, -3, 7, 1]])
binarized = np.where(data>0, 1, 0)
np.cumsum(binarized, axis=1)*binarized
# array([[0, 1, 0, 0, 2],
# [1, 0, 0, 2, 3]])
Your function can merely be:
def func(data, t=0, ax=1):
b = np.where(data>t, 1, 0)
return np.cumsum(b, axis=ax)*b
There parameter t sets the threshold, while the parameter ax is the axis to sum over. Giving a None will sum all values of the array.
Related
Given a bool tensor in pytorch, I would like to have a "lockout period" of N values after each True value along each row. More specifically, in the example below, moving from left to right on any given row I would like to ensure that after each True the following N values are all False.
e.g.
N = 3
input = tensor([[0, 0, 0, 0, 1, 1, 0, 1, 0, 1],
[1, 1, 1, 0, 1, 0, 1, 1, 0, 1]])
# should output
tensor([[0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1, 0, 0, 0, 0, 1]])
I can solve this with a double loop e.g.
for row in input:
for element in row:
# if sum of previous N entries > 0 set input[row,element] = 0
However I would like to solve this either (a) without looping at all or (b) with just a single loop (e.g. for column in input). Is there a way to achieve preferably (a), or otherwise (b)? I cannot assume the input tensor will be sparse or have any paritcular distribution.
Thanks Naga for the (Edit: since deleted) answer, but I need a pytorch solution and I am not sure that solution is O(n) as stated.
I found the following solution (looping over columns) seems to do the trick.
input = input.to(torch.bool)
for i, col in enumerate(input.t()):
input[:,i+1:i+1+N] = torch.mul(~col, input[:,i+1:i+1+N].t()).t()
Additionally based on a quick comparison on the sort of array size I'm interested in, it seems to be around 5x faster.
def a():
input = torch.randint(high=2, size = (200,100))
input = input.to(torch.bool)
N = 10
for i, col in enumerate(input.t()):
input[:,i+1:i+1+N] = torch.mul(~col, input[:,i+1:i+1+N].t()).t()
def b():
N = 10
a = np.random.randint(0, high=2, size = (200,100), dtype=int)
inds = np.where(a==1)
for r,c in np.nditer(inds):
if a[r,c]==1:
a[r,c+1:c+N]=0
%timeit a()
# 100 loops, best of 5: 2.47 ms per loop
%timeit b()
# 100 loops, best of 5: 12.8 ms per loop
I am looking for the fastest way to obtain a list of the nonzero indices of a 2D array per row and per column. The following is a working piece of code:
preds = [matrix[:,v].nonzero()[0] for v in range(matrix.shape[1])]
descs = [matrix[v].nonzero()[0] for v in range(matrix.shape[0])]
Example input:
matrix = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
Example output
preds = [array([1, 2, 3]), array([2, 3]), array([3]), array([], dtype=int64)]
descs = [array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]
(The lists are called preds and descs because they refer to the predecessors and descendants in a DAG when the matrix is interpreted as an adjacency matrix but this is not essential to the question.)
Timing example:
For timing purposes, the following matrix is a good representative:
test_matrix = np.zeros(shape=(4096,4096),dtype=np.float32)
for k in range(16):
test_matrix[256*(k+1):256*(k+2),256*k:256*(k+1)]=1
Background: In my code, these two lines take 75% of the time for a 4000x4000 matrix whereas the ensuing topological sort and DP algorithm take only the rest of the quarter. Roughly 5% of the values in the matrix are nonzero so a sparse-matrix solution may be applicable.
Thank you.
(On suggestion posted here as well: https://scicomp.stackexchange.com/questions/35242/fast-nonzero-indices-per-row-column-for-sparse-2d-numpy-array
There are also answers there to which I will provide timings in the comments. This link contains an accepted answer that is twice as fast.)
If you have enough motivation, Numba can do amazing things.
Here is a quick implementation of the logic you need.
Briefly, it computes the equivalent of np.nonzero() but it includes along the way the information to later dispatch the indices into the format you require.
The information is inspired by sparse.csr.indptr and sparse.csc.indptr.
import numpy as np
import numba as nb
#nb.jit
def cumsum(arr):
result = np.empty_like(arr)
cumsum = result[0] = arr[0]
for i in range(1, len(arr)):
cumsum += arr[i]
result[i] = cumsum
return result
#nb.jit
def count_nonzero(arr):
arr = arr.ravel()
n = 0
for x in arr:
if x != 0:
n += 1
return n
#nb.jit
def row_col_nonzero_nb(arr):
n, m = arr.shape
max_k = count_nonzero(arr)
indices = np.empty((2, max_k), dtype=np.uint32)
i_offset = np.zeros(n + 1, dtype=np.uint32)
j_offset = np.zeros(m + 1, dtype=np.uint32)
n, m = arr.shape
k = 0
for i in range(n):
for j in range(m):
if arr[i, j] != 0:
indices[:, k] = i, j
i_offset[i + 1] += 1
j_offset[j + 1] += 1
k += 1
return indices, cumsum(i_offset), cumsum(j_offset)
def row_col_idx_nonzero_nb(arr):
(ii, jj), jj_split, ii_split = row_col_nonzero_nb(arr)
ii_ = np.argsort(jj)
ii = ii[ii_]
return np.split(ii, ii_split[1:-1]), np.split(jj, jj_split[1:-1])
Compared to your approach (row_col_idx_sep() below), and a bunch of others, as per #hpaulj answer (row_col_idx_sparse_lil()) and #knl answer from scicomp.stackexchange.com (row_col_idx_sparse_coo()):
def row_col_idx_sep(arr):
return (
[arr[:, j].nonzero()[0] for j in range(arr.shape[1])],
[arr[i, :].nonzero()[0] for i in range(arr.shape[0])],)
def row_col_idx_zip(arr):
n, m = arr.shape
ii = [[] for _ in range(n)]
jj = [[] for _ in range(m)]
x, y = np.nonzero(arr)
for i, j in zip(x, y):
ii[i].append(j)
jj[j].append(i)
return jj, ii
import scipy as sp
import scipy.sparse
def row_col_idx_sparse_coo(arr):
coo_mat = sp.sparse.coo_matrix(arr)
csr_mat = coo_mat.tocsr()
csc_mat = coo_mat.tocsc()
return (
np.split(csc_mat.indices, csc_mat.indptr)[1:-1],
np.split(csr_mat.indices, csr_mat.indptr)[1:-1],)
def row_col_idx_sparse_lil(arr):
lil_mat = sp.sparse.lil_matrix(arr)
return lil_mat.T.rows, lil_mat.rows
For inputs generated using:
def gen_input(n, density=0.1, dtype=np.float32):
arr = np.zeros(shape=(n, n), dtype=dtype)
indices = tuple(np.random.randint(0, n, (2, int(n * n * density))).tolist())
arr[indices] = 1.0
return arr
One would get (your test_matrix had approximately 0.06 non-zero density):
m = gen_input(4096, density=0.06)
%timeit row_col_idx_sep(m)
# 1 loop, best of 3: 767 ms per loop
%timeit row_col_idx_zip(m)
# 1 loop, best of 3: 660 ms per loop
%timeit row_col_idx_sparse_coo(m)
# 1 loop, best of 3: 205 ms per loop
%timeit row_col_idx_sparse_lil(m)
# 1 loop, best of 3: 498 ms per loop
%timeit row_col_idx_nonzero_nb(m)
# 10 loops, best of 3: 130 ms per loop
Indicating this to be close to twice as fast as the fastest scipy.sparse-based approach.
In [182]: arr = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
The data is present in the whole-array nonzero, just not broken up into per row/column arrays:
In [183]: np.nonzero(arr)
Out[183]: (array([1, 2, 2, 3, 3, 3]), array([0, 0, 1, 0, 1, 2]))
In [184]: np.argwhere(arr)
Out[184]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 0],
[3, 1],
[3, 2]])
It might be possible to break the array([1, 2, 2, 3, 3, 3]) into sublists, [1,2,3],[2,3],[3],[] based on the other array. But it may take some time to work out the logic for that, and there's no guarantee that it will be faster than your row/column iterations.
Logical operations can reduce the boolean array to column or row, giving the rows or columns where nonzero occurs, but again not ragged:
In [185]: arr!=0
Out[185]:
array([[False, False, False, False],
[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]])
In [186]: (arr!=0).any(axis=0)
Out[186]: array([ True, True, True, False])
In [187]: np.nonzero((arr!=0).any(axis=0))
Out[187]: (array([0, 1, 2]),)
In [188]: np.nonzero((arr!=0).any(axis=1))
Out[188]: (array([1, 2, 3]),)
In [189]: arr
Out[189]:
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0]])
The scipy.sparse lil format does generate the data you want:
In [190]: sparse
Out[190]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [191]: M = sparse.lil_matrix(arr)
In [192]: M
Out[192]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [193]: M.rows
Out[193]: array([list([]), list([0]), list([0, 1]), list([0, 1, 2])], dtype=object)
In [194]: M.T
Out[194]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [195]: M.T.rows
Out[195]: array([list([1, 2, 3]), list([2, 3]), list([3]), list([])], dtype=object)
But timing probably isn't any better than your row or column iteration.
So lets say I have a an array that looks similar to this :
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
I would like to return the location of the center of the largest sum of values within a certain n*n square. So in this case it would be (2,2) if n = 3. If I let n = 4 it would be the same result.
Does numpy have a method for finding this location?
Approach #1 : We can use SciPy's 2D convolution to get summations in sliding windows of shape (n,n) and choose the index of the window with the biggest sum with argmax and translate to row, col indices with np.unravel_index, like so -
from scipy.signal import convolve2d as conv2
def largest_sum_pos_app1(a, n):
idx = conv2(a, np.ones((n,n),dtype=int),'same').argmax()
return np.unravel_index(idx, a.shape)
Sample run -
In [558]: a
Out[558]:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
In [559]: largest_sum_pos_app1(a, n=3)
Out[559]: (2, 2)
Approach #1S (Super charged) : We can boost it further by using uniform filter, like so -
from scipy.ndimage.filters import uniform_filter as unif2D
def largest_sum_pos_app1_mod1(a, n):
idx = unif2D(a.astype(float),size=n, mode='constant').argmax()
return np.unravel_index(idx, a.shape)
Approach #2 : Another based on scikit-image's sliding window creating tool view_as_windows, we would create sliding windows of shape (n,n) to give us a 4D array with the last two axes of shape (n,n) corresponding to the search window size. So, we would sum along those two axes and get the argmax index and translate it to the actual row, col positions.
Hence, the implementation would be -
from skimage.util.shape import view_as_windows
def largest_sum_pos_app2(a, n):
h = (n-1)//2 # half window size
idx = view_as_windows(a, (n,n)).sum((-2,-1)).argmax()
return tuple(np.array(np.unravel_index(idx, np.array(a.shape)-n+1))+h)
As also mentioned in the comments, a search square with an even n would be confusing given that it won't have its center at any element coordinate.
Runtime test
In [741]: np.random.seed(0)
In [742]: a = np.random.randint(0,1000,(1000,1000))
In [743]: largest_sum_pos_app1(a, n= 5)
Out[743]: (966, 403)
In [744]: largest_sum_pos_app1_mod1(a, n= 5)
Out[744]: (966, 403)
In [745]: largest_sum_pos_app2(a, n= 5)
Out[745]: (966, 403)
In [746]: %timeit largest_sum_pos_app1(a, n= 5)
...: %timeit largest_sum_pos_app1_mod1(a, n= 5)
...: %timeit largest_sum_pos_app2(a, n= 5)
...:
10 loops, best of 3: 57.6 ms per loop
100 loops, best of 3: 10.1 ms per loop
10 loops, best of 3: 47.7 ms per loop
I want to get the rank of each element, so I use argsort in numpy:
np.argsort(np.array((1,1,1,2,2,3,3,3,3)))
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
it give the same element the different rank, can I get the same rank like:
array([0, 0, 0, 3, 3, 5, 5, 5, 5])
If you don't mind a dependency on scipy, you can use scipy.stats.rankdata, with method='min':
In [14]: a
Out[14]: array([1, 1, 1, 2, 2, 3, 3, 3, 3])
In [15]: from scipy.stats import rankdata
In [16]: rankdata(a, method='min')
Out[16]: array([1, 1, 1, 4, 4, 6, 6, 6, 6])
Note that rankdata starts the ranks at 1. To start at 0, subtract 1 from the result:
In [17]: rankdata(a, method='min') - 1
Out[17]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
If you don't want the scipy dependency, you can use numpy.unique to compute the ranking. Here's a function that computes the same result as rankdata(x, method='min') - 1:
import numpy as np
def rankmin(x):
u, inv, counts = np.unique(x, return_inverse=True, return_counts=True)
csum = np.zeros_like(counts)
csum[1:] = counts[:-1].cumsum()
return csum[inv]
For example,
In [137]: x = np.array([60, 10, 0, 30, 20, 40, 50])
In [138]: rankdata(x, method='min') - 1
Out[138]: array([6, 1, 0, 3, 2, 4, 5])
In [139]: rankmin(x)
Out[139]: array([6, 1, 0, 3, 2, 4, 5])
In [140]: a = np.array([1,1,1,2,2,3,3,3,3])
In [141]: rankdata(a, method='min') - 1
Out[141]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
In [142]: rankmin(a)
Out[142]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
By the way, a single call to argsort() does not give ranks. You can find an assortment of approaches to ranking in the question Rank items in an array using Python/NumPy, including how to do it using argsort().
Alternatively, pandas series has a rank method which does what you need with the min method:
import pandas as pd
pd.Series((1,1,1,2,2,3,3,3,3)).rank(method="min")
# 0 1
# 1 1
# 2 1
# 3 4
# 4 4
# 5 6
# 6 6
# 7 6
# 8 6
# dtype: float64
With focus on performance, here's an approach -
def rank_repeat_based(arr):
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))
For a generic case with the elements in input array not already sorted, we would need to use argsort() to keep track of the positions. So, we would have a modified version, like so -
def rank_repeat_based_generic(arr):
sidx = np.argsort(arr,kind='mergesort')
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr[sidx]))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))[sidx.argsort()]
Runtime test
Testing out all the approaches listed thus far to solve the problem on a large dataset.
Sorted array case :
In [96]: arr = np.sort(np.random.randint(1,100,(10000)))
In [97]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 635 µs per loop
In [98]: %timeit rankmin(arr)
1000 loops, best of 3: 495 µs per loop
In [99]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 826 µs per loop
In [100]: %timeit rank_repeat_based(arr)
10000 loops, best of 3: 200 µs per loop
Unsorted case :
In [106]: arr = np.random.randint(1,100,(10000))
In [107]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 963 µs per loop
In [108]: %timeit rankmin(arr)
1000 loops, best of 3: 869 µs per loop
In [109]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 1.17 ms per loop
In [110]: %timeit rank_repeat_based_generic(arr)
1000 loops, best of 3: 1.76 ms per loop
I've written a function for the same purpose. It uses pure python and numpy only. Please have a look. I put comments as well.
def my_argsort(array):
# this type conversion let us work with python lists and pandas series
array = np.array(array)
# create mapping for unique values
# it's a dictionary where keys are values from the array and
# values are desired indices
unique_values = list(set(array))
mapping = dict(zip(unique_values, np.argsort(unique_values)))
# apply mapping to our array
# np.vectorize works similar map(), and can work with dictionaries
array = np.vectorize(mapping.get)(array)
return array
Hope that helps.
Complex solutions are unnecessary for this problem.
> ary = np.sort([1, 1, 1, 2, 2, 3, 3, 3, 3]) # or anything; must be sorted.
> a = np.diff().cumsum(); a
array([0, 0, 1, 1, 2, 2, 2, 2])
> b = np.r_[0, a]; b # ties get first open rank
array([0, 0, 0, 1, 1, 2, 2, 2, 2])
> c = np.flatnonzero(ary[1:] != ary[:-1])
> np.r_[0, 1 + c][b] # ties get last open rank
array([0, 0, 0, 3, 3, 5, 5, 5])
I want to find the minimum-sized 2-dimensional ndarray within an ndarray that contains all values meeting a condition.
For example:
Let's say I have the array
x = np.array([[1, 1, 5, 3, 11, 1],
[1, 2, 15, 19, 21, 33],
[1, 8, 17, 22, 21, 31],
[3, 5, 6, 11, 23, 19]])
and call f(x, x % 2 == 0)
Then the return value of the program would be the array
[[2, 15, 19]
[8, 17, 22]
[5, 6, 11]]
Because it is the smallest rectangular array that includes all the even numbers (the condition).
I've found a way to find all the indices for which the condition is true by using np.argwhere and then slicing from the minimum to maximum indices from the original array, and I've done it using a for loop but I was wondering if there was a more efficient way to do it using numpy or scipy.
My current method:
def f(arr, cond_arr):
indices = np.argwhere(cond_arr)
min = np.amin(indices, axis = 0) #get first row, col meeting cond
max = np.amax(indices, axis = 0) #get last row, col meeting cond
return arr[min[0]:max[0] + 1, min[1] : max[1] + 1]
The function is pretty efficient already - but you can do better.
Instead of checking the condition for every row/column and then finding the minimum and maximum, we can collapse the condition into each axis (using reduction with the logical OR) and find the first/last indices:
def f2(arr, cond_arr):
c0 = np.where(np.logical_or.reduce(cond_arr, axis=0))[0]
c1 = np.where(np.logical_or.reduce(cond_arr, axis=1))[0]
return arr[c0[0]:c0[-1] + 1, c1[0]:c1[-1] + 1]
How it works:
With the example data cond_array looks like this:
>>> (x%2==0).astype(int)
array([[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]])
This are the column conditions:
>>> np.logical_or.reduce(cond_arr, axis=0).astype(int)
array([0, 1, 1, 1, 0, 0])
And this the row conditions:
>>> np.logical_or.reduce(cond_arr, axis=).astype(int)
array([0, 1, 1, 1])
Now we only need to find the first/last nonzero element for each of the two arrays.
Is it really faster?
%timeit f(x, x%2 == 0) # 10000 loops, best of 3: 24.6 µs per loop
%timeit f2(x, x%2 == 0) # 100000 loops, best of 3: 12.6 µs per loop
Well, a little bit... but it really shines with larger arrays:
x = np.random.randn(1000, 1000)
c = np.zeros((1000, 1000), dtype=bool)
c[400:600, 400:600] = True
%timeit f(x,c) # 100 loops, best of 3: 5.28 ms per loop
%timeit f2(x,c) # 1000 loops, best of 3: 225 µs per loop
Finally, this version has slightly more overhead but is generic over the number of dimensions:
def f3(arr, cond_arr):
s = []
for a in range(arr.ndim):
c = np.where(np.logical_or.reduce(cond_arr, axis=a))[0]
s.append(slice(c[0], c[-1] + 1))
return arr[s]