Conditional selection in array - python

I have following list with arrays:
[array([10, 1, 7, 3]),
array([ 0, 14, 12, 13]),
array([ 3, 10, 7, 8]),
array([7, 5]),
array([ 5, 12, 3]),
array([14, 8, 10])]
What I want is to mark rows as "1" or "0", conditional on whether the row matches "10" AND "7" OR "10" AND "3".
np.where(output== 10 & output == 7 ) | (output == 10 & output == 3 ) | (output == 10 & output == 8 ), 1, 0)
returns
array(0)
What's the correct syntax to get into the array of the array?
Expected output:
[ 1, 0, 1, 0, 0, 1 ]
Note:
What is output? After training an CountVectorizer/LDA topic classifier in Scikit, the following script assigns topic probabilities to new documents. Topics above the threshold of 0.2 are then stored in an array.
def sortthreshold(x, thresh):
idx = np.arange(x.size)[x > thresh]
return idx[np.argsort(x[idx])]
output = []
for x in newdoc:
y = lda.transform(bowvectorizer.transform([x]))
output.append(sortthreshold(y[0], 0.2))
Thanks!

Your input data is a plain Python list of Numpy arrays of unequal length, thus it can't be simply converted to a 2D Numpy array, and so it can't be directly processed by Numpy. But it can be process using the usual Python list processing tools.
Here's a list comprehension that uses numpy.isin to test if a row contains any of (3, 7, 8). We first use simple == testing to see if the row contains 10, and only call isin if it does so; the Python and operator will not evaluate its second operand if the first operand is false-ish.
We use np.any to see if any row item passes each test. np.any returns a Boolean value of False or True, but we can pass those values to int to convert them to 0 or 1.
import numpy as np
data = [
np.array([10, 1, 7, 3]), np.array([0, 14, 12, 13]),
np.array([3, 10, 7, 8]), np.array([7, 5]),
np.array([5, 12, 3]), np.array([14, 8, 10]),
]
mask = np.array([3, 7, 8])
result = [int(np.any(row==10) and np.any(np.isin(row, mask)))
for row in data]
print(result)
output
[1, 0, 1, 0, 0, 1]
I've just performed some timeit tests. Curiously, Reblochon Masque's code is faster on the data given in the question, presumably because of the short-circuiting behaviour of plain Python any, and & or. Also, it appears that numpy.in1d is faster than numpy.isin, even though the docs recommend using the latter in new code.
Here's a new version that's about 10% slower than Reblochon's.
mask = np.array([3, 7, 8])
result = [int(any(row==10) and any(np.in1d(row, mask)))
for row in data]
Of course, the true speed on large amounts of real data may vary from what my tests indicate. And time may not be an issue: even on my slow old 32 bit single core 2GHz machine I can process the data in the question almost 3000 times in one second.
hpaulj has suggested an even faster way. Here's some timeit test info, comparing the various versions. These tests were performed on my old machine, YMMV.
import numpy as np
from timeit import Timer
the_data = [
np.array([10, 1, 7, 3]), np.array([0, 14, 12, 13]),
np.array([3, 10, 7, 8]), np.array([7, 5]),
np.array([5, 12, 3]), np.array([14, 8, 10]),
]
def rebloch0(data):
result = []
for output in data:
result.append(1 if np.where((any(output == 10) and any(output == 7)) or
(any(output == 10) and any(output == 3)) or
(any(output == 10) and any(output == 8)), 1, 0) == True else 0)
return result
def rebloch1(data):
result = []
for output in data:
result.append(1 if np.where((any(output == 10) and any(output == 7)) or
(any(output == 10) and any(output == 3)) or
(any(output == 10) and any(output == 8)), 1, 0) else 0)
return result
def pm2r0(data):
mask = np.array([3, 7, 8])
return [int(np.any(row==10) and np.any(np.isin(row, mask)))
for row in data]
def pm2r1(data):
mask = np.array([3, 7, 8])
return [int(any(row==10) and any(np.in1d(row, mask)))
for row in data]
def hpaulj0(data):
mask=np.array([3, 7, 8])
return [int(any(row==10) and any((row[:, None]==mask).flat))
for row in data]
def hpaulj1(data, mask=np.array([3, 7, 8])):
return [int(any(row==10) and any((row[:, None]==mask).flat))
for row in data]
functions = (
rebloch0,
rebloch1,
pm2r0,
pm2r1,
hpaulj0,
hpaulj1,
)
# Verify that all functions give the same result
for func in functions:
print('{:8}: {}'.format(func.__name__, func(the_data)))
print()
def time_test(loops, data):
timings = []
for func in functions:
t = Timer(lambda: func(data))
result = sorted(t.repeat(3, loops))
timings.append((result, func.__name__))
timings.sort()
for result, name in timings:
print('{:8}: {:.6f}, {:.6f}, {:.6f}'.format(name, *result))
print()
time_test(1000, the_data)
typical output
rebloch0: [1, 0, 1, 0, 0, 1]
rebloch1: [1, 0, 1, 0, 0, 1]
pm2r0 : [1, 0, 1, 0, 0, 1]
pm2r1 : [1, 0, 1, 0, 0, 1]
hpaulj0 : [1, 0, 1, 0, 0, 1]
hpaulj1 : [1, 0, 1, 0, 0, 1]
hpaulj1 : 0.140421, 0.154910, 0.156105
hpaulj0 : 0.154224, 0.154822, 0.167101
rebloch1: 0.281700, 0.282764, 0.284599
rebloch0: 0.339693, 0.359127, 0.375715
pm2r1 : 0.367677, 0.368826, 0.371599
pm2r0 : 0.626043, 0.628232, 0.670199
Nice work, hpaulj!

You need to use np.any combined with np.where, and avoid using | and & which are binary operators in python.
import numpy as np
a = [np.array([10, 1, 7, 3]),
np.array([ 0, 14, 12, 13]),
np.array([ 3, 10, 7, 8]),
np.array([7, 5]),
np.array([ 5, 12, 3]),
np.array([14, 8, 10])]
for output in a:
print(np.where(((any(output == 10) and any(output == 7))) or
(any(output == 10) and any(output == 3)) or
(any(output == 10) and any(output == 8 )), 1, 0))
output:
1
0
1
0
0
1
If you want it as a list as the edited question shows:
result = []
for output in a:
result.append(1 if np.where(((any(output == 10) and any(output == 7))) or
(any(output == 10) and any(output == 3)) or
(any(output == 10) and any(output == 8 )), 1, 0) == True else 0)
result
result:
[1, 0, 1, 0, 0, 1]

Related

Numpy: How to check if a number is the minimum/maximum among the previous K numbers?

I'm trying to automate a trading strategy which should enter/exit a long position when the current price is the minimum/maximum among the previous k prices.
The result should contain 1 if the current number is maximum among previous k numbers, -1 if it is the minimum and 0 if none of the conditions are true.
For example if k = 3 and the numpyp array = [1, 2, 3, 2, 1, 6], the result should be an array like:
[0, 0, 1, 0, -1, 1].
I tried the numpy's max function but don't know how to take into account the previous k numbers instead of fixed index and how to switch to default condition for the first k - 1 numbers which should be 0 since there are not k number available to compare them with.
I will use Pandas
import pandas as pd
array = [1, 2, 3, 2, 1, 6]
df = pd.DataFrame(array)
df['rolling_max'] = df[0].rolling(3).max()
df['rolling_min'] = df[0].rolling(3).min()
df['result'] = df.apply(lambda row: 1 if row[0] == row['rolling_max'] else (-1 if row[0] == row['rolling_min'] else 0), axis=1)
Here is a solution with numpy using numpy.lib.stride_tricks.sliding_window_view, which was introduced in version 1.20.0.
Note that this solution (like the one proposed by #Hanwei Tang) does not exactly yield the result you was looking for, because in the second window ([2, 3, 2]) 2 is the minimum value and thus a -1 is returned instead of zero (what you requested). But maybe you should rethink whether you really want a zero for the second window or a -1.
EDIT: If a windows only contains same numbers, i.e. the minimum and maximum are the same, this method returns a zero.
import numpy as np
def rolling_max(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.max(windows, axis=-1)
def rolling_min(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.min(windows, axis=-1)
def check_prize(a, wsize):
rmax = rolling_max(a, wsize)
rmin = rolling_min(a, wsize)
ismax = np.where(a[wsize-1:] == rmax, 1, 0)
ismin = np.where(a[wsize-1:] == rmin, -1, 0)
result = np.zeros_like(a)
result[wsize-1:] = ismax + ismin
return result
a = np.array([1, 2, 3, 2, 1, 6])
check_prize(a, wsize=3)
# Output:
# array([ 0, 0, 1, -1, -1, 1])
b = np.array([1, 2, 4, 3, 1, 6])
check_prize(b, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])
c = np.array([1, 2, 2, 2, 1, 6])
check_prize(c, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])
Another approach using sliding_window_view with pad:
from numpy.lib.stride_tricks import sliding_window_view as swv
k = 3
a = np.array([1, 2, 3, 2, 1, 6])
# create sliding window
v = swv(np.pad(a.astype(float), (k-1, 0), constant_values=np.nan), k)
# compare each element to min/max of sliding window
out = np.select([np.max(v, 1)==a, np.min(v, 1)==a], [1, -1], 0)
Output: array([ 0, 0, 1, -1, -1, 1])

How to efficiently shuffle some values of a numpy array while keeping their relative order?

I have a numpy array and a mask specifying which entries from that array to shuffle while keeping their relative order. Let's have an example:
In [2]: arr = np.array([5, 3, 9, 0, 4, 1])
In [4]: mask = np.array([True, False, False, False, True, True])
In [5]: arr[mask]
Out[5]: array([5, 4, 1]) # These entries shall be shuffled inside arr, while keeping their order.
In [6]: np.where(mask==True)
Out[6]: (array([0, 4, 5]),)
In [7]: shuffle_array(arr, mask) # I'm looking for an efficient realization of this function!
Out[7]: array([3, 5, 4, 9, 0, 1]) # See how the entries 5, 4 and 1 haven't changed their order.
I've written some code that can do this, but it's really slow.
import numpy as np
def shuffle_array(arr, mask):
perm = np.arange(len(arr)) # permutation array
n = mask.sum()
if n > 0:
old_true_pos = np.where(mask == True)[0] # old positions for which mask is True
old_false_pos = np.where(mask == False)[0] # old positions for which mask is False
new_true_pos = np.random.choice(perm, n, replace=False) # draw new positions
new_true_pos.sort()
new_false_pos = np.setdiff1d(perm, new_true_pos)
new_pos = np.hstack((new_true_pos, new_false_pos))
old_pos = np.hstack((old_true_pos, old_false_pos))
perm[new_pos] = perm[old_pos]
return arr[perm]
To make things worse, I actually have two large matrices A and B with shape (M,N). Matrix A holds arbitrary values, while each row of matrix B is the mask which to use for shuffling one corresponding row of matrix A according to the procedure that I outlined above. So what I want is shuffled_matrix = row_wise_shuffle(A, B).
The only way I have so far found to do it is via my shuffle_array() function and a for loop.
Can you think of any numpy'onic way to accomplish this task avoiding loops? Thank you so much in advance!
For 1d case:
import numpy as np
a = np.arange(8)
b = np.array([1,1,1,1,0,0,0,0])
# Get ordered values
ordered_values = a[np.where(b==1)]
# We'll shuffle both arrays
shuffled_ix = np.random.permutation(a.shape[0])
a_shuffled = a[shuffled_ix]
b_shuffled = b[shuffled_ix]
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = ordered_values
a_shuffled # Notice that 0, 1, 2, 3 preserves order.
>>>
array([0, 1, 2, 6, 3, 4, 7, 5])
for 2d case, columnwise shuffle (along axis=1):
import numpy as np
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])
# The code below works for column shuffle (i.e. axis=1).
# Get ordered values
i,j = np.where(b==1)
values = a[i, j]
values
# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*a.shape).argsort(axis=1)
a_shuffled = np.take_along_axis(a,idx,axis=1)
b_shuffled = np.take_along_axis(b,idx,axis=1)
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = values
# Get the result
a_shuffled # see that 4,5 | 6,7,8 | 12,13,14,15 | 20, 21 preserves order
>>>
array([[ 4, 1, 0, 3, 2, 5],
[ 9, 6, 7, 11, 8, 10],
[12, 13, 16, 17, 14, 15],
[23, 20, 19, 22, 21, 18]])
for 2d case, rowwise shuffle (along axis=0), we can use the same code, first transpose arrays and after shuffle transpose back:
import numpy as np
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])
# The code below works for column shuffle (i.e. axis=1).
# As you said rowwise, we first transpose
at = a.T
bt = b.T
# Get ordered values
i,j = np.where(bt==1)
values = at[i, j]
values
# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*at.shape).argsort(axis=1)
at_shuffled = np.take_along_axis(at,idx,axis=1)
bt_shuffled = np.take_along_axis(bt,idx,axis=1)
# Replace the values with correct order
at_shuffled[np.where(bt_shuffled==1)] = values
# Get the result
a_shuffled = at_shuffled.T
a_shuffled # see that 6,12 | 7, 13 | 8,14,20 | 15, 21 preserves order
>>>
array([[ 6, 7, 2, 3, 10, 17],
[18, 19, 8, 15, 16, 23],
[12, 13, 14, 21, 4, 5],
[ 0, 1, 20, 9, 22, 11]])

numpy.where with more than one condition

I am here because I have a question with the function numpy.where.
I need to develop a program that rounds the grades of a student in the danish grading scale.
(Danish grading scale is a 7-step-scale from the best one (12) to the worst one (-3) : 12 10 7 4 02 00 −3)
Here is the array of the grades :
grades=np.array([[-3,-2,-1,0],[1,2,3,4],[5,6,7,8],[9,10,11,12]])
and what I am trying to do is this :
gradesrounded=np.where(grades<-1.5, -3, grades)
gradesrounded=np.where(-1.5<=grades and grades<1, 0, grades)
gradesrounded=np.where(grades>=1 and grades<3, 2, grades)
gradesrounded=np.where(grades>=3 and grades<5.5, 4, grades)
gradesrounded=np.where(grades>=5.5 and grades<8.5, 7, grades)
gradesrounded=np.where(grades>=8.5 and grades<11, 10, grades)
gradesrounded=np.where(grades>=11, 12, grades)
print(gradesrounded)
and what I found out is that np.where works when there is one condition (so grades below -1.5 works and grades over 11 works for example) but if there are 2 different conditions (for example this one : np.where(grades>=1 and grades<3, 2, grades)) it won't work.
Do you know how I could fix this ?
Thank you very much.
Another way is np.searchsorted:
scales = np.array([-3,0,2,4,7,10,12])
grades=np.array([[-3,-2,-1,0],[1,2,3,4],[5,6,7,8],[9,10,11,12]])
thresh = [-1.5, 0.5 ,2.5,5.5,8.5,10]
out = scales[np.searchsorted(thresh, grades)]
# or
# thresh = [-3, -1.5, 1, 3, 5.5, 8.5, 11]
# out = scales[np.searchsorted(thresh, grades, side='right')-1]
Out:
array([[-3, -3, 0, 0],
[ 2, 2, 4, 4],
[ 4, 7, 7, 7],
[10, 10, 12, 12]])
You are using the logical operator and which doesn't work for array operations. Use bitwise operators instead that will operate element by element.
np.where((grades>=1) & (grades<3), 2, grades))
Have a look at this: link
This is an excellent case for the np.select() function. The docs can be found here.
The setup is simple:
Create a list of Danish system grades.
Create a list of mappings. The case below uses the logical and & operator to link multiple conditions.
Setup:
import numpy as np
# Sample grades.
x = np.array([-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Define limits and lookup.
grades = [12, 10, 7, 4, 2, 0, -3]
scale = [(x >= 11),
(x >= 8.5) & (x < 11),
(x >= 5.5) & (x < 8.5),
(x >= 3.0) & (x < 5.5),
(x >= 1.0) & (x < 3.0),
(x >= -1.5) & (x < 1.0 ),
(x < -1.5)]
Use:
Call the np.select function and pass in the two lists created above.
# Map grades to Danish system.
np.select(condlist=scale, choicelist=grades)
Output:
array([-3, -3, 0, 0, 2, 2, 4, 4, 4, 7, 7, 7, 10, 10, 12, 12])
If you want to put it as a string you could also use a pandas dataframe and the query function on it. Here is an example:
df = df.query('grades>=1 & grades<3')

N-D indexing with defaults in NumPy

Can I index NumPy N-D array with fallback to default values for out-of-bounds indexes? Example code below for some imaginary np.get_with_default(a, indexes, default):
import numpy as np
print(np.get_with_default(
np.array([[1,2,3],[4,5,6]]), # N-D array
[(np.array([0, 0, 1, 1, 2, 2]), np.array([1, 2, 2, 3, 3, 5]))], # N-tuple of indexes along each axis
13, # Default for out-of-bounds fallback
))
should print
[2 3 6 13 13 13]
I'm looking for some built-in function for this. If such not exists then at least some short and efficient implementation to do that.
I arrived at this question because I was looking for exactly the same. I came up with the following function, which does what you ask for 2 dimension. It could likely be generalised to N dimensions.
def get_with_defaults(a, xx, yy, nodata):
# get values from a, clipping the index values to valid ranges
res = a[np.clip(yy, 0, a.shape[0] - 1), np.clip(xx, 0, a.shape[1] - 1)]
# compute a mask for both x and y, where all invalid index values are set to true
myy = np.ma.masked_outside(yy, 0, a.shape[0] - 1).mask
mxx = np.ma.masked_outside(xx, 0, a.shape[1] - 1).mask
# replace all values in res with NODATA, where either the x or y index are invalid
np.choose(myy + mxx, [res, nodata], out=res)
return res
xx and yy are the index array, a is indexed by (y,x).
This gives:
>>> a=np.zeros((3,2),dtype=int)
>>> get_with_defaults(a, (-1, 1000, 0, 1, 2), (0, -1, 0, 1, 2), -1)
array([-1, -1, 0, 0, -1])
As an alternative, the following implementation achieves the same and is more concise:
def get_with_default(a, xx, yy, nodata):
# get values from a, clipping the index values to valid ranges
res = a[np.clip(yy, 0, a.shape[0] - 1), np.clip(xx, 0, a.shape[1] - 1)]
# replace all values in res with NODATA (gets broadcasted to the result array), where
# either the x or y index are invalid
res[(yy < 0) | (yy >= a.shape[0]) | (xx < 0) | (xx >= a.shape[1])] = nodata
return res
I don't know if there is anything in NumPy to do that directly, but you can always implement it yourself. This is not particularly smart or efficient, as it requires multiple advanced indexing operations, but does what you need:
import numpy as np
def get_with_default(a, indices, default=0):
# Ensure inputs are arrays
a = np.asarray(a)
indices = tuple(np.broadcast_arrays(*indices))
if len(indices) <= 0 or len(indices) > a.ndim:
raise ValueError('invalid number of indices.')
# Make mask of indices out of bounds
mask = np.zeros(indices[0].shape, np.bool)
for ind, s in zip(indices, a.shape):
mask |= (ind < 0) | (ind >= s)
# Only do masking if necessary
n_mask = np.count_nonzero(mask)
# Shortcut for the case where all is masked
if n_mask == mask.size:
return np.full_like(a, default)
if n_mask > 0:
# Ensure index arrays are contiguous so masking works right
indices = tuple(map(np.ascontiguousarray, indices))
for ind in indices:
# Replace masked indices with zeros
ind[mask] = 0
# Get values
res = a[indices]
if n_mask > 0:
# Replace values of masked indices with default value
res[mask] = default
return res
# Test
print(get_with_default(
np.array([[1,2,3],[4,5,6]]),
(np.array([0, 0, 1, 1, 2, 2]), np.array([1, 2, 2, 3, 3, 5])),
13
))
# [ 2 3 6 13 13 13]
I also needed a solution to this, but I wanted a solution that worked in N dimensions. I made Markus' solution work for N-dimensions, including selecting from an array with more dimensions than the coordinates point to.
def get_with_defaults(arr, coords, nodata):
coords, shp = np.array(coords), np.array(arr.shape)
# Get values from arr, clipping to valid ranges
res = arr[tuple(np.clip(c, 0, s-1) for c, s in zip(coords, shp))]
# Set any output where one of the coords was out of range to nodata
res[np.any(~((0 <= coords) & (coords < shp[:len(coords), None])), axis=0)] = nodata
return res
import numpy as np
if __name__ == '__main__':
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[[1, -9],[2, -8],[3, -7]],[[4, -6],[5, -5],[6, -4]]])
coords1 = [[0, 0, 1, 1, 2, 2], [1, 2, 2, 3, 3, 5]]
coords2 = [[0, 0, 1, 1, 2, 2], [1, 2, 2, 3, 3, 5], [1, 1, 1, 1, 1, 1]]
out1 = get_with_defaults(A, coords1, 13)
out2 = get_with_defaults(B, coords1, 13)
out3 = get_with_defaults(B, coords2, 13)
print(out1)
# [2, 3, 6, 13, 13, 13]
print(out2)
# [[ 2 -8]
# [ 3 -7]
# [ 6 -4]
# [13 13]
# [13 13]
# [13 13]]
print(out3)
# [-8, -7, -4, 13, 13, 13]

Find Distance to Nearest Zero in NumPy Array

Let's say I have a NumPy array:
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
At each index, I want to find the distance to nearest zero value. If the position is a zero itself then return zero as a distance. Afterward, we are only interested in distances to the nearest zero that is to the right of the current position. The super naive approach would be something like:
out = np.full(x.shape[0], x.shape[0]-1)
for i in range(x.shape[0]):
j = 0
while i + j < x.shape[0]:
if x[i+j] == 0:
break
j += 1
out[i] = j
And the output would be:
array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0])
I'm noticing a countdown/decrement pattern in the output in between the zeros. So, I might be able to do use the locations of the zeros (i.e., zero_indices = np.argwhere(x == 0).flatten())
What is the fastest way to get the desired output in linear time?
Approach #1 : Searchsorted to the rescue for linear-time in a vectorized manner (before numba guys come in)!
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
idx_nz = np.flatnonzero(~mask_z)
# Cover for the case when there's no 0 left to the right
# (for same results as with posted loop-based solution)
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = np.zeros(len(x), dtype=int)
idx = np.searchsorted(idx_z, idx_nz)
out[~mask_z] = idx_z[idx] - idx_nz
Approach #2 : Another with some cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
# Cover for the case when there's no 0 left to the right
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = idx_z[np.r_[False,mask_z[:-1]].cumsum()] - np.arange(len(x))
Alternatively, last step of cumsum could be replaced by repeat functionality -
r = np.r_[idx_z[0]+1,np.diff(idx_z)]
out = np.repeat(idx_z,r)[:len(x)] - np.arange(len(x))
Approach #3 : Another with mostly just cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
pp = np.full(len(x), -1)
pp[idx_z[:-1]] = np.diff(idx_z) - 1
if idx_z[0]==0:
pp[0] = idx_z[1]
else:
pp[0] = idx_z[0]
out = pp.cumsum()
# Handle boundary case and assigns 0s at original 0s places
out[idx_z[-1]:] = np.arange(len(x)-idx_z[-1],0,-1)
out[mask_z] = 0
You could work from the other side. Keep a counter on how many non zero digits have passed and assign it to the element in the array. If you see 0, reset the counter to 0
Edit: if there is no zero on the right, then you need another check
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
out = x
count = 0
hasZero = False
for i in range(x.shape[0]-1,-1,-1):
if out[i] != 0:
if not hasZero:
out[i] = x.shape[0]-1
else:
count += 1
out[i] = count
else:
hasZero = True
count = 0
print(out)
You can use the difference between the indices of each position and the cumulative max of zero positions to determine the distance to the preceding zero. This can be done forward and backward. The minimum between forward and backward distance to the preceding (or next) zero will be the nearest:
import numpy as np
indices = np.arange(x.size)
zeroes = x==0
forward = indices - np.maximum.accumulate(indices*zeroes) # forward distance
forward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
forward = forward * (x!=0) # set zero positions to zero
zeroes = zeroes[::-1]
backward = indices - np.maximum.accumulate(indices*zeroes) # backward distance
backward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
backward = backward[::-1] * (x!=0) # set zero positions to zero
distZero = np.minimum(forward,backward) # closest distance (minimum)
results:
distZero
# [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
forward
# [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
backward
# [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
Special case where no zeroes are present on outer edges:
x = np.array([3, 1, 2, 0, 4, 5, 6, 0,8,8])
forward: [9 9 9 0 1 2 3 0 1 2]
backward: [3 2 1 0 3 2 1 0 9 9]
distZero: [3 2 1 0 1 2 1 0 1 2]
also works with no zeroes at all
[EDIT] non-numpy solutions ...
if you're looking for an O(N) solution that doesn't require numpy, you can apply this strategy using the accumulate function from itertools:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
from itertools import accumulate
maxDist = len(x) - 1
zeroes = [maxDist*(v!=0) for v in x]
forward = [*accumulate(zeroes,lambda d,v:min(maxDist,(d+1)*(v!=0)))]
backward = accumulate(zeroes[::-1],lambda d,v:min(maxDist,(d+1)*(v!=0)))
backward = [*backward][::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
If you don't want to use any library, you can accumulate the distances manually in a loop:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
forward,backward = [],[]
fDist = bDist = maxDist = len(x)-1
for f,b in zip(x,reversed(x)):
fDist = min(maxDist,(fDist+1)*(f!=0))
forward.append(fDist)
bDist = min(maxDist,(bDist+1)*(b!=0))
backward.append(bDist)
backward = backward[::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
My first intuition would be to use slicing. If x can be a normal list instead of a numpy array, then you could use
out = [x[i:].index(0) for i,_ in enumerate(x)]
if numpy is necessary then you can use
out = [np.where(x[i:]==0)[0][0] for i,_ in enumerate(x)]
but this is less efficient because you are finding all zero locations to the right of the value and then pulling out just the first. Almost definitely a better way to do this in numpy.
Edit: I am sorry, I misunderstood. This will give you the distance to the nearest zeros - may it be at left or right. But you can use d_right as intermediate result. This does not cover the edge case of not having any zero to the right though.
import numpy as np
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
# Get the distance to the closest zero from the left:
zeros = x == 0
zero_locations = np.argwhere(x == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_left = np.cumsum(temp) - 1
# Get the distance to the closest zero from the right:
zeros = x[::-1] == 0
zero_locations = np.argwhere(x[::-1] == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_right = np.cumsum(temp) - 1
d_right = d_right[::-1]
# Get the smallest distance from both sides:
smallest_distances = np.min(np.stack([d_left, d_right]), axis=0)
# np.array([0, 1, 1, 0, 1, 2, 2, 1, 0, 0])

Categories