What's an easy way to count reversal points of a dataset? - python

I have a dataset ('db') with some reversal points. I would like to implement a counter ('counter_vals') starting at 0 that increases at each reversal point. How can I compute these counter values correctly in a simple way?
import numpy as np
import matplotlib.pyplot as plt
db = np.array([12, 0, 6, 3, 0, -3, -6, -3, -6])
x_vals = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
plt.scatter(x_vals, db)
plt.plot(x_vals, db)
plt.show()
The desired output should like
counter_vals = np.array([0, 1, 2, 2, 2, 2, 3, 4, 4])

You might use the derivatives. "a point after that the direction of a slope changes" is indicated by the second derivative. Due to numerical differentiations you will lose the boundary points (but you can improve that by adding a leading zero and by repeating the last element)
db = np.array([12, 0, 6, 3, 0, -3, -6, -3, -6])
deriv0 = np.diff(db) # first derivative
deriv1 = np.sign(np.diff(db)) # signs of first derivative
deriv2 = np.sign(np.diff(deriv1)) # signs of second derivative
deriv3 = np.abs(np.sign(np.diff(deriv1)) ) # absolute value of signs of second derivative
counter_vals = np.cumsum(deriv3)
print("deriv0 =", deriv0)
print("deriv1 =", deriv1)
print("deriv2 =", deriv2)
print("deriv3 =", deriv3)
print()
print("counter_vals =", counter_vals)
deriv0 = [-12 6 -3 -3 -3 -3 3 -3]
deriv1 = [-1 1 -1 -1 -1 -1 1 -1]
deriv2 = [ 1 -1 0 0 0 1 -1]
deriv3 = [1 1 0 0 0 1 1]
counter_vals = [1 2 2 2 2 3 4]

Related

Create dataset from another basing on first occurrence of some number

I have some dataset which looks like [3,4,5,-5,4,5,6,3,2-6,6]
I want to create a dataset that will always have 0 for indexes which match first sequence of positive numbers from dataset 1, and 1 for indexes which remain.
So for a = [3,4,5,-5,4,5,6,3,2-6,6] it should be
b = [0,0,0, 1,1,1,1,1,1,1]
How can produce b from a if I use pandas and python ?
Since you tagged pandas, here is a solution using a Series:
import pandas as pd
s = pd.Series([3, 4, 5, -5, 4, 5, 6, 3, 2 - 6, 6])
# find the first index that is greater than zero
idx = (s > 0).idxmin()
# using the index set all the values before as 0, otherwise 1
res = pd.Series(s.index >= idx, dtype=int)
print(res)
Output
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
dtype: int64
If you prefer a one-liner:
res = pd.Series(s.index >= (s > 0).idxmin(), dtype=int)
You can use a cummax on the boolean series:
s = pd.Series([3, 4, 5, -5, 4, 5, 6, 3, 2 - 6, 6])
out = s.lt(0).cummax().astype(int)
Output:
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
dtype: int64
If you are really working with lists, then pandas is not needed and numpy should be more efficient:
import numpy as np
a = [3,4,5,-5,4,5,6,3,2-6,6]
b = np.maximum.accumulate(np.array(a)<0).astype(int).tolist()
Output: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
And if the list is small, pure python should be preferred:
from itertools import accumulate
b = list(accumulate((int(x<0) for x in a), max))
Output: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

Find Distance to Nearest Zero in NumPy Array

Let's say I have a NumPy array:
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
At each index, I want to find the distance to nearest zero value. If the position is a zero itself then return zero as a distance. Afterward, we are only interested in distances to the nearest zero that is to the right of the current position. The super naive approach would be something like:
out = np.full(x.shape[0], x.shape[0]-1)
for i in range(x.shape[0]):
j = 0
while i + j < x.shape[0]:
if x[i+j] == 0:
break
j += 1
out[i] = j
And the output would be:
array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0])
I'm noticing a countdown/decrement pattern in the output in between the zeros. So, I might be able to do use the locations of the zeros (i.e., zero_indices = np.argwhere(x == 0).flatten())
What is the fastest way to get the desired output in linear time?
Approach #1 : Searchsorted to the rescue for linear-time in a vectorized manner (before numba guys come in)!
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
idx_nz = np.flatnonzero(~mask_z)
# Cover for the case when there's no 0 left to the right
# (for same results as with posted loop-based solution)
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = np.zeros(len(x), dtype=int)
idx = np.searchsorted(idx_z, idx_nz)
out[~mask_z] = idx_z[idx] - idx_nz
Approach #2 : Another with some cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
# Cover for the case when there's no 0 left to the right
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = idx_z[np.r_[False,mask_z[:-1]].cumsum()] - np.arange(len(x))
Alternatively, last step of cumsum could be replaced by repeat functionality -
r = np.r_[idx_z[0]+1,np.diff(idx_z)]
out = np.repeat(idx_z,r)[:len(x)] - np.arange(len(x))
Approach #3 : Another with mostly just cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
pp = np.full(len(x), -1)
pp[idx_z[:-1]] = np.diff(idx_z) - 1
if idx_z[0]==0:
pp[0] = idx_z[1]
else:
pp[0] = idx_z[0]
out = pp.cumsum()
# Handle boundary case and assigns 0s at original 0s places
out[idx_z[-1]:] = np.arange(len(x)-idx_z[-1],0,-1)
out[mask_z] = 0
You could work from the other side. Keep a counter on how many non zero digits have passed and assign it to the element in the array. If you see 0, reset the counter to 0
Edit: if there is no zero on the right, then you need another check
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
out = x
count = 0
hasZero = False
for i in range(x.shape[0]-1,-1,-1):
if out[i] != 0:
if not hasZero:
out[i] = x.shape[0]-1
else:
count += 1
out[i] = count
else:
hasZero = True
count = 0
print(out)
You can use the difference between the indices of each position and the cumulative max of zero positions to determine the distance to the preceding zero. This can be done forward and backward. The minimum between forward and backward distance to the preceding (or next) zero will be the nearest:
import numpy as np
indices = np.arange(x.size)
zeroes = x==0
forward = indices - np.maximum.accumulate(indices*zeroes) # forward distance
forward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
forward = forward * (x!=0) # set zero positions to zero
zeroes = zeroes[::-1]
backward = indices - np.maximum.accumulate(indices*zeroes) # backward distance
backward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
backward = backward[::-1] * (x!=0) # set zero positions to zero
distZero = np.minimum(forward,backward) # closest distance (minimum)
results:
distZero
# [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
forward
# [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
backward
# [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
Special case where no zeroes are present on outer edges:
x = np.array([3, 1, 2, 0, 4, 5, 6, 0,8,8])
forward: [9 9 9 0 1 2 3 0 1 2]
backward: [3 2 1 0 3 2 1 0 9 9]
distZero: [3 2 1 0 1 2 1 0 1 2]
also works with no zeroes at all
[EDIT] non-numpy solutions ...
if you're looking for an O(N) solution that doesn't require numpy, you can apply this strategy using the accumulate function from itertools:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
from itertools import accumulate
maxDist = len(x) - 1
zeroes = [maxDist*(v!=0) for v in x]
forward = [*accumulate(zeroes,lambda d,v:min(maxDist,(d+1)*(v!=0)))]
backward = accumulate(zeroes[::-1],lambda d,v:min(maxDist,(d+1)*(v!=0)))
backward = [*backward][::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
If you don't want to use any library, you can accumulate the distances manually in a loop:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
forward,backward = [],[]
fDist = bDist = maxDist = len(x)-1
for f,b in zip(x,reversed(x)):
fDist = min(maxDist,(fDist+1)*(f!=0))
forward.append(fDist)
bDist = min(maxDist,(bDist+1)*(b!=0))
backward.append(bDist)
backward = backward[::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
My first intuition would be to use slicing. If x can be a normal list instead of a numpy array, then you could use
out = [x[i:].index(0) for i,_ in enumerate(x)]
if numpy is necessary then you can use
out = [np.where(x[i:]==0)[0][0] for i,_ in enumerate(x)]
but this is less efficient because you are finding all zero locations to the right of the value and then pulling out just the first. Almost definitely a better way to do this in numpy.
Edit: I am sorry, I misunderstood. This will give you the distance to the nearest zeros - may it be at left or right. But you can use d_right as intermediate result. This does not cover the edge case of not having any zero to the right though.
import numpy as np
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
# Get the distance to the closest zero from the left:
zeros = x == 0
zero_locations = np.argwhere(x == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_left = np.cumsum(temp) - 1
# Get the distance to the closest zero from the right:
zeros = x[::-1] == 0
zero_locations = np.argwhere(x[::-1] == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_right = np.cumsum(temp) - 1
d_right = d_right[::-1]
# Get the smallest distance from both sides:
smallest_distances = np.min(np.stack([d_left, d_right]), axis=0)
# np.array([0, 1, 1, 0, 1, 2, 2, 1, 0, 0])

compute density map D

You are given two integer numbers n and r, such that 1 <= r < n,
a two-dimensional array W of size n x n.
Each element of this array is either 0 or 1.
Your goal is to compute density map D for array W, using radius of r.
The output density map is also two-dimensional array,
where each value represent number of 1's in matrix W within the specified radius.
Given the following input array W of size 5 and radius 1 (n = 5, r = 1)
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
Output (using Python):
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
Logic: Input first row, first column value is 1. r value is 1. So we should check 1 right element, 1 left element, 1 top element, top left, top right, bottom , bottom left and bottom right and sum all elements.
Should not use any 3rd party library.
I did it using for loop and inner for loop and check for each element. Any better work around ?
Optimization: For each 1 in W, update count for locations, in whose neighborhood it belongs
Although for W of size nxn, the following algorithm would still take O(n^2) steps, however if W is sparse i.e. number of 1s (say k) << nxn then instead of rxrxnxn steps for approach stated in question, following would take nxn + rxrxk steps, which is much lower if k << nxn
Given r assigned and W stored as
[[1, 0, 0, 0, 1],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0]]
then following
output = [[ 0 for i in range(5) ] for j in range(5) ]
for i in range(len(W)):
for j in range(len(W[0])):
if W[i][j] == 1:
for off_i in range(-r,r+1):
for off_j in range(-r,r+1):
if (0 <= i+off_i < len(W)) and (0 <= j+off_j < len(W[0])):
output[i+off_i][j+off_j] += 1
stores required values in output
for r = 1, output is as required
[[3, 4, 2, 2, 1],
[4, 5, 2, 2, 1],
[3, 4, 3, 3, 2],
[2, 2, 2, 2, 2],
[1, 1, 2, 2, 2]]

Range() including its bounds for positive and negative steps

I want to do a simple program that prints all the numbers in a range A to B, including B.
For ranges having bounds in increasing order, I know that I have to add 1 to the upper bound, like:
range(A, B+1)
But adding 1 to B won't work when the bounds are in decreasing order, like range(17, 15, -1).
How can I code it to work for both increasing and decreasing ranges?
I see why you are facing this issue. Its because you are using the larger value as the first argument and smaller value at the second argument in the range (This is happening due to the negative sign).
For such cases following code will work :
a = 5
b = -5
step = 1
if b < 0:
step = -1
range (a, b + step, step)
I think I don't understand the question properly. There are 3 cases:
A, B both positive
A negative, B positive
A, B both negative
Now if I do this (in Python 2, to avoid having to do list(range(...)): this makes the explanation cleaner):
>>> A = 10; B = 20 # case 1
>>> range(A,B+1)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
>>> A = -10; B = 2 # case 2
>>> range(A,B+1)
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2]
>>> A = -10; B = -2 # case 3
>>> range(A,B+1)
[-10, -9, -8, -7, -6, -5, -4, -3, -2]
So your remark the last number in the range won't be included doesn't seem to fit with what I can see.
If you are receiving input data where A > B, then the problem is not the negative number, but the fact that range() expects the values to be in ascending order.
To cover that:
>>> A = 2; B = -2 # case 4
>>> A,B = sorted((A,B))
>>> range(A,B+1)
[-2, -1, 0, 1, 2]
This also works for cases 1, 2, and 3.
If I have misunderstood the question please edit it to clarify.
Please check if this works. Thank you.
if A>B and A < 0 and B < 0:
print(list(range(A,B,-1)))
elif A<B:
print(list(range(A,B)))
else:
print(list(range(A,B,-1)))
You could create a function that turns a normal range into one that includes both bounds, like:
def inclusive(r):
return range(r.start, r.stop + r.step, r.step)
You should pass it a range, and it will return a range:
increasing_range = range(2, 5)
print(list(inclusive(increasing_range)))
# [2, 3, 4, 5]
decreasing_range = range(5, -5, -1)
print(list(inclusive(decreasing_range)))
# [5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5]
even = range(2, 10, 2)
print(list(inclusive(even)))
# [2, 4, 6, 8, 10]
for odd in inclusive(range(1, 5)):
print(odd)
# 1 2 3 4 5
Last number is never included in python range. You need to adjust the code accordingly.
e.g.
To print values form -5 to -1(included) use,
>>> print(list(range(-5, 0)))
[-5, -4, -3, -2, -1]
In reverse order
>>> print(list(range(-1, -6, -1)))
[-1, -2, -3, -4, -5]

Setting increasing values in a numpy array around a defined diagonal

What is the best way to create a 2D list (or numpy array) in python, in which the diagonal is set to -1 and the remaining values are increasing from 0 by 1, for different values of n. For example, if n = 3 the array would look like:
[[-1,0,1]
[2,-1,3]
[4,5,-1]]
or for n = 4:
[[-1,0,1,2]
[3,-1,4,5]
[6,7,-1,8]
[9,10,11,-1]]
etc.
I know I can create an array with zeros and with the diagonal set to -1 with:
a = numpy.zeros((n,n))
numpy.fill_diagonal(a,-1)
And so if n = 3 this would give:
[[-1,0,0]
[0,-1,0]
[0,0,-1]]
But how would I then set the 0's to be increasing numbers, as shown in the example above? Would I need to iterate through and set the values through a loop? Or is there a better way to approach this?
Thanks in advance.
One approach -
def set_matrix(n):
out = np.full((n,n),-1)
off_diag_mask = ~np.eye(n,dtype=bool)
out[off_diag_mask] = np.arange(n*n-n)
return out
Sample runs -
In [23]: set_matrix(3)
Out[23]:
array([[-1, 0, 1],
[ 2, -1, 3],
[ 4, 5, -1]])
In [24]: set_matrix(4)
Out[24]:
array([[-1, 0, 1, 2],
[ 3, -1, 4, 5],
[ 6, 7, -1, 8],
[ 9, 10, 11, -1]])
Here is an arithmetic way:
m=np.arange(n*n).reshape(n,n)*n//(n+1)
m.flat[::n+1]=-1
for n=5 :
[[-1 0 1 2 3]
[ 4 -1 5 6 7]
[ 8 9 -1 10 11]
[12 13 14 -1 15]
[16 17 18 19 -1]]

Categories