Fast implementation of finding duplicate coordinates

Fast implementation of finding duplicate coordinates - python

I am trying to write a program which finds duplicate coordinates (x, y, z) in a 3D array. The script should mark one or multiple duplicate points with a given tolerance - one point could have more than one duplicate. I found lots of different approaches which among others use sorting approaches.
To try the code I created the following test data set:
21.9799629872016 57.4044376777929 0
22.7807110172432 57.6921361034533 0
28.660840151287 61.5676757599822 0
28.6608401512 61.56767575998 0
30.6654296288019 56.2221038199424 0
20.3752036442253 49.1392209993897 0
32.8036584048178 43.927288357851 0
35.8105426210901 51.9456462679106 0
40.8888359641279 58.6944308422108 0
40.88883596412 70.6944308422108 0
41.0892949118794 58.1598736482068 0
39.6860822776189 64.775018924006 0
39.1515250836149 64.8418385732565 0
8.21402748063493 63.5054455882466 0
8.2140275006 63.5074455882 0
8.21404548063493 63.5064455882466 0
8.2143214806 63.5084455882 0
The code I came up with is:
# given tolerance
tol = 0.01
# initialize empty list for the found duplicates
duplicates = []
# loop over all nodes
for i in range(0,len(nodes)):
# current node
curr_node = nodes[i]
# create difference vector
diff = nodes - curr_node
# get all duplicate indices (the node itself is found as well)
condition = np.where((abs(diff[:,0])<tol) & (abs(diff[:,1])<tol) & (abs(diff[:,2])<tol))
# check if more than one entry is present. If larger than 1, duplicate points exist
if len(condition[0]) > 1:
# loop over all found duplicate points
for j in range(0,len(condition[0])):
# add duplicate if not already marked as duplicate
if j>0 and condition[0][j] not in duplicates:
duplicates.append(condition[0][j] )
This code returns what I am expecting:
duplicates = [3, 14, 15, 16]
However, the code is very slow. For 300,000 points it takes about 10 minutes. I am wondering if there is any faster way to implement this.

You can place points in a grid of tolerance-sized cubes. Then, for each point, you only need to check the points from the same cube + 26 adjacent ones instead of all other points.
# compute the grid
for p in points:
cube = (
int(p[0] / tolerance),
int(p[1] / tolerance),
int(p[2] / tolerance))
grid[cube].append(p)
# check
for p in points:
cube = as above
for adj in adjacent_cubes(cube)
for p2 in grid[adj]
check_distance(p, p2)

You could sort the nodes upfront, to reduce the amount of loops needed:
import timeit
import random
nodes = [
[21.9799629872016, 57.4044376777929, 0],
[22.7807110172432, 57.6921361034533, 0],
[28.660840151287, 61.5676757599822, 0], [28.6608401512, 61.56767575998, 0],
[30.6654296288019, 56.2221038199424, 0],
[20.3752036442253, 49.1392209993897, 0],
[32.8036584048178, 43.927288357851, 0],
[35.8105426210901, 51.9456462679106, 0],
[40.8888359641279, 58.6944308422108, 0],
[40.88883596412, 70.6944308422108, 0],
[41.0892949118794, 58.1598736482068, 0],
[39.6860822776189, 64.775018924006, 0],
[39.1515250836149, 64.8418385732565, 0],
[8.21402748063493, 63.5054455882466, 0], [8.2140275006, 63.5074455882, 0],
[8.21404548063493, 63.5064455882466, 0], [8.2143214806, 63.5084455882, 0]
]
duplicates = [3, 14, 15, 16]
assertList = [n for i, n in enumerate(nodes) if i in duplicates]
def new(nodes, tol=0.01):
print(f"Searching duplicates in {len(nodes)} nodes")
coordinateLen = range(len(nodes[0]))
nodes.sort()
last = nodes[0]
duplicates = []
for i, node in enumerate(nodes[1:]):
if not all(0 <= node[idx] - last[idx] < tol for idx in coordinateLen):
last = node
else:
duplicates.append(node)
print(f"Found: {len(duplicates)} duplicates")
return duplicates
# generate random numbers!
randomNodes = [
[random.uniform(0, 100),
random.uniform(0, 100),
random.uniform(0, 1)] for _ in range(300000)
]
# make sure there are at least the same 4 duplicates!
randomNodes += nodes
for i, lst in enumerate((nodes, randomNodes)):
for func in ("new", ):
t1 = timeit.Timer(f"{func}({lst})", f"from __main__ import {func}")
# verify values of found duplicates are [3, 14, 15, 16] !!
if i == 0:
print(all(x for x in new(nodes) if x in assertList))
print(f"{func} took: {t1.timeit(number=10)} seconds")
print("")
Out:
Searching duplicates in 17 nodes
Found: 4 duplicates
True
....
new took: 0.00034904800000001845 seconds
Searching duplicates in 300017 nodes
Found: 4 duplicates
...
new took: 14.316181525000001 seconds

Related

Python loop getting stuck?

The brute force search I'm trying to implement should iterate over the range of values, but it seems to be getting stuck.
I'd ideally like to keep using for loops
I've tried reordering the for loops and using [Angle,Max_Camber,Max_Camber_Position,Thickness] directly instead of assigning it.
Iterations = -1
Current_Max = 0
Airfoil = [0, 1, 1, 10]
for Angle in range(Airfoil[0],90,15):
for Max_Camber in range(Airfoil[1],11,2):
for Max_Camber_Position in range(Airfoil[2],11,2):
for Thickness in range(Airfoil[3],100,20):
Iterations+=1
print("Iterations = ",Iterations)
# Commenting this out stops the error
# The loop should have 749 iterations rather than 17
Airfoil =[Angle,Max_Camber,Max_Camber_Position,Thickness]
print("Airfoil = ", Airfoil)

when you assign Airfoil with value
Airfoil =[Angle,Max_Camber,Max_Camber_Position,Thickness]
you actually changed iteration starting position
in order to get 749 iterations
change your print Airfoil to some name else like
Iterations = -1
Current_Max = 0
Airfoil = [0, 1, 1, 10]
for Angle in range(Airfoil[0],90,15):
for Max_Camber in range(Airfoil[1],11,2):
for Max_Camber_Position in range(Airfoil[2],11,2):
for Thickness in range(Airfoil[3],100,20):
Iterations+=1
print("Iterations = ",Iterations)
Airfoil2 =[Angle,Max_Camber,Max_Camber_Position,Thickness]
print("Airfoil = ", Airfoil2)
this will work

You should use a different variable name to create your list before printing it. You could also make use of Python's product() function which helps with avoiding deeply nested for loops for doing exactly what you are doing, for example:
from itertools import product
Iterations = -1
Current_Max = 0
Airfoil = [0, 1, 1, 10]
for Angle, Max_Camber, Max_Camber_Position, Thickness in product(
range(Airfoil[0], 90, 15), range(Airfoil[1], 11, 2),
range(Airfoil[2], 11, 2), range(Airfoil[3], 100, 20)):
Iterations += 1
print("Iterations = ", Iterations)
Airfoil2 = [Angle, Max_Camber, Max_Camber_Position, Thickness]
print("Airfoil = ", Airfoil2)

How to vectorize a loop through a matrix numpy

Suppose I have a matrix that is 100000 x 100
import numpy as np
mat = np.random.randint(2, size=(100000,100))
I wish to go through this matrix, and if each row contains entirely either 1 or 0 I wish to change a state variable to that value. If the state is not changed, I wish to set the entire row the value of state. The initial value of state is 0.
Naively in a for loop this can be done as follows
state = 0
for row in mat:
if set(row) == {1}:
state = 1
elif set(row) == {0}:
state = 0
else:
row[:] = state
However, when the size of the matrix increases this takes an impractical amount of time. Could someone point me in the direction in how to leverage numpy to vectorize this loop and speed it up?
So for a sample input
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 1],
[0, 0, 1],
[0, 0, 1]])
The expected output in this case would be
array([[0, 0, 0],
[0, 0, 0],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])

Approach #1: NumPy-Vectorized
Here's a vectorized one -
def check_all(a, state): # a is input matrix/array
# Get zeros and ones all masks
zm = (a==0).all(1)
om = (a==1).all(1)
# "Attach" boundaries with False values at the start of these masks.
# These will be used to detect rising edges (as indices) on these masks.
zma = np.r_[False,zm]
oma = np.r_[False,om]
omi = np.flatnonzero(oma[:-1] < oma[1:])
zmi = np.flatnonzero(zma[:-1] < zma[1:])
# Group the indices and the signatures (values as 1s and -1s)
ai = np.r_[omi,zmi]
av = np.r_[np.ones(len(omi),dtype=int),-np.ones(len(zmi),dtype=int)]
# Sort the grouped-indices, thus we would know the positions
# of these group starts. Then index into the signatures/values
# and indices with those, giving us the information on how these signatures
# occur through the length of the input
sidx = ai.argsort()
val,aidx = av[sidx],ai[sidx]
# The identical consecutive signatures are to be removed
mask = np.r_[True,val[:-1]!=val[1:]]
v,i = val[mask],aidx[mask]
# Also, note that we are assigning all 1s as +1 signature and all 0s as -1
# So, in case the starting signature is a 0, assign a value of 0
if v[0]==-1:
v[0] = 0
# Initialize 1D o/p array, which stores the signatures as +1s and -1s.
# The bigger level idea is that performing cumsum at the end would give us the
# desired 1D output
out1d = np.zeros(len(a),dtype=a.dtype)
# Assign the values at i positions
out1d[i] = v
# Finally cumsum to get desired output
out1dc = out1d.cumsum()
# Correct the starting positions based on starting state value
out1dc[:i[0]] = state
# Convert to 2D view for mem. and perf. efficiency
out = np.broadcast_to(out1dc[:,None],a.shape)
return out
Approach #2: Numba-based
Here's another numba-based one for memory and hence perf. efficiency -
#njit(parallel=True)
def func1(zm, om, out, start_state, cur_state):
# This outputs 1D version of required output.
# Start off with the starting given state
newval = start_state
# Loop through zipped zeros-all and ones-all masks and in essence do :
# Switch between zeros and ones based on whether the other ones
# are occuring through or not, prior to the current state
for i,(z,o) in enumerate(zip(zm,om)):
if z and cur_state:
cur_state = ~cur_state
newval = 0
if o and ~cur_state:
cur_state = ~cur_state
newval = 1
out[i] = newval
return out
def check_all_numba(a, state):
# Get zeros and ones all masks
zm = (a==0).all(1)
om = (a==1).all(1)
# Decide the starting state
cur_state = zm.argmax() < om.argmax()
# Initialize 1D o/p array with given state values
out1d = np.full(len(a), fill_value=state)
func1(zm, om, out1d, state, cur_state)
# Broadcast into the 2D view for memory and perf. efficiency
return np.broadcast_to(out1d[:,None],a.shape)

You can do this without any loops by leveraging np.accumulate:
R = 5 # 100000
C = 3 # 100
mat = np.random.randint(2, size=(R,C))
print(mat) # original matrix
state = np.zeros((1,C)) # or np.ones((1,C))
mat = np.concatenate([state,mat]) # insert state row
zRows = np.isin(np.sum(mat,1),[0,C]) # all zeroes or all ones
iRows = np.arange(R+1) * zRows.astype(np.int) # base indexes
mat = mat[np.maximum.accumulate(iRows)][1:] # indirection, remove state
print(mat) # modified
#original
[[0 0 1]
[1 1 1]
[1 0 1]
[0 0 0]
[1 0 1]]
# modified
[[0 0 0]
[1 1 1]
[1 1 1]
[0 0 0]
[0 0 0]]
The way it works is by preparing an indirection array for rows that need to be changed. This is done from an np.arange of row indexes in which we set to zero the the indexes that will need replacement. Accumulating the maximum index will map each replaced row to an all-zero or all-one row before it.
For example:
[ 0, 1, 2, 3, 4, 5 ] # row indexes
[ 0, 1, 0, 0, 1, 0 ] # rows that are all zeroes or all ones (zRows)
[ 0, 1, 0, 0, 4, 0 ] # multiplied (iRows)
[ 0, 1, 1, 1, 4, 4 ] # np.maximum.accumulate
This gives us a list of indexes where row content should be taken from.
The state is represented by an extra row inserted at the beginning of the matrix before performing the operation and removed afterward.
This solution will be marginally slower for very small matrices (5x3) but it can give you a 20x speed boost for larger ones (100000x100: 0.7 second vs 14 seconds).

Here is a simple and fast numpy method:
import numpy as np
def pp():
m,n = a.shape
A = a.sum(axis=1)
A = np.where((A==0)|(A==n))[0]
if not A.size:
return np.ones_like(a) if state else np.zeros_like(a)
st = np.concatenate([np.arange(A[0]!=0), A, [m]])
v = a[st[:-1],0]
if A[0]:
v[0] = state
return np.broadcast_to(v.repeat(st[1:]-st[:-1])[:,None],(m,n))
I made some timings using this
state=0
a = (np.random.random((100000,100))<np.random.random((100000,1))).astype(int)
simple test case:
0.8655898020006134 # me
4.089095343002555 # Alain T.
2.2958932030014694 # Divakar 1
2.2178015549980046 # & 2

Finding singulars/sets of local maxima/minima in a 1D-NumPy array (once again)

I would like to have a function that can detect where the local maxima/minima are in an array (even if there is a set of local maxima/minima). Example:
Given the array
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
I would like to have an output like:
set of 2 local minima => array[0]:array[1]
set of 3 local minima => array[3]:array[5]
local minima, i = 9
set of 2 local minima => array[11]:array[12]
set of 2 local minima => array[15]:array[16]
As you can see from the example, not only are the singular values detected but, also, sets of local maxima/minima.
I know in this question there are a lot of good answers and ideas, but none of them do the job described: some of them simply ignore the extreme points of the array and all ignore the sets of local minima/maxima.
Before asking the question, I wrote a function by myself that does exactly what I described above (the function is at the end of this question: local_min(a). With the test I did, it works properly).
Question: However, I am also sure that is NOT the best way to work with Python. Are there builtin functions, APIs, libraries, etc. that I can use? Any other function suggestion? A one-line instruction? A full vectored solution?
def local_min(a):
candidate_min=0
for i in range(len(a)):
# Controlling the first left element
if i==0 and len(a)>=1:
# If the first element is a singular local minima
if a[0]<a[1]:
print("local minima, i = 0")
# If the element is a candidate to be part of a set of local minima
elif a[0]==a[1]:
candidate_min=1
# Controlling the last right element
if i == (len(a)-1) and len(a)>=1:
if candidate_min > 0:
if a[len(a)-1]==a[len(a)-2]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
if a[len(a)-1]<a[len(a)-2]:
print("local minima, i = " + str(len(a)-1))
# Controlling the other values in the middle of the array
if i>0 and i<len(a)-1 and len(a)>2:
# If a singular local minima
if (a[i]<a[i-1] and a[i]<a[i+1]):
print("local minima, i = " + str(i))
# print(str(a[i-1])+" > " + str(a[i]) + " < "+str(a[i+1])) #debug
# If it was found a set of candidate local minima
if candidate_min >0:
# The candidate set IS a set of local minima
if a[i] < a[i+1]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
candidate_min = 0
# The candidate set IS NOT a set of local minima
elif a[i] > a[i+1]:
candidate_min = 0
# The set of local minima is growing
elif a[i] == a[i+1]:
candidate_min = candidate_min + 1
# It never should arrive in the last else
else:
print("Something strange happen")
return -1
# If there is a set of candidate local minima (first value found)
if (a[i]<a[i-1] and a[i]==a[i+1]):
candidate_min = candidate_min + 1
Note: I tried to enrich the code with some comments to let understand what I do. I know that the function that I propose is
not clean and just prints the results that can be stored and returned
at the end. It was written to give an example. The algorithm I propose should be O(n).
UPDATE:
Somebody was suggesting to import from scipy.signal import argrelextrema and use the function like:
def local_min_scipy(a):
minima = argrelextrema(a, np.less_equal)[0]
return minima
def local_max_scipy(a):
minima = argrelextrema(a, np.greater_equal)[0]
return minima
To have something like that is what I am really looking for. However, it doesn't work properly when the sets of local minima/maxima have more than two values. For example:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
print(local_max_scipy(test03))
The output is:
[ 0 2 4 8 10 13 14 16]
Of course in test03[4] I have a minimum and not a maximum. How do I fix this behavior? (I don't know if this is another question or if this is the right place where to ask it.)

A full vectored solution:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1]) # Size 17
extended = np.empty(len(test03)+2) # Rooms to manage edges, size 19
extended[1:-1] = test03
extended[0] = extended[-1] = np.inf
flag_left = extended[:-1] <= extended[1:] # Less than successor, size 18
flag_right = extended[1:] <= extended[:-1] # Less than predecessor, size 18
flagmini = flag_left[1:] & flag_right[:-1] # Local minimum, size 17
mini = np.where(flagmini)[0] # Indices of minimums
spl = np.where(np.diff(mini)>1)[0]+1 # Places to split
result = np.split(mini, spl)
result:
[0, 1] [3, 4, 5] [9] [11, 12] [15, 16]
EDIT
Unfortunately, This detects also maxima as soon as they are at least 3 items large, since they are seen as flat local minima. A numpy patch will be ugly this way.
To solve this problem I propose 2 other solutions, with numpy, then with numba.
Whith numpy using np.diff :
import numpy as np
test03=np.array([12,13,12,4,4,4,5,6,7,2,6,5,5,7,7,17,17])
extended=np.full(len(test03)+2,np.inf)
extended[1:-1]=test03
slope = np.sign(np.diff(extended)) # 1 if ascending,0 if flat, -1 if descending
not_flat,= slope.nonzero() # Indices where data is not flat.
local_min_inds, = np.where(np.diff(slope[not_flat])==2)
#local_min_inds contains indices in not_flat of beginning of local mins.
#Indices of End of local mins are shift by +1:
start = not_flat[local_min_inds]
stop = not_flat[local_min_inds+1]-1
print(*zip(start,stop))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)
A direct solution compatible with numba acceleration :
##numba.njit
def localmins(a):
begin= np.empty(a.size//2+1,np.int32)
end = np.empty(a.size//2+1,np.int32)
i=k=0
begin[k]=0
search_end=True
while i<a.size-1:
if a[i]>a[i+1]:
begin[k]=i+1
search_end=True
if search_end and a[i]<a[i+1]:
end[k]=i
k+=1
search_end=False
i+=1
if search_end and i>0 : # Final plate if exists
end[k]=i
k+=1
return begin[:k],end[:k]
print(*zip(*localmins(test03)))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)

I think another function from scipy.signal would be interesting.
from scipy.signal import find_peaks
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
find_peaks(test03)
Out[]: (array([ 2, 8, 10, 13], dtype=int64), {})
find_peaks has lots of options and might be quite useful, especially for noisy signals.
Update
The function is really powerful and versatile. You can set several parameters for peak minimal width, height, distance from each other and so on. As example:
test04 = np.array([1,1,5,5,5,5,5,5,5,5,1,1,1,1,1,5,5,5,1,5,1,5,1])
find_peaks(test04, width=1)
Out[]:
(array([ 5, 16, 19, 21], dtype=int64),
{'prominences': array([4., 4., 4., 4.]),
'left_bases': array([ 1, 14, 18, 20], dtype=int64),
'right_bases': array([10, 18, 20, 22], dtype=int64),
'widths': array([8., 3., 1., 1.]),
'width_heights': array([3., 3., 3., 3.]),
'left_ips': array([ 1.5, 14.5, 18.5, 20.5]),
'right_ips': array([ 9.5, 17.5, 19.5, 21.5])})
See documentation for more examples.

There can be multiple ways to solve this. One approach listed here.
You can create a custom function, and use the maximums to handle edge cases while finding mimima.
import numpy as np
a = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
def local_min(a):
temp_list = list(a)
maxval = max(a) #use max while finding minima
temp_list = temp_list + [maxval] #handles last value edge case.
prev = maxval #prev stores last value seen
loc = 0 #used to store starting index of minima
count = 0 #use to count repeated values
#match_start = False
matches = []
for i in range(0, len(temp_list)): #need to check all values including the padded value
if prev == temp_list[i]:
if count > 0: #only increment for minima candidates
count += 1
elif prev > temp_list[i]:
count = 1
loc = i
# match_start = True
else: #prev < temp_list[i]
if count > 0:
matches.append((loc, count))
count = 0
loc = i
prev = temp_list[i]
return matches
result = local_min(a)
for match in result:
print ("{} minima found starting at location {} and ending at location {}".format(
match[1],
match[0],
match[0] + match[1] -1))
Let me know if this does the trick for you. The idea is simple, you want to iterate through the list once and keep storing minima as you see them. Handle the edges by padding with maximum values on either end. (or by padding the last end, and using the max value for initial comparison)

Here's an answer based on restriding the array into an iterable of windows:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def windowstride(a, window):
return as_strided(a, shape=(a.size - window + 1, window), strides=2*a.strides)
def local_min(a, maxwindow=None, doends=True):
if doends: a = np.pad(a.astype(float), 1, 'constant', constant_values=np.inf)
if maxwindow is None: maxwindow = a.size - 1
mins = []
for i in range(3, maxwindow + 1):
for j,w in enumerate(windowstride(a, i)):
if (w[0] > w[1]) and (w[-2] < w[-1]):
if (w[1:-1]==w[1]).all():
mins.append((j, j + i - 2))
mins.sort()
return mins
Testing it out:
test03=np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
local_min(test03)
Output:
[(0, 2), (3, 6), (9, 10), (11, 13), (15, 17)]
Not the most efficient algorithm, but at least it's short. I'm pretty sure it's O(n^2), since there's roughly 1/2*(n^2 + n) windows to iterate over. This is only partially vectorized, so there may be a way to improve it.
Edit
To clarify, the output is the indices of the slices that contain the runs of local minimum values. The fact that they go one past the end of the run is intentional (someone just tried to "fix" that in an edit). You can use the output to iterate over the slices of minimum values in your input array like this:
for s in local_mins(test03):
print(test03[slice(*s)])
Output:
[2 2]
[4 4 4]
[2]
[5 5]
[1 1]

A pure numpy solution (revised answer):
import numpy as np
y = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
x = np.r_[y[0]+1, y, y[-1]+1] # pad edges, gives possibility for minima
ups, = np.where(x[:-1] < x[1:])
downs, = np.where(x[:-1] > x[1:])
minend = ups[np.unique(np.searchsorted(ups, downs))]
minbeg = downs[::-1][np.unique(np.searchsorted(-downs[::-1], -ups[::-1]))][::-1]
minlen = minend - minbeg
for line in zip(minlen, minbeg, minend-1): print("set of %d minima %d - %d" % line)
This gives
set of 2 minima 0 - 1
set of 3 minima 3 - 5
set of 1 minima 9 - 9
set of 2 minima 11 - 12
set of 2 minima 15 - 16
np.searchsorted(ups, downs) finds the first ups after every down. This is the "true" end of a minimum.
For the start of the minima, we do it similar, but now in reverse order.
It is working for the example, yet not fully tested. But I would say a good starting point.

You can use argrelmax, as long as there no multiple consecutive equal elements, so first you need to run length encode the array, then use argrelmax (or argrelmin):
import numpy as np
from scipy.signal import argrelmax
from itertools import groupby
def local_max_scipy(a):
start = 0
result = [[a[0] - 1, 0, 0]] # this is to guarantee the left edge is included
for k, g in groupby(a):
length = sum(1 for _ in g)
result.append([k, start, length])
start += length
result.append([a[-1] - 1, 0, 0]) # this is to guarantee the right edge is included
arr = np.array(result)
maxima, = argrelmax(arr[:, 0])
return arr[maxima]
test03 = np.array([2, 2, 10, 4, 4, 4, 5, 6, 7, 2, 6, 5, 5, 7, 7, 1, 1])
output = local_max_scipy(test03)
for val, start, length in output:
print(f'set of {length} maxima start:{start} end:{start + length}')
Output
set of 1 maxima start:2 end:3
set of 1 maxima start:8 end:9
set of 1 maxima start:10 end:11
set of 2 maxima start:13 end:15

Value in an array between two numbers in python

So making a title that actually explains what i want is harder than i thought, so here goes me explaining it.
I have an array filled with zeros that adds values every time a condition is met, so after 1 time step iteration i get something like this (minus the headers):
current_array =
bubble_size y_coord
14040 42
3943 71
6345 11
0 0
0 0
....
After this time step is complete this current_array gets set as previous_array and is wiped with zeros because there is not a guaranteed number of entries each time.
NOW the real question is i want to be able to check all rows in the first column of the previous_array and see if the current bubble size is within say 5% either side and if so i want to take the current y position away for the value associated with the matching bubble size number in the previous_array's second column.
currently i have something like;
if bubble_size in current_array[:, 0]:
do_whatever
but i don't know how to pull out the associated y_coord without using a loop, which i am fine with doing (there is about 100 rows to the array and atleast 1000 time steps so i want to make it as efficient as possible) but would like to avoid
i have included my thoughts on the for loop (note the current and previous_array are actually current and previous_frame)
for y in range (0, array_size):
if bubble_size >> previous_frame[y,0] *.95 &&<< previous_frame[y, 0] *1.05:
distance_travelled = current_y_coord - previous_frame[y,0]
y = y + 1
Any help is greatly appreciated :)

I probably did not get your issue here but if you want to first check if the bubble size is in between the same row element 95 % you can use the following:
import numpy as np
def apply(p, c): # For each element check the bubblesize grow
if(p*0.95 < c < p*1.05):
return 1
else:
return 0
def dist(p, c): # Calculate the distance
return c-p
def update(prev, cur):
assert isinstance(
cur, np.ndarray), 'Current array is not a valid numpy array'
assert isinstance(
prev, np.ndarray), 'Previous array is not a valid numpy array'
assert prev.shape == cur.shape, 'Arrays size mismatch'
applyvec = np.vectorize(apply)
toapply = applyvec(prev[:, 0], cur[:, 0])
print(toapply)
distvec = np.vectorize(dist)
distance = distvec(prev[:, 1], cur[:, 1])
print(distance)
current = np.array([[14040, 42],
[3943,71],
[6345,11],
[0,0],
[0,0]])
previous = np.array([[14039, 32],
[3942,61],
[6344,1],
[0,0],
[0,0]])
update(previous,current)
PS: Please, could you tell us what is the final array you look for based on my examples?

As I understand it (correct me if Im wrong):
You have a current bubble size (integer) and a current y value (integer)
You have a 2D array (prev_array) that contains bubble sizes and y coords
You want to check whether your current bubble size is within 5% (either way) of each stored bubble size in prev_array
If they are within range, subtract your current y value from the stored y coord
This will result in a new array, containing only bubble sizes that are within range, and the newly subtracted y value
You want to do this without an explicit loop
You can do that using boolean indexing in numpy...
Setup the previous array:
prev_array = np.array([[14040, 42], [3943, 71], [6345, 11], [3945,0], [0,0]])
prev_array
array([[14040, 42],
[ 3943, 71],
[ 6345, 11],
[ 3945, 0],
[ 0, 0]])
You have your stored bubble size you want to use for comparison, and a current y coord value:
bubble_size = 3750
cur_y = 10
Next we can create a boolean mask where we only select rows of prev_array that meets the 5% criteria:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
# ind is a boolean array that looks like this: [False, True, False, True, False]
Then we use ind to index prev_array, and calculate the new (subtracted) y coords:
new_array = prev_array[ind]
new_array[:,1] = cur_y - new_array[:,1]
Giving your final output array:
array([[3943, -61],
[3945, 10]])
As its not clear what you want your output to actually look like, instead of creating a new array, you can also just update prev_array with the new y values:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
prev_array[ind,1] = cur_y - prev_array[ind,1]
Which gives:
array([[14040, 42],
[ 3943, -61],
[ 6345, 11],
[ 3945, 10],
[ 0, 0]])

what this python code trying to do

The following python code is to traverse a 2D grid of (c, g) in some special order, which is stored in "jobs" and "job_queue". But I am not sure which kind of order it is after trying to understand the code. Is someone able to tell about the order and give some explanation for the purpose of each function? Thanks and regards!
import Queue
c_begin, c_end, c_step = -5, 15, 2
g_begin, g_end, g_step = 3, -15, -2
def range_f(begin,end,step):
# like range, but works on non-integer too
seq = []
while True:
if step > 0 and begin > end: break
if step < 0 and begin < end: break
seq.append(begin)
begin = begin + step
return seq
def permute_sequence(seq):
n = len(seq)
if n <= 1: return seq
mid = int(n/2)
left = permute_sequence(seq[:mid])
right = permute_sequence(seq[mid+1:])
ret = [seq[mid]]
while left or right:
if left: ret.append(left.pop(0))
if right: ret.append(right.pop(0))
return ret
def calculate_jobs():
c_seq = permute_sequence(range_f(c_begin,c_end,c_step))
g_seq = permute_sequence(range_f(g_begin,g_end,g_step))
nr_c = float(len(c_seq))
nr_g = float(len(g_seq))
i = 0
j = 0
jobs = []
while i < nr_c or j < nr_g:
if i/nr_c < j/nr_g:
# increase C resolution
line = []
for k in range(0,j):
line.append((c_seq[i],g_seq[k]))
i = i + 1
jobs.append(line)
else:
# increase g resolution
line = []
for k in range(0,i):
line.append((c_seq[k],g_seq[j]))
j = j + 1
jobs.append(line)
return jobs
def main():
jobs = calculate_jobs()
job_queue = Queue.Queue(0)
for line in jobs:
for (c,g) in line:
job_queue.put((c,g))
main()
EDIT:
There is a value for each (c,g). The code actually is to search in the 2D grid of (c,g) to find a grid point where the value is the smallest. I guess the code is using some kind of heuristic search algorithm? The original code is here http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/gridsvr/gridregression.py, which is a script to search for svm algorithm the best values for two parameters c and g with minimum validation error.

permute_sequence reorders a list of values so that the middle value is first, then the midpoint of each half, then the midpoints of the four remaining quarters, and so on. So permute_sequence(range(1000)) starts out like this:
[500, 250, 750, 125, 625, 375, ...]
calculate_jobs alternately fills in rows and columns using the sequences of 1D coordinates provided by permute_sequence.
If you're going to search the entire 2D space eventually anyway, this does not help you finish sooner. You might as well just scan all the points in order. But I think the idea was to find a decent approximation of the minimum as early as possible in the search. I suspect you could do about as well by shuffling the list randomly.
xkcd readers will note that the urinal protocol would give only slightly different (and probably better) results:
[0, 1000, 500, 250, 750, 125, 625, 375, ...]

Here is an example of permute_sequence in action:
print permute_sequence(range(8))
# prints [4, 2, 6, 1, 5, 3, 7, 0]
print permute_sequence(range(12))
# prints [6, 3, 9, 1, 8, 5, 11, 0, 7, 4, 10, 2]
I'm not sure why it uses this order, because in main, it appears that all candidate pairs of (c,g) are still evaluated, I think.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast implementation of finding duplicate coordinates - python

Related

Python loop getting stuck?

How to vectorize a loop through a matrix numpy

Finding singulars/sets of local maxima/minima in a 1D-NumPy array (once again)

Value in an array between two numbers in python

what this python code trying to do

Categories

Resources