Related
So I had to write a code to find all the diagonal strings in a grid mystery
mystery = [["r","a","w","b","i","t"],
["x","a","y","z","c","h"],
["p","q","b","e","i","e"],
["t","r","s","b","o","g"],
["u","w","x","v","i","t"],
["n","m","r","w","o","t"]]
And here is what I have so far, with the help of a few experts because I'm new to this. The expert who helped me is https://stackoverflow.com/users/5237560/alain-t
def diagsDownRight(M):
diags,pad = [],[]
while any(M):
edge = [*next(zip(*reversed(M))),*M[0][1:]]
M = [r[1:] for r in M[1:]]
diags.append(pad+edge+pad)
pad.append("")
return [*map("".join,zip(*diags))]
While this does work, I myself find it hard to grasp and I do not want to just write down a code that I do not understand. So, can anyone please help make the code as basic as possible?
When I mean basic as possible, I mean like picture yourself as a person who has just learnt coding for a couple of months, and please try to simplify my code as much as possible.
The easiest I could think of: pad rows so that diagonals become columns. The code:
def diagsDownRight(M):
n = len(M)
m = [[''] * (n-i-1) + row + [''] * i for i, row in enumerate(M)] # pad rows
return [''.join(col) for col in zip(*m)]
The result is the same, and IMO the approach is more intuitive
consider a square matrix
[
[ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ]
]
the indexes into the diagonals are as follows
d1 = [[0,0],[1,1],[2,2]]
d2 = [[0,1],[1,2]]
d3 = [[1,0],[2,1]]
d4 = [[2,0]]
d5 = [[0,2]]
to get the middle diagonal you can simply start with the indexes
for i in range(3):
index = [i,i]
for the next diagonal we simply do the same... but offset x by 1, until we go out of bounds
for i in range(3):
if i + 1 > 2:
break
index = [i, i+1]
for the next diagonal its the same ... except we do it on the other axis
for i in range(3):
if i + 1 > 2:
break
index = [i + 1, i]
for the toprightmost (in this case ...) its the same but we add 2
for i in range(3):
if i + 2 > 2:
break
index = [i, i+2]
same for the bottom most but using the other index
for i in range(3):
if i + 2 > 2:
break
index = [i + 2, i]
I will leave it to you to extrapolate this into a working solution
here's a simpler version:
def diagsDownRight(M):
rows = len(M) # number of rows
cols = len(M[0]) # number of columns
result = [] # result will be a list of strings
leftSide = [(r,0) for r in range(rows)] # first column
topSide = [(0,c) for c in range(1,cols)] # first row
for r,c in leftSide[::-1] + topSide: # all start positions
s = "" # string on the diagonal
while r < rows and c < cols:
s += M[r][c] # accumulate characters
r += 1 # move down
c += 1 # and to the right
result.append(s) # add diagonal string to result
return result
print(diagsDownRight(mystery))
['n', 'um', 'twr', 'prxw', 'xqsvo',
'rabbit', 'ayeot', 'wzig', 'bce', 'ih', 't']
The way it works is by starting at the coordinate of the left and top positions, accumulate characters going one place to the right and down until going out of the matrix.
I would suggest you go with Marat's solution though. It is simple and elegant. If you print the m matrix I'm sure, you'll understand what's going on
I am trying to write a program which finds duplicate coordinates (x, y, z) in a 3D array. The script should mark one or multiple duplicate points with a given tolerance - one point could have more than one duplicate. I found lots of different approaches which among others use sorting approaches.
To try the code I created the following test data set:
21.9799629872016 57.4044376777929 0
22.7807110172432 57.6921361034533 0
28.660840151287 61.5676757599822 0
28.6608401512 61.56767575998 0
30.6654296288019 56.2221038199424 0
20.3752036442253 49.1392209993897 0
32.8036584048178 43.927288357851 0
35.8105426210901 51.9456462679106 0
40.8888359641279 58.6944308422108 0
40.88883596412 70.6944308422108 0
41.0892949118794 58.1598736482068 0
39.6860822776189 64.775018924006 0
39.1515250836149 64.8418385732565 0
8.21402748063493 63.5054455882466 0
8.2140275006 63.5074455882 0
8.21404548063493 63.5064455882466 0
8.2143214806 63.5084455882 0
The code I came up with is:
# given tolerance
tol = 0.01
# initialize empty list for the found duplicates
duplicates = []
# loop over all nodes
for i in range(0,len(nodes)):
# current node
curr_node = nodes[i]
# create difference vector
diff = nodes - curr_node
# get all duplicate indices (the node itself is found as well)
condition = np.where((abs(diff[:,0])<tol) & (abs(diff[:,1])<tol) & (abs(diff[:,2])<tol))
# check if more than one entry is present. If larger than 1, duplicate points exist
if len(condition[0]) > 1:
# loop over all found duplicate points
for j in range(0,len(condition[0])):
# add duplicate if not already marked as duplicate
if j>0 and condition[0][j] not in duplicates:
duplicates.append(condition[0][j] )
This code returns what I am expecting:
duplicates = [3, 14, 15, 16]
However, the code is very slow. For 300,000 points it takes about 10 minutes. I am wondering if there is any faster way to implement this.
You can place points in a grid of tolerance-sized cubes. Then, for each point, you only need to check the points from the same cube + 26 adjacent ones instead of all other points.
# compute the grid
for p in points:
cube = (
int(p[0] / tolerance),
int(p[1] / tolerance),
int(p[2] / tolerance))
grid[cube].append(p)
# check
for p in points:
cube = as above
for adj in adjacent_cubes(cube)
for p2 in grid[adj]
check_distance(p, p2)
You could sort the nodes upfront, to reduce the amount of loops needed:
import timeit
import random
nodes = [
[21.9799629872016, 57.4044376777929, 0],
[22.7807110172432, 57.6921361034533, 0],
[28.660840151287, 61.5676757599822, 0], [28.6608401512, 61.56767575998, 0],
[30.6654296288019, 56.2221038199424, 0],
[20.3752036442253, 49.1392209993897, 0],
[32.8036584048178, 43.927288357851, 0],
[35.8105426210901, 51.9456462679106, 0],
[40.8888359641279, 58.6944308422108, 0],
[40.88883596412, 70.6944308422108, 0],
[41.0892949118794, 58.1598736482068, 0],
[39.6860822776189, 64.775018924006, 0],
[39.1515250836149, 64.8418385732565, 0],
[8.21402748063493, 63.5054455882466, 0], [8.2140275006, 63.5074455882, 0],
[8.21404548063493, 63.5064455882466, 0], [8.2143214806, 63.5084455882, 0]
]
duplicates = [3, 14, 15, 16]
assertList = [n for i, n in enumerate(nodes) if i in duplicates]
def new(nodes, tol=0.01):
print(f"Searching duplicates in {len(nodes)} nodes")
coordinateLen = range(len(nodes[0]))
nodes.sort()
last = nodes[0]
duplicates = []
for i, node in enumerate(nodes[1:]):
if not all(0 <= node[idx] - last[idx] < tol for idx in coordinateLen):
last = node
else:
duplicates.append(node)
print(f"Found: {len(duplicates)} duplicates")
return duplicates
# generate random numbers!
randomNodes = [
[random.uniform(0, 100),
random.uniform(0, 100),
random.uniform(0, 1)] for _ in range(300000)
]
# make sure there are at least the same 4 duplicates!
randomNodes += nodes
for i, lst in enumerate((nodes, randomNodes)):
for func in ("new", ):
t1 = timeit.Timer(f"{func}({lst})", f"from __main__ import {func}")
# verify values of found duplicates are [3, 14, 15, 16] !!
if i == 0:
print(all(x for x in new(nodes) if x in assertList))
print(f"{func} took: {t1.timeit(number=10)} seconds")
print("")
Out:
Searching duplicates in 17 nodes
Found: 4 duplicates
True
....
new took: 0.00034904800000001845 seconds
Searching duplicates in 300017 nodes
Found: 4 duplicates
...
new took: 14.316181525000001 seconds
I would like to have a function that can detect where the local maxima/minima are in an array (even if there is a set of local maxima/minima). Example:
Given the array
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
I would like to have an output like:
set of 2 local minima => array[0]:array[1]
set of 3 local minima => array[3]:array[5]
local minima, i = 9
set of 2 local minima => array[11]:array[12]
set of 2 local minima => array[15]:array[16]
As you can see from the example, not only are the singular values detected but, also, sets of local maxima/minima.
I know in this question there are a lot of good answers and ideas, but none of them do the job described: some of them simply ignore the extreme points of the array and all ignore the sets of local minima/maxima.
Before asking the question, I wrote a function by myself that does exactly what I described above (the function is at the end of this question: local_min(a). With the test I did, it works properly).
Question: However, I am also sure that is NOT the best way to work with Python. Are there builtin functions, APIs, libraries, etc. that I can use? Any other function suggestion? A one-line instruction? A full vectored solution?
def local_min(a):
candidate_min=0
for i in range(len(a)):
# Controlling the first left element
if i==0 and len(a)>=1:
# If the first element is a singular local minima
if a[0]<a[1]:
print("local minima, i = 0")
# If the element is a candidate to be part of a set of local minima
elif a[0]==a[1]:
candidate_min=1
# Controlling the last right element
if i == (len(a)-1) and len(a)>=1:
if candidate_min > 0:
if a[len(a)-1]==a[len(a)-2]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
if a[len(a)-1]<a[len(a)-2]:
print("local minima, i = " + str(len(a)-1))
# Controlling the other values in the middle of the array
if i>0 and i<len(a)-1 and len(a)>2:
# If a singular local minima
if (a[i]<a[i-1] and a[i]<a[i+1]):
print("local minima, i = " + str(i))
# print(str(a[i-1])+" > " + str(a[i]) + " < "+str(a[i+1])) #debug
# If it was found a set of candidate local minima
if candidate_min >0:
# The candidate set IS a set of local minima
if a[i] < a[i+1]:
print("set of " + str(candidate_min+1)+ " local minima => array["+str(i-candidate_min)+"]:array["+str(i)+"]")
candidate_min = 0
# The candidate set IS NOT a set of local minima
elif a[i] > a[i+1]:
candidate_min = 0
# The set of local minima is growing
elif a[i] == a[i+1]:
candidate_min = candidate_min + 1
# It never should arrive in the last else
else:
print("Something strange happen")
return -1
# If there is a set of candidate local minima (first value found)
if (a[i]<a[i-1] and a[i]==a[i+1]):
candidate_min = candidate_min + 1
Note: I tried to enrich the code with some comments to let understand what I do. I know that the function that I propose is
not clean and just prints the results that can be stored and returned
at the end. It was written to give an example. The algorithm I propose should be O(n).
UPDATE:
Somebody was suggesting to import from scipy.signal import argrelextrema and use the function like:
def local_min_scipy(a):
minima = argrelextrema(a, np.less_equal)[0]
return minima
def local_max_scipy(a):
minima = argrelextrema(a, np.greater_equal)[0]
return minima
To have something like that is what I am really looking for. However, it doesn't work properly when the sets of local minima/maxima have more than two values. For example:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
print(local_max_scipy(test03))
The output is:
[ 0 2 4 8 10 13 14 16]
Of course in test03[4] I have a minimum and not a maximum. How do I fix this behavior? (I don't know if this is another question or if this is the right place where to ask it.)
A full vectored solution:
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1]) # Size 17
extended = np.empty(len(test03)+2) # Rooms to manage edges, size 19
extended[1:-1] = test03
extended[0] = extended[-1] = np.inf
flag_left = extended[:-1] <= extended[1:] # Less than successor, size 18
flag_right = extended[1:] <= extended[:-1] # Less than predecessor, size 18
flagmini = flag_left[1:] & flag_right[:-1] # Local minimum, size 17
mini = np.where(flagmini)[0] # Indices of minimums
spl = np.where(np.diff(mini)>1)[0]+1 # Places to split
result = np.split(mini, spl)
result:
[0, 1] [3, 4, 5] [9] [11, 12] [15, 16]
EDIT
Unfortunately, This detects also maxima as soon as they are at least 3 items large, since they are seen as flat local minima. A numpy patch will be ugly this way.
To solve this problem I propose 2 other solutions, with numpy, then with numba.
Whith numpy using np.diff :
import numpy as np
test03=np.array([12,13,12,4,4,4,5,6,7,2,6,5,5,7,7,17,17])
extended=np.full(len(test03)+2,np.inf)
extended[1:-1]=test03
slope = np.sign(np.diff(extended)) # 1 if ascending,0 if flat, -1 if descending
not_flat,= slope.nonzero() # Indices where data is not flat.
local_min_inds, = np.where(np.diff(slope[not_flat])==2)
#local_min_inds contains indices in not_flat of beginning of local mins.
#Indices of End of local mins are shift by +1:
start = not_flat[local_min_inds]
stop = not_flat[local_min_inds+1]-1
print(*zip(start,stop))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)
A direct solution compatible with numba acceleration :
##numba.njit
def localmins(a):
begin= np.empty(a.size//2+1,np.int32)
end = np.empty(a.size//2+1,np.int32)
i=k=0
begin[k]=0
search_end=True
while i<a.size-1:
if a[i]>a[i+1]:
begin[k]=i+1
search_end=True
if search_end and a[i]<a[i+1]:
end[k]=i
k+=1
search_end=False
i+=1
if search_end and i>0 : # Final plate if exists
end[k]=i
k+=1
return begin[:k],end[:k]
print(*zip(*localmins(test03)))
#(0, 1) (3, 5) (9, 9) (11, 12) (15, 16)
I think another function from scipy.signal would be interesting.
from scipy.signal import find_peaks
test03 = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
find_peaks(test03)
Out[]: (array([ 2, 8, 10, 13], dtype=int64), {})
find_peaks has lots of options and might be quite useful, especially for noisy signals.
Update
The function is really powerful and versatile. You can set several parameters for peak minimal width, height, distance from each other and so on. As example:
test04 = np.array([1,1,5,5,5,5,5,5,5,5,1,1,1,1,1,5,5,5,1,5,1,5,1])
find_peaks(test04, width=1)
Out[]:
(array([ 5, 16, 19, 21], dtype=int64),
{'prominences': array([4., 4., 4., 4.]),
'left_bases': array([ 1, 14, 18, 20], dtype=int64),
'right_bases': array([10, 18, 20, 22], dtype=int64),
'widths': array([8., 3., 1., 1.]),
'width_heights': array([3., 3., 3., 3.]),
'left_ips': array([ 1.5, 14.5, 18.5, 20.5]),
'right_ips': array([ 9.5, 17.5, 19.5, 21.5])})
See documentation for more examples.
There can be multiple ways to solve this. One approach listed here.
You can create a custom function, and use the maximums to handle edge cases while finding mimima.
import numpy as np
a = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
def local_min(a):
temp_list = list(a)
maxval = max(a) #use max while finding minima
temp_list = temp_list + [maxval] #handles last value edge case.
prev = maxval #prev stores last value seen
loc = 0 #used to store starting index of minima
count = 0 #use to count repeated values
#match_start = False
matches = []
for i in range(0, len(temp_list)): #need to check all values including the padded value
if prev == temp_list[i]:
if count > 0: #only increment for minima candidates
count += 1
elif prev > temp_list[i]:
count = 1
loc = i
# match_start = True
else: #prev < temp_list[i]
if count > 0:
matches.append((loc, count))
count = 0
loc = i
prev = temp_list[i]
return matches
result = local_min(a)
for match in result:
print ("{} minima found starting at location {} and ending at location {}".format(
match[1],
match[0],
match[0] + match[1] -1))
Let me know if this does the trick for you. The idea is simple, you want to iterate through the list once and keep storing minima as you see them. Handle the edges by padding with maximum values on either end. (or by padding the last end, and using the max value for initial comparison)
Here's an answer based on restriding the array into an iterable of windows:
import numpy as np
from numpy.lib.stride_tricks import as_strided
def windowstride(a, window):
return as_strided(a, shape=(a.size - window + 1, window), strides=2*a.strides)
def local_min(a, maxwindow=None, doends=True):
if doends: a = np.pad(a.astype(float), 1, 'constant', constant_values=np.inf)
if maxwindow is None: maxwindow = a.size - 1
mins = []
for i in range(3, maxwindow + 1):
for j,w in enumerate(windowstride(a, i)):
if (w[0] > w[1]) and (w[-2] < w[-1]):
if (w[1:-1]==w[1]).all():
mins.append((j, j + i - 2))
mins.sort()
return mins
Testing it out:
test03=np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
local_min(test03)
Output:
[(0, 2), (3, 6), (9, 10), (11, 13), (15, 17)]
Not the most efficient algorithm, but at least it's short. I'm pretty sure it's O(n^2), since there's roughly 1/2*(n^2 + n) windows to iterate over. This is only partially vectorized, so there may be a way to improve it.
Edit
To clarify, the output is the indices of the slices that contain the runs of local minimum values. The fact that they go one past the end of the run is intentional (someone just tried to "fix" that in an edit). You can use the output to iterate over the slices of minimum values in your input array like this:
for s in local_mins(test03):
print(test03[slice(*s)])
Output:
[2 2]
[4 4 4]
[2]
[5 5]
[1 1]
A pure numpy solution (revised answer):
import numpy as np
y = np.array([2,2,10,4,4,4,5,6,7,2,6,5,5,7,7,1,1])
x = np.r_[y[0]+1, y, y[-1]+1] # pad edges, gives possibility for minima
ups, = np.where(x[:-1] < x[1:])
downs, = np.where(x[:-1] > x[1:])
minend = ups[np.unique(np.searchsorted(ups, downs))]
minbeg = downs[::-1][np.unique(np.searchsorted(-downs[::-1], -ups[::-1]))][::-1]
minlen = minend - minbeg
for line in zip(minlen, minbeg, minend-1): print("set of %d minima %d - %d" % line)
This gives
set of 2 minima 0 - 1
set of 3 minima 3 - 5
set of 1 minima 9 - 9
set of 2 minima 11 - 12
set of 2 minima 15 - 16
np.searchsorted(ups, downs) finds the first ups after every down. This is the "true" end of a minimum.
For the start of the minima, we do it similar, but now in reverse order.
It is working for the example, yet not fully tested. But I would say a good starting point.
You can use argrelmax, as long as there no multiple consecutive equal elements, so first you need to run length encode the array, then use argrelmax (or argrelmin):
import numpy as np
from scipy.signal import argrelmax
from itertools import groupby
def local_max_scipy(a):
start = 0
result = [[a[0] - 1, 0, 0]] # this is to guarantee the left edge is included
for k, g in groupby(a):
length = sum(1 for _ in g)
result.append([k, start, length])
start += length
result.append([a[-1] - 1, 0, 0]) # this is to guarantee the right edge is included
arr = np.array(result)
maxima, = argrelmax(arr[:, 0])
return arr[maxima]
test03 = np.array([2, 2, 10, 4, 4, 4, 5, 6, 7, 2, 6, 5, 5, 7, 7, 1, 1])
output = local_max_scipy(test03)
for val, start, length in output:
print(f'set of {length} maxima start:{start} end:{start + length}')
Output
set of 1 maxima start:2 end:3
set of 1 maxima start:8 end:9
set of 1 maxima start:10 end:11
set of 2 maxima start:13 end:15
So making a title that actually explains what i want is harder than i thought, so here goes me explaining it.
I have an array filled with zeros that adds values every time a condition is met, so after 1 time step iteration i get something like this (minus the headers):
current_array =
bubble_size y_coord
14040 42
3943 71
6345 11
0 0
0 0
....
After this time step is complete this current_array gets set as previous_array and is wiped with zeros because there is not a guaranteed number of entries each time.
NOW the real question is i want to be able to check all rows in the first column of the previous_array and see if the current bubble size is within say 5% either side and if so i want to take the current y position away for the value associated with the matching bubble size number in the previous_array's second column.
currently i have something like;
if bubble_size in current_array[:, 0]:
do_whatever
but i don't know how to pull out the associated y_coord without using a loop, which i am fine with doing (there is about 100 rows to the array and atleast 1000 time steps so i want to make it as efficient as possible) but would like to avoid
i have included my thoughts on the for loop (note the current and previous_array are actually current and previous_frame)
for y in range (0, array_size):
if bubble_size >> previous_frame[y,0] *.95 &&<< previous_frame[y, 0] *1.05:
distance_travelled = current_y_coord - previous_frame[y,0]
y = y + 1
Any help is greatly appreciated :)
I probably did not get your issue here but if you want to first check if the bubble size is in between the same row element 95 % you can use the following:
import numpy as np
def apply(p, c): # For each element check the bubblesize grow
if(p*0.95 < c < p*1.05):
return 1
else:
return 0
def dist(p, c): # Calculate the distance
return c-p
def update(prev, cur):
assert isinstance(
cur, np.ndarray), 'Current array is not a valid numpy array'
assert isinstance(
prev, np.ndarray), 'Previous array is not a valid numpy array'
assert prev.shape == cur.shape, 'Arrays size mismatch'
applyvec = np.vectorize(apply)
toapply = applyvec(prev[:, 0], cur[:, 0])
print(toapply)
distvec = np.vectorize(dist)
distance = distvec(prev[:, 1], cur[:, 1])
print(distance)
current = np.array([[14040, 42],
[3943,71],
[6345,11],
[0,0],
[0,0]])
previous = np.array([[14039, 32],
[3942,61],
[6344,1],
[0,0],
[0,0]])
update(previous,current)
PS: Please, could you tell us what is the final array you look for based on my examples?
As I understand it (correct me if Im wrong):
You have a current bubble size (integer) and a current y value (integer)
You have a 2D array (prev_array) that contains bubble sizes and y coords
You want to check whether your current bubble size is within 5% (either way) of each stored bubble size in prev_array
If they are within range, subtract your current y value from the stored y coord
This will result in a new array, containing only bubble sizes that are within range, and the newly subtracted y value
You want to do this without an explicit loop
You can do that using boolean indexing in numpy...
Setup the previous array:
prev_array = np.array([[14040, 42], [3943, 71], [6345, 11], [3945,0], [0,0]])
prev_array
array([[14040, 42],
[ 3943, 71],
[ 6345, 11],
[ 3945, 0],
[ 0, 0]])
You have your stored bubble size you want to use for comparison, and a current y coord value:
bubble_size = 3750
cur_y = 10
Next we can create a boolean mask where we only select rows of prev_array that meets the 5% criteria:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
# ind is a boolean array that looks like this: [False, True, False, True, False]
Then we use ind to index prev_array, and calculate the new (subtracted) y coords:
new_array = prev_array[ind]
new_array[:,1] = cur_y - new_array[:,1]
Giving your final output array:
array([[3943, -61],
[3945, 10]])
As its not clear what you want your output to actually look like, instead of creating a new array, you can also just update prev_array with the new y values:
ind = (bubble_size > prev_array[:,0]*.95) & (bubble_size < prev_array[:,0]*1.05)
prev_array[ind,1] = cur_y - prev_array[ind,1]
Which gives:
array([[14040, 42],
[ 3943, -61],
[ 6345, 11],
[ 3945, 10],
[ 0, 0]])
The following python code is to traverse a 2D grid of (c, g) in some special order, which is stored in "jobs" and "job_queue". But I am not sure which kind of order it is after trying to understand the code. Is someone able to tell about the order and give some explanation for the purpose of each function? Thanks and regards!
import Queue
c_begin, c_end, c_step = -5, 15, 2
g_begin, g_end, g_step = 3, -15, -2
def range_f(begin,end,step):
# like range, but works on non-integer too
seq = []
while True:
if step > 0 and begin > end: break
if step < 0 and begin < end: break
seq.append(begin)
begin = begin + step
return seq
def permute_sequence(seq):
n = len(seq)
if n <= 1: return seq
mid = int(n/2)
left = permute_sequence(seq[:mid])
right = permute_sequence(seq[mid+1:])
ret = [seq[mid]]
while left or right:
if left: ret.append(left.pop(0))
if right: ret.append(right.pop(0))
return ret
def calculate_jobs():
c_seq = permute_sequence(range_f(c_begin,c_end,c_step))
g_seq = permute_sequence(range_f(g_begin,g_end,g_step))
nr_c = float(len(c_seq))
nr_g = float(len(g_seq))
i = 0
j = 0
jobs = []
while i < nr_c or j < nr_g:
if i/nr_c < j/nr_g:
# increase C resolution
line = []
for k in range(0,j):
line.append((c_seq[i],g_seq[k]))
i = i + 1
jobs.append(line)
else:
# increase g resolution
line = []
for k in range(0,i):
line.append((c_seq[k],g_seq[j]))
j = j + 1
jobs.append(line)
return jobs
def main():
jobs = calculate_jobs()
job_queue = Queue.Queue(0)
for line in jobs:
for (c,g) in line:
job_queue.put((c,g))
main()
EDIT:
There is a value for each (c,g). The code actually is to search in the 2D grid of (c,g) to find a grid point where the value is the smallest. I guess the code is using some kind of heuristic search algorithm? The original code is here http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/gridsvr/gridregression.py, which is a script to search for svm algorithm the best values for two parameters c and g with minimum validation error.
permute_sequence reorders a list of values so that the middle value is first, then the midpoint of each half, then the midpoints of the four remaining quarters, and so on. So permute_sequence(range(1000)) starts out like this:
[500, 250, 750, 125, 625, 375, ...]
calculate_jobs alternately fills in rows and columns using the sequences of 1D coordinates provided by permute_sequence.
If you're going to search the entire 2D space eventually anyway, this does not help you finish sooner. You might as well just scan all the points in order. But I think the idea was to find a decent approximation of the minimum as early as possible in the search. I suspect you could do about as well by shuffling the list randomly.
xkcd readers will note that the urinal protocol would give only slightly different (and probably better) results:
[0, 1000, 500, 250, 750, 125, 625, 375, ...]
Here is an example of permute_sequence in action:
print permute_sequence(range(8))
# prints [4, 2, 6, 1, 5, 3, 7, 0]
print permute_sequence(range(12))
# prints [6, 3, 9, 1, 8, 5, 11, 0, 7, 4, 10, 2]
I'm not sure why it uses this order, because in main, it appears that all candidate pairs of (c,g) are still evaluated, I think.