I'm trying to implement Held-Karp in Python but it doesn't seem to be working. There are two differences in my problem from the classical TSP problem (as used in the description of the H-K algorithm I found in the web):
- I don't need to return to the original node. I think this is called a Hamiltonian cycle but I'm not very familiar with graph algorithms so I'm not entirely sure. Each city needs to be visited once as in TSP
- Some edges in the graph are missing
I used networkx and created this function:
def hk(nodes,Graph,start_node, total_ops, min_weight = 9999999,min_result = []):
nodes.remove(start_node)
# removes the current node from the set of nodes
for next_node in nodes:
total_ops += 1
current_weight = 0
try:
current_weight += Graph[start_node][next_node]["weight"]
# checks if there's an edge between the current node and the next node
# If there's an edge, adds the edge weight
except:
continue
sub_result = []
if len(nodes) > 1:
new_nodes = set(nodes)
sub_result,sub_weight,total_ops = hk(new_nodes,Graph,next_node, total_ops)
#calculates the minimum weight of the remaining tree
current_weight += sub_weight
if current_weight < min_weight:
# if the weight of the tree is below the minimum weight, update minimum weight
min_weight = current_weight
min_result = [next_node] + sub_result
return min_result,min_weight,total_ops
But something is clearly wrong as I'm expecting O(n ** 2 * 2 ** n) complexity but am getting O(n!) instead, same as for the brute force method (trying all combinations one by one). Clearly, there's an error in my implementation.
Thank you.
Related
The problem:
There is a list of positive integers where the elements are unique and monotonically increasing. The list is s. For example s=[1,2,3,4,5,6,7,8,9,10].
The length of the list is n. The goal is to find the total number of ordered pairs (L, W) where L and W are in s that satisfies L*W <= a
The solution of the problem seems straight forward, yet I cannot figure out how did I do it wrong. This was an online assessment problem and I have failed already. If someone can tell me how did I fail, I will be very happy.
The approach is straight forward: for each possible L, find an upper bound of W in s using binary search. My code using Python:
def configurationCount(n, s, a):
# n is the length of s.
num_ways = 0
# Consider every L.
for i in range(n):
L = s[i]
goal = a//L
# Binary search for this upper bound in s.
i = 0
j = len(s) - 1
mid = (i+j)//2
exact_match = False
valid = False
cursor = -1
while(i<=j):
if goal > s[mid]:
i = mid+1
cursor = mid
mid = (i+j)//2
elif goal < s[mid]:
j = mid - 1
cursor = mid
mid = (i+j)//2
else:
exact_match = True
cursor = mid
break
if cursor in range(n) and L*s[cursor]<=a:
valid = True
if cursor in range(n) and valid:
num_ways += (cursor + 1)
return num_ways
This code only gave one correct output for one test case, but failed for the other 10 test cases. In one case where I could see that my out put was wrong, the correct output was more than 2000, where mine was about 1200. So I probably did not count some cases. Yet this approach seems so standard but how did I do it wrong?
I am in the midst of a project and would like to find all solutions to the android pattern unlock. If you have not seen it before, here it is, with a link to a stack overflow post discussing it in more detail.
The base rules are:
Only visit a node 0 or 1 times
No jumping over unvisited nodes
No cyclic paths
My implementation deals with solving the problem for a N by M grid, with a cap on the max length of a pattern. Here it is:
def get_all_sols(grid_size: (int, int), max_len: int) -> list:
"""
Return all solutions to the android problem as a list
:param grid_size: (x, y) size of the grid
:param max_len: maximum number of nodes in the solution
"""
sols = []
def r_sols(current_sol):
current_y = current_sol[-1] // grid_size[1] # The solution values are stored as ids ->> 0, 1, 2 for an example 3x3 grid
current_x = current_sol[-1] - current_y * grid_size[1] # Cache x and y of last visited node 3, 4, 5
grid = {} # Prepping a dict to store options for travelling 6, 7, 8
grid_id = -1
for y in range(grid_size[1]):
for x in range(grid_size[0]):
grid_id += 1
if grid_id in current_sol: # Avoid using the same node twice
continue
dist = (x - current_x) ** 2 + (y - current_y) ** 2 # Find some kind of distance, no need to root since all values will be like this
slope = math.atan2((y - current_y), (x - current_x)) # Likely very slow, but need to hold some kind of slope value,
# so that jumping over a point can be detected
# If the option table doesnt have the slope add a new entry with distance and id
# if it does, check distances and pick the closer one
grid[slope] = (dist, grid_id) if grid.get(slope) is None or grid[slope][0] > dist else grid[slope]
# The code matches the android login criteria, since:
# - Each node is visited either 0 or 1 time(s)
# - The path cannot jump over unvisited nodes, but can over visited ones
# - The path is not a cycle
r_sol = [current_sol]
if len(current_sol) == max_len: # Stop if hit the max length and return
return r_sol
for _, opt in grid.values(): # Else recurse for each possible choice
r_sol += r_sols(current_sol + [opt])
return r_sol
for start in range(grid_size[0] * grid_size[1]):
sols += r_sols([start])
return sols
My current issue is the runtime as the paths or grid get bigger. Could I get some help optimizing the function?
For verification, a 4x4 grid should have these path stats:
1 nodes: 16 paths
2 nodes: 172 paths
3 nodes: 1744 paths
4 nodes: 16880 paths
5 nodes: 154680 paths
6 nodes: 1331944 paths
7 nodes: 10690096 paths
Assuming the algorithm is correct, you can apply some small optimizations. The biggest one is to cut the algorithm earlier by moving the len(current_sol) == max_len earlier. Then, you can compute set(current_sol) so to speed up list searching. Then, you can replace val**2 by val*val and store some temporary result not to recompute them. In fact, every basic operation is slow with CPython and it performs almost no optimization. Here is the resulting code:
def get_all_sols_faster(grid_size: (int, int), max_len: int) -> list:
sols = []
def r_sols(current_sol):
r_sol = [current_sol]
if len(current_sol) == max_len:
return r_sol
current_y = current_sol[-1] // grid_size[1]
current_x = current_sol[-1] - current_y * grid_size[1]
grid = {}
grid_id = -1
current_sol_set = set(current_sol)
for y in range(grid_size[1]):
for x in range(grid_size[0]):
grid_id += 1
if grid_id in current_sol_set:
continue
diff_x, diff_y = x - current_x, y - current_y
dist = diff_x * diff_x + diff_y * diff_y
slope = math.atan2(diff_y, diff_x)
tmp = grid.get(slope)
grid[slope] = (dist, grid_id) if tmp is None or tmp[0] > dist else tmp
for _, opt in grid.values():
r_sol += r_sols(current_sol + [opt])
return r_sol
for start in range(grid_size[0] * grid_size[1]):
sols += r_sols([start])
return sols
This code is about 3 time faster.
Honestly, for such a bruteforce algorithm, CPython is a mess. I think you should use a native compiled language to get a much faster code (certainly at least an order of magnitude faster). Note that counting results instead of producing all the solution should also be faster.
I'm trying to find the solution of the one-max problem with a genetic algorithm, but it is not converging, instead the maximum fitness is getting lower. I can't see why it's not working; I tried to execute the functions on their own and they worked, I'm not sure about the calling in the main though.the one max problem is when you have a population N of binary individuals (1/0) of length m, and you want to optimize your population so you generate at least one individual containing only 1s (in my case 0s)
Here's my code:
import random
def fitness(individual):
i = 0
for m in individual:
if m == 0:
i += 1
return i
def selection(pop):
chosen = []
for i in range(len(pop)):
aspirants = []
macs = []
for j in range(3):
aspirants.append(random.choice(pop))
if fitness(aspirants[0]) > fitness(aspirants[1]):
if fitness(aspirants[0]) > fitness(aspirants[2]):
macs = aspirants[0]
else: macs = aspirants[2]
else:
if fitness(aspirants[1]) > fitness(aspirants[2]):
macs = aspirants[1]
else: macs = aspirants[2]
chosen.append(macs)
return chosen
def crossover(offspring):
for child1, child2 in zip(offspring[::2], offspring[1::2]):
if random.random() < 0.7:
child1[50:100], child2[50:100]=child2[50:100], child1[50:100]
def mutate(offspring):
for mut in offspring:
if random.random() < 0.3:
for i in range(len(mut)):
if random.random() < 0.05:
mut[i] = type(mut[i])(not mut[i])
def gen_individ():
ind = []
for s in range(100):
ind.append(random.randint(0, 1))
return ind
def gen_pop():
pop = []
for s in range(300):
pop.append(gen_individ())
return pop
g = 0
popul = gen_pop()
print("length of pop = %i "% len(popul))
fits = []
for k in popul:
fits.append(fitness(k))
print("best fitness before = %i"% max(fits))
while(max(fits) < 100 and g < 100):
g += 1
offspring = []
offspring = selection(popul)
crossover(offspring)
mutate(offspring)
popul.clear()
popul[:] = offspring
fits.clear()
for k in popul:
fits.append(fitness(k))
print("lenght of pop = %i "% len(popul))
print("best fitness after = %i"% max(fits))
print("generation : %i"%g)
The problem seems to be that in all your functions, you always just modify the same individuals instead of creating copies. For instance, in the selection function you repeatedly select the best-out-of-three (in a rather convoluted way), and then insert multiple references to that same list into the chosen list. Later, when you mutate any of those, you mutate all the references. In the end you might even end up with just N references to the same list, at which point obviously no more actual selection can take place.
Instead, you should create copies of the lists. This can happen in different places: in your main method, in mutate and recombine, or in the selection for the next iteration. I'll put it into selection, mainly for the reason that this function can be improved in other ways, too:
def selection(pop):
chosen = []
for i in range(len(pop)):
# a lot shorter
aspirants = random.sample(pop, 3)
macs = max(aspirants, key=fitness)
# create COPIES of the individual, not multiple references
chosen.append(macs[:])
return chosen
With this, you should get a quality of 100 each time.
I made a function which determines the height of BST, but when the height of the tree is e.g. 2, the result that appears for me is 3, etc. I don't know what I should change in my code. If you need whole code to be able to answer me, tell me, so I'll copy it.
def maxDepth(self, node):
if node is None:
return 0
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Instead of return 0 just do return -1 and you'll get desired height smaller by 1. Corrected code is below:
def maxDepth(self, node):
if node is None:
return -1
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Also you can use built-in max() function to make your code shorter:
def maxDepth(self, node):
if node is None:
return -1
return max(self.maxDepth(node.left), self.maxDepth(node.right)) + 1
Note: OP is correct, height should be edge-based, i.e. tree with one node 5 should have height of 0. And empty tree (None-tree) has height -1. There are two proofs of this:
One proof in Wikipedia Tree Article says that height is edge based and Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1.
And another proof in famous book Cormen T.H. - Introduction to Algorithms:
Given n points, choose a point in the given list such that the sum of distances to this point is minimum ,compared to all others.
Distance is measured in the following manner.
For a point (x,y) all 8 adjacent points have distance 1.
(x+1,y)(x+1,y+1),(x+1,y-1),(x,y+1),(x,y-1),(x-1,y)(x-1,y+1),(x-1,y-1)
EDIT
More clearer explanation.
A function foo is defined as
foo(point_a,point_b) = max(abs(point_a.x - point_b.x),abs(point_a.y - point_b.y))
Find a point x such that sum([foo(x,y) for y in list_of_points]) is minimum.
Example
Input:
12 -14
-3 3
-14 7
-14 -3
2 -12
-1 -6
Output
-1 -6
Eg:
Distance between (4,5) and 6,7) is 2.
This can be done in O(n^2) time, by checking the sum of each pair.
Is there any better algorithm to do it?
Update: it sometimes fails to find the optimum, I'll leave this here till I find the problem.
this is O(n): nth is O(n) (expected, not worst), iterating over the list is O(n). If you need strict O() then pick the middle element with sorting but then it's going to be O(n*log(n)).
Note: it's easy to modifiy it to return all the optimal points.
import sys
def nth(sample, n):
pivot = sample[0]
below = [s for s in sample if s < pivot]
above = [s for s in sample if s > pivot]
i, j = len(below), len(sample)-len(above)
if n < i: return nth(below, n)
elif n >= j: return nth(above, n-j)
else: return pivot
def getbest(li):
''' li is a list of tuples (x,y) '''
l = len(li)
lix = [x[0] for x in li]
liy = [x[1] for x in li]
mid_x1 = nth(lix, l/2) if l%2==1 else nth(lix, l/2-1)
mid_x2 = nth(lix, l/2)
mid_y1 = nth(liy, l/2) if l%2==1 else nth(liy, l/2-1)
mid_y2 = nth(liy, l/2)
mindist = sys.maxint
minp = None
for p in li:
dist = 0 if mid_x1 <= p[0] <= mid_x2 else min(abs(p[0]-mid_x1), abs(p[0]-mid_x2))
dist += 0 if mid_y1 <= p[1] <= mid_y2 else min(abs(p[1]-mid_y1), abs(p[1]-mid_y2))
if dist < mindist:
minp, mindist = p, dist
return minp
It's based on the solution of the one dimensional problem - for a list of numbers find a number for which the sum distance is the minimum.
The solution for this is the middle element of the (sorted) list or any number between the two middle elements (including these two elements) if there are an even number of elements in the list.
Update: my nth algorithm seems to be very slow, probably there is a better way to rewrite it, sort outperforms it with < 100000 elements, so if you do speed comparison, just add sort(lix); sort(liy); and
def nth(sample, n):
return sample[n]
For anyone out there who wants to test his solution, here is what I use. Just run a loop, generate input and compare your solution with the output of bruteforce.
import random
def example(length):
l = []
for x in range(length):
l.append((random.randint(-100, 100), random.randint(-100,100)))
return l
def bruteforce(li):
bestsum = sys.maxint
bestp = None
for p in li:
sum = 0
for p1 in li:
sum += max(abs(p[0]-p1[0]), abs(p[1]-p1[1]))
if sum < bestsum:
bestp, bestsum = p, sum
return bestp
I can imagine a scheme better than O(n^2), at least in the common case.
Build a quadtree out of your input points. For each node in the tree, compute the number and average position of the points within that node. Then for each point, you can use the quadtree to compute its distance to all other points in less than O(n) time. If you're computing the distance from a point p to a distant quadtree node v, and v doesn't overlap the 45 degree diagonals from p, then the total distance from p to all the points in v is easy to compute (for v which are more horizontally than vertically separated from p, it is just v.num_points * |p.x - v.average.x|, and similarly using y coordinates if v is predominately vertically seperated). If v overlaps one of the 45 degree diagonals, recurse on its components.
That should beat O(n^2), at least when you can find a balanced quadtree to represent your points.