Horton's algorithm I coded does not work well - python

I tried coding Horton's algorithm to derive a minimum cycle basis for an unweighted undirected 2-connected graph.
However, the basis often cover the all edges of a graph.
I guess that the program could correctly make Horton set .
So how to fix my code to work correctly?
for v in G.nodes():
T = BFS_Tree(G,v)
for x,y in G.edges():
path_vtox = nx.shortest_path(T,source=v,target=x)
path_vtoy = nx.shortest_path(T,source=v,target=y)
if set(path_vtox) & set(path_vtoy) == {v}:
cycel = []
for i in range(len(path_vtox)-1):
cycle.append(path_vtox[i],path_vtox[i+1])
for i in range(len(path_vtoy)-1):
cycle.append(path_vtoy[i],path_vtoy[i+1])
cycle.append((x,y))
g = nx.Graph()
g.add_edges_from(cycle)
try:
nx.find_cycle(g)
Cycles.append(cycle)
except:
pass

Thanks for posting the question.
First, a few comments on how to ask questions and regarding your code:
You should write a self contained example. This includes working code, a toy problem and the expected output. In your case I am missing a line like
import networkx as nx at the top. Also, you are referencing classes BFS_Tree and Cycles at lines 2 and 17, without having them defined before.
There is a typo. In line 7 it should say cycle instead of cycel.
Most importantly, I'd expect a short working example of how your graph G is defined and what you would expect as an output of your code.
Now, I'll try to say something about the algorithm, even though I am probably missing a few concepts.
Is the Horton set a cycle basis of shortest length?
In general I do not see a problem when the basis can cover all edges of a graph, assuming all edges are part of some cycle. Or do you mean that basis elements cover all edges of the graph but should be shorter?
I could not find a unique reference to Horton's algorithm and am assuming you are implementing it from p. 360 in the original paper. In this reference, Horton describes the algorithm as:
1) Find a minimum path P(x,y) between each pair of points x, y
2) For each vertex v and edge {x,y} in the graph, create the cycle C(v,x,y)=P(v,x)+P(v,y)+{x,y}, and calculate its length. Degenerate cases in which P(v,x) and P(v,y) have vertices other than v in common can be omitted.
3) Order the cycles by weight
4) Use the greedy algorithm to find the minimum cycle basis from this set of cycles
In your code I only see steps 1) and 2) implemented. 3) is trivial for the unweighted case, as every cycle is weighted by its length. But it seems step 4) is missing. Horton proposes a solution on page 362 of the reference.
I hope this helps.

Related

Finding crossing numbers: Drawing a bipartite graph using Python turtle

At first I must say I am from Mathematics background and I have a very little knowledge about programming in Python. I am working on drawing complete bipartite graph with minimum number of crossings. For example: K(4,4) the complete bipartite graph with 8 vertices (grouped in 4 - 4) given in the following diagram.
The graph has crossing number = 4. I would like to draw graphs like this with higher number of vertices for example K(9,9), complete bipartite graph with 18 vertices. I have searched for different coding and theories. I found that Python has turtle package that can help me in this matter. I have planned to use an algorithm: I will start the turtle's journey from one vertex and stop on other vertex, in between the journey I will collect the coordinates of the path where it traveled and then repeat the process under the condition that It must not coincide with any other edge, if this happens then take the path which has minimum number of intersections.
Currently I am working on turtle and its commands. Any idea on how solve this problem, any recommendations on using more suitable software (paid or open source), help on algorithm, recommendations on books or research papers is highly appreciated.
I have made this diagram using yED (freely available) but for higher number of vertices manual work is very laborious.
Thanks in advance!
Sorry I'm not exactly answering the question re turtle graphics. Since you are a mathematician, don't you use LaTeX? If so, here's a little program to get you started:
from string import Template
DOC = Template("""
\\documentclass{article}
\\usepackage{tikz}
\\begin{document}
\\begin{tikzpicture}
$drawing\\end{tikzpicture}
\\end{document}
""")
class GraphDrawing(object):
def __init__(self, node_count: int):
self.node_count = node_count;
def Coord(self, i: int):
c = self.node_count
offset = 1 - c if c % 2 == 0 else - c
return offset + 2 * i
def Draw(self):
r = ''
for i in range(self.node_count):
for j in range(self.node_count):
r += f'\\draw ({self.Coord(i)},0) -- (0,{self.Coord(j)});\n'
for i in range(self.node_count):
r += f'\\filldraw ({self.Coord(i)},0) circle (3pt);\n'
r += f'\\filldraw (0,{self.Coord(i)}) circle (3pt);\n'
print(DOC.substitute({'drawing' : r}))
GraphDrawing(8).Draw()
Run through pdflatex, this produces:
If you really need turtle graphics, it should be pretty straightforward to replace the string construction in this program with turtle moves.
As http://garden.irmacs.sfu.ca/op/the_crossing_number_of_the_complete_bipartite_graph says, it is believed that the diagram you provided is always optimal. Put the one set on a vertical axis and the other on the horizontal, split close to evenly top/bottom, left/right. Then connect by straight lines. It is highly unlikely that you will find better than that.
You can see https://mathworld.wolfram.com/ZarankiewiczsConjecture.html for more, including various small n cases where it has been confirmed.

Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID).
My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.
My files look something like this:
The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:
group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.
so really simple:
def distance_clustering(growth_xml,shrinkage_xml):
coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()
counter = 0 #initalize counter to count number of filtered tracks
for track_g, param_g in growth_xml.groupby('TRACK_ID'):
# define growing track as multi-point line object
traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]
for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):
# define shrinking track as a second multi-point line object
traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]
# compute delay between shrinkage and growing ends to use as an extra filter
delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])
# keep track only if the frechet Distance is lower than 0.2 microns
if frechetDist(traj1, traj2) < 0.2 and delay > 0:
counter += 1
param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
coords_g = pd.concat([coords_g, param_g])
param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
coords_s = pd.concat([coords_s, param_s])
coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s
The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...
Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts.
Then you'll only have to look on correct case to have the result instead of calc each time.
You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster
Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap.
In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf

What is the time efficient way to calculate Global efficiency for large networks?

I have a network with 30000 nodes and over 40000 edges. I tried to calculate Global efficiency for my network with networkx but it's not time efficient. I was wondering what is the best library to calculate Global efficiency for large networks like mine?
edit 29 Sept - corrected a typo where I had an indent that shouldn't be there
I looked at the networkx implementation and found an inefficiency (it considers each possible path independently, while there are ways to find many of the shortest paths all at once). I've improved the method.
Try this code:
def my_global_efficiency(G):
'''author Joel C Miller
https://stackoverflow.com/a/57032282/2966723
'''
n = len(G)
denom = n*(n-1)
if denom>0:
efficiency = 0
for path_collection in nx.all_pairs_shortest_path_length(G):
source = path_collection[0]
for target in path_collection[1]:
if target != source:
efficiency += 1./path_collection[1][target]
return efficiency/denom
else:
return 0
Sample use:
import networkx as nx
G = nx.fast_gnp_random_graph(500,0.04)
nx.global_efficiency(G)
#answers will vary based on G
> 0.44650033400070577
my_global_efficiency(G)
> 0.44650033400070543
The difference in the last 3 digits is a rounding issue. I think it is caused by some of the sums being done in a different order.
This will run significantly faster. However, it may not be enough of an improvement for your purposes.
An alternate improvement if your graph is undirected would be to go to the networkx code, replace denom by half of its value and change the permutations to combinations. Currently it looks at each pair of nodes and finds the distance in both directions. If it's undirected, you only need to do this once. So the change to combinations gives a factor of 2 improvement.
Depending on your graph it's not clear to me which change will be faster. And these may still be too slow for your purposes.
You can speed up the process a bit more by getting an approximate value. To do this, instead of using nx.all_pairs_shortest_path_length, sample a large number of randomly chosen sources and find the distances of each of those specific nodes from all of the other nodes in G using nx.single_source_shortest_path_length. So if you take N=100 sources then there will be denom=N*(n-1) paths considered where n is the total number of nodes in G. This should give over a factor of 300 speed up from the improved my_global_efficiency.

Find third coordinate of (right) triangle given 2 coordinates and ray to third

I start explaining my problem from very far, so you could suggest completely different approaches and understand custom objects and functions.
Over years I have recorded many bicycle GPS tracks (.gpx). I decided to merge these (mostly overlapping) tracks into a large graph and merge/remove most of track points. So far, I have managed to simplify tracks (feature in gpxpy module, that removes about 90% of track-points, while preserving positions of corners) and load them into my current program.
Current Python 3 program consists of loading gpx tracks and optimising graph with four scans. Here's planned steps in my program:
Import points from gpx (working)
Join points located close to each other (working)
Merge edges under small angles (Problem is with this step)
Remove points on straights (angle between both edges is over 170 degrees). Looks like it is working.
Clean-up by resetting unique indexing of points (working)
Final checking of all edges in graph.
In my program I started counting steps with 0, because first one is simply opening and parsing file. Stackoverflow doesn't let me to start ordering from 0.
To store graph, I have a dictionary punktid (points in estonian), where punkt (point) object is stored at key uid/ui (unique ID). Unique ID is also stored in point itself too. Weight attribute is used in 2-nd and 3-rd step to find average of points while taking into account earlier merges.
class punkt:
def __init__(self,lo,la,idd,edge=set(),ele=0, wei=1):
self.lng=lo #Longtitude
self.lat=la #Latitude
self.uid=idd #Unique ID
self.edges=edge #Set of neighbour nodes
self.att=ele #Elevation
self.weight=wei #Used to get weighted average
>>> punktid
{1: <__main__.punkt object at 0x0000006E9A9F7FD0>,
2: <__main__.punkt object at 0x0000006E9AADC470>, 3: ...}
>>> punktid[1].__dict__
{'weight': 90, 'uid': 9000, 'att': 21.09333333333333, 'lat': 59.41757, 'lng': 24.73907, 'edges': {1613, 1218, 1530}}
As you can see, there is a minor bug in clean-up, where uid was not updated. I have fixed it by now, but I left it in, so you could see scale of graph. Largest index in punktid was 1699/11787.
Getting to core problem
Let's say I have 3 points: A, B and C (i, lyhem(2) and lyhem(0) respectively in following code slice). A has common edge with B and C, but B and C might not have common edge. C is closer to A than B. To reduce size of graph, I want to move C closer to edge AB (while respecting weights of B and C) and redirect AB through C.
Solution I came up with is to find temporary point D on AB, which is closest to C. Then find weighted average between D and C, save it as E and redirect all C edges and AB to that. Simplified figure - note, that E=(C+D)/2 is not completely accurate. I cannot add more than two links, but I have additional 2 images illustrating my problem.
Biggest problem was finding coordinates of D. I found possible solution on Mathematica site, but it contains ± sign, because when finding coordinate there are two possible coordinates. But I have line, where point is located on. Anyway, I don't know how to implement it correctly and my code has become quite messy:
# 2-nd run: Merge edges under small angles
for i in set(punktid.keys()):
try:
naabrid1=frozenset(punktid[i].edges) # naabrid / neighbours
for e in naabrid1:
t=set(naabrid1)
t.remove(e)
for u in t:
try:
a=nurk_3(punktid[i], punktid[e], punktid[u]) #Returns angle EIU in degrees. 0<=a<=180
if a<10:
de=((punktid[i].lat-punktid[e].lat)**2+
((punktid[i].lng-punktid[u].lng))*2 **2) #distance i-e
du=((punktid[i].lat-punktid[u].lat)**2+
((punktid[i].lng-punktid[u].lng)*2) **2) #distance i-u
b=radians(a)
if du<de:
lyhem=[u,du,e] # lühem in English is shorter
else: # but currently it should be lähem/closer
lyhem=[e,de,u]
if sin(b)*lyhem[1]<r:
lr=abs(sin(b)*lyhem[1])
ml=tan(nurk_coor(punktid[i],punktid[lyhem[0]])) #Lühema tõus / Slope of closer (C)
mp=tan(nurk_coor(punktid[i],punktid[lyhem[2]])) #Pikema / ...farer / B
mr=-1/ml #Ristsirge / ...BD
p1=(punktid[i].lng+lyhem[1]*(1/(1+ml**2)**0.5), punktid[i].lat+lyhem[1]*(ml/(1+ml**2)**0.5))
p2=(punktid[i].lng-lyhem[1]*(1/(1+ml**2)**0.5), punktid[i].lat-lyhem[1]*(ml/(1+ml**2)**0.5))
d1=((punktid[lyhem[0]].lat-p1[1])**2+
((punktid[lyhem[0]].lng-p1[0])*2)**2)**0.5 #distance i-e
d2=((punktid[lyhem[0]].lat-p2[1])**2+
((punktid[lyhem[0]].lng-p2[0])*2)**2)**0.5 #distance i-u
if d1<d2: # I experimented with one idea,
x=p1[0]#but it made things worse.
y=p1[1]#Originally I simply used p1 coordinates
else:
x=p2[0]
y=p2[1]
lo=punktid[lyhem[2]].weight*p2[0] # Finding weighted average
la=punktid[lyhem[2]].weight*p2[1]
la+=punktid[lyhem[0]].weight*punktid[lyhem[0]].lat
lo+=punktid[lyhem[0]].weight*punktid[lyhem[0]].lng
kaal=punktid[lyhem[2]].weight+punktid[lyhem[0]].weight #kaal = weight
c=(la/kaal,lo/kaal)
punktid[ui]=punkt(c[1],c[0], ui,punktid[lyhem[0]].edges, punktid[lyhem[0]].att,kaal)
punktid[i].edges.remove(lyhem[2])
punktid[lyhem[2]].edges.remove(i)
try:
for n in punktid[ui].edges: #In all neighbours
try: #Remove link to old point
punktid[n].edges.remove(lyhem[0])
except KeyError:
pass #If it doesn't link to current
punktid[n].edges.add(ui) #And add new point
if log:
printf(punktid[n].edges,'naabri '+str(n)+' edges')
except KeyError: #If neighbour itself has been removed
pass #(in same merge), Ignore
punktid[ui].edges.add(lyhem[2])
punktid[lyhem[2]].edges.add(ui)
punktid.pop(lyhem[0])
ui+=1
except KeyError: # u has been removed
pass
except KeyError: # i has been removed
pass
This is a code segment and it is likely to not run after copy-pasting because of missing variables/functions. New point is being calculated on lines 22 to 43, in 3rd if-statement from beginning if sin(b)*lyhem[1]<r to punktid[ui]=... After that is redirecting old edges to new node.
Stating question clearly: How to find point on ray (AB), if two coordinates of line segment (AC) and angles at these points are known (angle ACB should be 90 degrees)? How to implement it in Python 3.5?
PS. (Meta) If somebody needs full source, how could I provide it (uploading single text file without registration)? Pastebin or pasting (spamming) it here? If I upload it to other site, how to provide link, if newbie users are limited to two?

Modeling a graph in Python

I'm trying to solve a problem related to graphs in Python. Since its a comeptitive programming problem, I'm not using any other 3rd party packages.
The problem presents a graph in the form of a 5 X 5 square grid.
A bot is assumed to be at a user supplied position on the grid. The grid is indexed at (0,0) on the top left and (4,4) on the bottom right. Each cell in the grid is represented by any of the following 3 characters. ‘b’ (ascii value 98) indicates the bot’s current position, ‘d’ (ascii value 100) indicates a dirty cell and ‘-‘ (ascii value 45) indicates a clean cell in the grid.
For example below is a sample grid where the bot is at 0 0:
b---d
-d--d
--dd-
--d--
----d
The goal is to clean all the cells in the grid, in minimum number of steps.
A step is defined as a task, where either
i) The bot changes it position
ii) The bot changes the state of the cell (from d to -)
Assume that initially the position marked as b need not be cleaned. The bot is allowed to move UP, DOWN, LEFT and RIGHT.
My approach
I've read a couple of tutorials on graphs,and decided to model the graph as an adjacency matrix of 25 X 25 with 0 representing no paths, and 1 representing paths in the matrix (since we can move only in 4 directions). Next, I decided to apply Floyd Warshell's all pairs shortest path algorithm to it, and then sum up the values of the paths.
But I have a feeling that it won't work.
I'm in a delimma that the problem is either one of the following:
i) A Minimal Spanning Tree (which I'm unable to do, as I'm not able to model and store the grid as a graph).
ii) A* Search (Again a wild guess, but the same problem here, I'm not able to model the grid as a graph properly).
I'd be thankful if you could suggest a good approach at problems like these. Also, some hint and psuedocode about various forms of graph based problems (or links to those) would be helpful. Thank
I think you're asking two questions here.
1. How do I represent this problem as a graph in Python?
As the robot moves around, he'll be moving from one dirty square to another, sometimes passing through some clean spaces along the way. Your job is to figure out the order in which to visit the dirty squares.
# Code is untested and may contain typos. :-)
# A list of the (x, y) coordinates of all of the dirty squares.
dirty_squares = [(0, 4), (1, 1), etc.]
n = len(dirty_squares)
# Everywhere after here, refer to dirty squares by their index
# into dirty_squares.
def compute_distance(i, j):
return (abs(dirty_squares[i][0] - dirty_squares[j][0])
+ abs(dirty_squares[i][1] - dirty_squares[j][1]))
# distances[i][j] is the cost to move from dirty square i to
# dirty square j.
distances = []
for i in range(n):
distances.append([compute_distance(i, j) for j in range(n)])
# The x, y coordinates of where the robot starts.
start_node = (0, 0)
# first_move_distances[i] is the cost to move from the robot's
# start location to dirty square i.
first_move_distances = [
abs(start_node[0] - dirty_squares[i][0])
+ abs(start_node[1] - dirty_squares[i][1]))
for i in range(n)]
# order is a list of the dirty squares.
def cost(order):
if not order:
return 0 # Cleaning 0 dirty squares is free.
return (first_move_distances[order[0]]
+ sum(distances[order[i]][order[i+1]]
for i in range(len(order)-1)))
Your goal is to find a way to reorder list(range(n)) that minimizes the cost.
2. How do I find the minimum number of moves to solve this problem?
As others have pointed out, the generalized form of this problem is intractable (NP-Hard). You have two pieces of information that help constrain the problem to make it tractable:
The graph is a grid.
There are at most 24 dirty squares.
I like your instinct to use A* here. It's often good for solving find-the-minimum-number-of-moves problems. However, A* requires a fair amount of code. I think you'd be better of going with a Branch-and-Bound approach (sometimes called Branch-and-Prune), which should be almost as efficient but is much easier to implement.
The idea is to start enumerating all possible solutions using a depth-first-search, like so:
# Each list represents a sequence of dirty nodes.
[]
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[1, 3, 2]
[2]
[2, 1]
[2, 1, 3]
Every time you're about to recurse into a branch, check to see if that branch is more expensive than the cheapest solution found so far. If so, you can skip the whole branch.
If that's not efficient enough, add a function to calculate a lower bound on the remaining cost. Then if cost([2]) + lower_bound(set([1, 3])) is more expensive than the cheapest solution found so far, you can skip the whole branch. The tighter lower_bound() is, the more branches you can skip.
Let's say V={v|v=b or v=d}, and get a full connected graph G(V,E). You could calculate the cost of each edge in E with a time complexity of O(n^2). Afterwards the problem becomes exactly the same as: Start at a specified vertex, and find a shortest path of G which covers V.
We call this Traveling Salesman Problem(TSP) since 1832.
The problem can certainly be stored as a graph. The cost between nodes (dirty cells) is their Manhattan distance. Ignore the cost of cleaning cells, because that total cost will be the same no matter what path taken.
This problem looks to me like the Minimum Rectilinear Steiner Tree problem. Unfortunately, the problem is NP hard, so you'll need to come up with an approximation (a Minimum Spanning Tree based on Manhattan distance), if I am correct.

Categories