Solving a graph issue with Python

Solving a graph issue with Python - python

I have one situation and I would like to approach this problem with Python, but unfortunately I don't have enough knowledge about the graphs. I found one library which seems very suitable for this relatively simple task, networkx, but I am having issues doing exact things I want, which should be fairly simple.
I have a list of nodes, which can have different types, and two "classes" of neighbors, upwards and downwards. The task is to find paths between two target nodes, with some constraints in mind:
only nodes of specific type can be traversed, i.e. if starting nodes are of type x, any node in the path has to be from another set of paths, y or z
if a node has a type y, it can be passed through only once
if a node has type z, it can be passed through twice
in case a node of type z is visited, the exit has to be from the different class of neighbor, i.e. if its visited from upwards, the exit has to be from downwards
So, I tried some experimentation but I, as said, have struggled. First, I am unsure what type of graph this actually represents? Its not directional, since it doesn't matter if you go from node 1 to node 2, or from node 2 to node 1 (except in that last scenario, so that complicates things a bit...). This means I can't just create a graph which is simply multidirectional, since I have to have that constraint in mind. Second, I have to traverse through those nodes, but specify that only nodes of specific type have to be available for path. Also, in case the last scenario happens, I have to have in mind the entry and exit class/direction, which puts it in somewhat directed state.
Here is some sample mockup code:
import networkx as nx
G=nx.DiGraph()
G.add_node(1, type=1)
G.add_node(2, type=2)
G.add_node(3, type=3)
G.add_edge(1,2, side="up")
G.add_edge(1,3, side="up")
G.add_edge(2,1, side="down")
G.add_edge(2,3, side="down")
for path in nx.all_simple_paths(G,1,3):
print path
The output is fairly nice, but I need these constraints. So, do you have some suggestions how can I implement these, or give me some more guidance regarding understanding this type of problem, or suggest a different approach or library for this problem? Maybe a simple dictionary based algorithm would fit this need?
Thanks!

You might be able to use the all_simple_paths() function for your problem if you construct your graph differently. Simple paths are those with no repeated nodes. So for your constraints here are some suggestions to build the graph so you can run that algorithm unmodified.
only nodes of specific type can be traversed, i.e. if starting nodes are of type x, any node in the path has to be from another set of paths, y or z
Given a starting node n, remove all other nodes with that type before you find paths.
if a node has a type y, it can be passed through only once
This is the definition of simple paths so it is automatically satisfied.
if a node has type z, it can be passed through twice
For every node n of type z add a new node n2 with the same edges as those pointing to and from n.
in case a node of type z is visited, the exit has to be from the different class of neighbor, i.e. if its visited from upwards, the exit has to be from downwards
If the edges are directed as you propose then this could be satisfied if you make sure the edges to z are all the same direction - e.g. in for up and out for down...

The best way to do this I think is by calculating all valid paths of length at most k between the source S and every other node, then using that information to calculate all valid paths of length at most k+1. You then just repeat this until you get a fixed point where no paths are modified.
In practice, this means you should set up a list of paths at each node. At each step, you take each node U in turn and look at the paths that, in the previous step, terminated at some neighbour V of U. If any of those paths can be extended to be a new, distinct path to U, extend it and add it to U's list.
If when you execute a step you find no new paths, that's your termination state. You can then check the list of paths at the target node T.
Pseudocode (in a very loose C# formalism):
var paths = graph.nodes.ToDictionary(node => node, node => new List<List<node>>())
paths[S].Add(new List<node> {S}) // The trivial path that'll start us off.
bool notAFixedPoint = true;
while (notAFixedPoint)
{
notAFixedPoint = false // Assume we're not gonna find any new paths.
foreach (var node in graph)
{
var pathsToNode = paths[node]
foreach (var neighbour in node.Neighbours)
{
var pathsToNeighbour = paths[neighbour]
// ExtendPaths is where all the logic about how to recognise a valid path goes.
var newPathsToNode = ExtendPaths(pathsToNeighbour, node)
// The use of "Except" here is for expository purposes. It wouldn't actually work,
// because collections in most languages are compared by reference rather than by value.
if (newPathsToNode.Except(pathsToNode).IsNotEmpty())
{
// We've found some new paths, so we can't terminate yet.
notAFixedPoint = true
pathsToNode.AddMany(newPathsToNode)
}
}
}
}
return paths[T]

This looks like an optimization problem to me -- look up "Traveling Salesman" for a classic example that is somewhat close to what you want to do.
I've had good luck using "simulated annealing" for optimization problems, but you might also take a look at "genetic algorithms".

Related

Graphs - what is the functional difference between "pure" Dijkstra and my hybrid BFS-Dijkstra solution?

I am (or was supposed to, at least) make a Dijkstra implementation, but upon reviewing what I've done it looks more like a breadth-first search. But I wonder if I have come across a way to kind of do both things at the same time?
Essentially by using an OOP approach I can perform a BFS that also preserves knowledge of the shortest weighted path, thereby eliminating the need to determine during the search process whether some node has a lower cost than its alternatives like Dijkstra does.
I've searched for clues as to why a more "pure" implementation of Dijkstra should be faster than this, most prominently the answers in these two threads:
What is difference between BFS and Dijkstra's algorithms when looking for shortest path?
Why use Dijkstra's Algorithm if Breadth First Search (BFS) can do the same thing faster?
I haven't seen anything that made me understand the question to what I'm wondering, though.
My approach is essentially this:
Start at whatever node we require, this becomes a "path"
Step out to each adjacent node, and each of these steps create a new path that we store in a path collection
Every path contains an ordered list of which nodes it has visited from the starting node all the way to wherever it is, as well as the associated weight/cost
Mark the "parent" path as closed (stop iterating on it)
Select the first open path in the path collection and repeat the above
When there are no more open paths, delete all paths that didn't make it to the destination node
Compare the lengths and return the path with the lowest weight
I struggle to see where the performance difference between this and pure Dijkstra would come from. Dijkstra would still have to iterate over all possible paths, right? So I perform exactly the same number of steps with this implementation as if I change returnNextOpenPath() (serving the function of a normal queue, just not implemented as one) to a more priority queue-looking returnShortestOpenPath().
There's presumably a marginal performance penalty at the end where I examine all the collected, non-destroyed paths before I can print a result instead of just popping from a queue - but aside from that, am I just not seeing where else my implementation would also be worse than "pure" Dijkstra?
I don't think it matters, but in case it does: The actual code I have for this is gigantic so my first instinct is to hold off on posting it for now, but here's a stripped down version of it.
class DijkstraNode:
def getNeighbors(self):
# returns a Dict of all adjacent nodes and their cost
def weightedDistanceToNeighbor(self, neighbor):
# returns the cost associated with traversing from current node to the chosen neighbor node
return int(self.getNeighbors()[neighbor])
class DijkstraEdge:
def __init__(self, startingNode: DijkstraNode, destinationNode: DijkstraNode)
self.start = startingNode
self.goal = destinationNode
def weightedLength(self):
return self.start.weightedDistanceToNeighbor(self.goal)
class DijkstraPath:
def __init__(self, startingNode: DijkstraNode, destinationNode: DijkstraNode):
self.visitedNodes: list[DjikstraNode] = [startingNode]
self.previousNode = self.start
self.edges: list[DijkstraEdge] = []
def addNode(self, node: DijkstraNode)
# if the node we're inspecting is new, add it to all the lists
if not node in self.visitedNodes:
self.edges.append(DijkstraEdge(self.prevNode, node))
self.visitedNodes.append(node)
self.prevNode = node
# if the node we just added above is our destination, stop iterating on this path
if node = self.goal:
self.closed = True
self.valid = True
class DijkstraTree:
def bruteforceAllPaths(self, startingNode: DijkstraNode, destinationNode: DijkstraNode):
self.pathlist = []
self.pathlist.append(DjikstraPath(startingNode, destinationNode))
cn: DjikstraNode
# iterate over all open paths
while self.hasOpenPaths():
currentPath = self.returnNextOpenPath()
neighbors: Dict = currentPath.lastNode().getNeighbors()
for c in neighbors:
cn = self.returnNode(c)
# copy the current path
tmpPath = deepcopy(currentPath)
# add the child node onto the newly made path
tmpPath.addNode(cn)
# add the new path to pathlist
if tmpPath.isOpen() or tmpPath.isValid():
self.pathlist.append(tmpPath)
# then we close the parent path
currentPath.close()

While traversing an ancestor tree starting from two target nodes, can I mark nodes I've seen in recursive calls to find their lowest common ancestor?

I'm solving a problem where we're given a tree, its root and two target nodes (descendantOne and descendantTwo) within the tree.
I am asked to return the lowest common ancestor of the two target nodes.
However, we are also told that our tree is an instance of AncestralTree, which is given by:
class AncestralTree:
def __init__(self, name):
self.name = name
self.ancestor = None
i.e. for every node in the tree, we only have pointers going upwards to the parents (as opposed to a normal tree which has pointer from parent to child!)
My idea of solving this problem is to start from both target nodes and move upwards, marking each node that we visit. At one point, we are bound to visit a node twice, and the first time we do- this is our lowest common ancestor!
Here is my code:
def getYoungestCommonAncestor(topAncestor, descendantOne, descendantTwo):
lowestCommonAncestor = None
def checkAncestors(topAncestor,descendantOne, descendantTwo,descendantOneSeen,descendantTwoSeen):
if descendantOneSeen and descendantTwoSeen:
return descendantOne
else:
return None
while not lowestCommonAncestor:
**lowestCommonAncestor = checkAncestors(topAncestor,descendantOne.ancestor, descendantTwo,True,False)
if lowestCommonAncestor:
break
**lowestCommonAncestor = checkAncestors(topAncestor,descendantOne, descendantTwo.ancestor,False,True)
if descendantOne.ancestor == topAncestor:
pass
else:
descendantOne = descendantOne.ancestor
if descendantTwo.ancestor == topAncestor:
pass
else:
descendantTwo= descendantTwo.ancestor
return lowestCommonAncestor
I have put stars ** next to the two recursion calls in my code, because I believe this is the issue.
As I run the recursion calls, e.g. say we have seen descendantOne, when I run the recursion call for descendantTwo, it automatically marks descendantOneSeen as false
in its recursion call. So this causes us to never have descendantOneSeen and descendantTwoSeen to be true.
And when I run the above code, I do get a infiniteLoop error- and I do see why.
Is there any way to amend my code to achieve what I want WITHOUT using global variables?

Indeed, it will not work like that, as descendantOneSeen and descendantTwoSeen is never true. But even if you would fix that part of the logic, the distance the two nodes have to their lowest common ancestor may be far apart... so you need a different algorithm.
One way is to walk to the top of the tree in tandem like you did, but then when you reach the top, continue with that reference at the other starting node. When both references have made this jump back down, they will have visited the exact same number of nodes at the moment they meet eachother at the common lowest ancestor.
This leads to a very simple algorithm:
def getYoungestCommonAncestor(topAncestor, descendantOne, descendantTwo):
nodeOne = descendantOne
nodeTwo = descendantTwo
while nodeOne is not nodeTwo:
nodeOne = descendantTwo if nodeOne is topAncestor else nodeOne.ancestor
nodeTwo = descendantOne if nodeTwo is topAncestor else nodeTwo.ancestor
return nodeOne
This may look dodgy, as it looks like a matter of luck that these node references will ever be equal. But both nodeOne and nodeTwo references will walk from both starting points (descendantOne and descendantTwo) -- it is just the order in which they do this that is inverted. But that still means they will visit the same number of nodes by the time they visit the common ancestor the second time.
Here is your example graph, where the two starting nodes are C and I. I have removed some of the nodes, as they are unreachable from these two nodes, so they don't play a role:
So the idea is that we start the traversal at nodes I and C. By applying the rule that when a traversal reaches the root, it will continue from the other starting node, we see that from I we will first follow the red edges, and then the green one, while the path that starts from C will first follow the green edge and then follow the green edges.
From this it is clear that these two traversals will take an equal number of steps to visit both the green and the red edges (just in a different order) and so they will reach node A at the same time when they each visit it for the second time.

KeyError on Generator

So close, and yet so far.
I'm not sure what happened but a generator script that was working for me has suddenly started throwing KeyErrors. I am assigning properties to networkx nodes according to a category I am giving each node.
Each node looks like this...
539943797.0: {'category': 'perimeter'}
and I define the sizes with a variable like this...
node_sizes = {'core':500, 'perimeter':50}
and the actual node draw code looks like this, with the generator in place...
nx.draw_networkx_nodes(G, graph_pos,
node_size=[node_sizes[G.node[node]['category']]for node in G],
alpha=node_alpha,
node_color=[node_colors[G.node[node]['category']]for node in G])
The problem is, the above generator code (which was working not so long ago) gives me a KeyError:'category' error when I run it.
However, calling this...
node_sizes[G.node[539943797.0]['category']]
gets me the value of 50, as I would expect; pulling the category from 539943797.0 as 'perimeter' and the size for that is 50. So far so good. I'm not sure what I'm doing wrong here. I was hoping another few sets of eyes on this could give me a better idea.
I suspect I'm doing something wrong in how I calling the category, or setting it
(I set it here...)
for node in graph[1]:
G.add_node(node)
G.node[node]['category'] = 'perimeter'
If I need to put up more of the code to be better understood I'll try and trim things up and put it out here. Hopefully I've supplied enough.
thanks,

In your comment you say this started happening after you started adding edges. I think that's where the problem is. You'll get this error if even one node doesn't have the 'category' defined. I think adding edges is resulting in the addition of a few nodes that don't have category defined. The first test is to just go through
for node in G.nodes():
if not G.node[node].has_key('category'):
print node
right before where you get your error.
I bet that you'll see that most of your nodes are okay, but a few aren't.
If I have the code
import networkx as nx
G=nx.Graph()
G.add_node(1)
G.node[1]['category'] = 'perimeter'
for node in G.nodes():
if not G.node[node].has_key('category'):
print node
I get no output. All (1) of the nodes has 'category' defined.
However, if I then try
G.add_edge(1,2)
for node in G.nodes():
if not G.node[node].has_key('category'):
print node
It outputs
2
This is because it when I added the edge, networkx saw a node that wasn't in G yet. It assumes you want to add the node too. So now you've added a new node 2, but it doesn't know that it should be defining 'category' as well. So it doesn't.
From what you've described this is almost certainly what is happening. To check this, before adding any edge you can check whether the two nodes are in the graph already. Or, if your code adds a huge number of edges at once, just check whether G.order() is the same before and after. Once you figure out why it's doing this, presumably you can decide what you want to do with those nodes.
If this doesn't find something that's an issue, then you'll need to post more code, so that we have something that reproduces your error.

Generator comprehensions for look ahead algorithm in Python

I called for help yesterday on how to look ahead in Python. My problem was to iterate through all the possible edges to add to a network, and for each possible network with an edge added, look at all the possible edges to add, and so on (n depth). At the end, compare all networks produced at depth n with the root network, and actually add the best first step (best first edge to add to accomplish the best result at depth n). When that first edge is added, do the depth search again, and so on until a good network is found. Like a moving window, I may say (see lookahead algorithm in Python for a more thorough explanation of the problem).
Unfortunately for the clarity of the question, the code requires igraph, which is available here: http://igraph.org/python/#downloads
#Peter Gibson promptly answered, guiding me through the logic of Generator comprehensions, and helped me produce this code:
from igraph import * # http://igraph.org/python/
def delta(g,gOld): # evaluates the improvement of the graph from one generation to the next
print "delta"
return g.diameter()-gOld.diameter()
def possible_new_edges(G):
print "Possible new edges"
allPossibleNewEdges = []
for n1 in range(50):
for n2 in range(n1,50):
if G.are_connected(G.vs[n1],G.vs[n2]) == False and n1 != n2:
allPossibleNewEdges.append(G.vs[n1],G.vs[n2])
return allPossibleNewEdges
def add_optimal_edge(graph, n=3):
print "Add optimal edge"
paths = [[graph]] # start off with just one graph path, which contains a single graph
for generation in range(n):
print "Generation:", generation
# path[-1] is the latest graph for each generation
paths = (path + path[-1].add_edge(e) for path in paths for e in path[-1].possible_new_edges())
# select best path by comparison of final generation against original graph
best = max(paths, lambda path: comp_delta(path[-1],graph))
return best[1] # returns the first generation graph
graph = Graph.Erdos_Renyi(50, .15, directed=False, loops=False) # create a random root graph of density 0.15
add_optimal_edge(graph)
The generator is concise and elegant. Let's say a little too elegant for my unwieldy Python style, and there are a few things I need to understand to make it work. The code runs with this error:
return best[1] # returns the first generation graph
TypeError: 'generator' object has no attribute '__getitem__'
I think it's because of a wrong use of functions with the generator...
So, my question is: what's the proper way to use functions in such a generator? I need to call possible_new_edges() and delta(), what do I need to pass them (the graph?) and how to do so?
Thanks so much!

Trying the code from your gist, I found several fairly minor errors that were preventing the code from running. I've included fixed code below. However, this doesn't really solve the problem. That's because your algorithm needs to consider a truly vast number of potential graphs, which it cannot do in any sort of reasonable time.
In my testing, looking one step ahead works perfectly well, but looking two steps takes a very long time (10s of minutes, at least, I've never waited for it to finish) and three steps will probably take days. This is because your possible_new_edges function returns more than a thousand possible edges. Each one will be added to a copy of your initial graph. Then for each each succeeding step, the process will repeat on each of the expanded graphs from the previous step. This results in an exponential explosion of graphs, as you have to evaluate something on the order of 1000**n graphs to see which is the best.
So, to get a practical result you'll still need to change things. I don't know graph theory or your problem domain well enough to suggest what.
Anyway, here are the changed parts of the "working" code (with the original comments removed so that my notes on what I've changed are more clear):
def possible_new_edges(G):
print("Possible new edges")
allPossibleNewEdges = []
for n1 in range(50):
for n2 in range(n1,50):
if G.are_connected(G.vs[n1],G.vs[n2]) == False and n1 != n2:
allPossibleNewEdges.append((G.vs[n1],G.vs[n2])) # append a tuple
return allPossibleNewEdges
def add_optimal_edge(graph, n=3):
print("Add optimal edge")
paths = [[graph]]
for generation in range(n):
print("Generation:", generation)
paths = (path + [path[-1] + e] # use + to add an edge, and to extend the path
for path in paths
for e in possible_new_edges(path[-1])) # call this function properly
best = max(paths, key=lambda path: comp_delta(path[-1],graph))
return best[1]
If the generator expression in the loop confuses you, it might help to replace it with a list comprehension (by replacing the outermost parentheses with square brackets). You can then inspect the paths list inside the loop (and do things like print its len()). The logic of the code is the same either way, the generator expressions just put off computing the expanded results until the max function starts iterating over paths in order to find the best scoring one.
Using list comprehensions will work for n=1 certainly, but you may start running out of memory as you try n=2 (and you certainly will for n=3 or more). The version above won't you run out of memory (as the generator expression only expands O(n) graphs at a time), but that doesn't mean it runs fast enough to inspect billions of graphs in sensible amount of time.

Pygraph - path between two nodes with specific weight

I want to find a path in a graph that has connects two nodes and does not use the same node twice. The sum of the weights of the edges must be within a certain range.
I need to implement this in pygraph. I'm not sure if there is already an algorithm that I can use for this purpose or not. What's the best way to achieve this?

EDIT: I misunderstood the question initially. I've corrected my answer. This functionality isn't built into the pygraphlib library, but you can easily implement it. Consider something like this, which basically gets the shortest path, decides if it's in a predefined range, then removes the edge with the smallest weight, and computes the new shortest path, and repeats.
from pygraphlib import pygraph, algo
edges = [(1,2),(2,3),(3,4),(4,6),(6,7),(3,5),(4,5),(7,1),(2,5),(5,7)]
graph = pygraph.from_list(edges)
pathList = []
shortestPath = algo.shortest_path(graph, startNode, endNode)
cost = shortestPath[len(shortestPath)-1][1]
while cost <= maxCost:
if cost >= minCost:
pathList.append(shortestPath)
minEdgeWt = float('inf')
for i in range(len(shortestPath)-1):
if shortestPath[i+1][1] - shortestPath[i][1] < minEdgeWt:
minEdgeWt = shortestPath[i+1][1] - shortestPath[i][1]
edgeNodes = (shortestPath[i][0], shortestPath[i+1][0])
#Not sure of the syntax here, edgeNodes is a tuple, and hide_edge requires an edge.
graph.hide_edge(edgeNodes)
shortestPath = alog.shortest_path(graph, startNode, endNode)
cost = shortestPath[len(shortestPath)-1][1]
return pathList
Note that I couldn't find a copy of pygraphlib, seeing as it is no longer under development, so I couldn't test the above code. It should work, mod the syntax uncertainty. Also, if possible, I would recommend using networkx[link] for any kind of graph manipulation in python, as it is more complete, under active development, and more completely documented then pygraphlib. Just a suggestion.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.