I want to find a path in a graph that has connects two nodes and does not use the same node twice. The sum of the weights of the edges must be within a certain range.
I need to implement this in pygraph. I'm not sure if there is already an algorithm that I can use for this purpose or not. What's the best way to achieve this?
EDIT: I misunderstood the question initially. I've corrected my answer. This functionality isn't built into the pygraphlib library, but you can easily implement it. Consider something like this, which basically gets the shortest path, decides if it's in a predefined range, then removes the edge with the smallest weight, and computes the new shortest path, and repeats.
from pygraphlib import pygraph, algo
edges = [(1,2),(2,3),(3,4),(4,6),(6,7),(3,5),(4,5),(7,1),(2,5),(5,7)]
graph = pygraph.from_list(edges)
pathList = []
shortestPath = algo.shortest_path(graph, startNode, endNode)
cost = shortestPath[len(shortestPath)-1][1]
while cost <= maxCost:
if cost >= minCost:
pathList.append(shortestPath)
minEdgeWt = float('inf')
for i in range(len(shortestPath)-1):
if shortestPath[i+1][1] - shortestPath[i][1] < minEdgeWt:
minEdgeWt = shortestPath[i+1][1] - shortestPath[i][1]
edgeNodes = (shortestPath[i][0], shortestPath[i+1][0])
#Not sure of the syntax here, edgeNodes is a tuple, and hide_edge requires an edge.
graph.hide_edge(edgeNodes)
shortestPath = alog.shortest_path(graph, startNode, endNode)
cost = shortestPath[len(shortestPath)-1][1]
return pathList
Note that I couldn't find a copy of pygraphlib, seeing as it is no longer under development, so I couldn't test the above code. It should work, mod the syntax uncertainty. Also, if possible, I would recommend using networkx[link] for any kind of graph manipulation in python, as it is more complete, under active development, and more completely documented then pygraphlib. Just a suggestion.
Related
I am (or was supposed to, at least) make a Dijkstra implementation, but upon reviewing what I've done it looks more like a breadth-first search. But I wonder if I have come across a way to kind of do both things at the same time?
Essentially by using an OOP approach I can perform a BFS that also preserves knowledge of the shortest weighted path, thereby eliminating the need to determine during the search process whether some node has a lower cost than its alternatives like Dijkstra does.
I've searched for clues as to why a more "pure" implementation of Dijkstra should be faster than this, most prominently the answers in these two threads:
What is difference between BFS and Dijkstra's algorithms when looking for shortest path?
Why use Dijkstra's Algorithm if Breadth First Search (BFS) can do the same thing faster?
I haven't seen anything that made me understand the question to what I'm wondering, though.
My approach is essentially this:
Start at whatever node we require, this becomes a "path"
Step out to each adjacent node, and each of these steps create a new path that we store in a path collection
Every path contains an ordered list of which nodes it has visited from the starting node all the way to wherever it is, as well as the associated weight/cost
Mark the "parent" path as closed (stop iterating on it)
Select the first open path in the path collection and repeat the above
When there are no more open paths, delete all paths that didn't make it to the destination node
Compare the lengths and return the path with the lowest weight
I struggle to see where the performance difference between this and pure Dijkstra would come from. Dijkstra would still have to iterate over all possible paths, right? So I perform exactly the same number of steps with this implementation as if I change returnNextOpenPath() (serving the function of a normal queue, just not implemented as one) to a more priority queue-looking returnShortestOpenPath().
There's presumably a marginal performance penalty at the end where I examine all the collected, non-destroyed paths before I can print a result instead of just popping from a queue - but aside from that, am I just not seeing where else my implementation would also be worse than "pure" Dijkstra?
I don't think it matters, but in case it does: The actual code I have for this is gigantic so my first instinct is to hold off on posting it for now, but here's a stripped down version of it.
class DijkstraNode:
def getNeighbors(self):
# returns a Dict of all adjacent nodes and their cost
def weightedDistanceToNeighbor(self, neighbor):
# returns the cost associated with traversing from current node to the chosen neighbor node
return int(self.getNeighbors()[neighbor])
class DijkstraEdge:
def __init__(self, startingNode: DijkstraNode, destinationNode: DijkstraNode)
self.start = startingNode
self.goal = destinationNode
def weightedLength(self):
return self.start.weightedDistanceToNeighbor(self.goal)
class DijkstraPath:
def __init__(self, startingNode: DijkstraNode, destinationNode: DijkstraNode):
self.visitedNodes: list[DjikstraNode] = [startingNode]
self.previousNode = self.start
self.edges: list[DijkstraEdge] = []
def addNode(self, node: DijkstraNode)
# if the node we're inspecting is new, add it to all the lists
if not node in self.visitedNodes:
self.edges.append(DijkstraEdge(self.prevNode, node))
self.visitedNodes.append(node)
self.prevNode = node
# if the node we just added above is our destination, stop iterating on this path
if node = self.goal:
self.closed = True
self.valid = True
class DijkstraTree:
def bruteforceAllPaths(self, startingNode: DijkstraNode, destinationNode: DijkstraNode):
self.pathlist = []
self.pathlist.append(DjikstraPath(startingNode, destinationNode))
cn: DjikstraNode
# iterate over all open paths
while self.hasOpenPaths():
currentPath = self.returnNextOpenPath()
neighbors: Dict = currentPath.lastNode().getNeighbors()
for c in neighbors:
cn = self.returnNode(c)
# copy the current path
tmpPath = deepcopy(currentPath)
# add the child node onto the newly made path
tmpPath.addNode(cn)
# add the new path to pathlist
if tmpPath.isOpen() or tmpPath.isValid():
self.pathlist.append(tmpPath)
# then we close the parent path
currentPath.close()
Lately I've been working with some recursive problems in Python where I have to generate a list of possible configurations (i.e list of permutations of a given string, list of substrings, etc..) using recursion. I'm having a very hard time in finding the best practice and also in understanding how to manage this sort of variable in recursion.
I'll give the example of the generate binary trees problem. I more-or-less know what I have to implement in the recursion:
If n=1, return just one node.
If n=3, return the only possible binary tree.
For n>3, crate one node and then explore the possibilities: left node is childless, right node is childless, neither node is childless. Explore these possibilites recursively.
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees. Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list, but then the problem is in case 3 when calling the recursive function to explore the possibilites for the nodes it would be returning a list and not appending nodes to a tree that I am building. When I picture the recursion tree in my head I imagine a "tree" variable that is unique to each of the tree leaves, and these trees are added to a list which is returned by the "root" (i.e first) call. But I don't know if that is possible. I thought of a global list and the recursive function not returning anything (just appending to it) but the problem I believe is that at each call the function would receive a copy of the variable.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better. I would also like to know if there is a "best practice" when it comes to that.
Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list
This is not the purest way to attack a recursive problem. It would be better if you can make the recursive function such that it solves the sub problem without an extra parameter variable that it must use. So the recursive function should just return a result as if it was the only call that was ever made (by the testing framework). So in the example, that recursive call should return a list with trees.
Alternatively the recursive function could be a sub-function that doesn't return a list, but yields the individual values (in this case: trees). The caller can then decide whether to pack that into a list or not. This is more pythonic.
As to the example problem, it is also important to identify some invariants. For instance, it is clear that there are no solutions when n is even. As to recursive aspect: once you have decided to create a root, then both its left and right sided subtree will have an odd number of nodes. Of course, this is an observation that is specific to this problem, but it is important to look for such problem properties.
Finally, it is equally important to see if the same sub problems can reoccur multiple times. This surely is the case in the example problem: for instance, the left subtree may sometimes have the same number of nodes as the right subtree. In such cases memoization will improve efficiency (dynamic programming).
When the recursive function returns a list, the caller can then iterate that list to retrieve its elements (trees in the example), and use them to build an extended result that satisfies the caller's task. In the example case that means that the tree taken from the recursively retrieved list, is appended as a child to a new root. Then this new tree is appended to a new list (not related to the one returned from the recursive call). This new list will in many cases be longer, although this depends on the type of problem.
To further illustrate the way to tackle these problems, here is a solution for the example problem: one which uses the main function for the recursive calls, and using memoization:
class Solution:
memo = { 1: [TreeNode()] }
def allPossibleFBT(self, n: int) -> List[Optional[TreeNode]]:
# If we didn't solve this problem before...
if n not in self.memo:
# Create a list for storing the results (the trees)
results = []
# Before creating any root node,
# decide the size of the left subtree.
# It must be odd
for num_left in range(1, n, 2):
# Make the recursive call to get all shapes of the
# left subtree
left_shapes = self.allPossibleFBT(num_left)
# The remainder of the nodes must be in the right subtree
num_right = n - 1 - num_left # The root also counts as 1
right_shapes = self.allPossibleFBT(num_right)
# Now iterate the results we got from recursion and
# combine them in all possible ways to create new trees
for left in left_shapes:
for right in right_shapes:
# We have a combination. Now create a new tree from it
# by putting a root node on top of the two subtrees:
tree = TreeNode(0, left, right)
# Append this possible shape to our results
results.append(tree)
# All done. Save this for later re-use
self.memo[n] = results
return self.memo[n]
This code can be made more compact using list comprehension, but it may make the code less readable.
Don't pass information into the recursive calls, unless they need that information to compute their local result. It's much easier to reason about recursion when you write without side effects. So instead of having the recursive call put its own results into a list, write the code so that the results from the recursive calls are used to create the return value.
Let's take a trivial example, converting a simple loop to recursion, and using it to accumulate a sequence of increasing integers.
def recursive_range(n):
if n == 0:
return []
return recursive_range(n - 1) + [n]
We are using functions in the natural way: we put information in with the arguments, and get information out using the return value (rather than mutation of the parameters).
In your case:
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees.
So you know that you want to return a list of trees at the end of the process. So the natural way to proceed, is that you expect each recursive call to do that, too.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better.
The recursive calls return their lists of results for the sub-problems. You use those results to create the list of results for the current problem.
You don't need to think about how recursion is implemented in order to write recursive algorithms. You don't need to think about the call stack. You do need to think about two things:
What are the base cases?
How does the problem break down recursively? (Alternately: why is recursion a good fit for this problem?)
The thing is, recursion is not special. Making the recursive call is just like calling any other function that would happen to give you the correct answer for the sub-problem. So all you need to do is understand how solving the sub-problems helps you to solve the current one.
I called for help yesterday on how to look ahead in Python. My problem was to iterate through all the possible edges to add to a network, and for each possible network with an edge added, look at all the possible edges to add, and so on (n depth). At the end, compare all networks produced at depth n with the root network, and actually add the best first step (best first edge to add to accomplish the best result at depth n). When that first edge is added, do the depth search again, and so on until a good network is found. Like a moving window, I may say (see lookahead algorithm in Python for a more thorough explanation of the problem).
Unfortunately for the clarity of the question, the code requires igraph, which is available here: http://igraph.org/python/#downloads
#Peter Gibson promptly answered, guiding me through the logic of Generator comprehensions, and helped me produce this code:
from igraph import * # http://igraph.org/python/
def delta(g,gOld): # evaluates the improvement of the graph from one generation to the next
print "delta"
return g.diameter()-gOld.diameter()
def possible_new_edges(G):
print "Possible new edges"
allPossibleNewEdges = []
for n1 in range(50):
for n2 in range(n1,50):
if G.are_connected(G.vs[n1],G.vs[n2]) == False and n1 != n2:
allPossibleNewEdges.append(G.vs[n1],G.vs[n2])
return allPossibleNewEdges
def add_optimal_edge(graph, n=3):
print "Add optimal edge"
paths = [[graph]] # start off with just one graph path, which contains a single graph
for generation in range(n):
print "Generation:", generation
# path[-1] is the latest graph for each generation
paths = (path + path[-1].add_edge(e) for path in paths for e in path[-1].possible_new_edges())
# select best path by comparison of final generation against original graph
best = max(paths, lambda path: comp_delta(path[-1],graph))
return best[1] # returns the first generation graph
graph = Graph.Erdos_Renyi(50, .15, directed=False, loops=False) # create a random root graph of density 0.15
add_optimal_edge(graph)
The generator is concise and elegant. Let's say a little too elegant for my unwieldy Python style, and there are a few things I need to understand to make it work. The code runs with this error:
return best[1] # returns the first generation graph
TypeError: 'generator' object has no attribute '__getitem__'
I think it's because of a wrong use of functions with the generator...
So, my question is: what's the proper way to use functions in such a generator? I need to call possible_new_edges() and delta(), what do I need to pass them (the graph?) and how to do so?
Thanks so much!
Trying the code from your gist, I found several fairly minor errors that were preventing the code from running. I've included fixed code below. However, this doesn't really solve the problem. That's because your algorithm needs to consider a truly vast number of potential graphs, which it cannot do in any sort of reasonable time.
In my testing, looking one step ahead works perfectly well, but looking two steps takes a very long time (10s of minutes, at least, I've never waited for it to finish) and three steps will probably take days. This is because your possible_new_edges function returns more than a thousand possible edges. Each one will be added to a copy of your initial graph. Then for each each succeeding step, the process will repeat on each of the expanded graphs from the previous step. This results in an exponential explosion of graphs, as you have to evaluate something on the order of 1000**n graphs to see which is the best.
So, to get a practical result you'll still need to change things. I don't know graph theory or your problem domain well enough to suggest what.
Anyway, here are the changed parts of the "working" code (with the original comments removed so that my notes on what I've changed are more clear):
def possible_new_edges(G):
print("Possible new edges")
allPossibleNewEdges = []
for n1 in range(50):
for n2 in range(n1,50):
if G.are_connected(G.vs[n1],G.vs[n2]) == False and n1 != n2:
allPossibleNewEdges.append((G.vs[n1],G.vs[n2])) # append a tuple
return allPossibleNewEdges
def add_optimal_edge(graph, n=3):
print("Add optimal edge")
paths = [[graph]]
for generation in range(n):
print("Generation:", generation)
paths = (path + [path[-1] + e] # use + to add an edge, and to extend the path
for path in paths
for e in possible_new_edges(path[-1])) # call this function properly
best = max(paths, key=lambda path: comp_delta(path[-1],graph))
return best[1]
If the generator expression in the loop confuses you, it might help to replace it with a list comprehension (by replacing the outermost parentheses with square brackets). You can then inspect the paths list inside the loop (and do things like print its len()). The logic of the code is the same either way, the generator expressions just put off computing the expanded results until the max function starts iterating over paths in order to find the best scoring one.
Using list comprehensions will work for n=1 certainly, but you may start running out of memory as you try n=2 (and you certainly will for n=3 or more). The version above won't you run out of memory (as the generator expression only expands O(n) graphs at a time), but that doesn't mean it runs fast enough to inspect billions of graphs in sensible amount of time.
I have one situation and I would like to approach this problem with Python, but unfortunately I don't have enough knowledge about the graphs. I found one library which seems very suitable for this relatively simple task, networkx, but I am having issues doing exact things I want, which should be fairly simple.
I have a list of nodes, which can have different types, and two "classes" of neighbors, upwards and downwards. The task is to find paths between two target nodes, with some constraints in mind:
only nodes of specific type can be traversed, i.e. if starting nodes are of type x, any node in the path has to be from another set of paths, y or z
if a node has a type y, it can be passed through only once
if a node has type z, it can be passed through twice
in case a node of type z is visited, the exit has to be from the different class of neighbor, i.e. if its visited from upwards, the exit has to be from downwards
So, I tried some experimentation but I, as said, have struggled. First, I am unsure what type of graph this actually represents? Its not directional, since it doesn't matter if you go from node 1 to node 2, or from node 2 to node 1 (except in that last scenario, so that complicates things a bit...). This means I can't just create a graph which is simply multidirectional, since I have to have that constraint in mind. Second, I have to traverse through those nodes, but specify that only nodes of specific type have to be available for path. Also, in case the last scenario happens, I have to have in mind the entry and exit class/direction, which puts it in somewhat directed state.
Here is some sample mockup code:
import networkx as nx
G=nx.DiGraph()
G.add_node(1, type=1)
G.add_node(2, type=2)
G.add_node(3, type=3)
G.add_edge(1,2, side="up")
G.add_edge(1,3, side="up")
G.add_edge(2,1, side="down")
G.add_edge(2,3, side="down")
for path in nx.all_simple_paths(G,1,3):
print path
The output is fairly nice, but I need these constraints. So, do you have some suggestions how can I implement these, or give me some more guidance regarding understanding this type of problem, or suggest a different approach or library for this problem? Maybe a simple dictionary based algorithm would fit this need?
Thanks!
You might be able to use the all_simple_paths() function for your problem if you construct your graph differently. Simple paths are those with no repeated nodes. So for your constraints here are some suggestions to build the graph so you can run that algorithm unmodified.
only nodes of specific type can be traversed, i.e. if starting nodes are of type x, any node in the path has to be from another set of paths, y or z
Given a starting node n, remove all other nodes with that type before you find paths.
if a node has a type y, it can be passed through only once
This is the definition of simple paths so it is automatically satisfied.
if a node has type z, it can be passed through twice
For every node n of type z add a new node n2 with the same edges as those pointing to and from n.
in case a node of type z is visited, the exit has to be from the different class of neighbor, i.e. if its visited from upwards, the exit has to be from downwards
If the edges are directed as you propose then this could be satisfied if you make sure the edges to z are all the same direction - e.g. in for up and out for down...
The best way to do this I think is by calculating all valid paths of length at most k between the source S and every other node, then using that information to calculate all valid paths of length at most k+1. You then just repeat this until you get a fixed point where no paths are modified.
In practice, this means you should set up a list of paths at each node. At each step, you take each node U in turn and look at the paths that, in the previous step, terminated at some neighbour V of U. If any of those paths can be extended to be a new, distinct path to U, extend it and add it to U's list.
If when you execute a step you find no new paths, that's your termination state. You can then check the list of paths at the target node T.
Pseudocode (in a very loose C# formalism):
var paths = graph.nodes.ToDictionary(node => node, node => new List<List<node>>())
paths[S].Add(new List<node> {S}) // The trivial path that'll start us off.
bool notAFixedPoint = true;
while (notAFixedPoint)
{
notAFixedPoint = false // Assume we're not gonna find any new paths.
foreach (var node in graph)
{
var pathsToNode = paths[node]
foreach (var neighbour in node.Neighbours)
{
var pathsToNeighbour = paths[neighbour]
// ExtendPaths is where all the logic about how to recognise a valid path goes.
var newPathsToNode = ExtendPaths(pathsToNeighbour, node)
// The use of "Except" here is for expository purposes. It wouldn't actually work,
// because collections in most languages are compared by reference rather than by value.
if (newPathsToNode.Except(pathsToNode).IsNotEmpty())
{
// We've found some new paths, so we can't terminate yet.
notAFixedPoint = true
pathsToNode.AddMany(newPathsToNode)
}
}
}
}
return paths[T]
This looks like an optimization problem to me -- look up "Traveling Salesman" for a classic example that is somewhat close to what you want to do.
I've had good luck using "simulated annealing" for optimization problems, but you might also take a look at "genetic algorithms".
I apply GP in pyevolve to train and it gives me the best tree. I would like to use this tree to test different data. I want to keep min is root of the tree, so the function will return -1 if gp_add, gp_mul... is the root of tree.
This is example about my best tree and the raw score is 1.0143
gp_min(gp_add(gp_mul(gp_min(a, b), c), d))
And this is the code, I try to apply the best individual in testing data.
bestIndi = ga.bestIndividual()
comp_code = bestIndi.getCompiledCode()
score = eval(comp_code)
Is this code on the right direction? Why the score is always -1?
That's on the right path. One main property in GP is the sufficiency property, which says the elements in the domain should be enough to solve the problem.
Check the range of your variables.
If want to keep min as the root of your tree it's preferable to apply that in the fitness function, however, if GP finds an individual better that one having the gp_min as the root then it would be preferable to eliminate that restriction.