Check if path exists that contains a set of N nodes - python

Given a graph g and a set of N nodes my_nodes = [n1, n2, n3, ...], how can I check if there's a path that contains all N nodes?
Checking among all_simple_paths for paths that contain all nodes in my_nodes becomes computationally cumbersome as the graph grows
The search above can be limited to paths between my_nodes pairwise couples. This reduces complexity only to a small degree. Plus it requires a lot of python looping, which is quite slow
Is there a faster solution to the problem?

You may try out some greedy algorithm here, starting the path find check from all the nodes to find, and step by step explore your graph. Can't provide some real sample, but pseudo-code should be something like this:
Start n path stubs from all your n nodes to find
For all these path stubs adjust them by all the neighbors which weren't checked before
If you have some intersection between path stubs, then you got a new one, which does contain more of your needed nodes than before
If after merging the stub paths you have the one which covers all needed nodes, you're done
If there are still some additional nodes to add to the path, you continue with second step again
If there are no nodes left in graph, the path doesn't exists
This algorithm has complexity O(E + N), because you're visiting the edges and nodes in non-recursive fashion.
However, in case of directed graph the "merge" will be a bit more complicated, yet still be done, but in this case the worst scenario may take a lot of time.
Update:
As you say that the graph is directed, the above approach wouldn't work well. In this case you may simplify your task like this:
Find the strongly connected components in graph (I suggest you to implement it by yourself, e.g., Kosaraju's algorithm). The complexity is O(E + N). You can use a NetworkX method for this, if you want some out-ofbox solution.
Create the condensation of graph, based on step 1 information, with saving the information about which component can be visited from other. Again, there is a NetworkX method for this.
Now you can easily say, which nodes from your set are in the same component, so a path containing all of them definitely exists.
After that all you need to check is a connectivity between different components for your nodes. For example, you can get the topological sort of condensation and do check in linear time again.

Related

Add extra cost depending on length of path

I have a graph/network that obviously consists of some nodes and some edges. Each edge has a weight attached to it, or in this case a cost. Each edge also have a distance attached to it AND a type. So basically the weight/cost is pre-calculated from the distance of the edge along with some other metrics for both type of edges.
However, in my case I would like there to be added some additional cost for let's say every 100 distance or so, but only for one type of edge.But I'm not even certain if it is possible to add additional cost/distance depending on the sum of the previous steps in the path in algorithms such as Dijkstra's ?
I know I could just divide the cost into each distance unit, and thus getting a somewhat estimate. The problem there would be the edge cases, where the cost would be almost double at distance 199 compared to adding the cost at exactly each 100 distance, i.e. adding cost at 100 and 200.
But maybe there are other ways to get around this ?
I think you cannot implement this using Dijkstra, because you would validate the invariant, which is needed for correctness (see e.g. wikipedia). In each step, Dijkstra builds on this invariant, which more or less states, that all "already found paths" are optimal, i.e. shortest. But to show that it does not hold in your case of "additional cost by edge type and covered distance", let's have a look at a counterexample:
Counterexample against Usage of Dijkstra
Assume we have two types of edges, first type (->) and second type (=>). The second type has an additional cost of 10 after a total distance of 10. Now, we take the following graph, with the following edges
start -1-> u_1
start -1-> u_2
start -1-> u_3
...
start -1-> u_7
u_7 -1-> v
start =7=> v
v =4=> end
When, we play that through with Dijkstra (I skip all intermediate steps) with start as start node and end as target, we will first retrieve the path start=7=>v. This path has a length of 7 and that is shorter than the "detour" start-1->u_1-1->... -1->u_7->v, which has a length of 8. However, in the next step, we have to choose the edge v=4=>end, which makes the first path to a total of 21 (11 original + 10 penalty). But the detour path becomes now shorter with a length of 12=8+4 (no penalty).
In short, Dijkstra is not applicable - even if you modify the algorithm to take the "already found path" into account for retrieving the cost of next edges.
Alternative?
Maybe you can build your algorithm around a variant of Dijkstra, which usually retrieves multiple (suboptimal) solutions. First, you would need to extend Dijkstra, so that it takes the already found path into account. (In this function replace cost = weight(v, u, e) with cost = weight(v, u, e, paths[v]) and write a suitable function to calculate the penalty based on the previous path and the considered edge). Afterwards, remove edges from your original optimal solution and iterate the procedure to find a new alternative shortest path. However, I see no easy way of selecting which edge to remove from the graph-beside those from your penalty type-and the runtime complexity is probably awful.

Algorithm designing

Which one is more suitable for designing an algorithm that produces all the paths between two vertices in a directed graph?
Backtracking
Divide and conquer
Greedy approach
Dynamic programming
I was thinking of Backtracking due to the BFS and DFS, but I am not sure. Thank you.
Note that there can be an exponential number of paths in your output.
Indeed, in a directed graph of n vertices having an edge i -> j for every pair i < j, there are 2n-2 paths from 1 to n: each vertex except the endpoints can be either present in the path or omitted.
So, if we really want to output all paths (and not, e.g, make a clever lazy structure to list them one by one later) no advanced technique can help achieve polynomial complexity here.
The simplest way to find all the simple paths is recursively constructing a path, and adding the current path to the answer once we arrive at the end vertex.
To improve it, we can use backtracking.
Indeed, for each vertex, we can first compute whether the final vertex is reachable from it, and do so in polynomial time.
Later, we just use only the vertices for which the answer was positive.

Computing many shortest paths in graph

I have a large (weighted, directed) graph (>100,000 nodes) and I want to compute a large number of random shortest paths in that graph. So I want to randomly select two nodes (let's say k times) and compute the shortest path. One way to do this is using either the networkx or the igraph module and doing a for loop as in
pairs=np.random.choice(np.arange(0,len(graph.nodes)), [k,2])
for pair in pairs:
graph.get_shortest_paths(pair[0],pair[1], weights='weight')
This works, but it takes a long time. Especially, compared to computing all paths for a particular source node. Essentially, in every iteration the process loads the graph again and starts the process from scratch. So is there a way to benefit from loading the graph structure in to memory and not redoing this in each iteration without computing all shortest paths (which would take too long given that those would be n*(n-1) paths).
Phrased differently, can I compute a random subset of all shortest paths in an efficient way?
AFAIK, the operations are independent of each other, so running them in parallel could work (pseudocode):
import dask
#dask.delayed
def short_path(graph, pair):
return graph.get_shortest_paths(pair[0],pair[1], weights='weight')
pairs=np.random.choice(np.arange(0,len(graph.nodes)), [k,2])
results = dask.compute(*[short_path(pair) for pair in pairs])

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

How do I find the path with the biggest sum of weights in a weighted graph?

I have a bunch of objects with level, weight and 0 or more connections to objects of the next levels. I want to know how do I get the "heaviest" path (with the biggest sum of weights).
I'd also love to know of course, what books teach me how to deal with graphs in a practical way.
Your graph is acyclic right? (I presume so, since a node always points to a node on the next level). If your graph can have arbritrary cycles, the problem of finding the largest path becomes NP-complete and brute force search becomes the only solution.
Back to the problem - you can solve this by finding, for each node, the heaviest path that leads up to it. Since you already have a topological sort of your DAG (the levels themselves) it is straighfoward to find the paths:
For each node, store the cost of the heaviest path that leads to it and the last node before that on the said path. Initialy, this is always empty (but a sentinel value, like a negative number for the cost, might simplify code later)
For nodes in the first level, you already know the cost of the heaviest path that ends in them - it is zero (and the parent node is None)
For each level, propagate the path info to the next level - this is similar to a normal algo for shortest distance:
for level in range(nlevels):
for node in nodes[level]:
cost = the cost to this node
for (neighbour_vertex, edge_cost) in (the nodes edges):
alt_cost = cost + edge_cost
if alt_cost < cost_to_that_vertex:
cost_to_that_vertex = alt_cost
My book recommendation is Steve Skiena's "Algorithm Design Manual". There's a nice chapter on graphs.
I assume that you can only go down to a lower level in the graph.
Notice how the graph forms a tree. Then you can solve this using recursion:
heaviest_path(node n) = value[n] + max(heaviest_path(children[n][0]), heaviest_path(children[n][1]), etc)
This can easily be optimized by using dynamic programming instead.
Start with the children with the lowest level. Their heaviest_path is just their own value. Keep track of this in an array. Then calculate the heaviest_path for then next level up. Then the next level up. etc.
The method which i generally use to find the 'heaviest' path is to negate the weights and then find the shortest path. there are good algorithms( http://en.wikipedia.org/wiki/Shortest_path_problem) to find the shortest path. But this method holds good as long as you do not have a positive-weight cycle in your original graph.
For graphs having positive-weight cycles the problem of finding the 'heaviest' path is NP-complete and your algorithm to find the heaviest path will have non-polynomial time complexity.

Categories