Finding the shortest path in a cyclical graph using Dijkstra - python

I have a cyclical directed graph. Below is the representation of the graph as a python dict
graph = {
'A': {'B': 5, 'D': 5, 'E': 7 },
'B': {'C': 4},
'C': {'D': 8, 'E': 2},
'D': {'C': 8, 'E': 6},
'E': {'B': 3}
}
I have wrote a simple implementation of a Dijkstra's shortest path. Which seems to work for given two points. Below is my implementation.
def shortestpath(self, start, end, visited=[],distances={},predecessors={}):
# initialize a big number
maxint = 10000
if start==end:
path=[]
while end != None:
path.append(end)
end=predecessors.get(end, None)
return distances[start], path[::-1]
# detect if it's the first time through, set current distance to zero
if not visited: distances[start]=0
# process neighbors as per algorithm, keep track of predecessors
for neighbor in self.graph[start]:
if neighbor not in visited:
neighbordist = distances.get(neighbor,maxint)
tentativedist = distances[start] + self.graph[start][neighbor]
if tentativedist < neighbordist:
distances[neighbor] = tentativedist
predecessors[neighbor]=start
# neighbors processed, now mark the current node as visited
visited.append(start)
# finds the closest unvisited node to the start
unvisiteds = dict((k, distances.get(k,maxint)) for k in self.graph if k not in visited)
closestnode = min(unvisiteds, key=unvisiteds.get)
# now we can take the closest node and recurse, making it current
return self.shortestpath(closestnode,end,visited,distances,predecessors)
now this simple implementation seems to work. For example if I do somthing like this
shortestpath('A', 'C')
it will give me the path and shortest weight
(9, ['A', 'B', 'C'])
in this case.
However, whenever I shortestpath('B', 'B') the program will break.
Now there is a shortest path from B to B since it is a cyclic graph the path is B-C-E-B. I just don't know how to check for that and modify the Dijktra's algorithm accordingly to have it check for cyclic cases like this one. Any suggestion is greatly appreciated. Thanks :)

Related

Optimizing python DFS (for loop is inefficient)

Given the following function, what would be the correct and pythonic way to archiving the same (and faster) result?
My code is not efficient and I believe I'm missing something that is staring at me.
The idea is to find a pattern that is [[A,B],[A,C],[C,B]] without having to generate additional permutations (since this will result in a higher processing time for the comparisons).
The length of the dictionary fed into find_path in real-life would be approximately 10,000, so having to iterate over that amount with the current code version below is not efficient.
from time import perf_counter
from typing import List, Generator, Dict
def find_path(data: Dict) -> Generator:
for first_pair in data:
pair1: List[str] = first_pair.split("/")
for second_pair in data:
pair2: List[str] = second_pair.split("/")
if pair2[0] == pair1[0] and pair2[1] != pair1[1]:
for third_pair in data:
pair3: List[str] = third_pair.split("/")
if pair3[0] == pair2[1] and pair3[1] == pair1[1]:
amount_pair_1: int = data.get(first_pair)[
"amount"
]
id_pair_1: int = data.get(first_pair)["id"]
amount_pair_2: int = data.get(second_pair)[
"amount"
]
id_pair_2: int = data.get(second_pair)["id"]
amount_pair_3: int = data.get(third_pair)[
"amount"
]
id_pair_3: int = data.get(third_pair)["id"]
yield (
pair1,
amount_pair_1,
id_pair_1,
pair2,
amount_pair_2,
id_pair_2,
pair3,
amount_pair_3,
id_pair_3,
)
raw_data = {
"EZ/TC": {"id": 1, "amount": 9},
"LM/TH": {"id": 2, "amount": 8},
"CD/EH": {"id": 3, "amount": 7},
"EH/TC": {"id": 4, "amount": 6},
"LM/TC": {"id": 5, "amount": 5},
"CD/TC": {"id": 6, "amount": 4},
"BT/TH": {"id": 7, "amount": 3},
"BT/TX": {"id": 8, "amount": 2},
"TX/TH": {"id": 9, "amount": 1},
}
processed_data = list(find_path(raw_data))
for i in processed_data:
print(("The path to traverse is:", i))
>> ('The path to traverse is:', (['CD', 'TC'], 4, 6, ['CD', 'EH'], 7, 3, ['EH', 'TC'], 6, 4))
>> ('The path to traverse is:', (['BT', 'TH'], 3, 7, ['BT', 'TX'], 2, 8, ['TX', 'TH'], 1, 9))
>> ('Time to complete', 5.748599869548343e-05)
# Timing for a simple ref., as mentioned above, the raw_data is a dict containing about 10,000 keys
You can't do that with this representation of the graph. This algorithm has O(|E|^3) time complexity. It is a good idea to store edges as array of lists, each list will store only adjacent vertexes. And then it is easy to do what you need. Fortunately, you can re-represent graph in O(|E|) time.
How to do that
We will store graph as array of vertices (but in this case because of string vertex-values we take a dictionary). We want to access in all neighbours by a vertex. Let's do that -- we will store in the array lists of all neighbours of the given vertex.
Now we just need to construct our structure by set of edges (aka row_data).
How to add an edge in graph? Easy! We should find a vertex from in our array and add a vertex to to the list of it's neighbours
So, the construct_graph function could be like:
def construct_graph(raw_data): # here we will change representation
graph = defaultdict(list) # our graph
for pair in raw_data: # go through every edge
u, v = pair.split("/") # get from and to vertexes
graph[u].append(v) # and add this edge in our structure
return graph # return our new graph to other functions
How to find path length 2
We will use dfs on our graph.
def dfs(g, u, dist): # this is a simple dfs function
if dist == 2: # we has a 'dist' from our start
return [u] # and if we found already answer, return it
for v in g.get(u, []): # otherwise check all neighbours of current vertex
ans = dfs(g, v, dist + 1) # run dfs in every neighbour with dist+1
if ans: # and if that dfs found something
ans.append(u) # store it in ouy answer
return ans # and return it
return [] # otherwise we found nothing
And then we just try it for every vertex.
def main():
graph = construct_graph(raw_data)
for v in graph.keys(): # here we will try to find path
ans = dfs(graph, v, 0) # starting with 0 dist
if ans: # and if we found something
print(list(reversed(ans))) # return it, but answer will be reversed

Updating priority queue python Dijkstras algorithm

I would like to understand in the following WORKING AND FINISHED code, why when updating pq_update, it is written as pq_update[neighbour][1].
Instead of writing pq_update[neighbour] (which is how I did it), it does not seem to change anything so why is it included ?
Thank you
import heapq
def dijkstra(graph, start):
distances = {vertex:float('inf') for vertex in graph}
pq = []
pq_update = {}
distances[start] = 0
for vertex, value in distances.items():
entry = [vertex, value]
heapq.heappush(pq, entry)
pq_update[vertex] = entry
while pq:
getmin = heapq.heappop(pq)[0]
for neighbour, distance_neigh in graph[getmin].items():
dist = distances[getmin] + distance_neigh
if dist < distances[neighbour]:
distances[neighbour] = dist
pq_update[neighbour][1] = dist # THIS LINE !!!
print(distances)
return distances
if __name__ == '__main__':
example_graph = {
'U': {'V': 2, 'W': 5, 'X': 1},
'V': {'U': 2, 'X': 2, 'W': 3},
'W': {'V': 3, 'U': 5, 'X': 3, 'Y': 1, 'Z': 5},
'X': {'U': 1, 'V': 2, 'W': 3, 'Y': 1},
'Y': {'X': 1, 'W': 1, 'Z': 1},
'Z': {'W': 5, 'Y': 1},
}
dijkstra(example_graph, 'X')
Note: the implementation you have is broken and doesn't correctly implement Dijkstra. More on that below.
The pq_update dictionary contains lists, each with two entries:
for vertex, value in distances.items():
entry = [vertex, value]
heapq.heappush(pq, entry)
pq_update[vertex] = entry
So pq_update[neighbour] is a list with both the vertex and the distance. You want to update the distance, not replace the [vertex, value] list, so pq_update[neighbour][1] is used.
Note that the entry list is also shared wit the heapq. The pq heap has a reference to the same list object, so changes to pq_update[neightbor][1] will also be visible in entries still to be processed on heap!
When you assign directly to pq_update[neighbour], you remove that connection.
The reason you don't see any difference is because the implementation of the algorithm is actually broken, as the heap is not used correctly. The heap is sorted by first by the first value in the list items you pushed in. In your code that's the node name, not the distance, and the heapq order of items is never updated when the distances in the list items are altered. Because the heapq is not used correctly, you always traverse the nodes in alphabetical order.
To use the heapq correctly, you need to put the edge length first, and you don't alter the values on the heap; if you use tuples you can't accidentally do this. You only need to push nodes onto the heap that you reached, really; you'll end up with multiple entries for some of the nodes (reached by multiple paths), but the heapq will still present the shortest path to that node first. Just keep a set of visited nodes so you know to skip any longer paths. The point is that you visit the shorter path to a given node before the longer path, and you don't need to alter the heapq items in-place to achieve that.
You could re-write your function (with better variable names) to:
def dijkstra(graph, start):
"""Visit all nodes and calculate the shortest paths to each from start"""
queue = [(0, start)]
distances = {start: 0}
visited = set()
while queue:
_, node = heapq.heappop(queue) # (distance, node), ignore distance
if node in visited:
continue
visited.add(node)
dist = distances[node]
for neighbour, neighbour_dist in graph[node].items():
if neighbour in visited:
continue
neighbour_dist += dist
if neighbour_dist < distances.get(neighbour, float('inf')):
heapq.heappush(queue, (neighbour_dist, neighbour))
distances[neighbour] = neighbour_dist
return distances

Bellman-ford algorithm with capacity constraint - Python

I am implementing a Bellman-ford shortest path algorithm. Based on the source and destination node, it outputs the shortest distance, and the path through a network.
Now, I need to add a capacity component to the algorithm. So if the demand is 2 but the capacity is 1, that path is no longer usable.
My initial idea was to add a dictionary for the capacity and a variable for the demand. Then if the demand exceeded the capacity of a node, the lenght of the path would be arbritrarily large. I was thinking something like:
if capacity[neighbour] < demand:
distance[neighbour], predecessor[neighbour] = 999
This gives me the following error message:
TypeError: '<' not supported between instances of 'dict' and 'int'
Is there a workaround for this issue, or could I potentially add the demand-constraint in a smarter way?
Full code:
source = 'e'
destination = 'd'
demand = 2
def bellman_ford(graph, source, capacity):
# Step 1: Prepare the distance and predecessor for each node
distance, predecessor = dict(), dict()
for node in capacity:
for node in graph:
distance[node], predecessor[node] = float('inf'), None
distance[source] = 0
# Step 2: Relax the edges
for _ in range(len(graph) - 1):
for node in graph:
for neighbour in graph[node]:
# If the distance between the node and the neighbour is lower than the current, store it
if distance[neighbour] > distance[node] + graph[node][neighbour]:
distance[neighbour], predecessor[neighbour] = distance[node] + graph[node][neighbour], node
if capacity[node] < demand:
distance[neighbour], predecessor[neighbour] = 100
# Step 3: Check for negative weight cycles
for node in graph:
for neighbour in graph[node]:
assert distance[neighbour] <= distance[node] + graph[node][neighbour], "Negative weight cycle."
return distance, predecessor
#Initial graph
graph = {
'a': {'b': 1, 'd': 1},
'b': {'c': 1, 'd': 2},
'c': {},
'd': {'b': 1, 'c': 8, 'e': 1},
'e': {'a': 2, 'd': 7}
}
capacity = {
'a': {'b': 4, 'd': 1},
'b': {'c': 5, 'd': 4},
'c': {},
'd': {'b': 1, 'c': 3, 'e': 3},
'e': {'a': 5, 'd': 3}
}
distance, predecessor = bellman_ford(graph, source, capacity)
print("The cost of shipping from from", source, "to", destination, "is", distance[destination])
for i in graph:
print("node",i,"is reached through node", predecessor[i])

Python Networkx : find all edges for a given path in a multiDiGraph

In my multi directed graph, I would like to find all the (simple) paths possible between 2 nodes. I manage to get all the path, but cannot distinguish which edge (given that it's a multiDiGraph) the source node takes to reach the target node.
For example I have A->B->C where there are multiple edges in parallele between (A,B) and (B,C). If I have let say 5 parallele edges for A->B and 2 parallele edges for B->C, the all_simple_path(graph, source='A', target='C') will return in total 7 paths, all are of course A->B->C
When using get_edge_data(), it returns ALL the parallele edge between each node. But what I want is to be able to list all the combinations edges taken by the specified nodes in the path.
Thank you !
Use "all_simple_edge_paths" . It will give index of the edges.
import networkx as nx
G = nx.MultiDiGraph()
G.add_edge(1, 2, **{'prop1': 'A', 'prop2': 'B'})
G.add_edge(1, 3, **{'prop1': 'A', 'prop2': 'C'})
G.add_edge(2, 3, **{'prop1': 'B', 'prop2': 'C'})
G.add_edge(2, 3, **{'prop1': 'B1', 'prop2': 'C1'})
# Our source and destination nodes
source = 1
destination = 3
paths = nx.all_simple_edge_paths(G, source, destination)
for path in paths:
print(" Path :: ")
for edge in path:
src = edge[0]
dst = edge[1]
print(str(src)+ " - "+str(dst)+ " :: "+str(G.get_edge_data(edge[0], edge[1])[edge[2]]))
I think OP doesn't need this answer but it can be useful for others.
networkx has no built-in functions to handle it so we have to do everything manually. nx.all_simple_paths() returns node lists so for MultiDiGraph there will be many repetitions. So firstly we remove them by converting the nx.all_simple_paths() output to set and then iterate for it. For every path we extract node pairs (for example: [1,2,3,4] -> [[1,2],[2,3],[3,4]]) and for each pair we get AtlasView of all edges between them. Here is the code for this algorithm:
import networkx as nx
from pprint import pprint
# Create the graph with unique edges to check the algorithm correctness
G = nx.MultiDiGraph()
G.add_edges_from([
[1,2],
[1,2],
[1,2],
[2,3],
[2,3],
[2,3],
[3,4],
[3,4],
[2,4]
])
G.add_edge(1,2,data='WAKA')
G.add_edge(2,3,data='WAKKA')
G.add_edge(2,4,data='WAKA-WAKA')
# Our source and destination nodes
source = 1
destination = 4
# All unique single paths, like in nx.DiGraph
unique_single_paths = set(
tuple(path) # Sets can't be used with lists because they are not hashable
for path in nx.all_simple_paths(G, source, destination)
)
combined_single_paths = []
for path in unique_single_paths:
# Get all node pairs in path:
# [1,2,3,4] -> [[1,2],[2,3],[3,4]]
pairs = [path[i: i + 2] for i in range(len(path)-1)]
# Construct the combined list for path
combined_single_paths.append([
(pair, G[pair[0]][pair[1]]) # Pair and all node between these nodes
for pair in pairs
])
pprint(combined_single_paths)
[[((1, 2), AtlasView({0: {}, 1: {}, 2: {}, 3: {'data': 'WAKA'}})),
((2, 3), AtlasView({0: {}, 1: {}, 2: {}, 3: {'data': 'WAKKA'}})),
((3, 4), AtlasView({0: {}, 1: {}}))],
[((1, 2), AtlasView({0: {}, 1: {}, 2: {}, 3: {'data': 'WAKA'}})),
((2, 4), AtlasView({0: {}, 1: {'data': 'WAKA-WAKA'}}))]]

Get parent node based on current node and edge value in dictionary in Python

I am making a uniform cost search algorithm (for fun, not an assignment or anything) and I have an array that keeps tracks of the nodes and edge values that it passes. However I am trying to get it to add the edge values together based on the path it took.
How does one get the parent node based on the current node and edge value?
Trail of nodes it has visited (for example) [Started from A, visited C, and now on B (from A)]:
Starting point: B
Graph of current node: {'A': 3, 'D': 8}
Trail {'A': 0, 'C': 2, 'B': 3}
Nodes_to_expand {'C': 2, 'B': 3, 'E': 5, 'D': 6}
Graph (Python dictionary):
graph = {'A': {'B':3, 'C':2, 'D': 6},
'B': {'A':3, 'D':8},
'C': {'D':7, 'E':5},
'D': {'E':-2},
'E':{}}

Categories