Python freezes while path-finding - python

I have a script that uses an adjacency graph and the BFS algorithm to find a path between two points. The graph has about 10,000 vertexes and the script is set up like this:
graph = {...
'9660': ['9661', '9662', '9659'],
'9661': ['9654', '9660'],
'9662': ['9660', '9663'],
'9663': ['9664', '9662'],
'9664': ['9665', '9663'],
...}
def bfs(graph, start, end):
# maintain a queue of paths
queue = []
# push the first path into the queue
queue.append([start])
while queue:
# get the first path from the queue
path = queue.pop(0)
# get the last node from the path
node = path[-1]
# path found
if node == end:
return path
# enumerate all adjacent nodes, construct a new path and push it into the queue
for adjacent in graph.get(node, []):
new_path = list(path)
new_path.append(adjacent)
queue.append(new_path)
print bfs(graph, '50', '8659')
Because this algorithm works on small adjacency graphs, I'm guessing python just takes a really long time to process a graph this size. My goal is to find the shortest path but that's currently out of the question if I cant even find one path. Is there a python solution to handling path-finding using large adjacency graphs? If so, could I get an example?

You're not keeping track of visited nodes, which can lead to lots of wasted time if your graph is not a directed acyclic graph. For example, if your graph is
{'A': ['B', 'C', 'D', 'E'],
'B': ['A', 'C', 'D'],
'C': ['A', 'B', 'D'],
'D': ['A', 'B', 'C'],
'E': ['F'],
'F': ['G'],
'G': ['H'],
...
'W': ['X'],
'X': ['Y'],
'Y': ['Z']}
calling bfs(graph, 'A', 'Z') with your algorithm would cycle unnecessarily through 'A', 'B', 'C' and 'D' before finally reaching Z. Whereas if you keep track of visited nodes, you only add the neighbors of 'A', 'B', 'C' and 'D' to the queue once each.
def bfs(graph, start, end):
# maintain a queue of paths
queue = []
# push the first path into the queue
queue.append([start])
# already visited nodes
visited = set()
while queue:
# get the first path from the queue
path = queue.pop(0)
# get the last node from the path
node = path[-1]
# if node has already been visited
if node in visited:
continue
# path found
if node == end:
return path
# enumerate all adjacent nodes, construct a new path and push it into the queue
else:
for adjacent in graph.get(node, []):
# add the path only if it's end node hasn't already been visited
if adjacent not in visited
new_path = list(path)
new_path.append(adjacent)
queue.append(new_path)
# add node to visited set
visited.add(node)
Using this version of the algorithm and the alphabet graph, here's what the queue and visited set would look like at the top of the while loop through the whole algorithm:
queue = [ ['A'] ]
visited = {}
queue = [ ['A', 'B'], ['A', 'C'], ['A', 'D'], ['A', 'E'] ]
visited = {'A'}
queue = [ ['A', 'C'], ['A', 'D'], ['A', 'E'], ['A', 'B', 'C'],
['A', 'B', 'D'] ]
visited = {'A', 'B'}
queue = [ ['A', 'D'], ['A', 'E'], ['A', 'B', 'C'], ['A', 'B', 'D'],
['A', 'C', 'D'] ]
visited = {'A', 'B', 'C'}
queue = [ ['A', 'E'], ['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'C', 'D'] ]
visited = {'A', 'B', 'C', 'D'}
queue = [ ['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'C', 'D'], ['A', 'E', 'F'] ]
visited = {'A', 'B', 'C', 'D', 'E'}
queue = [ ['A', 'B', 'D'], ['A', 'C', 'D'], ['A', 'E', 'F'] ]
visited = {'A', 'B', 'C', 'D', 'E'}
queue = [ ['A', 'C', 'D'], ['A', 'E', 'F'] ]
visited = {'A', 'B', 'C', 'D', 'E'}
queue = [ ['A', 'E', 'F'] ]
visited = {'A', 'B', 'C', 'D', 'E'}
queue = [ ['A', 'E', 'F', 'G'] ]
visited = {'A', 'B', 'C', 'D', 'E', 'F'}
queue = [ ['A', 'E', 'F', 'G', 'H'] ]
visited = {'A', 'B', 'C', 'D', 'E', 'F', 'G'}
...
...
queue = [ ['A', 'E', 'F', 'G', 'H', ..., 'X', 'Y', 'Z'] ]
visited = {'A', 'B', 'C', 'D', 'E', 'F', 'G', ..., 'X', 'Y'}
# at this point the algorithm will pop off the path,
# see that it reaches the goal, and return
This is much less work than adding paths like ['A', 'B', 'A', 'B', 'A', 'B', ...].

Related

Is it possible to add lists inside a list?

I created 2 lists in python `
ls = []
a = ['a','b','c','d','e','f']
i = 0
while i < 5:
x = a[-1]
a.pop(-1)
a.insert(0, x)
ls.insert(0, a)
i += 1
print(ls)
What I want to do is to add something from the list filled with letters into an empty list and making the result look like this
ls = [
['a','b','c','d','e','f'],
['f','a','b','c','d','e'],
['e','f','a','b','c','d'],
['d','e','f','a','b','c'],
['c','d','e','f','a','b'],
['b','c','d','e','f','a']
]
I would like to know where I made a mistake in python and the solution.
The list is a mutable object in python, so when you insert the list a in the ls, you are just adding a reference to the list a, instead of adding the whole value.
A workaround would be to insert a copy of a in the ls. One way to create a new copy of the list is using the list() on the list or you can use copy function from copy module. So doing ls.insert(0, a.copy()) would give the same result as below -
ls = []
a = ['a','b','c','d','e','f']
i = 0
while i < 5:
x = a[-1]
a.pop(-1)
a.insert(0, x)
ls.insert(0, list(a)) # updated this
i += 1
print(ls)
Output:
[['b', 'c', 'd', 'e', 'f', 'a'], ['c', 'd', 'e', 'f', 'a', 'b'], ['d', 'e', 'f', 'a', 'b', 'c'], ['e', 'f', 'a', 'b', 'c', 'd'], ['f', 'a', 'b', 'c', 'd', 'e']]
Another easy way to get your expected output would be to -
ls = []
a = ['a','b','c','d','e','f']
for i in range(6):
ls.append(a.copy())
a = [a[-1]] + a[:-1]
print(ls)
Output :
[['a', 'b', 'c', 'd', 'e', 'f'], ['f', 'a', 'b', 'c', 'd', 'e'], ['e', 'f', 'a', 'b', 'c', 'd'], ['d', 'e', 'f', 'a', 'b', 'c'], ['c', 'd', 'e', 'f', 'a', 'b'], ['b', 'c', 'd', 'e', 'f', 'a']]

How to flatten a list of sets to a list of lists? [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 1 year ago.
Given the list of sets s, I want to create the flattened_s as follows:
s = [{'a', 'b'}, {'c', 'd', 'e'}]
flattened_s = [['a', 'c'], ['a', 'd'], ['a', 'e'], ['b', 'c'], ['b', 'd'], ['b', 'e']]
This code can do the job:
flattened_s = []
for m in s[0]:
for n in s[1]:
flattened_s.append([m, n])
print(flattened_s)
However, if the list s is generalized to containing more than 2 sets in it, then how to do it? For example:
s = [{'a', 'b'}, {'c', 'd', 'e'}, {'f'}]
You could use itertools.product combined with the * operator:
from itertools import product
inp1 = [{'a', 'b'}, {'c', 'd', 'e'}]
inp2 = [{'a', 'b'}, {'c', 'd', 'e'}, {'f'}]
res1 = list(map(list, product(*inp1)))
# [['a', 'c'], ['a', 'e'], ['a', 'd'], ['b', 'c'], ['b', 'e'], ['b', 'd']]
res2 = list(map(list, product(*inp2)))
# [['a', 'c', 'f'],
# ['a', 'e', 'f'],
# ['a', 'd', 'f'],
# ['b', 'c', 'f'],
# ['b', 'e', 'f'],
# ['b', 'd', 'f']]

Path finder for network link wiring - follow-up

See the initial post on code review
Thanks to #Graipher who proposed the library called networkx in Python for answering my question. My code is now improved and cleaner:
# Path finder improved
class Edge:
# An Edge is a link (physical, radio, logical) between two assets/nodes/vertices
def __init__(self, sku, e1, e2, re1, re2):
# The SKU is the unique ID of the edge
# An edge two vertices that can be inversable (undirected edge)
self.sku = sku
self.sku_endpoint_1 = e1
self.sku_endpoint_2 = e2
self.reverse_sku_endpoint_1 = re1
self.reverse_sku_endpoint_2 = re2
# We can instanciante a edge like that
edge1 = Edge("Edge1","A", "B", "B", "A")
edge2 = Edge("Edge2","B", "C", "C", "B")
edge3 = Edge("Edge3","A", "C", "C", "A")
edge4 = Edge("Edge4","C", "D", "D", "C")
edge5 = Edge("Edge5","B", "E", "E", "B")
edge6 = Edge("Edge6","D", "E", "E", "D")
edges = [edge1, edge2, edge3, edge4, edge5, edge6]
# And then we can find all paths using #Graipher method
def solve(edges, source, target):
g = nx.Graph() # bi-directional edges.
for edge in edges:
g.add_edge(edge.sku_endpoint_1, edge.sku_endpoint_2, sku=edge.sku)
paths = nx.all_simple_paths(g, source=source, target=target)
index = 0
paths_dict = {}
# Creating the dict of paths with only the edgesku
for path in map(nx.utils.pairwise, paths):
paths_dict[index] = []
for edge in path:
paths_dict[index].append(g.edges[edge]["sku"])
index+=1
return paths_dict
But now, what about finding all paths with repeated nodes, but without repeating the same edge? I now see that the networkx library is explicitly not repeating nodes while searching paths...
But consider the following graph:
g.add_edges_from([("A", "B", {"sku": "Edge1"}),
("B", "C", {"sku": "Edge2"}),
("A", "C", {"sku": "Edge3"}),
("C", "D", {"sku": "Edge4"}),
("B", "E", {"sku": "Edge5"}),
("D", "E", {"sku": "Edge6"}),
("C", "E", {"sku": "Edge7"})]
The graph we see looks like that:
When we want to find all paths from A to D we also want find a path even if it uses an already discovered node (here it's C). The only rule we want is, not add a path that has the same edge aleady used (to prevent an infinite loop).
In this example one path that matching these rules for A to D is:
A->C : "Edge3"
C->E : "Edge7"
E->B : "Edge5"
B->C : "Edge2"
C->D : "Edge4"
Is there a way to do that with this library? Because with my code (see previous post on codereview) I was able to find these paths. But that's not very optimised because the program searches ALL paths and only then I remove duplicated and non meaningful paths.
Here is an attempt, but it's not so great since it doesn't track back all the way to 'a' and re-search all paths via a -> c etc...
If you swap the order of ['b', 'c'] you will get the example path you specified in your question here... Not ideal since it doesn't scale, but hopefully this might show you where I'm headed with this...
graph = {
'a': ['b','c'],
'b': ['a', 'c', 'e'],
'c': ['a','b','d','e'],
'd': ['c','e'],
'e': ['b','c','d']
}
def non_simple_paths(g, u, v):
from collections import defaultdict
paths = []
path = []
edges_used = defaultdict(bool)
def dfs(g, u, v):
for n in g[u]:
e = (u, n)
re = (n,u)
if edges_used[re] or edges_used[e]:
continue
elif v in e:
c = path[:]
c.append(e)
paths.append(c)
edges_used[e] = True
edges_used[re] = True
else:
path.append(e)
edges_used[e] = True
edges_used[re] = True
dfs(g, n, v)
if path:
path.pop() # going back to parent
return
dfs(g, u, v)
return paths
# ================================
ps = non_simple_paths(graph, 'a', 'd')
print(ps)
I thought a while about this interesting problem, but unfortunately don’t have a great answer.
1. Approach
My first approach was based on the following observation (I’m calling the the paths you are looking for edge-simple):
Each simple path is obviously edge-simple.
Each edge-simple graph can be reduced to a simple path by removing the cycles (loops) between multiple nodes.
To illustrate the 2. point, look at the path you used as an example path A-C-E-B-C-D. It has the node C twice, and, after removing the corresponding cycle C-E-B-C, it is simple: A-C-D.
My idea was to use the simple paths between two nodes as a basis for the edge-simple ones
simple_paths = list(nx.all_simple_paths(G, 'A', 'D'))
and add cycles to the nodes it contains (here constructed via the (full) corresponding directed graph)
H = nx.DiGraph()
H.add_edges_from(list(G.edges) + [(edge[1], edge[0]) for edge in G.edges])
cycles_basis = [cycle for cycle in nx.simple_cycles(H) if len(cycle) > 2]
But I got lost on the way through the second part ...
2. Approach
I ended up with a second approach that resembles the one given by #JordanSimba:
import networkx as nx
def remove_edge(G, node1, node2):
return G.edge_subgraph(list(set(G.edges)
.difference({(node1, node2), (node2, node1)})))
def all_edge_simple_paths(G, source, target):
paths = []
if source == target:
paths.append([source])
for node in G[source]:
G_sub = remove_edge(G, source, node)
if node in G_sub.nodes and target in G_sub.nodes:
paths.extend([[source] + path
for path in all_edge_simple_paths(G_sub, node, target)])
else:
if node == target:
paths.append([source, target])
return paths
With your graph
G = nx.Graph()
G.add_edges_from([('A', 'B'), ('B', 'C'), ('A', 'C'), ('C', 'D'), ('B', 'E'),
('D', 'E'), ('C', 'E')])
the result (all_edge_simple_paths(G, 'A', 'D')) is
[['A', 'B', 'C', 'D'],
['A', 'B', 'C', 'E', 'D'],
['A', 'B', 'E', 'D'],
['A', 'B', 'E', 'C', 'D'],
['A', 'C', 'B', 'E', 'D'],
['A', 'C', 'B', 'E', 'C', 'D'],
['A', 'C', 'D'],
['A', 'C', 'E', 'B', 'C', 'D'],
['A', 'C', 'E', 'D']]
If a small cycle gets added onto node D
G.add_edges_from([('D', 'F'), ('F', 'G'), ('G', 'D')])
the results includes it (both directions through the cycle)
[['A', 'B', 'C', 'D'],
['A', 'B', 'C', 'D', 'F', 'G', 'D'],
['A', 'B', 'C', 'D', 'G', 'F', 'D'],
['A', 'B', 'C', 'E', 'D'],
['A', 'B', 'C', 'E', 'D', 'F', 'G', 'D'],
['A', 'B', 'C', 'E', 'D', 'G', 'F', 'D'],
['A', 'B', 'E', 'D'],
['A', 'B', 'E', 'D', 'F', 'G', 'D'],
['A', 'B', 'E', 'D', 'G', 'F', 'D'],
['A', 'B', 'E', 'C', 'D'],
['A', 'B', 'E', 'C', 'D', 'F', 'G', 'D'],
['A', 'B', 'E', 'C', 'D', 'G', 'F', 'D'],
['A', 'C', 'B', 'E', 'D'],
['A', 'C', 'B', 'E', 'D', 'F', 'G', 'D'],
['A', 'C', 'B', 'E', 'D', 'G', 'F', 'D'],
['A', 'C', 'B', 'E', 'C', 'D'],
['A', 'C', 'B', 'E', 'C', 'D', 'F', 'G', 'D'],
['A', 'C', 'B', 'E', 'C', 'D', 'G', 'F', 'D'],
['A', 'C', 'D'],
['A', 'C', 'D', 'F', 'G', 'D'],
['A', 'C', 'D', 'G', 'F', 'D'],
['A', 'C', 'E', 'B', 'C', 'D'],
['A', 'C', 'E', 'B', 'C', 'D', 'F', 'G', 'D'],
['A', 'C', 'E', 'B', 'C', 'D', 'G', 'F', 'D'],
['A', 'C', 'E', 'D'],
['A', 'C', 'E', 'D', 'F', 'G', 'D'],
['A', 'C', 'E', 'D', 'G', 'F', 'D']]
To be honest: I’m not 100% sure it works correctly (for all situations). I just don't have the time for extensive testing. And the number of paths grows rapidly with increasing graph size, which makes it hard to keep track on what's going on.
And I have serious doubts regarding the efficiency. Someone pointed out to me recently that working with the subgraph view could slow things down. So maybe only a lower level implementation might produce the speed you’re looking for.
But maybe it helps.

Python DFS recursive function retaining values from previous call

I'm sorry if the title is misleading, but I could not put it in any other way.
I am trying to implement bfs and dfs in order to remember some concepts, but and odd behavior is going on with the recursive versions of the codes.
This is what is happening:
def rec_dfs(g, start_node, visited=[]):
visited.append(start_node)
for next_node in g[start_node]:
if next_node not in visited:
rec_dfs(g, next_node, visited)
return visited
graph2={'A': ['B', 'C', 'D'],
'B': ['A', 'E', 'F'],
'C': ['A', 'F'],
'D': ['A'],
'E': ['B'],
'F': ['B', 'C']}
rec_dfs(graph2, "A") #['A', 'B', 'E', 'F', 'C', 'D'] OK
rec_dfs(graph2, "A") #['A', 'B', 'E', 'F', 'C', 'D', 'A'] NOK
rec_dfs(graph2, "A") #['A', 'B', 'E', 'F', 'C', 'D', 'A', 'A'] NOK
It should always return the first case, but when I investigated I could see that the second call already had "visited" populated.
If I call the function like:
rec_dfs(graph2, "A", []) #['A', 'B', 'E', 'F', 'C', 'D'] OK
rec_dfs(graph2, "A", []) #['A', 'B', 'E', 'F', 'C', 'D'] OK
rec_dfs(graph2, "A", []) #['A', 'B', 'E', 'F', 'C', 'D'] OK
it works just fine...
I would really appreciate if someone could explain why this behavior is happening, and if there is a way to avoid it.
Thanks!
You're using visited array as a mutable default argument which is essentially initialized to an empty array only once at definition according to http://code.activestate.com/recipes/577786-smarter-default-arguments/.
During each subsequent call to rec_dfs(), if visited array is not explicitly re-initialized, it will maintain its state during each subsequent function call.

Python code to list dependencies, avoiding loops

Say you have a dictionary describing item dependencies, along the lines of:
deps = {
'A': ['B', 'C', 'D'],
'B': ['C', 'E'],
'C': ['D', 'F'],
'D': ['C', 'G'],
'E': ['A'],
'H': ['N'],
}
meaning that item 'A' depends on items 'B', 'C', and 'D', etc. Obviously, this could be of arbitrary complexity.
How do you write a function get_all_deps(item) that gives you a list of all the dependencies of item, without duplicates and without item. E.g.:
> get_all_deps('H')
['N']
> get_all_deps('A')
['B', 'C', 'D', 'E', 'F', 'G']
> get_all_deps('E')
['A', 'B', 'C', 'D', 'F', 'G']
I'm looking for concise code - ideally a single recursive function. Performance is not terribly important for my use case - we're talking about fairly small dependency graphs (e.g. a few dozen items)
you can use a stack/todo list to avoid recursive implementation:
deps = {
'A': ['B', 'C', 'D'],
'B': ['C', 'E'],
'C': ['D', 'F'],
'D': ['C', 'G'],
'E': ['A'],
'H': ['N'],
}
def get_all_deps(item):
todo = set(deps[item])
rval = set()
while todo:
subitem = todo.pop()
if subitem != item: # don't add start item to the list
rval.add(subitem)
to_add = set(deps.get(subitem,[]))
todo.update(to_add.difference(rval))
return sorted(rval)
print(get_all_deps('A'))
print(get_all_deps('E'))
print(get_all_deps('H'))
result:
['B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'F', 'G']
['N']
todo set contains the elements to be processed.
Pop one element and put it in return value list
Loop until no more elements (okay there's a loop in here)
add only the elements to process if they're not already in the return value.
return sorted list
The set difference avoids the problem with cyclic dependencies, and the "max recursion depth" is avoided. Only limit is system memory.

Categories