I am trying to generate all paths with at most 6 nodes from every origin to every destination in a fairly large network (20,000+ arcs). I am using networkx and python 2.7. For small networks, it works well but I need to run this for the whole network. I was wondering if there is a more efficient way to do this in python. My code contains a recursive function (see below). I am thinking about keeping some of the paths in memory so that I don't create them again for other paths but I am not sure how I can accomplish it fast. right now it can't finish even within a few days. 3-4 hours should be fine for my project.
Here is a sample that I created. Feel free to ignore print functions as I added them for illustration purposes. Also here is the sample input file. input
import networkx as nx
import pandas as pd
import copy
import os
class ODPath(object):
def __init__(self,pathID='',transittime=0,path=[],vol=0,OD='',air=False,sortstrat=[],arctransit=[]):
self.pathID = pathID
self.transittime = transittime
self.path = path
self.vol = vol
self.OD = OD
self.air = air
self.sortstrat = sortstrat # a list of sort strategies
self.arctransit = arctransit # keep the transit time of each arc as a list
def setpath(self,newpath):
self.path = newpath
def setarctransitpath(self,newarctransit):
self.arctransit = newarctransit
def settranstime(self,newtranstime):
self.transittime = newtranstime
def setpathID(self,newID):
self.pathID = newID
def setvol(self,newvol):
self.vol = newvol
def setOD(self,newOD):
self.OD = newOD
def setAIR(self,newairTF):
self.air = newairTF
def setsortstrat(self,newsortstrat):
self.sortstrat = newsortstrat
def find_allpaths(graph, start, end, pathx=ODPath(None,0,[],0,None,False)):
path = copy.deepcopy(pathx) #to make sure the original has not been revised
newpath = path.path +[start]
path.setpath(newpath)
if len(path.path) >6:
return []
if start == end:
return [path]
if (start) not in graph: #check if node:start exists in the graph
return []
paths = []
for node in graph[start]: #loop over all outbound nodes of starting point
if node not in path.path: #makes sure there are no cycles
newpaths = find_allpaths(graph,node,end,path)
for newpath in newpaths:
if len(newpath.path) < 7: #generate only the paths that are with at most 6 hops
paths.append(newpath)
return paths
def printallpaths(path_temp):
map(printapath,path_temp)
def printapath(path):
print path.path
filename='transit_sample1.csv'
df_t= pd.read_csv(filename,delimiter=',')
df_t = df_t.reset_index()
G=nx.from_pandas_dataframe(df_t, 'node1', 'node2', ['Transit Time'],nx.DiGraph())
allpaths=find_allpaths(G,'A','S')
printallpaths(allpaths)
I would really appreciate any help.
I actually asked this question previously about optimizing an algorithm I wrote previously using networkx. Essentially what you'll want to do is move away from a recursive function, and towards a solution that uses memoization like I did.
From here you can do further optimizations like using multiple cores, or picking the next node to traverse based on criteria such as degree.
Related
I'm trying to code a path finding algorithm, for this problem: You're given a grid with values, and your goal is to find the longest path from the highest point to the lowest point (least steep one).
I experimented with code I on here [https://stackoverflow.com/questions/68464767/find-a-path-from-grid-with-rules], but for large grids (e.g. 140x140). Since the code is checking every single possible route, it's taking really long to calculate it. Could I implement any stopping solution that would just not continue finding the path is the path isn't optimal ?
As i said earlier, i found code on here that works, but is just way too slow for me to use. Here's my code with the grid. The code works perfectly for smaller grids.
The code below itsn't full, because the grid is too large to be posted here. The full code can be found here - https://pastebin.com/q4nKS4rS
def find_paths_recursive(grid, current_path=[(136,136)], solutions=[]):
n = len(grid)
dirs = [(-1,0), (1,0), (0,1), (0,-1)]
last_cell = current_path[-1]
for x,y in dirs:
new_i = last_cell[0] + x
new_j = last_cell[1] + y
# Check if new cell is in grid
if new_i<0 or new_i>=n or new_j<0 or new_j>=n:
continue
# Check if new cell has bigger value than last
if grid[new_i][new_j] > grid[last_cell[0]][last_cell[1]]:
continue
# Check if new cell is already in path
if (new_i, new_j) in current_path:
continue
# Add cell to current path
current_path_copy = current_path.copy()
current_path_copy.append((new_i, new_j))
if new_i==0 and new_j ==0:
solutions.append(current_path_copy)
print(current_path_copy)
# Create new current_path array for every direction
find_paths_recursive(grid, current_path_copy, solutions)
return solutions
def compute_cell_values(grid1, solutions):
path_values = []
for solution in solutions:
solution_values = []
for cell in solution:
solution_values.append(grid1[cell[0]][cell[1]])
path_values.append(solution_values)
return path_values
grid1 = [...]
solutions = find_paths_recursive(grid1)
path_values = compute_cell_values(grid1, solutions)
print('Solutions:')
print(solutions)
print('Values:')
print(path_values)
I am trying to generate a D-ary balanced tree in python using the networkx package.
import networkx as nx
g=nx.Graph()
D= int(input("enter number of children of a node:"));
L=int(input("Enter the number of levels:"));
#variable to store the total number of nodes in the tree.
tot_node=0;
for i in range(0,L+1):
tot_node=tot_node+D**i;
for N in range(1,tot_node):
for j in range(N,N+D):
g.add_edge(N,j);
nx.draw(g);
For this I am getting the following tree for D=2 and L=3.
enter image description here
Can someone please point out the error in this code? I want to construct a balanced tree for any general D (the number of branches of a node).
I have updated the code again to make sure the general cases work. I hope I have not made this more complicated than necessary, I feel like there must be some simpler implementation, maybe one that relies on recursion.
Anyways, I have produced what I think is an acceptable result. Although it is not your code directly, I believe I have implemented something along the lines of the rudimentary solution you want:
import matplotlib.pyplot as plt
import networkx as nx
from networkx import Graph
#We make a node class to track which node to modify (modify here means add children to.)
class Node:
def __init__(self, node_id, has_children, not_connected):
self.node_id = node_id
self.has_children = has_children
self.not_connected = not_connected
def get_min_not_connected(nodes_tracker):
smallest = float('inf')
for node in nodes_tracker:
#print(f"Is the node {node.node_id} not connected: {node.not_connected}")
if node.node_id < smallest and node.not_connected:
smallest = node.node_id
return smallest-1
def construction_step(G, node_id, num_children, nodes_tracker):
#print(f"The range is {len(nodes_tracker)+1} to {len(nodes_tracker)+num_children+1}")
#I am just creating new Node objects to track which connections have been made here. Note how the third parameter of not connected is True.
nodes_tracker = nodes_tracker + [Node(i,False,True) for i in range(len(nodes_tracker)+1, len(nodes_tracker)+num_children+1)]
for i in range(1, num_children+1):
print(f'adding edge relation ({node_id}, {get_min_not_connected(nodes_tracker)+i})')
#Here I am adding the child nodes to the parent ones.
G.add_edge(node_id, get_min_not_connected(nodes_tracker)+i)
for i in range(1, num_children+1):
#print(get_min_not_connected(nodes_tracker))
nodes_tracker[get_min_not_connected(nodes_tracker)].not_connected = False
return nodes_tracker
#Hardcode inputs for your specific example.
#I am using num_children in place of your D variable.
num_children=3
L=2
G=nx.Graph()
#Create the central (initial) node and setup
total_nodes = 0
#correct formula is like 2^0+2^1+...+2^L
for i in range(0,L):
total_nodes += num_children**i
print(total_nodes)
nodes_tracker = [Node(1,False,False)]
#Create the actual d-ary graph here.
for i in range(1, total_nodes+1):
nodes_tracker = construction_step(G, i, num_children, nodes_tracker)
#print(len(nodes_tracker))
nx.draw(G);
plt.show()
For the output with your parameters D=2, L=3, I got:
To test a more general case, I used D=4, L=2 and I got:
And for fun D=5, L=3:
It works with bigger D and L as well, but the charts naturally look very ugly.
Thanks for your patience with this answer and I hope this helps.
I'm totally surprised with python networkx not supporting heavies path between specific 2 nodes
I have very big graph (DAG), ~70K nodes where there is a weight attributes on each edge (weight is >= 0)
I want to create a function take source and target and return the heaviest path between this specific 2 nodes.
I have tried using all_simple_path and implemented get_weight function that takes path and return total weight, as suggested in some solutions.
however all_simple_path never ends with this graph, the graph does not have any cycle for sure (ran networkx find_cycle function), this solution worked for very small graphs.
all suggested solutions I found here and other places return heaviest path in the whole graph (start to end), while dag has this function (dag_longest_path), but its not what I need.
Any networkx function or graphs lib in python I can use to get heavies path between 2 nodes ?
or any direction to achieve the requirement ?
Thanks in advance!
It's just the matter of summing up the weights (or any other edge numeric attribute) of edges in each path by iterating over the all_simple_paths and take the maximum value at the end:
import networkx as nx
import random
G = nx.complete_graph(10)
# add a random weight between 0 and 1 to each edge
for src, target, _ in G.edges.data():
G[src][target]['weight'] = round(random.random(), 2)
def aggregate_weights(G, path):
"""
Calculate sum of the weights in a path.
"""
return sum([G[i][i + 1]['weight'] for i in range(len(path) - 2)])
def find_heaviest_path(G, source, target):
"""
Find the heaviest path between source and target nodes.
"""
return max([aggregate_weights(G, path) for path in nx.all_simple_paths(G, source, target)])
Note: the above algorithm has a high time complexity.
I am working with a (number of) directed graphs with no cycles in them, and I have the need to find all simple paths between any two nodes. In general I wouldn't worry about the execution time, but I have to do this for very many nodes during very many timesteps - I am dealing with a time-based simulation.
I had tried in the past the facilities offered by NetworkX but in general I found them slower than my approach. Not sure if anything has changed lately.
I have implemented this recursive function:
import timeit
def all_simple_paths(adjlist, start, end, path):
path = path + [start]
if start == end:
return [path]
paths = []
for child in adjlist[start]:
if child not in path:
child_paths = all_simple_paths(adjlist, child, end, path)
paths.extend(child_paths)
return paths
fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())
number = 1000
stmnt = 'all_simple_paths(adjlist, 166, 180, [])'
setup = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print 'Elapsed: %0.2f ms'%(1000*elapsed)
On my computer, I get an average of 1.5 ms per iteration. I know this is a small number, but I have to do this operation very many times.
In case you're interested, I have uploaded a small file containing the adjacency list here:
adjlist
I am using adjacency lists as inputs, coming from a NetworkX DiGraph representation.
Any suggestion for improvements of the algorithm (i.e., does it have to be recursive?) or other approaches I may try are more than welcome.
Thank you.
Andrea.
You can save time without change the algorithm logic by caching result of shared sub-problems here.
For example, calling all_simple_paths(adjlist, 'A', 'D', []) in following graph will compute all_simple_paths(adjlist, 'D', 'E', []) multiple times:
Python has a built-in decorator lru_cache for this task. It uses hash to memorize the parameters so you will need to change adjList and path to tuple since list is not hashable.
import timeit
import functools
#functools.lru_cache()
def all_simple_paths(adjlist, start, end, path):
path = path + (start,)
if start == end:
return [path]
paths = []
for child in adjlist[start]:
if child not in path:
child_paths = all_simple_paths(tuple(adjlist), child, end, path)
paths.extend(child_paths)
return paths
fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())
# you can also change your data format in txt
adjlist = tuple(tuple(pair)for pair in adjlist)
number = 1000
stmnt = 'all_simple_paths(adjlist, 166, 180, ())'
setup = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print('Elapsed: %0.2f ms'%(1000*elapsed))
Running time on my machine:
- original: 0.86ms
- with cache: 0.01ms
And this method should only work when there's a lot shared sub-problems.
I am trying to simulate a random traversal through a directed networkx graph. The pseudo code is as follows
Create graph G with nodes holding the value true or false.
// true -> visited, false -> not visited
pick random node N from G
save N.successors as templist
while true
nooptions = false
pick random node N from templist
while N from templist has been visited
remove N from templist
pick random node N from templist
if templist is empty
nooptions = true
break
if nooptions = true
break
save N.successors as templist
Is there are a more efficient way of marking a path as traveled other than
creating a temporary list and removing the elements if they are marked as visited?
EDIT
The goal of the algorithm is to pick a node at random in the graph. Pick a random successor/child of that node. If it is unvisited, go there and mark it as visited. Repeat until there are either no successors/children or there are no unvisited successors/children
Depending on the size of your graph, you could use the built-in all_pairs_shortest_path function. Your function would then be basically:
G = nx.DiGraph()
<add some stuff to G>
# Get a random path from the graph
all_paths = nx.all_pairs_shortest_path(G)
# Choose a random source
source = random.choice(all_paths.keys())
# Choose a random target that source can access
target = random.choice(all_paths[source].keys())
# Random path is at
random_path = all_paths[source][target]
There doesn't appear to be a way to just generate the random paths starting at source that I saw, but the python code is accessible, and adding that feature would be straightforward I think.
Two other possibilities, which might be faster but a little more complicated/manual, would be to use bfs_successors, which does a breadth-first search, and should only include any target node once in the list. Not 100% sure on the format, so it might not be convenient.
You could also generate bfs_tree, which generates a subgraph with no cycles to all nodes that it can reach. That might actually be simpler, and probably shorter?
# Get random source from G.node
source = random.choice(G.node)
min_tree = nx.bfs_tree(G, source)
# Accessible nodes are any node in this list, except I need to remove source.
all_accessible = min_tree.node.keys()
all_accessible.remove(source)
target = random.choice(all_accessible.node.keys())
random_path = nx.shortest_path(G, source, target)