Optimizing a path finding alg. from grid with rules

Optimizing a path finding alg. from grid with rules - python

I'm trying to code a path finding algorithm, for this problem: You're given a grid with values, and your goal is to find the longest path from the highest point to the lowest point (least steep one).
I experimented with code I on here [https://stackoverflow.com/questions/68464767/find-a-path-from-grid-with-rules], but for large grids (e.g. 140x140). Since the code is checking every single possible route, it's taking really long to calculate it. Could I implement any stopping solution that would just not continue finding the path is the path isn't optimal ?
As i said earlier, i found code on here that works, but is just way too slow for me to use. Here's my code with the grid. The code works perfectly for smaller grids.
The code below itsn't full, because the grid is too large to be posted here. The full code can be found here - https://pastebin.com/q4nKS4rS
def find_paths_recursive(grid, current_path=[(136,136)], solutions=[]):
n = len(grid)
dirs = [(-1,0), (1,0), (0,1), (0,-1)]
last_cell = current_path[-1]
for x,y in dirs:
new_i = last_cell[0] + x
new_j = last_cell[1] + y
# Check if new cell is in grid
if new_i<0 or new_i>=n or new_j<0 or new_j>=n:
continue
# Check if new cell has bigger value than last
if grid[new_i][new_j] > grid[last_cell[0]][last_cell[1]]:
continue
# Check if new cell is already in path
if (new_i, new_j) in current_path:
continue
# Add cell to current path
current_path_copy = current_path.copy()
current_path_copy.append((new_i, new_j))
if new_i==0 and new_j ==0:
solutions.append(current_path_copy)
print(current_path_copy)
# Create new current_path array for every direction
find_paths_recursive(grid, current_path_copy, solutions)
return solutions
def compute_cell_values(grid1, solutions):
path_values = []
for solution in solutions:
solution_values = []
for cell in solution:
solution_values.append(grid1[cell[0]][cell[1]])
path_values.append(solution_values)
return path_values
grid1 = [...]
solutions = find_paths_recursive(grid1)
path_values = compute_cell_values(grid1, solutions)
print('Solutions:')
print(solutions)
print('Values:')
print(path_values)

Related

Why is ''.join() significantly slower than string concatenation here?

Problem link: 2096. Step-By-Step Directions From a Binary Tree Node to Another.
I was solving a problem that asks us to output the shortest step-by-step path between a start node and a destination node in a binary tree, where traveling to left and right children are denoted by ‘L’ and ‘R’, and traveling to a parent node is denoted by ‘U’. We are given the reference to the root node, and the number of nodes n can be as large as 100,000.
My solution is as follows:
Run a BFS to find both start and destination nodes, while generating the paths along the way using arrays. Once we find both nodes, return the path arrays.
Now that we have paths to both start and destination nodes, find the lowest common ancestor of the two nodes by throwing out any initial common letters (e.g. if startPath = [‘L’, ‘R’, ‘L’] and destPath = [‘L’, ‘L’, ‘R’], the lowest common ancestor is root.left, and the remaining startPath from this lowest common ancestor is [‘R’, ‘L’].)
Finally, to get the path between startNode and destNode, we can convert all remaining startPath letters to ‘U’s, and then add on the remaining destPath.
The relevant code pertaining to the question is as follows:
def bfs_path2(root, target1, target2):
q = deque([(root, [])])
while q and not (found1 and found2):
node, path = q.popleft()
if node.left:
q.append((node.left, path + ['L']))
if node.right:
q.append((node.right, path + ['R']))
return path1, path2
s_path, t_path = bfs_path2(root, startValue, destValue)
# i, j legal indices of s_path and t_path
return "".join(['U']*len(s_path[i:]) + t_path[j:])
The runtime of this against a large test case was >10s. However, if I change the implementation of the BFS queue elements to strings, it runs in ~3s:
def bfs_path2(root, target1, target2):
q = deque([(root, '')])
while q and not (found1 and found2):
node, path = q.popleft()
if node.left:
q.append((node.left, path + 'L'))
if node.right:
q.append((node.right, path + 'R'))
return path1, path2
s_path, t_path = bfs_path2(root, startValue, destValue)
# i, j legal indices of s_path and t_path
return 'U'*len(s_path[i:]) + t_path[j:]
There are many links on StackOverflow showing that string concatenation is far slower than the join() method in Python, so I’m confused as to why the first code runtime is much slower than the second code runtime. Am I missing something here?

What is best: Global Variable or Parameter in this python function?

I have a question about the following code, but i guess applies to different functions.
This function computes the maximum path and its length for a DAG, given the Graph, source node, and end node.
To keep track of already computed distances across recursions I use "max_distances_and_paths" variable, and update it on each recursion.
Is it better to keep it as a function parameter (inputed and outputed across recursions) or
use a global variable and initialize it outside the function?
How can avoid to have this parameter returned when calling the function externally (i.e it
has to be outputed across recursions but I dont care about its value, externally)?
a better way than doing: LongestPath(G, source, end)[0:2] ??
Thanks
# for a DAG computes maximum distance and maximum path nodes sequence (ordered in reverse).
# Recursively computes the paths and distances to edges which are adjacent to the end node
# and selects the maximum one
# It will return a single maximum path (and its distance) even if there are different paths
# with same max distance
# Input {Node 1: adj nodes directed to Node 1 ... Node N: adj nodes directed to Node N}
# Example: {'g': ['r'], 'k': ['g', 'r']})
def LongestPath(G, source, end, max_distances_and_paths=None):
if max_distances_and_paths is None:
max_distances_and_paths = {}
max_path = [end]
distances_list = []
paths_list = []
# return max_distance and max_path from source to current "end" if already computed (i.e.
# present in the dictionary tracking maximum distances and correspondent distances)
if end in max_distances_and_paths:
return max_distances_and_paths[end][0], max_distances_and_paths[end][1], max_distances_and_paths
# base case, when end node equals source node
if source == end:
max_distance = 0
return max_distance, max_path, max_distances_and_paths
# if there are no adjacent nodes directed to end node (and is not the source node, previous case)
# means path is disconnected
if len(G[end]) == 0:
return 0, [0], {"": []}
# for each adjacent node pointing to end node compute recursively its max distance to source node
# and add one to get the distance to end node. Recursively add nodes included in the path
for t in G[end]:
sub_distance, sub_path, max_distances_and_paths = LongestPath(G, source, t, max_distances_and_paths)
paths_list += [[end] + sub_path]
distances_list += [1 + sub_distance]
# compute max distance
max_distance = max(distances_list)
# access the same index where max_distance is, in the list of paths, to retrieve the path
# correspondent to the max distance
index = [i for i, x in enumerate(distances_list) if x == max_distance][0]
max_path = paths_list[index]
# update the dictionary tracking maximum distances and correspondent paths from source
# node to current end node.
max_distances_and_paths.update({end: [max_distance, max_path]})
# return computed max distance, correspondent path, and tracker
return max_distance, max_path, max_distances_and_paths

Global variables are generally avoided due to several reasons (see Why are global variables evil?). I would recommend sending the parameter in this case. However, you could define a larger function housing your recursive function. Here's a quick example I wrote for a factorial code:
def a(m):
def b(m):
if m<1:return 1
return m*b(m-1)
n = b(m)
m=m+2
return n,m
print(a(6))
This will give: (720, 8). This proves that even if you used the same variable name in your recursive function, the one you passed in to the larger function will not change. In your case, you want to just return n as per my example. I only returned an edited m value to show that even though both a and b functions have m as their input, Python separates them.

In general I would say avoid the usage of global variables. This is because is makes you code harder to read and often more difficult to debug if you codebase gets a bit more complex. So it is good practice.
I would use a helper function to initialise your recursion.
def longest_path_helper(G, source, end, max_distances_and_paths=None):
max_distance, max_path, max_distances_and_paths = LongestPath(
G, source, end, max_distances_and_paths
)
return max_distance, max_path, max_distances_and_paths
On a side note, in Python it is convention to write functions without capital letters and separated with underscores and Capicalized without underscores are used for classes. So it would be more Pythonic to use def longest_path():

Generate all paths in an efficient way using networkx in python

I am trying to generate all paths with at most 6 nodes from every origin to every destination in a fairly large network (20,000+ arcs). I am using networkx and python 2.7. For small networks, it works well but I need to run this for the whole network. I was wondering if there is a more efficient way to do this in python. My code contains a recursive function (see below). I am thinking about keeping some of the paths in memory so that I don't create them again for other paths but I am not sure how I can accomplish it fast. right now it can't finish even within a few days. 3-4 hours should be fine for my project.
Here is a sample that I created. Feel free to ignore print functions as I added them for illustration purposes. Also here is the sample input file. input
import networkx as nx
import pandas as pd
import copy
import os
class ODPath(object):
def __init__(self,pathID='',transittime=0,path=[],vol=0,OD='',air=False,sortstrat=[],arctransit=[]):
self.pathID = pathID
self.transittime = transittime
self.path = path
self.vol = vol
self.OD = OD
self.air = air
self.sortstrat = sortstrat # a list of sort strategies
self.arctransit = arctransit # keep the transit time of each arc as a list
def setpath(self,newpath):
self.path = newpath
def setarctransitpath(self,newarctransit):
self.arctransit = newarctransit
def settranstime(self,newtranstime):
self.transittime = newtranstime
def setpathID(self,newID):
self.pathID = newID
def setvol(self,newvol):
self.vol = newvol
def setOD(self,newOD):
self.OD = newOD
def setAIR(self,newairTF):
self.air = newairTF
def setsortstrat(self,newsortstrat):
self.sortstrat = newsortstrat
def find_allpaths(graph, start, end, pathx=ODPath(None,0,[],0,None,False)):
path = copy.deepcopy(pathx) #to make sure the original has not been revised
newpath = path.path +[start]
path.setpath(newpath)
if len(path.path) >6:
return []
if start == end:
return [path]
if (start) not in graph: #check if node:start exists in the graph
return []
paths = []
for node in graph[start]: #loop over all outbound nodes of starting point
if node not in path.path: #makes sure there are no cycles
newpaths = find_allpaths(graph,node,end,path)
for newpath in newpaths:
if len(newpath.path) < 7: #generate only the paths that are with at most 6 hops
paths.append(newpath)
return paths
def printallpaths(path_temp):
map(printapath,path_temp)
def printapath(path):
print path.path
filename='transit_sample1.csv'
df_t= pd.read_csv(filename,delimiter=',')
df_t = df_t.reset_index()
G=nx.from_pandas_dataframe(df_t, 'node1', 'node2', ['Transit Time'],nx.DiGraph())
allpaths=find_allpaths(G,'A','S')
printallpaths(allpaths)
I would really appreciate any help.

I actually asked this question previously about optimizing an algorithm I wrote previously using networkx. Essentially what you'll want to do is move away from a recursive function, and towards a solution that uses memoization like I did.
From here you can do further optimizations like using multiple cores, or picking the next node to traverse based on criteria such as degree.

Variant of Dijkstra - no repeat groups

I'm trying to write an optimization process based on Dijkstra's algorithm to find the optimal path, but with a slight variation to disallow choosing items from the same group/family when finding the optimal path.
Brute force traversal of all edges to find the solution would be np-hard, which is why am attempting to (hopefully) use Dijkstra's algorithm, but I'm struggling to add in the no-repeat groups logic.
Think of it like a traveling salesman problem, but I want to travel from New Your to Los Angels, and have an interesting route (by never visiting 2 similar cities from same group) and minimize my fuel costs. There are approx 15 days and 40 cities, but for defining my program, I've pared it down to 4 cities and 3 days.
Valid paths don't have to visit every group, they just can't visit 2 cities in the same group. {XL,L,S} is a valid solution, but {XL,L,XL} is not valid because it visits the XL group twice. All Valid solutions will be the same length (15 days or edges) but can use any combination of groups (w/out duplicating groups) and need not use them all (since 15 days, but 40 different city groups).
Here's a picture I put together to illustrate a valid & invalid route: (FYI - groups are horizontal rows in the matrix)
**Day 1**
G1->G2 # $10
G3->G4 # $30
etc...
**Day 2**
G1->G3 # $50
G2->G4 # $10
etc...
**Day 3**
G1->G4 # $30
G2->G3 # $50
etc...
The optimal path would be G1->G2->G3, however a standard Dijkstra solution returns G1-
I found & tweaked this example code online, and name my nodes with the following syntax so I can quickly check what day & group they belong to: D[day#][Group#] by slicing the 3rd character.
## Based on code found here: https://raw.githubusercontent.com/nvictus/priority-queue-dictionary/0eea25fa0b0981558aa780ec5b74649af83f441a/examples/dijkstra.py
import pqdict
def dijkstra(graph, source, target=None):
"""
Computes the shortests paths from a source vertex to every other vertex in
a graph
"""
# The entire main loop is O( (m+n) log n ), where n is the number of
# vertices and m is the number of edges. If the graph is connected
# (i.e. the graph is in one piece), m normally dominates over n, making the
# algorithm O(m log n) overall.
dist = {}
pred = {}
predGroups = {}
# Store distance scores in a priority queue dictionary
pq = pqdict.PQDict()
for node in graph:
if node == source:
pq[node] = 0
else:
pq[node] = float('inf')
# Remove the head node of the "frontier" edge from pqdict: O(log n).
for node, min_dist in pq.iteritems():
# Each node in the graph gets processed just once.
# Overall this is O(n log n).
dist[node] = min_dist
if node == target:
break
# Updating the score of any edge's node is O(log n) using pqdict.
# There is _at most_ one score update for each _edge_ in the graph.
# Overall this is O(m log n).
for neighbor in graph[node]:
if neighbor in pq:
new_score = dist[node] + graph[node][neighbor]
#This is my attempt at tracking if we've already used a node in this group/family
#The group designator is stored as the 4th char in the node name for quick access
try:
groupToAdd = node[2]
alreadyVisited = predGroups.get( groupToAdd, False )
except:
alreadyVisited = False
groupToAdd = 'S'
#Solves OK with this line
if new_score < pq[neighbor]:
#Erros out with this line version
#if new_score < pq[neighbor] and not( alreadyVisited ):
pq[neighbor] = new_score
pred[neighbor] = node
#Store this node in the "visited" list to prevent future duplication
predGroups[groupToAdd] = groupToAdd
print predGroups
#print node[2]
return dist, pred
def shortest_path(graph, source, target):
dist, pred = dijkstra(graph, source, target)
end = target
path = [end]
while end != source:
end = pred[end]
path.append(end)
path.reverse()
return path
if __name__=='__main__':
# A simple edge-labeled graph using a dict of dicts
graph = {'START': {'D11':1,'D12':50,'D13':3,'D14':50},
'D11': {'D21':5},
'D12': {'D22':1},
'D13': {'D23':50},
'D14': {'D24':50},
'D21': {'D31':3},
'D22': {'D32':5},
'D23': {'D33':50},
'D24': {'D34':50},
'D31': {'END':3},
'D32': {'END':5},
'D33': {'END':50},
'D34': {'END':50},
'END': {'END':0}}
dist, path = dijkstra(graph, source='START')
print dist
print path
print shortest_path(graph, 'START', 'END')

More efficient way of running a random traversal of a directed graph with Networkx

I am trying to simulate a random traversal through a directed networkx graph. The pseudo code is as follows
Create graph G with nodes holding the value true or false.
// true -> visited, false -> not visited
pick random node N from G
save N.successors as templist
while true
nooptions = false
pick random node N from templist
while N from templist has been visited
remove N from templist
pick random node N from templist
if templist is empty
nooptions = true
break
if nooptions = true
break
save N.successors as templist
Is there are a more efficient way of marking a path as traveled other than
creating a temporary list and removing the elements if they are marked as visited?
EDIT
The goal of the algorithm is to pick a node at random in the graph. Pick a random successor/child of that node. If it is unvisited, go there and mark it as visited. Repeat until there are either no successors/children or there are no unvisited successors/children

Depending on the size of your graph, you could use the built-in all_pairs_shortest_path function. Your function would then be basically:
G = nx.DiGraph()
<add some stuff to G>
# Get a random path from the graph
all_paths = nx.all_pairs_shortest_path(G)
# Choose a random source
source = random.choice(all_paths.keys())
# Choose a random target that source can access
target = random.choice(all_paths[source].keys())
# Random path is at
random_path = all_paths[source][target]
There doesn't appear to be a way to just generate the random paths starting at source that I saw, but the python code is accessible, and adding that feature would be straightforward I think.
Two other possibilities, which might be faster but a little more complicated/manual, would be to use bfs_successors, which does a breadth-first search, and should only include any target node once in the list. Not 100% sure on the format, so it might not be convenient.
You could also generate bfs_tree, which generates a subgraph with no cycles to all nodes that it can reach. That might actually be simpler, and probably shorter?
# Get random source from G.node
source = random.choice(G.node)
min_tree = nx.bfs_tree(G, source)
# Accessible nodes are any node in this list, except I need to remove source.
all_accessible = min_tree.node.keys()
all_accessible.remove(source)
target = random.choice(all_accessible.node.keys())
random_path = nx.shortest_path(G, source, target)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing a path finding alg. from grid with rules - python

Related

Why is ''.join() significantly slower than string concatenation here?

What is best: Global Variable or Parameter in this python function?

Generate all paths in an efficient way using networkx in python

Variant of Dijkstra - no repeat groups

More efficient way of running a random traversal of a directed graph with Networkx

Categories

Resources