Maximum recursion depth error - python

I've been scouring the internet and haven't been able to find anything that would help. I'm running a basic dfs algorithm in python. My error is in the explore subroutine of dfs
def dfs(graph):
for node in graph:
if not node in visited:
explore(graph, node)
def explore(graph, v):
visited.append(v)
adjNode = graph[v]
for i in range(0, len(adjNode)):
if not adjNode[i] in visited:
explore(graph, adjNode[i])
visited is a list I'm using to keep track of visited nodes and graph is a dictionary that holds the graph.
With a standard recursion limit of 1000 I get this error
File "2breakDistance.py", line 45, in explore
explore(graph, adjNode[i], cycles)
File "2breakDistance.py", line 45, in explore
explore(graph, adjNode[i], cycles)
File "2breakDistance.py", line 45, in explore
explore(graph, adjNode[i], cycles)
File "2breakDistance.py", line 45, in explore
explore(graph, adjNode[i], cycles)
File "2breakDistance.py", line 41, in explore
adjNode = graph[v]
RuntimeError: maximum recursion depth exceeded in cmp
First of all, I'm not quite sure why the error is occurring in adjNode = graph[v] since explore is the recursive call and adjNode is just list assignment.
But to deal with the recursion error, I increased the recursion limit with sys.setrecursionlimit(5000) I no longer get the error, but the program quits right before the adjNode = graph[v] line and exits with no error. It never even reaches the end of dfs so I'm not quite sure what's going on. Thanks for reading all this and for any help!

Python is not very good at recursion. It doesn't do any tail-call optimization, and runs out of frame space pretty quickly unless you manually change the recursion limit. It's also slower to lookup and call a function again instead of keeping code in a loop without recursion.
Try rewriting this without recursion instead. The error can be happening anywhere a new frame is created, not just where your recursive call is, which is why the error is happening there.

def explore(graph, v):
visited.append(v)
adjNode = graph[v]
for i in range(0, len(adjNode)):
if not adjNode[i] in visited:
explore(graph, adjNode[i])
This isn't making sense to me, what is in a node object? Why are you assigning adjNode back to the value of the node that you pass in. Is adjNode meant to be calling like a "GetConnections()" sort of function instead?
The logic currently feels like it should be this:
1. For each Node in Nodes:
2. Add Node to visited
Get Nodes Connections
Explore SubNodes:
Go to 2.

Related

Maximum recursion depth exceeded. Multiprocessing and bs4

I'm trying to make a parser use beautifulSoup and multiprocessing. I have an error:
RecursionError: maximum recursion depth exceeded
My code is:
import bs4, requests, time
from multiprocessing.pool import Pool
html = requests.get('https://www.avito.ru/moskva/avtomobili/bmw/x6?sgtd=5&radius=0')
soup = bs4.BeautifulSoup(html.text, "html.parser")
divList = soup.find_all("div", {'class': 'item_table-header'})
def new_check():
with Pool() as pool:
pool.map(get_info, divList)
def get_info(each):
pass
if __name__ == '__main__':
new_check()
Why I get this error and how I can fix it?
UPDATE:
All text of error is
Traceback (most recent call last):
File "C:/Users/eugen/PycharmProjects/avito/main.py", line 73, in <module> new_check()
File "C:/Users/eugen/PycharmProjects/avito/main.py", line 67, in new_check
pool.map(get_info, divList)
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 644, in get
raise self._value
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 424, in _handle_tasks
put(task)
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RecursionError: maximum recursion depth exceeded
When you use multiprocessing, everything you pass to a worker has to be pickled.
Unfortunately, many BeautifulSoup trees can't be pickled.
There are a few different reasons for this. Some of them are bugs that have since been fixed, so you could try making sure you have the latest bs4 version, and some are specific to different parsers or tree builders… but there's a good chance nothing like this will help.
But the fundamental problem is that many elements in the tree contain references to the rest of the tree.
Occasionally, this leads to an actual infinite loop, because the circular references are too indirect for its circular reference detection. But that's usually a bug that gets fixed.
But, even more importantly, even when the loop isn't infinite, it can still drag in more than 1000 elements from all over the rest of the tree, and that's already enough to cause a RecursionError.
And I think the latter is what's happening here. If I take your code and try to pickle divList[0], it fails. (If I bump the recursion limit way up and count the frames, it needs a depth of 23080, which is way, way past the default of 1000.) But if I take that exact same div and parse it separately, it succeeds with no problem.
So, one possibility is to just do sys.setrecursionlimit(25000). That will solve the problem for this exact page, but a slightly different page might need even more than that. (Plus, it's usually not a great idea to set the recursion limit that high—not so much because of the wasted memory, but because it means actual infinite recursion takes 25x as long, and 25x as much wasted resources, to detect.)
Another trick is to write code that "prunes the tree", eliminating any upward links from the div before/as you pickle it. This is a great solution, except that it might be a lot of work, and requires diving into the internals of how BeautifulSoup works, which I doubt you want to do.
The easiest workaround is a bit clunky, but… you can convert the soup to a string, pass that to the child, and have the child re-parse it:
def new_check():
divTexts = [str(div) for div in divList]
with Pool() as pool:
pool.map(get_info, divTexts)
def get_info(each):
div = BeautifulSoup(each, 'html.parser')
if __name__ == '__main__':
new_check()
The performance cost for doing this is probably not going to matter; the bigger worry is that if you had imperfect HTML, converting to a string and re-parsing it might not be a perfect round trip. So, I'd suggest that you do some tests without multiprocessing first to make sure this doesn't affect the results.

Python recursion stopped unexpected

Briefly speaking, the recursion stopped halfway though everything else is fine.
The recursion function is shown below (The entire code can be found here):
def DFS(graph, startNode = 0):
global nodesProcessed; global explored; global finishingTime
explored[startNode] = True
currentLeader = startNode
if startNode in graph:
for neighbor in graph[startNode]:
if not explored[neighbor]:
#checkpoint 1
DFS(graph, neighbor)
#checkpoint 2
else: return currentLeader
nodesProcessed += 1
finishingTime[startNode] = nodesProcessed
return currentLeader
The problem is that after a while's recursion, it stopped. The things that confused me are that:
The input is fixed, but where it stops is not fixed. However, it always stopped at around 7000 times of invoke;
All the failed recursions reaches checkpoint 1 but fails to reach checkpoint 2, and the recursion does not execute at all;
It doesn't reach that maximum recursion time at all, I've set the max recursion as sys.setrecursionlimit(10**6)
It runs pretty well on relatively small input (hundreds or thousands of nodes) but stuck on a large graph which has more than 800,000 nodes.
Which are driving me crazy, I can't see a reason why it doesn't work, no error, no stack overflow, just stopped, saying Press any key to continue ... as if it is finished. Anyone has a clue about what could possibly go wrong?
As documentation specifies:
The highest possible limit is platform-dependent. A user may need to
set the limit higher when they have a program that requires deep
recursion and a platform that supports a higher limit. This should be
done with care, because a too-high limit can lead to a crash.
There is a script to check that limit.
You will have to implement non recursive DFS.

Making code that is both iterative and recursive purely iterative

I have huge lists (>1,000,000 in most cases) that are partitioned into n x n grids. These lists are of some coordinates. I define a neighbouring relation between points - if they are within range of each other, they are 'neighbours' and put into a 'cluster'. We can add cells to these clusters, so cell A is a neighbour of cell B, B a neighbour of C, C is not a neighbour of A. A, B, and C would be in the same cluster.
Given that background, I have the following code that tries to assign points to clusters in Python 3.6:
for i in range(n):
for j in range(n):
tile = image[i][j]
while tile:
cell = tile.pop()
cluster = create_and_return_cluster(cell, image_clusters[i][j], (i, j))
clusterise(cell, cluster, (i, j), image, n, windows, image_clusters)
def clusterise(cell, cluster, tile_index, image, n, windows, image_clusters):
neighbouring_windows, neighbouring_indices = get_neighbouring_windows(tile_index[0], tile_index[1], n, windows)
neighbours = get_and_remove_all_neighbours(cell, image, tile_index, neighbouring_windows, neighbouring_indices)
if neighbours:
for neighbour, (n_i, n_j) in neighbours:
add_to_current_cluster(cluster, image_clusters, neighbour, (n_j, n_j))
clusterise(neighbour, cluster, (n_i, n_j), image, n, windows, image_clusters)
Because of the massive size of the lists, I've had issues with RecursionError and have been scouring the internet for suggestions on tail-recursion. The problem is that this algorithm needs to branch from neighbours to pick up neighbours of those neighbours, and so on. As you can imagine, this gets pretty big pretty quickly in terms of stack frames.
My question is: is it possible to make this algorithm use tail-recursion, or how would one go about making it tail-recursive? I know the cluster argument is essentially an accumulator in this case, but given how the lists shrink and the nasty for-loop in clusterise() I am not sure how to successfully convert to tail-recursion. Does anyone have any ideas? Ideally supported with an explanation.
NB: I am well aware that Python does not optimise tail-recursion by default, yes I am aware that other languages optimise tail-recursion. My question is whether it can be done in this case using Python. I don't want to change if I don't absolutely have to and much of my other code is already in Python.
Just use a queue or stack to track which neighbours to process next; the following function does exactly the same work as your recursive function, iteratively:
from collections import deque
def clusterise(cell, cluster, tile_index, image, n, windows, image_clusters):
to_process = deque([(cell, tile_index)])
while to_process:
cell, tile_index = to_process.pop()
neighbouring_windows, neighbouring_indices = get_neighbouring_windows(tile_index[0], tile_index[1], n, windows)
neighbours = get_and_remove_all_neighbours(cell, image, tile_index, neighbouring_windows, neighbouring_indices)
if neighbours:
for neighbour, (n_i, n_j) in neighbours:
add_to_current_cluster(cluster, image_clusters, neighbour, (n_j, n_j))
to_process.append((neighbour, (n_i, n_j))
So instead of using the Python stack to track what still needs to be processed, we move the varying arguments (cell and tile_index) to a deque stack managed by the function instead, which isn't bound like the Python stack is. You can also use it as a queue (pop from the beginning instead of the end with to_process.popleft()) for a breadth-first processing order. Note that your recursive solution process cells depth-first.
As a side note: yes, you can use a regular Python list as a stack too, but due to the nature of how a list object is grown and shrunk dynamically, for a stack the deque linked-list implementation is more efficient. And it's easier to switch between a stack and a queue this way. See Raymond Hettinger's remarks on deque performance.

queue.get() doesn't remove item from queue

I recently started programming in Python (3.5) and I am trying to solve a simple Breadth first search problem in Python (see code)
import queue
import networkx as nx
def bfs(graph, start, target):
frontier = queue.Queue()
frontier.put(start)
explored = list()
while not frontier.empty():
state = frontier.get()
explored.append(state)
print(explored)
if state == target:
return 'success'
print(graph.neighbors(state))
for neighbor in graph.neighbors(state):
if neighbor not in explored:
frontier.put(state)
return 'Failure to find path'
The code returns an infinite loop where it seems that frontier.get() does not delete the item from the queue. This makes the while loop infinite, as the first value in the queue is always the start node defined in the function input. The variable state is in each while loop the same (always the start node).
What am I doing wrong? As from what I understood the queue should move from the start node to the neighbours of the start node and therefore a loop should not occur.
Two things. First, I assume everything from the while on down ought to be indented by one level.
If I'm reading your algorithm correctly, I believe the error is on the last line before the return. You have:
frontier.put(state)
which just inserts the node you were already looking at. I think what you should be doing instead is:
frontier.put(neighbor)
so that you explore all the immediate neighbors of state. Otherwise you just keep looking at the start node over and over.
Because you're putting the state value in the queue again. Change this:
for neighbor in graph.neighbors(state):
if neighbor not in explored:
frontier.put(state) # Here you put the 'state' back!
to this:
for neighbor in graph.neighbors(state):
if neighbor not in explored:
frontier.put(neighbor) # Put in the neighbours instead.

How to try all possible paths?

I need to try all possible paths, branching every time I hit a certain point. There are <128 possible paths for this problem, so no need to worry about exponential scaling.
I have a player that can take steps through a field. The player
takes a step, and on a step there could be an encounter.
There are two options when an encounter is found: i) Input 'B' or ii) Input 'G'.
I would like to try both and continue repeating this until the end of the field is reached. The end goal is to have tried all possibilities.
Here is the template, in Python, for what I am talking about (Step object returns the next step using next()):
from row_maker_inlined import Step
def main():
initial_stats = {'n':1,'step':250,'o':13,'i':113,'dng':0,'inp':'Empty'}
player = Step(initial_stats)
end_of_field = 128
# Walk until reaching an encounter:
while player.step['n'] < end_of_field:
player.next()
if player.step['enc']:
print 'An encounter has been reached.'
# Perform an input on an encounter step:
player.input = 'B'
# Make a branch of player?
# perform this on the branch:
# player.input = 'G'
# Keep doing this, and branching on each encounter, until the end is reached.
As you can see, the problem is rather simple. Just I have no idea, as a beginner programmer, how to solve such a problem.
I believe I may need to use recursion in order to keep branching. But I really just do not understand how one 'makes a branch' using recursion, or anything else.
What kind of solution should I be looking at?
You should be looking at search algorithms like breath first search (BFS) and depth first search (DFS).
Wikipedia has this as the pseudo-code implementation of BFS:
procedure BFS(G, v) is
let Q be a queue
Q.enqueue(v)
label v as discovered
while Q is not empty
v← Q.dequeue()
for all edges from v to w in G.adjacentEdges(v) do
if w is not labeled as discovered
Q.enqueue(w)
label w as discovered
Essentially, when you reach an "encounter" you want to add this point to your queue at the end. Then you pick your FIRST element off of the queue and explore it, putting all its children into the queue, and so on. It's a non-recursive solution that is simple enough to do what you want.
DFS is similar but instead of picking the FIRST element form the queue, you pick the last. This makes it so that you explore a single path all the way to a dead end before coming back to explore another.
Good luck!

Categories