I'm pretty bad with recursion as it is, but this algorithm naturally seems like it's best done recursively. Basically, I have a list of all the function calls made in a C program, throughout multiple files. This list is unordered. My recursive algorithm attempts to make a tree of all the functions called, starting from the main method.
This works perfectly fine for smaller programs, but when I tried it out with larger ones I'm getting this error. I read that the issue might be due to me exceeding the cstack limit? Since I already tried raising the recursion limit in python.
Would appreciate some help here, thanks.
functions = set containing a list of function calls and their info, type Function. The data in node is of type Function.
#dataclass
class Function:
name : str
file : str
id : int
calls : set
....
Here's the algorithm.
def order_functions(node, functions, defines):
calls = set()
# Checking if the called function is user-defined
for call in node.data.calls:
if call in defines:
calls.add(call)
node.data.calls = calls
if len(calls) == 0:
return node
for call in node.data.calls:
child = Node(next((f for f in functions if f.name == call), None))
node.add_child(child)
Parser.order_functions(child, functions, defines)
return node
If you exceed the predefined limit on the call stack size, the best idea probably is to rewrite an iterative version of your program. If you have no idea on how deeply your recursion will go, then don't use recursion.
More information here, and maybe if you need to implement an iterative version you can get inspiration from this post.
The main information here is that python doesn't perform any tail recursion elimination. Therefore recursive functions will never work with inputs that have an unknown/unbounded hierarchical structure.
Related
Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.
Lately I've been working with some recursive problems in Python where I have to generate a list of possible configurations (i.e list of permutations of a given string, list of substrings, etc..) using recursion. I'm having a very hard time in finding the best practice and also in understanding how to manage this sort of variable in recursion.
I'll give the example of the generate binary trees problem. I more-or-less know what I have to implement in the recursion:
If n=1, return just one node.
If n=3, return the only possible binary tree.
For n>3, crate one node and then explore the possibilities: left node is childless, right node is childless, neither node is childless. Explore these possibilites recursively.
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees. Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list, but then the problem is in case 3 when calling the recursive function to explore the possibilites for the nodes it would be returning a list and not appending nodes to a tree that I am building. When I picture the recursion tree in my head I imagine a "tree" variable that is unique to each of the tree leaves, and these trees are added to a list which is returned by the "root" (i.e first) call. But I don't know if that is possible. I thought of a global list and the recursive function not returning anything (just appending to it) but the problem I believe is that at each call the function would receive a copy of the variable.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better. I would also like to know if there is a "best practice" when it comes to that.
Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list
This is not the purest way to attack a recursive problem. It would be better if you can make the recursive function such that it solves the sub problem without an extra parameter variable that it must use. So the recursive function should just return a result as if it was the only call that was ever made (by the testing framework). So in the example, that recursive call should return a list with trees.
Alternatively the recursive function could be a sub-function that doesn't return a list, but yields the individual values (in this case: trees). The caller can then decide whether to pack that into a list or not. This is more pythonic.
As to the example problem, it is also important to identify some invariants. For instance, it is clear that there are no solutions when n is even. As to recursive aspect: once you have decided to create a root, then both its left and right sided subtree will have an odd number of nodes. Of course, this is an observation that is specific to this problem, but it is important to look for such problem properties.
Finally, it is equally important to see if the same sub problems can reoccur multiple times. This surely is the case in the example problem: for instance, the left subtree may sometimes have the same number of nodes as the right subtree. In such cases memoization will improve efficiency (dynamic programming).
When the recursive function returns a list, the caller can then iterate that list to retrieve its elements (trees in the example), and use them to build an extended result that satisfies the caller's task. In the example case that means that the tree taken from the recursively retrieved list, is appended as a child to a new root. Then this new tree is appended to a new list (not related to the one returned from the recursive call). This new list will in many cases be longer, although this depends on the type of problem.
To further illustrate the way to tackle these problems, here is a solution for the example problem: one which uses the main function for the recursive calls, and using memoization:
class Solution:
memo = { 1: [TreeNode()] }
def allPossibleFBT(self, n: int) -> List[Optional[TreeNode]]:
# If we didn't solve this problem before...
if n not in self.memo:
# Create a list for storing the results (the trees)
results = []
# Before creating any root node,
# decide the size of the left subtree.
# It must be odd
for num_left in range(1, n, 2):
# Make the recursive call to get all shapes of the
# left subtree
left_shapes = self.allPossibleFBT(num_left)
# The remainder of the nodes must be in the right subtree
num_right = n - 1 - num_left # The root also counts as 1
right_shapes = self.allPossibleFBT(num_right)
# Now iterate the results we got from recursion and
# combine them in all possible ways to create new trees
for left in left_shapes:
for right in right_shapes:
# We have a combination. Now create a new tree from it
# by putting a root node on top of the two subtrees:
tree = TreeNode(0, left, right)
# Append this possible shape to our results
results.append(tree)
# All done. Save this for later re-use
self.memo[n] = results
return self.memo[n]
This code can be made more compact using list comprehension, but it may make the code less readable.
Don't pass information into the recursive calls, unless they need that information to compute their local result. It's much easier to reason about recursion when you write without side effects. So instead of having the recursive call put its own results into a list, write the code so that the results from the recursive calls are used to create the return value.
Let's take a trivial example, converting a simple loop to recursion, and using it to accumulate a sequence of increasing integers.
def recursive_range(n):
if n == 0:
return []
return recursive_range(n - 1) + [n]
We are using functions in the natural way: we put information in with the arguments, and get information out using the return value (rather than mutation of the parameters).
In your case:
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees.
So you know that you want to return a list of trees at the end of the process. So the natural way to proceed, is that you expect each recursive call to do that, too.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better.
The recursive calls return their lists of results for the sub-problems. You use those results to create the list of results for the current problem.
You don't need to think about how recursion is implemented in order to write recursive algorithms. You don't need to think about the call stack. You do need to think about two things:
What are the base cases?
How does the problem break down recursively? (Alternately: why is recursion a good fit for this problem?)
The thing is, recursion is not special. Making the recursive call is just like calling any other function that would happen to give you the correct answer for the sub-problem. So all you need to do is understand how solving the sub-problems helps you to solve the current one.
The code I already have is for a bot that receives a mathematical expression and calculates it. Right now I have it doing multiply, divide, subtract and add. The problem though is I want to build support for parentheses and parentheses inside parentheses. For that to happen, I need to run the code I wrote for the expressions without parentheses for the expression inside the parentheses first. I was going to check for "(" and append the expression inside it to a list until it reaches a ")" unless it reaches another "(" first in which case I would create a list inside a list. I would subtract, multiply and divide and then the numbers that are left I just add together.
So is it possible to call a definition/function from within itself?
Yes, this is a fundamental programming technique called recursion, and it is often used in exactly the kind of parsing scenarios you describe.
Just be sure you have a base case, so that the recursion ends when you reach the bottom layer and you don't end up calling yourself infinitely.
(Also note the easter egg when you Google recursion: "Did you mean recursion?")
Yes, as #Daniel Roseman said, this is a fundamental programming technique called recursion.
Recursion should be used instead of iteration when you want to produce a cleaner solution compared to an iterative version. However, recursion is generally more expensive than iteration because it requires winding, or pushing new stack frames onto the call stack each time a recursive function is invoked -- these operations take up time and stack space, which can lead to an error called stack overflow if the stack frames consume all of the memory allocated for the call stack.
Here is an example of it in Python
def recur_factorial(n):
"""Function to return the factorial of a number using recursion"""
if n == 1:
return n
else:
return n*recur_factorial(n-1)
For more detail, visit the github gist that was used for this answer
yes it's possible in "python recursion"
and the best describe is: "A physical world example would be to place two parallel mirrors facing each other. Any object in between them would be reflected recursively"
A few lines of code that demonstrates what I'm asking are:
>>> x = ()
>>> for i in range(1000000):
... x = (x,)
>>> x.__hash__()
=============================== RESTART: Shell ===============================
The 1000000 may be excessive, but it demonstrates that there is some form of limit when hashing nested tuples (and I assume other objects). Just to clarify, I didn't restart the shell, it did that automatically when I attempted the hash.
What I want to know is what is this limit, why does it happen (and why did it not raise an error), and is there a way around it (so that I can put tuples like this into sets or dictionaries).
The __hash__ method of tuple calculates the hash of each item in the tuple - in your case like a recursive function. So if you have a deeply nested tuple then it ends up with a very deep recursion. At some point there is probably not enough memory on the stack to go "one level deeper". That's also why the "shell restarts" without Python exception - because the recusion is done in C (at least for CPython). You could use, i.e. gdb to get more information about the exception or debug it.
There will be no global hard limit, the limit depends on your system (e.g. how much stack) and how many function calls (internally) are involved and how much of the "stack" each function call requires.
However that could qualify as Bug in the implementation, so it would be a good idea to post that on the Python issue tracker: CPython issue tracker.
So I have a time-critical section of code within a Python script, and I decided to write a Cython module (with one function -- all I need) to replace it. Unfortunately, the execution speed of the function I'm calling from the Cython module (which I'm calling within my Python script) isn't nearly as fast as I tested it to be in a variety of other scenarios. Note that I CANNOT share the code itself because of contract law! See the following cases, and take them as an initial description of my issue:
(1) Execute Cython function by using the Python interpreter to import the module and run the function. Runs relatively quickly (~0.04 sec on ~100 separate tests, versus original ~0.24 secs).
(2) Call Cython function within Python script at 'global' level (i.e. not inside any function). Same speed as case (1).
(3) Call Cython function within Python script, with Cython function inside my Python script's main function; tested with the Cython function in global and local namespaces, all with the same speed as case (1).
(4) Same as (3), but inside a simple for-loop within said Python function. Same speed as case (1).
(5) problem! Same as (4), but inside yet another for-loop: Cython function's execution time (whether called globally or locally) balloons to ~10 times that of the other cases, and this is where I need the function to get called. Nothing odd to report about this loop, and I tested all of the components of this loop (adjusting/removing what I could). I also tried using a 'while' loop for giggles, to no avail.
"One thing I've yet to try is making this inner-most loop a function and going from there." EDIT: Just tried this- no luck.
Thanks for any suggestions you have- I deeply regret not being able to share my code...it hurts my soul a little, but my client just can't have this code floating around. Let me know if there is any other information that I can provide!
-The Real Problem and an Initial (ugly) Solution-
It turns out that the best hint in this scenario was the obvious one (as usual): it wasn't the for-loop that was causing the problem; why would it? After a few more tests, it became obvious that something about the way I was calling my Cython function was wrong, because I could call it elsewhere (using an input variable different from the one going to the 'real' Cython function) without the performance loss issue.
The underlying issue: data types. I wrote my Cython function to expect a list full of standard floats. Unfortunately, my code did this:
function_input = list(numpy_array_containing_npfloat64_data) # yuck.
type(function_input[0]) = numpy.float64
output = Cython_Function(function_input)
inside the Cython function:
def Cython_Function(list function_input):
cdef many_vars
"""process lots of vars expecting C floats""" # Slowness from converting numpy.float64's --> floats???
type(output) = list
return output
I'm aware that I can play around more with types in the Cython function, which I very well may do to prevent having to 'list' an existing numpy array. Anyway, here is my current solution:
function_input = [float(x) for x in function_input]
I welcome any feedback and suggestions for improvement. The function_input numpy array doesn't really need the precision of numpy.float64, but it does get used a few times before getting passed to my Cython function.
It could be that, while individually, each function call with the Cython implementation is faster than its corresponding Python function, there is more overhead in the Cython function call because it has to look up the name in the module namespace. You can try assigning the function to a local callable first, for example:
from module import function
def main():
my_func = functon
for i in sequence:
my_func()
If possible, you should try to include the loops within the Cython function, which would reduce the overhead of a Python loop to the (very minimal) overhead of a compiled C loop. I understand that it might not be possible (i.e. need references from a global/larger scope), but it's worth some investigation on your part. Good luck!
function_input = list(numpy_array_containing_npfloat64_data)
def Cython_Function(list function_input):
cdef many_vars
I think the problem is in using the numpy array as a list ... can't you use the np.ndarray as input to the Cython function?
def Cython_Function(np.ndarray[dtype=np.float64] input):
....