Python structuring 2 functions with same dependencies - python

Issue: I have 2 functions that both require the same nested functions to operate so they're currently copy-pasted into each function. These functions cannot be combined as the second function relies on calling the first function twice. Unnesting the functions would result in the addition of too many parameters.
Question: Is it better to run the nested functions in the first function and append their values to an object to be fed into the 2nd function, or is it better to copy and paste the nested functions?
Example:
def func_A(thing):
def sub_func_A(thing):
thing += 1
return sub_func_A(thing)
def func_B(thing):
def sub_func_B(thing):
thing += 1
val_A, val_B = func_A(5), func_A(5)
return sub_func_B(val_A), sub_func_B(val_B)
Imagine these functions couldn't be combined and the nested function relied on so many parameters that moving it outside and calling it would be too cluttered

The "better option" depends on a few factors -:
The type of optimization you want to achieve.
The time taken by the functions to execute.
If the type of optimization to be achieved here is based on the time taken to execute the second function in the two cases, then it depends on the time taken for the nested function to fully execute, if that time is less than the time taken to store it's output when it's first called by the first function then its better copy pasting them.
While, if the time taken by the nested function to execute is more than the time taken to store it's output, then its a better option to execute it first time and then store it's output for future use.
Further, As mentioned by #DarylG in the comments, a class based approach can also be used wherein the nested function(subfunction) can be a private function(only accessible by the class's inner components), while the two functions(func_A and func_B) can be public thus allowing them to be used and accessed widely from the outside as well. If implemented in code it might look something like this -:
class MyClass() :
def __init__(self, ...) :
...
return
def __subfunc(self, thing) :
# PRIVATE SUBFUNC
thing += 1
return thing
def func_A(self, thing):
# PUBLIC FUNC A
return self.__subfunc(thing)
def func_B(self, thing):
# PUBLIC FUNC B
val_A, val_B = self.func_A(5), self.func_A(5)
return self.__subfunc(val_A), self.__subfunc(val_B)

Related

Best practice to match callable function

I'd like to have a set of functions which can be called upon specific input types. I provide a brief example, let's say I have the following description in a JSON
{
"type": "location",
"precision": 100
}
and I have 2 functions such as
fun1(type,param) # Here param is intended as the precision
fun2(type,param) # Here param is intended as another variable
however, I want to be able to match the description only with fun1 which has the correct type and param, although the python type of param can be the same for both function, however with a different meaning. Moreover, there can be multiple param to check.
Has python something handy to handle this?
Let's suppose, you have already loaded your functions to dict in Python.
There are many approaches, how to do the job, so I will write only few of them down here and demonstrate on only on few of them.
Function decorators to verify, whether the dictionary contains the right variable before calling it. -- This approach is by my opinion best for short scripts.
If else chain with your types -- I think, this approach is the best for long term maintenance.
Check in the beginning of function whether you want to run it. -- If you don't care about anything and want a short code to run in shortest possible time.
Map from type to correct function -- This approach is for good performance
Demonstration of first approach
First, we have to make a function generating decorators.
def dec_gen(the_type: str):
def dec(func):
def inner(d: dict):
if d.get('type') == the_type:
func(d)
return inner
return dec
Let's change fun1 a little bit.
#dec_gen('location')
def fun1(d: dict):
...your code....
Demonstration of third approach
Let's change fun1 a little bit (again)
def fun1(d: dict):
if d.get('type') == 'location':
...your code...
If you write such header for all fun1, fun2,..., funn, you can just pass the dictionary and it will be run only on few of them.
Of course, this one can get terribly slow for many different types and large N, but there is no requirement on speed in your question.
Demonstration of forth approach
See the other answer.
The easiest way is probably to use a dictionary for the mapping and (optional) associate every function with an appropriate attribute to keep track:
# untested
def func1(data, param):
pass
# do something
func1.type = "location"
def func2(data, param):
pass
# do something
func2.type = "something_else"
funcs = [func1, func2]
type_func_map = {func.type: func for func in funcs}
# apply the function to data:
def apply_matching_func(data, param):
func = type_func_map.get(data["type"])
if func:
return func(data, param)

when do we initialise a function call within the function vs as an argument?

I have a question about arguments in functions, in particular initialising an array or other data structure within the function call, like the following:
def helper(root, result = []):
...
My question is, what is the difference between the above vs. doing:
def helper(root):
result = []
I can see why this would be necessary if we were to run recursions, i.e. we would need to use the first case in some instances.
But are there any other instances, and am I right in saying it is necessary in some cases for recursion, or can we always use the latter instead?
Thanks
Python uses pointers for lists, so initializing a list or any other mutable objects in function definition is a bad idea.
The best way of doing it is like this:
def helper(root, result=None):
if isinstance(result, type(None)):
result = []
Now if you only pass one argument to the function, the "result" will be an empty list.
If you initiate the list within the function definition, by calling the function multiple times, "result" won't reset and it will keep the values from previous calls.

Python: Redefine function so that it references its own self

Say I have got some function fun, the actual code body of which is out of my control. I can create a new function which does some preprocessing before calling fun, i.e.
def process(x):
x += 1
return fun(x)
If I now want process to take the place of fun for all future calls to fun, I need to do something like
# Does not work
fun = process
This does not work however, as this creates a cyclic reference problem as now fun is called from within the body of fun. One solution I have found is to reference a copy of fun inside of process, like so:
# Works
import copy
fun_cp = copy.copy(fun)
def process(x):
x += 1
return fun_cp(x)
fun = process
but this solution bothers me as I don't really know how Python constructs a copy of a function. I guess my problem is identical to that of extending a class method using inheritance and the super function, but here I have no class.
How can I do this properly? I would think that this is a common enough task that some more or less idiomatic solution should exist, but I have had no luck finding it.
Python is not constructing a copy of your function. copy.copy(fun) just returns fun; the difference is that you saved that to the fun_cp variable, a different variable from the one you saved process to, so it's still in fun_cp when process tries to look for it.
I'd do something similar to what you did, saving the original function to a different variable, just without the "copy":
original_fun = fun
def fun(x):
x += 1
return original_fun(x)
If you want to apply the same wrapping to multiple functions, defining a decorator and doing fun = decorate(fun) is more reusable, but for a one-off, it's more work than necessary and an extra level of indentation.
This looks like a use case for python's closures. Have a function return your function.
def getprocess(f):
def process(x):
x += 1
return f(x) # f is referenced from the enclosing scope.
return process
myprocess = getprocess(fun)
myprocess = getprocess(myprocess)
Credit to coldspeed for the idea of using a closure. A fully working and polished solution is
import functools
def getprocess(f):
#functools.wraps(f)
def process(x):
x += 1
return f(x)
return process
fun = getprocess(fun)
Note that this is 100% equivalent to applying a decorator (getprocess) to fun. I couldn't come up with this solution as the dedicated decorator syntax #getprocess can only be used at the definition place of the function (here fun). To apply it on an existing function though, just do fun = getprocess(fun).

Is it good to use inner function in a python function to make the logic clear?

Below is the basic logic for function foo:
def foo(item_lst):
val_in_foo_scope = 1
for item in item_lst:
# some logic to deal with item
# val_in_foo_scope used
pass
return 0
The logic in the loop can be very complex, in order to make the code more clear, I want to split the logic with a separate function.
With inner function:
def foo(item_lst):
val_in_foo_scope = 1
def some_logic(item):
# val_in_foo_scope used
pass
for item in item_lst:
some_logic(item)
return 0
With outer function:
def some_logic(item, val):
# val used
pass
def foo(item_lst):
val_in_foo_scope = 1
for item in item_lst:
some_logic(item, val_in_foo_scope)
return 0
The inner function version
val_in_foo_scope can be used directly -- good
we can easily know that the some_logic is relevant with foo, actually only be used in function foo -- good
each time function foo is called, a new inner function will be created -- not so good
The outer function version
val_in_foo_scope can not be used directly -- not so good
we can not see the relevance between some_logic and foo directly -- not so good
some_logic will be created one time -- good
there will be so many functions in the global namespace -- not so good
So, which solution is better or is there any other solutions?
Factors below or any other factors you come up with can be considered:
val_in_foo_scope is used or not
whether the time cost to create inner function each time can be ignored
Use lambda if it's a simple function.
Use an inner function if it is complex and you don't want to make it "public".
Use a "private" method if you want to mark it hidden and uses members of the instance.
Use a method if you want to make it "public" and uses members of the instance.
Use a class method if it uses class members.
And lastly use a global function if it's general enough to be used by other classes/functions.
You forgot one point in your pro/cons list: testability. Keeping some_logic out of foo makes it testable in isolation, which is important if it's indeed a "complex" (hence very probably critical) function.
As a general rule, only use inner functions when you have both of those conditions: it's trivial stuff and passing the required context (the 'outer' function's context) would be a pain.
(nb: I'm of course not talking about using inner functions for closures - like in a decorator - here).

Structuring a program. Classes and functions in Python

I'm writing a program that uses genetic techniques to evolve equations.
I want to be able to submit the function 'mainfunc' to the Parallel Python 'submit' function.
The function 'mainfunc' calls two or three methods defined in the Utility class.
They instantiate other classes and call various methods.
I think what I want is all of it in one NAMESPACE.
So I've instantiated some (maybe it should be all) of the classes inside the function 'mainfunc'.
I call the Utility method 'generate()'. If we were to follow it's chain of execution
it would involve all of the classes and methods in the code.
Now, the equations are stored in a tree. Each time a tree is generated, mutated or cross
bred, the nodes need to be given a new key so they can be accessed from a dictionary attribute of the tree. The class 'KeySeq' generates these keys.
In Parallel Python, I'm going to send multiple instances of 'mainfunc' to the 'submit' function of PP. Each has to be able to access 'KeySeq'. It would be nice if they all accessed the same instance of KeySeq so that none of the nodes on the returned trees had the same key, but I could get around that if necessary.
So: my question is about stuffing EVERYTHING into mainfunc.
Thanks
(Edit) If I don't include everything in mainfunc, I have to try to tell PP about dependent functions, etc by passing various arguements in various places. I'm trying to avoid that.
(late Edit) if ks.next() is called inside the 'generate() function, it returns the error 'NameError: global name 'ks' is not defined'
class KeySeq:
"Iterator to produce sequential \
integers for keys in dict"
def __init__(self, data = 0):
self.data = data
def __iter__(self):
return self
def next(self):
self.data = self.data + 1
return self.data
class One:
'some code'
class Two:
'some code'
class Three:
'some code'
class Utilities:
def generate(x):
'___________'
def obfiscate(y):
'___________'
def ruminate(z):
'__________'
def mainfunc(z):
ks = KeySeq()
one = One()
two = Two()
three = Three()
utilities = Utilities()
list_of_interest = utilities.generate(5)
return list_of_interest
result = mainfunc(params)
It's fine to structure your program that way. A lot of command line utilities follow the same pattern:
#imports, utilities, other functions
def main(arg):
#...
if __name__ == '__main__':
import sys
main(sys.argv[1])
That way you can call the main function from another module by importing it, or you can run it from the command line.
If you want all of the instances of mainfunc to use the same KeySeq object, you can use the default parameter value trick:
def mainfunc(ks=KeySeq()):
key = ks.next()
As long as you don't actually pass in a value of ks, all calls to mainfunc will use the instance of KeySeq that was created when the function was defined.
Here's why, in case you don't know: A function is an object. It has attributes. One of its attributes is named func_defaults; it's a tuple containing the default values of all of the arguments in its signature that have defaults. When you call a function and don't provide a value for an argument that has a default, the function retrieves the value from func_defaults. So when you call mainfunc without providing a value for ks, it gets the KeySeq() instance out of the func_defaults tuple. Which, for that instance of mainfunc, is always the same KeySeq instance.
Now, you say that you're going to send "multiple instances of mainfunc to the submit function of PP." Do you really mean multiple instances? If so, the mechanism I'm describing won't work.
But it's tricky to create multiple instances of a function (and the code you've posted doesn't). For example, this function does return a new instance of g every time it's called:
>>> def f():
def g(x=[]):
return x
return g
>>> g1 = f()
>>> g2 = f()
>>> g1().append('a')
>>> g2().append('b')
>>> g1()
['a']
>>> g2()
['b']
If I call g() with no argument, it returns the default value (initially an empty list) from its func_defaults tuple. Since g1 and g2 are different instances of the g function, their default value for the x argument is also a different instance, which the above demonstrates.
If you'd like to make this more explicit than using a tricky side-effect of default values, here's another way to do it:
def mainfunc():
if not hasattr(mainfunc, "ks"):
setattr(mainfunc, "ks", KeySeq())
key = mainfunc.ks.next()
Finally, a super important point that the code you've posted overlooks: If you're going to be doing parallel processing on shared data, the code that touches that data needs to implement locking. Look at the callback.py example in the Parallel Python documentation and see how locking is used in the Sum class, and why.
Your concept of classes in Python is not sound I think. Perhaps, it would be a good idea to review the basics. This link will help.
Python Basics - Classes

Categories