If you are incrementally designing a function that could have variable number of outputs, what is the best way to design that function? E.g.
def function(input):
return output1, output2, ...
or
def function(input):
return dict(output1=...)
In both cases, you need a bunch of if statements to sort through and utilize the outputs; the difference is where the if statements are used (within function or outside of the function over the dictionary). I am not sure what principle to use to decide on what to do.
If you need to return multiple things, it means the function is either complex and you should break it down, or you need an object with attributes that is "processed" in the function. A dict is a standard object, but you can also create your own, depends if you want to go more the OOP way or the functional/procedural way
Related
Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.
I need to create an algorithm which is executed as a sequence of functions chosen at run-time, a bit like the strategy pattern. I want to avoid creating many single method classes (100+), but use simple functions, and assemble them using a decorator.
actions = []
def act(specific_fn):
actions.append(specific_fn)
return specific_fn
#act
def specific_fn1():
print("specific_fn1")
#act
def specific_fn2():
print("specific_fn2")
def execute_strategy():
[f() for f in actions]
I have a couple of questions:
How can I modify the decorator function act to take the list actions as a parameter, so that it adds the decorated function into the list?
How do I use specific_fnX defined in another file? Currently, I just import the file inside the calling function - but that seems odd. Any other options?
Also, any other ideas on implementing this pattern?
This is a pretty good tutorial on using decorators, both with and without arguments. Others may know of others.
The trick is to remember that a decorator is a function call, but when applied to a function definition, it magically replaces the function with the returned result. If you want #my_decorator(my_list) to be a decorator, then my_decorator(my_list) must either return a function or an object with a __call__ method, and that function is then called on called on specific_fn1.
So yes, you need a function that returns a function that returns a function. Looking at some examples in a tutorial will make this clearer.
As to your second question, you can just call my_decorator(my_list)(specific_fn1) without using a decorator, and then ignorning the result.
I've only recently learned about decorators, and despite reading nearly every search result I can find about this question, I cannot figure this out. All I want to do is define some function "calc(x,y)", and wrap its result with a series of external functions, without changing anything inside of my function, nor its calls in the script, such as:
#tan
#sqrt
def calc(x,y):
return (x+y)
### calc(x,y) = tan(sqrt(calc(x,y))
### Goal is to have every call of calc in the script automatically nest like that.
After reading about decorators for almost 10 hours yesterday, I got the strong impression this is what they were used for. I do understand that there are various ways to modify how the functions are passed to one another, but I can't find any obvious guide on how to achieve this. I read that maybe functools wraps can be used for this purpose, but I cannot figure that out either.
Most of the desire here is to be able to quickly and easily test how different functions modify the results of others, without having to tediously wrap functions between parenthesis... That is, to avoid having to mess with parenthesis at all, having my modifier test functions defined on their own lines.
A decorator is simply a function that takes a function and returns another function.
def tan(f):
import math
def g(x,y):
return math.tan(f(x,y))
return g
I need to assign a numeric value, which is returned from a function, to a variable name by using exec() command:
def func1(x,y):
return x+y
def main(x,y,n):
x=3; y=5; n=1
t = exec("func%s(%s,%s)" % (n,x,y))
return t**2
main(3,5,1)
I have many functions like "func1", "func2" and so on... I try to return t = func1(x,y) with theexec("func%s(%s,%s)" % (n,x,y)) statement. However, I cannot assign a value, returned fromexec().
There is a partly similar question on SE, but it is written for Python3 and is also not applicable to my case.
How can we resolve this problem or is there a more elegant way to perform such an operation, maybe without the use of'exec()'?
By the way, as "func%s(%s,%s)" % (n,x,y) is a statement, I used exec. Or should I better use eval?
It is almost always a really bad idea to get at functions and variables using their names as strings (or bits of their names as integers, as in the example code in the question). One reason why is that eval or exec can do literally anything and you generally want to avoid using code constructs whose behaviour is so hard to predict and reason about.
So here are two ways to get similar results with less pain.
First: Instead of passing around magic index numbers, like the 1 in the code above, pass the actual function which is, itself, a perfectly reasonable value for a variable in Python.
def main(x,y,f):
return f(x,y)**2
main(3, 5, func1)
(Incidentally, the definition of main in the question throws away the values of x,y,n that are passed in to it. That's probably a mistake.)
Second: Instead of making these mere functions, make them methods on classes, and pass around not the functions but either the classes themselves or instances of the classes.
class Solver:
def solve(self, x,y): return "eeeek, not implemented"
class Solver1:
def solve(self, x,y): return x+y
def main(x, y, obj):
return obj.solve(x,y)**2
main(3, 5, Solver1())
Which of these is the better approach depends on the details of your application. For instance, if you actually have multiple "parallel" sets of functions -- as well as your func1, func2, etc., there are also otherfunc1, otherfunc2 etc. -- then this is crying out for an object-oriented solution (the second approach above).
In general I would argue for the second approach; it tends to lead to cleaner code as your requirements grow.
I am pretty bad at stating my problem clearly. Sorry.
Basically, I have many view functions whose functionalities are very similar. Part of it is using reverse. However, each of those view functions execute different reverse, so I cannot write them one by one in my new "generic view". That's insane.
At the time, I am trying to reduce the amount of duplicated codes I am writing (that's over 500 lines of duplication!!!!!)
To solve this problem, I have a few helper functions, one of which is to evaluate reverse on whatever view function is given and whatever args are passed to the helper function.
def render_reverse(f, args):
return eval(...)
But eval is evil, and is slow. Any substitute for eval? A better approach to solve this in Django?
Thanks.
Why do you need to eval at all in the first place? Just call reverse() normally?
return reverse(f, *args)
The * lets you unpack the argument list into actual args.
That said, why do you need this helper at all? Why not just put return reverse(... in your view?