What are the conventions for ordering parameters in Python? - python

What are the conventions for ordering parameters in Python? For instance,
def plot_graph(G, filename, ...)
# OR
def plot_graph(filename, G, ...)
There is no discussion in PEP 0008 -- Style Guide for Python Code | Python.org
Excerpt from the answer of Conventions for order of parameters in a function,
If a language allows passing a hash/map/associative array as a single parameter, try to opt for passing that. This is especially useful for methods with >=3 parameters, ESPECIALLY when those same parameters will be passed to nested function calls.
Is it extreme to convert each parameter into a key-value pair, like def plot_graph(graph=None, filename=None, ...)?

There's really no convention for ordering function parameters, except a limitation that positional non-default parameters must go before parameters with defaults and only then keyword parameters, i.e. def func(pos_1, pos_n, pos_1_w_default='default_val', pos_n_w_default='default_val', *args, kw_1, kw_n, kw_1_w_default='default_val', kw_n_w_default='default_val', **kwargs).
Usually you define parameters order logically based on their meaning for the function, e.g. if you define a function that does subtraction, it's logical, that minuend should be the first parameter and subtrahend should be second. In this case reverse order is possible, but it's not logical.
Also, if you consider that your function might be used partially, that might affect your decision on parameter ordering.
Most things you need to know about function parameters are in the official tutorial.
P.S. Regarding your particular example with graph function... Considering your function name, it is used for displaying a graph, so a graph must be provided as argument, otherwise there's nothing to display, so making graph=None by default doesn't make much sense.

It is not extreme to use only keyword arguments. I have seen that in many codebases. This allows you to extend functionalities (by adding new keyword arguments to your functions) without breaking your previous code. It can be slightly more tedious to use, but definitely easier to maintain and to extend.
Also have a look at PEP 3102 -- Keyword-Only Arguments, which is a way to force the use of keyword arguments in python 3.

Related

Is there a way for a caller of multiple functions to forward a function ref to selected functions in a purely functional way?

Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.

Input arguments in Python functions

I'm new with Python language and a I'm a little bit frustrated.
Till today, I thought that passing parameter names in a function call was not mandatory. For example, if you have the following function:
def computeRectangleArea(width=7, height=8):
return width * height
I thought that you can call like this computeRectangleArea(width=7,height=8) only to make clearer the meaning of the parameters, but actually keywords of input arguments were not needed, so you can call the same function in this way also: computeRectangleArea(7, 8)
Today, while using openpyxl.styles.PatternFill(), I realized that fill_type keyword is a necessary when calling this function.
Suppose that you call the function in this way: openpyxl.styles.PatternFill('FFFFFF','FFFFFF','solid'), then the interpretation of the input parameter will be wrong.
I have some experience with OOP language (Java, C#) and these thing doesn't exist there.
It seems an inconsistent behaviour to me that some parameter names (like start_color and end_color in the example above) are optional, while others (like fill_type) must be specified before their values.
Can someone explain me why this apparently strange policy? In addition, I will be glad if someone can point me out some useful resource to understand the way it is implemented.
Positional and keyword parameters work just as they do in the languages you know better. You need to go to the documentation of the method you're using and look at the signature. For creating a PatternFill object, go to the class's __init__ method.
class PatternFill(Fill):
def __init__(self, patternType=None, fgColor=Color(), bgColor=Color(),
fill_type=None, start_color=None, end_color=None):
You may specify arguments without the keyword as long as you supply them all in order, without skipping any. For instance, your failing call can be legally given as:
PatternFill(None, 'FFFFFF', 'FFFFFF', 'solid')
These will match the first four parameters. Any time you supply an argument out of order, then you must supply the keyword for that argument and all later arguments in that invocation. For instance, with the above call, if you want to let they style default to None, then you must supply the keywords for the three arguments you do supply. If you simply omit the None, then the parser still tries to match them up sequentially from the front:
patternType <= 'FFFFFF'
fgColor <= 'FFFFFF'
bgColor <= 'solid'
... and your call fails to pass parsing.
Does that clear things up a little?
Can someone explain me why we need this "headache"…
For your specific example, it doesn't appear that there are any keyword-only parameters. Rather, you're trying to pass arguments for the first, second, and fourth parameters, without having to pass an argument for the one in between that you don't care about.
In other words, it's not a headache at all. It's a convenience (and sanity check) you could quite easily ignore—but probably don't want to.
Instead of this:
PatternFill('FFFFFF', 'FFFFFF', fill_type='solid')
… you could write this:
PatternFill('FFFFFF', 'FFFFFF', Color(), 'solid')
… but in order to know that's what you'd need to send, you need to read the source or docs to see the whole parameter list, and see what the default values are for the parameters you want to skip over, and explicitly add them to your call.
I doubt anyone would find that better.
Also, as multiple people pointed out in comments, this is pretty much exactly how named arguments work in C#.
And this class is, accidentally, a great example of why Python actually does allow keyword-only parameters, even though they aren't being used here.
The fact that you can write PatternFill('FFFFFF', 'FFFFFF', 'solid') and not get a TypeError for bad arguments to PatternFill, but instead a mysterious error about 'solid' not working as a color, is hardly a good thing. And (at least without type hinting annotations, which this type doesn't have) there's no way your IDE or any other tool could catch that mistake.
And, in fact, by not using keywords, you've even gotten the initial arguments wrong, without realizing it. You almost certainly wanted to do this:
PatternFile(None, 'FFFFFF', 'FFFFFF')
… but you got away with this without a visible error:
PatternFile('FFFFFF', 'FFFFFF')
… which means you're passing your foreground color as a pattern type and your background color as a foreground color and leaving the default background color.
That could be solved by making all or most parameters keyword-only. But without keyword-only params, the only option would be **kwargs, and that tradeoff is usually not worth it.
Quoting from the Rationale of PEP 3102, the proposal that added keyword-only parameters to the language:
There are often cases where it is desirable for a function to take a variable number of arguments. The Python language supports this using the 'varargs' syntax (*name), which specifies that any 'left over' arguments be passed into the varargs parameter as a tuple.
One limitation on this is that currently, all of the regular argument slots must be filled before the vararg slot can be.
This is not always desirable. One can easily envision a function which takes a variable number of arguments, but also takes one or more 'options' in the form of keyword arguments. Currently, the only way to do this is to define both a varargs argument, and a 'keywords' argument (**kwargs), and then manually extract the desired keywords from the dictionary.
If it isn't obvious why using *args and **kwargs isn't good enough:
The actual signature of the function is not visible when looking at the function definition in the source, or the inline help, or auto-generated docs.
The signature is also not available to dynamic reflective code using the inspect module or similar.
The signature is also not available to static reflective code—like that used by many IDEs to do completion and suggestions.
The implementation of the function is less clear, because at best it's half boilerplate for extracting and testing the parameters, and at worst the args and kwargs access are scattered throughout the body of the function.
For an example of what this feature allows, consider the builtin print function, which you can call like this:
print(x, y, z, sep=', ')
This works because print is defined like this:
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False):
If it weren't for keyword arguments, there'd be no way to pass that sep as something different from the actual values to print.
You could force the user to pass all of the objects in a tuple instead of as separate arguments, but that would be a lot less friendly—and even if you did that, there'd be no way to pass flush without passing values for all of sep, end, and file.
And, even with keyword arguments, if it weren't for keyword-only parameters, the function signature would have to look like this:
print(*objects, **kwargs):
… which would make it a lot harder to figure out what keyword arguments you could pass.

What is the pythonic way to pass arguments to functions?

I'm working on a project that almost everywhere arguments are passed by key. There are functions with positional params only, with keyword (default value) params or mix of both. For example the following function:
def complete_task(activity_task, message=None, data=None):
pass
This function in the current code would be called like this:
complete_task(activity_task=activity_task, message="My massage", data=task_data)
For me there is no point to name arguments whose name is obvious by the context of the function execution / by the variable names. I would call it like this:
complete_task(activity_task, "My message", task_data)
In certain cases where it's not clear what the a call argument is from the context, or inferred from the variable names, I might do:
complete_task(activity_task, message="success", task_data=json_dump)
So this got me wondering if there is a convention or "pythonic" way to call functions with positional/keyword params, when there is no need to rearrange method arguments or use default values for some of the keyword params.
The usual rules of thumb I follow are:
Booleans, particularly boolean literals, should always be passed by keyword unless it is really obvious what they mean. This is important enough that I will often make booleans keyword-only when writing my own functions. If you have a boolean parameter, your function may want to be split into two smaller functions, particularly if it takes the overall structure of if boolean_parameter: do_something(); else: do_something_entirely_different().
If a function takes a lot of optional parameters (more than ~3 including required parameters), then the optionals should usually be passed by keyword. But if you have a lot of parameters, your function may want to be refactored into multiple smaller functions.
If a function takes multiple parameters of the same type, they probably want to be passed as keyword arguments unless order is completely obvious from context (e.g. src comes before dest).
Most of the time, keyword arguments are not wrong. If you have a case where positional arguments are confusing, you should use keyword arguments without a second thought. With the possible exception of simple one parameter functions, keyword arguments will not make your code any harder to read.
Python has 2 types of arguments1. positional and keyword (aka default). The waters get a little muddy because positional arguments can be called by keyword and keyword arguments can be called by position...
def foo(a, b=1):
print(a, b)
foo(1, 2)
foo(a=1, b=2)
With that said, I think that the names of the types of arguments should indicate how you should (typically) use them. Most of the time, I see positional arguments called by position and keyword arguments called by keyword. So, if you're looking for a general rule of thumb, I'd advise that you make the function call mimic the signature. In the case of our above foo function, I'd call it like this:
foo(1, b=2)
I think that one reason to follow this advice is because (most of the time), people expect keyword arguments to be passed via keyword. So it isn't uncommon for someone to later add a keyword:
def foo(a, aa='1', b=2):
print(a, aa, b)
If you were calling the function using only positional arguments, you'd now be passing a value to a different parameter than you were before. However, keyword arguments don't care what order you pass them, so you should still be all set.
So far, so good. But what rules should you use when you're creating a function? How do you know whether to make an argument a default argument or a positional argument? That's a reasonable question -- And it's hard to find a good rule of thumb. The rules of thumb I use are as follows:
Be consistent with the rest of the project -- It's hard to get it right if you're doing something different than the rest of the surrounding code.
Make an argument a default argument if (and only if) it is possible to supply a reasonable default. If the function will fail if the user doesn't supply a particular argument (because there is no good default), then it should be positional.
1Python3.x also has keyword only arguments. Those don't give you a choice, so I don't know that they add too much to the discussion here :-) -- Though I don't know that I've seen their use out in the wild too much.

Why Python’s function call semantics pass-in keyword arguments are not ordered?

Using the double star syntax in function definition, we obtain a regular dictionary. The problem is that it loose the user input order. Sometimes, we could want to know in which order keyword arguments where passed to the function.
Since usually a function call do not involved many arguments, I don't think it is a problem of performance so I wonder why the default is not to maintain the order.
I know we can use:
from collections import Ordereddict
def my_func(kwargs):
print kwargs
my_func(Ordereddict(a=1, b=42))
But it is less concise than:
def my_func(**kwargs):
print kwargs
my_func(a=1, b=42)
[EDIT 1]:
1) I thought there where 2 cases:
I need to know the order, this behaviour is known by the user through the documentation.
I do not need the order, so I do not care if it is ordered or not.
I did not thought that even if the user know it use the order, he could use:
a = dict(a=1, b=42)
my_func(**a)
Because he did not know that a dict is not ordered (even if he should know)
2) I thought that the overhead would not be huge in case of a few arguments, so the benefits of having a new possibility to manage arguments would be superior to this downside.
But it seems (from Joe's answer) that the overhead is not negligible.
[EDIT 2]:
It seems that the PEP 0468 -- Preserving the order of **kwargs in a function is going in this direction.
Because dictionaries are not ordered by definition. I think it really is that simple. The point of kwargs is to take care of exactly those formal parameters which are not ordered. If you did know the order then you could receive them as 'normal' parameters or *args.
Here is a dictionary definition.
CPython implementation detail: Keys and values are listed in an
arbitrary order which is non-random, varies across Python
implementations, and depends on the dictionary’s history of insertions
and deletions.
http://docs.python.org/2/library/stdtypes.html#dict
Python's dictionaries are central to the way the whole language works, so they are highly optimised. Adding ordering would impact performance and require more storage and processing overhead.
You may have a case where that's not true, but I think that's more exceptional than common. Adding a feature 'just in case' for a very hot code path is not a sensible design decision.
EDIT:
Just FYI
>>> timeit.timeit(stmt="z = dict(x)", setup='x = ((("one", "two"), ("three", "four"), ("five", "six")))', number=1000000)
1.6569631099700928
>>> timeit.timeit(stmt="z = OrderedDict(x)", setup='from collections import OrderedDict; x = ((("one", "two"), ("three", "four"), ("five", "six")))', number=1000000)
31.618864059448242
That's about a 30x speed difference in constructing a smallish 'normal' size dictionary. OrderedDict is part of the standard library, so I don't imagine there's much more performance that can be squeezed out of it.
As a counter-argument, here is an example of the complicated semantics this would cause. There are a couple of cases here:
The function always gets an unordered dictionary.
The function always gets an ordered dictionary - given this, we don't know if the order has any meaning, as if the user passes in an unordered data structure, the order will be arbitrary, while the data type implies order.
The function gets whatever is passed in - this seems ideal, but it's not that simple.
What about the case of some_func(a=1, b=2, **unordered_dict)? There is implicit ordering in the original keyword arguments, but then the dict is unordered. There is no clear choice here between ordered or not.
Given this, I'd say that ordering the keyword arguments wouldn't be useful, as it would be impossible to tell if the order is just an arbitrary one. This would cloud the semantics of function calling.
Given that, any benefit gained by making this a part of calling is lost - instead, just expect an OrderedDict as an argument.
If your function's arguments are so correlated that both name and order matter, consider using a specific data structure or define a class to hold them. Chances are, you'll want them together in other places in your code, and possibly define other functions/methods that use them.
Retrieving the order of key-word arguments passed via **kwargs would be extremely useful in the particular project I am working on. It is about making a kind of n-d numpy array with meaningful dimensions (right now called dimarray), particularly useful for geophysical data handling.
I have posted a developed question with examples here:
How to retrieve the original order of key-word arguments passed to a function call?

When should I use varargs in designing a Python API?

Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.

Categories