Input arguments in Python functions - python

I'm new with Python language and a I'm a little bit frustrated.
Till today, I thought that passing parameter names in a function call was not mandatory. For example, if you have the following function:
def computeRectangleArea(width=7, height=8):
return width * height
I thought that you can call like this computeRectangleArea(width=7,height=8) only to make clearer the meaning of the parameters, but actually keywords of input arguments were not needed, so you can call the same function in this way also: computeRectangleArea(7, 8)
Today, while using openpyxl.styles.PatternFill(), I realized that fill_type keyword is a necessary when calling this function.
Suppose that you call the function in this way: openpyxl.styles.PatternFill('FFFFFF','FFFFFF','solid'), then the interpretation of the input parameter will be wrong.
I have some experience with OOP language (Java, C#) and these thing doesn't exist there.
It seems an inconsistent behaviour to me that some parameter names (like start_color and end_color in the example above) are optional, while others (like fill_type) must be specified before their values.
Can someone explain me why this apparently strange policy? In addition, I will be glad if someone can point me out some useful resource to understand the way it is implemented.

Positional and keyword parameters work just as they do in the languages you know better. You need to go to the documentation of the method you're using and look at the signature. For creating a PatternFill object, go to the class's __init__ method.
class PatternFill(Fill):
def __init__(self, patternType=None, fgColor=Color(), bgColor=Color(),
fill_type=None, start_color=None, end_color=None):
You may specify arguments without the keyword as long as you supply them all in order, without skipping any. For instance, your failing call can be legally given as:
PatternFill(None, 'FFFFFF', 'FFFFFF', 'solid')
These will match the first four parameters. Any time you supply an argument out of order, then you must supply the keyword for that argument and all later arguments in that invocation. For instance, with the above call, if you want to let they style default to None, then you must supply the keywords for the three arguments you do supply. If you simply omit the None, then the parser still tries to match them up sequentially from the front:
patternType <= 'FFFFFF'
fgColor <= 'FFFFFF'
bgColor <= 'solid'
... and your call fails to pass parsing.
Does that clear things up a little?

Can someone explain me why we need this "headache"…
For your specific example, it doesn't appear that there are any keyword-only parameters. Rather, you're trying to pass arguments for the first, second, and fourth parameters, without having to pass an argument for the one in between that you don't care about.
In other words, it's not a headache at all. It's a convenience (and sanity check) you could quite easily ignore—but probably don't want to.
Instead of this:
PatternFill('FFFFFF', 'FFFFFF', fill_type='solid')
… you could write this:
PatternFill('FFFFFF', 'FFFFFF', Color(), 'solid')
… but in order to know that's what you'd need to send, you need to read the source or docs to see the whole parameter list, and see what the default values are for the parameters you want to skip over, and explicitly add them to your call.
I doubt anyone would find that better.
Also, as multiple people pointed out in comments, this is pretty much exactly how named arguments work in C#.
And this class is, accidentally, a great example of why Python actually does allow keyword-only parameters, even though they aren't being used here.
The fact that you can write PatternFill('FFFFFF', 'FFFFFF', 'solid') and not get a TypeError for bad arguments to PatternFill, but instead a mysterious error about 'solid' not working as a color, is hardly a good thing. And (at least without type hinting annotations, which this type doesn't have) there's no way your IDE or any other tool could catch that mistake.
And, in fact, by not using keywords, you've even gotten the initial arguments wrong, without realizing it. You almost certainly wanted to do this:
PatternFile(None, 'FFFFFF', 'FFFFFF')
… but you got away with this without a visible error:
PatternFile('FFFFFF', 'FFFFFF')
… which means you're passing your foreground color as a pattern type and your background color as a foreground color and leaving the default background color.
That could be solved by making all or most parameters keyword-only. But without keyword-only params, the only option would be **kwargs, and that tradeoff is usually not worth it.
Quoting from the Rationale of PEP 3102, the proposal that added keyword-only parameters to the language:
There are often cases where it is desirable for a function to take a variable number of arguments. The Python language supports this using the 'varargs' syntax (*name), which specifies that any 'left over' arguments be passed into the varargs parameter as a tuple.
One limitation on this is that currently, all of the regular argument slots must be filled before the vararg slot can be.
This is not always desirable. One can easily envision a function which takes a variable number of arguments, but also takes one or more 'options' in the form of keyword arguments. Currently, the only way to do this is to define both a varargs argument, and a 'keywords' argument (**kwargs), and then manually extract the desired keywords from the dictionary.
If it isn't obvious why using *args and **kwargs isn't good enough:
The actual signature of the function is not visible when looking at the function definition in the source, or the inline help, or auto-generated docs.
The signature is also not available to dynamic reflective code using the inspect module or similar.
The signature is also not available to static reflective code—like that used by many IDEs to do completion and suggestions.
The implementation of the function is less clear, because at best it's half boilerplate for extracting and testing the parameters, and at worst the args and kwargs access are scattered throughout the body of the function.
For an example of what this feature allows, consider the builtin print function, which you can call like this:
print(x, y, z, sep=', ')
This works because print is defined like this:
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False):
If it weren't for keyword arguments, there'd be no way to pass that sep as something different from the actual values to print.
You could force the user to pass all of the objects in a tuple instead of as separate arguments, but that would be a lot less friendly—and even if you did that, there'd be no way to pass flush without passing values for all of sep, end, and file.
And, even with keyword arguments, if it weren't for keyword-only parameters, the function signature would have to look like this:
print(*objects, **kwargs):
… which would make it a lot harder to figure out what keyword arguments you could pass.

Related

Is there a way for a caller of multiple functions to forward a function ref to selected functions in a purely functional way?

Problem
I have a function make_pipeline that accepts an arbitrary number of functions, which it then calls to perform sequential data transformation. The resulting call chain performs transformations on a pandas.DataFrame. Some, but not all functions that it may call need to operate on a sub-array of the DataFrame. I have written multiple selector functions. However at present each member-function of the chain has to be explicitly be given the user-selected selector/filter function. This is VERY error-prone and accessibility is very important as the end-code is addressed to non-specialists (possibly with no Python/programming knowledge), so it must be "batteries-included". This entire project is written in a functional style (that's what's always worked for me).
Sample Code
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2, data_filter = filter_func),# This function needs access to user-defined filter function
transform3(arg1,arg2,, data_filter = filter_func),# This function needs access to user-defined filter function
transform4(arg1,arg2),
)
Expected API
filter_func = simple_filter()
# The API looks like this
make_pipeline(
load_data("somepath", header = [1,0]),
transform1(arg1,arg2),
transform2(arg1,arg2),
transform3(arg1,arg2),
transform4(arg1,arg2),
)
Attempted
I thought that if the data_filter alias is available in the caller's namespace, it also becomes available (something similar to a closure) to all functions it calls. This seems to happen with some toy examples but wont work in the case (UnboundError).
What's a good way to make a function defined in one place available to certain interested functions in the call chain? I'm trying to avoid global.
Notes/Clarification
I've had problems with OOP and mutable states in the past, and functional programming has worked quite well. Hence I've set a goal for myself to NOT use classes (to the extent that Python enables me to anyways). So no classes.
I should have probably clarified this initially: In the pipeline the output of all functions is a DataFrame and the input of all functions (except load data obviously) is a DataFrame. The functions are decorated with a wrapper that calls functools.partial because we want the user to supply the args to each function but not execute it. The actual execution is done be a forloop in make_pipeline.
Each function accepts df:pandas.DataFrame plus all arguements that are specific to that function. The statement seen above transform1(arg1,arg2,...) actually calls the decorated transform1 witch returns functools.partial(transform, arg1,arg2,...) which is now has a signature like transform(df:pandas.DataFrame).
load_dataframe is just a convenience function to load the initial dataframe so that all other functions can begin operating on it. It just felt more intuitive to users to have it part of the chain rather that a separate call
The problem is this: I need a way for a filter function to be initialized (called) in only on place, such that every function in the call chain that needs access to the filter function, gets it without it being explicitly passed as argument to said function. If you're wondering why this is the case, it's because I feel that end users will find it unintuitive and arbitrary. Some functions need it, some don't. I'm also pretty certain that they will make all kinds of errors like passing different filters, forgetting it sometimes etc.
(Update) I've also tried inspect.signature() in make_pipeline to check if each function accepts a data_filter argument and pass it on. However, this raises an incorrect function signature error so some unclear reason (likely because of the decorators/partial calls). If signature could the return the non-partial function signature, this would solve the issue, but I couldn't find much info in the docs
Turns out it was pretty easy. The solution is inspect.signature.
def make_pipeline(*args, data_filter:Optional[Callable[...,Any]] = None)
d = args[0]
for arg in args[1:]:
if "data_filter" in inspect.signature(arg):
d = arg(d, data_filter = data_filter)
else:
d= arg(d)
Leaving this here mostly for reference because I think this is a mini design pattern. I've also seen an function._closure_ on unrelated subject. That may also work, but will likely be more complicated.

variable argument vs list in Python as function arguments

I want to understand when should I use varargs vs a list type in the function parameter in Python 2.7
Suppose I write a function that processes a list of URLs. I could define the function in two different ways:
Option 1:
def process_urls(urls):
if not isinstance(urls, list) or isinstance(urls, tuple):
raise TypeError("urls should be a list or tuple type")
Option 2:
def process_urls(*urls):
# urls is guaranteed to be a tuple
Option 2 guarantees urls to be a tuple but can take in random number of positional arguments which could be garbage such as process_urls(['url1', 'url2'], "this is not a url")
From a programming standpoint, which option is preferred?
The first, but without the type checking. Type checks kill duck typing. What if the caller wants to pass in a generator, or a set, or other iterable? Don't limit them to just lists and tuples.
Neither is unequivocally best. Each style has benefits in different situations.
Using a single iterable argument is going to be better most of the time, especially if the caller already has the URLs packed up into a list. If they have a list and needed to use the varargs style, they'd need to call process_urls(*existing_list_of_URLs) whould needlessly unpacks and then repacks the arguments. As John Kugelman suggests in his answer, you should probably not use explicit type checking to enforce the type of the argument, just assume it's an iterable and work from there.
Using a variable argument list might be nicer than requiring a list if your function is mostly going to be called with separate URLs. For instance, maybe the URLs are hard coded like this: process_urls("http://example.com", "https://stackoverflow.com"). Or maybe they're in separate variables, but the specific variable to be used are directly coded in: process_url(primary_url, backup_url).
A final option: Support both approaches! You can specify that your function accepts one or more arguments. If it gets only one, it expects an iterable containing URLs. If it gets more than one argument, it expects each to be a separate URL. Here's what that might look like:
def process_urls(*args):
if len(args) == 1:
args = args[0]
# do stuff with args, which is an iterable of URLs
There's one downside to this, that a single URL string passed by itself will be incorrectly identified as a sequence of URLs, each consisting of a single character from the original string. That's such an awkward failure case, so you might want to explicitly check for it. You could choose to raise an exception, or just accept a single string as an argument as if it was in a container.

What is the pythonic way to pass arguments to functions?

I'm working on a project that almost everywhere arguments are passed by key. There are functions with positional params only, with keyword (default value) params or mix of both. For example the following function:
def complete_task(activity_task, message=None, data=None):
pass
This function in the current code would be called like this:
complete_task(activity_task=activity_task, message="My massage", data=task_data)
For me there is no point to name arguments whose name is obvious by the context of the function execution / by the variable names. I would call it like this:
complete_task(activity_task, "My message", task_data)
In certain cases where it's not clear what the a call argument is from the context, or inferred from the variable names, I might do:
complete_task(activity_task, message="success", task_data=json_dump)
So this got me wondering if there is a convention or "pythonic" way to call functions with positional/keyword params, when there is no need to rearrange method arguments or use default values for some of the keyword params.
The usual rules of thumb I follow are:
Booleans, particularly boolean literals, should always be passed by keyword unless it is really obvious what they mean. This is important enough that I will often make booleans keyword-only when writing my own functions. If you have a boolean parameter, your function may want to be split into two smaller functions, particularly if it takes the overall structure of if boolean_parameter: do_something(); else: do_something_entirely_different().
If a function takes a lot of optional parameters (more than ~3 including required parameters), then the optionals should usually be passed by keyword. But if you have a lot of parameters, your function may want to be refactored into multiple smaller functions.
If a function takes multiple parameters of the same type, they probably want to be passed as keyword arguments unless order is completely obvious from context (e.g. src comes before dest).
Most of the time, keyword arguments are not wrong. If you have a case where positional arguments are confusing, you should use keyword arguments without a second thought. With the possible exception of simple one parameter functions, keyword arguments will not make your code any harder to read.
Python has 2 types of arguments1. positional and keyword (aka default). The waters get a little muddy because positional arguments can be called by keyword and keyword arguments can be called by position...
def foo(a, b=1):
print(a, b)
foo(1, 2)
foo(a=1, b=2)
With that said, I think that the names of the types of arguments should indicate how you should (typically) use them. Most of the time, I see positional arguments called by position and keyword arguments called by keyword. So, if you're looking for a general rule of thumb, I'd advise that you make the function call mimic the signature. In the case of our above foo function, I'd call it like this:
foo(1, b=2)
I think that one reason to follow this advice is because (most of the time), people expect keyword arguments to be passed via keyword. So it isn't uncommon for someone to later add a keyword:
def foo(a, aa='1', b=2):
print(a, aa, b)
If you were calling the function using only positional arguments, you'd now be passing a value to a different parameter than you were before. However, keyword arguments don't care what order you pass them, so you should still be all set.
So far, so good. But what rules should you use when you're creating a function? How do you know whether to make an argument a default argument or a positional argument? That's a reasonable question -- And it's hard to find a good rule of thumb. The rules of thumb I use are as follows:
Be consistent with the rest of the project -- It's hard to get it right if you're doing something different than the rest of the surrounding code.
Make an argument a default argument if (and only if) it is possible to supply a reasonable default. If the function will fail if the user doesn't supply a particular argument (because there is no good default), then it should be positional.
1Python3.x also has keyword only arguments. Those don't give you a choice, so I don't know that they add too much to the discussion here :-) -- Though I don't know that I've seen their use out in the wild too much.

What are the conventions for ordering parameters in Python?

What are the conventions for ordering parameters in Python? For instance,
def plot_graph(G, filename, ...)
# OR
def plot_graph(filename, G, ...)
There is no discussion in PEP 0008 -- Style Guide for Python Code | Python.org
Excerpt from the answer of Conventions for order of parameters in a function,
If a language allows passing a hash/map/associative array as a single parameter, try to opt for passing that. This is especially useful for methods with >=3 parameters, ESPECIALLY when those same parameters will be passed to nested function calls.
Is it extreme to convert each parameter into a key-value pair, like def plot_graph(graph=None, filename=None, ...)?
There's really no convention for ordering function parameters, except a limitation that positional non-default parameters must go before parameters with defaults and only then keyword parameters, i.e. def func(pos_1, pos_n, pos_1_w_default='default_val', pos_n_w_default='default_val', *args, kw_1, kw_n, kw_1_w_default='default_val', kw_n_w_default='default_val', **kwargs).
Usually you define parameters order logically based on their meaning for the function, e.g. if you define a function that does subtraction, it's logical, that minuend should be the first parameter and subtrahend should be second. In this case reverse order is possible, but it's not logical.
Also, if you consider that your function might be used partially, that might affect your decision on parameter ordering.
Most things you need to know about function parameters are in the official tutorial.
P.S. Regarding your particular example with graph function... Considering your function name, it is used for displaying a graph, so a graph must be provided as argument, otherwise there's nothing to display, so making graph=None by default doesn't make much sense.
It is not extreme to use only keyword arguments. I have seen that in many codebases. This allows you to extend functionalities (by adding new keyword arguments to your functions) without breaking your previous code. It can be slightly more tedious to use, but definitely easier to maintain and to extend.
Also have a look at PEP 3102 -- Keyword-Only Arguments, which is a way to force the use of keyword arguments in python 3.

When should I use varargs in designing a Python API?

Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.

Categories