variable argument vs list in Python as function arguments - python

I want to understand when should I use varargs vs a list type in the function parameter in Python 2.7
Suppose I write a function that processes a list of URLs. I could define the function in two different ways:
Option 1:
def process_urls(urls):
if not isinstance(urls, list) or isinstance(urls, tuple):
raise TypeError("urls should be a list or tuple type")
Option 2:
def process_urls(*urls):
# urls is guaranteed to be a tuple
Option 2 guarantees urls to be a tuple but can take in random number of positional arguments which could be garbage such as process_urls(['url1', 'url2'], "this is not a url")
From a programming standpoint, which option is preferred?

The first, but without the type checking. Type checks kill duck typing. What if the caller wants to pass in a generator, or a set, or other iterable? Don't limit them to just lists and tuples.

Neither is unequivocally best. Each style has benefits in different situations.
Using a single iterable argument is going to be better most of the time, especially if the caller already has the URLs packed up into a list. If they have a list and needed to use the varargs style, they'd need to call process_urls(*existing_list_of_URLs) whould needlessly unpacks and then repacks the arguments. As John Kugelman suggests in his answer, you should probably not use explicit type checking to enforce the type of the argument, just assume it's an iterable and work from there.
Using a variable argument list might be nicer than requiring a list if your function is mostly going to be called with separate URLs. For instance, maybe the URLs are hard coded like this: process_urls("http://example.com", "https://stackoverflow.com"). Or maybe they're in separate variables, but the specific variable to be used are directly coded in: process_url(primary_url, backup_url).
A final option: Support both approaches! You can specify that your function accepts one or more arguments. If it gets only one, it expects an iterable containing URLs. If it gets more than one argument, it expects each to be a separate URL. Here's what that might look like:
def process_urls(*args):
if len(args) == 1:
args = args[0]
# do stuff with args, which is an iterable of URLs
There's one downside to this, that a single URL string passed by itself will be incorrectly identified as a sequence of URLs, each consisting of a single character from the original string. That's such an awkward failure case, so you might want to explicitly check for it. You could choose to raise an exception, or just accept a single string as an argument as if it was in a container.

Related

Input arguments in Python functions

I'm new with Python language and a I'm a little bit frustrated.
Till today, I thought that passing parameter names in a function call was not mandatory. For example, if you have the following function:
def computeRectangleArea(width=7, height=8):
return width * height
I thought that you can call like this computeRectangleArea(width=7,height=8) only to make clearer the meaning of the parameters, but actually keywords of input arguments were not needed, so you can call the same function in this way also: computeRectangleArea(7, 8)
Today, while using openpyxl.styles.PatternFill(), I realized that fill_type keyword is a necessary when calling this function.
Suppose that you call the function in this way: openpyxl.styles.PatternFill('FFFFFF','FFFFFF','solid'), then the interpretation of the input parameter will be wrong.
I have some experience with OOP language (Java, C#) and these thing doesn't exist there.
It seems an inconsistent behaviour to me that some parameter names (like start_color and end_color in the example above) are optional, while others (like fill_type) must be specified before their values.
Can someone explain me why this apparently strange policy? In addition, I will be glad if someone can point me out some useful resource to understand the way it is implemented.
Positional and keyword parameters work just as they do in the languages you know better. You need to go to the documentation of the method you're using and look at the signature. For creating a PatternFill object, go to the class's __init__ method.
class PatternFill(Fill):
def __init__(self, patternType=None, fgColor=Color(), bgColor=Color(),
fill_type=None, start_color=None, end_color=None):
You may specify arguments without the keyword as long as you supply them all in order, without skipping any. For instance, your failing call can be legally given as:
PatternFill(None, 'FFFFFF', 'FFFFFF', 'solid')
These will match the first four parameters. Any time you supply an argument out of order, then you must supply the keyword for that argument and all later arguments in that invocation. For instance, with the above call, if you want to let they style default to None, then you must supply the keywords for the three arguments you do supply. If you simply omit the None, then the parser still tries to match them up sequentially from the front:
patternType <= 'FFFFFF'
fgColor <= 'FFFFFF'
bgColor <= 'solid'
... and your call fails to pass parsing.
Does that clear things up a little?
Can someone explain me why we need this "headache"…
For your specific example, it doesn't appear that there are any keyword-only parameters. Rather, you're trying to pass arguments for the first, second, and fourth parameters, without having to pass an argument for the one in between that you don't care about.
In other words, it's not a headache at all. It's a convenience (and sanity check) you could quite easily ignore—but probably don't want to.
Instead of this:
PatternFill('FFFFFF', 'FFFFFF', fill_type='solid')
… you could write this:
PatternFill('FFFFFF', 'FFFFFF', Color(), 'solid')
… but in order to know that's what you'd need to send, you need to read the source or docs to see the whole parameter list, and see what the default values are for the parameters you want to skip over, and explicitly add them to your call.
I doubt anyone would find that better.
Also, as multiple people pointed out in comments, this is pretty much exactly how named arguments work in C#.
And this class is, accidentally, a great example of why Python actually does allow keyword-only parameters, even though they aren't being used here.
The fact that you can write PatternFill('FFFFFF', 'FFFFFF', 'solid') and not get a TypeError for bad arguments to PatternFill, but instead a mysterious error about 'solid' not working as a color, is hardly a good thing. And (at least without type hinting annotations, which this type doesn't have) there's no way your IDE or any other tool could catch that mistake.
And, in fact, by not using keywords, you've even gotten the initial arguments wrong, without realizing it. You almost certainly wanted to do this:
PatternFile(None, 'FFFFFF', 'FFFFFF')
… but you got away with this without a visible error:
PatternFile('FFFFFF', 'FFFFFF')
… which means you're passing your foreground color as a pattern type and your background color as a foreground color and leaving the default background color.
That could be solved by making all or most parameters keyword-only. But without keyword-only params, the only option would be **kwargs, and that tradeoff is usually not worth it.
Quoting from the Rationale of PEP 3102, the proposal that added keyword-only parameters to the language:
There are often cases where it is desirable for a function to take a variable number of arguments. The Python language supports this using the 'varargs' syntax (*name), which specifies that any 'left over' arguments be passed into the varargs parameter as a tuple.
One limitation on this is that currently, all of the regular argument slots must be filled before the vararg slot can be.
This is not always desirable. One can easily envision a function which takes a variable number of arguments, but also takes one or more 'options' in the form of keyword arguments. Currently, the only way to do this is to define both a varargs argument, and a 'keywords' argument (**kwargs), and then manually extract the desired keywords from the dictionary.
If it isn't obvious why using *args and **kwargs isn't good enough:
The actual signature of the function is not visible when looking at the function definition in the source, or the inline help, or auto-generated docs.
The signature is also not available to dynamic reflective code using the inspect module or similar.
The signature is also not available to static reflective code—like that used by many IDEs to do completion and suggestions.
The implementation of the function is less clear, because at best it's half boilerplate for extracting and testing the parameters, and at worst the args and kwargs access are scattered throughout the body of the function.
For an example of what this feature allows, consider the builtin print function, which you can call like this:
print(x, y, z, sep=', ')
This works because print is defined like this:
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False):
If it weren't for keyword arguments, there'd be no way to pass that sep as something different from the actual values to print.
You could force the user to pass all of the objects in a tuple instead of as separate arguments, but that would be a lot less friendly—and even if you did that, there'd be no way to pass flush without passing values for all of sep, end, and file.
And, even with keyword arguments, if it weren't for keyword-only parameters, the function signature would have to look like this:
print(*objects, **kwargs):
… which would make it a lot harder to figure out what keyword arguments you could pass.

Methods for passing large numbers of arguments to a Python function

I've encountered a problem in a project where it may be useful to be able to pass a large number (in the tens, not the hundreds) of arguments to a single "Write once, use many times" function in Python. The issue is, I'm not really sure what the best ay is to handle a large block of functions like that - just pass them all in as a single dictionary and unpack that dictionary inside the function, or is there a more efficient/pythonic way of achieving the same effect.
Depending on exactly what you are doing, you can pass in arbitrary parameters to Python functions in one of two standard ways. The first is to pass them as a tuple (i.e. based on location in the function call). The second is to pass them as key-value pairs, stored in a map in the function definition. If you wanted to be able to differentiate the arguments using keys, you would call the function using arguments of the form key=value and retrieve them from a map parameter (declared with ** prefix) in the function definition. This parameter is normally called kwargs by convention. The other way to pass an arbitrary number of parameters is to pass them as a tuple. Python will wrap the arguments in a tuple automatically if you declare it with the * prefix. This parameter is usually called args by convention. You can of course use both of these in some combination along with other named arguments as desired.

Python: why str.join(iterable) instead of str.join(*strings)

I'm constantly wrapping my str.join() arguments in a list, e.g.
'.'.join([str_one, str_two])
The extra list wrapper always seems superfluous to me. I'd like to do...
'.'.join(str_one, str_two, str_three, ...)
... or if I have a list ...
'.'.join(*list_of_strings)
Yes I'm a minimalist, yes I'm picky, but mostly I'm just curious about the history here, or whether I'm missing something. Maybe there was a time before splats?
Edit:
I'd just like to note that max() handles both versions:
max(iterable[, key])
max(arg1, arg2, *args[, key])
For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join() is following:
''.join(process(x) for x in some_input)
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)
where input_data can have thousand of entries and you just want to generate the output string efficiently.
If join accepted variable arguments instead of an iterable, this would have to be spelled as:
''.join(*(process(x) for x in some_input))
# or
''.join(*result)
which would create a (possibly long) tuple, just to pass it as *args.
So that's 2 characters in a short case vs. being wasteful in large data case.
History note
(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)
The *args in function definitions were added in Python long time ago:
==> Release 0.9.8 (9 Jan 1993) <==
Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".
A proper way to pass a list to such functions was using a built-in function apply(callable, sequence). (Note, this doesn't mention **kwargs which can be first seen in docs for version 1.4).
The ability to call a function with * syntax is first mentioned in release notes for 1.6:
There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).
But it's missing from grammar docs until version 2.2.
Before 2.0 str.join() did not even exists and you had to do from string import join.
You'd have to write your own function to do that.
>>> def my_join(separator, *args):
return separator.join(args)
>>> my_join('.', '1', '2', '3')
'1.2.3'
Note that this doesn't avoid the creation of an extra object, it just hides that an extra object is being created. If you inspect the type of args, you'll see that it's a tuple.
If you don't want to create a function and you have a fixed list of strings then it would be possible to use format instead of join:
'{}.{}.{}.{}'.format(str_one, str_two, str_three, str_four)
It's better to just stick with '.'.join((a, b, c)).
Argh, now this is a hard question! Try arguing which style is more minimalist... Hard to give a good answer without being too subjective, since it's all about convention.
The problem is: We have a function that accepts an ordered collection; should it accept it as a single argument or as a variable-length argument list?
Python usually answers: Single argument; VLAL if you really have a reason to. Let's see how Python libs reflect this:
The standard library has a couple examples for VLAL, most notably:
when the function can be called with an arbitrary number of separate sequences - like zip or map or itertools.chain,
when there's one sequence to pass, but you don't really expect the caller to have the whole of it as a single variable. This seems to fit str.format.
And the common case for using a single argument:
When you want to do some generic data processing on a single sequence. This fits the functional trio (map*, reduce, filter), and specialized spawns of thereof, like sum or str.join. Also stateful transforms like enumerate.
The pattern is "consume an interable, give another iterable" or "consume an iterable, give a result".
Hope this answers your question.
Note: map is technically var-arg, but the common use case is just map(func, sequence) -> sequence which falls into one bucket with reduce and filter.
*The obscure case, map(func, *sequences) is conceptually like map(func, izip_longest(sequences)) - and the reason for zips to follow the var-arg convention was explained before.
I Hope you follow my thinking here; after all it's all a matter of programming style, I'm just pointing at some patterns in Python's library functions.

Python: Return tuple or list?

I have a method that returns either a list or a tuple. What is the most pythonic way of denoting the return type in the argument?
def names(self, section, as_type=()):
return type(as_type)(([m[0] for m in self.items(section)]))
The pythonic way would be not to care about the type at all. Return a tuple, and if the calling function needs a list, then let it call list() on the result. Or vice versa, whichever makes more sense as a default type.
Even better, have it return a generator expression:
def names(self, section):
return (m[0] for m in self.items(section))
Now the caller gets an iterable that is evaluated lazily. He then can decide to iterate over it:
for name in obj.names(section):
...
or create a list or tuple from it from scratch - he never has to change an existing list into a tuple or vice versa, so this is efficient in all cases:
mylist = list(obj.names(section))
mytuple = tuple(obj.names(section))
Return whatever the caller will want most of the time. If they will want to be able to sort, remove or delete items, etc. then use a list. If they will want to use it as a dictionary key, use a tuple. If the primary use will be iteration, return an iterator. If it doesn't matter to the caller, which it won't more often than you might think, then return whatever makes the code the most straightforward. Usually this will be a list or an iterator.
Don't provide your own way to convert the output to a given type. Python has a perfectly simple way to do this already and any programmer using your function will be familiar with it. Look at the standard Python library. Do any of those routines do this? No, because there's no reason to.
Exception: sometimes there's a way to get an iterator or a list, even though it is easy to convert an iterator to a list. Usually this capability is provided as two separate functions or methods. Maybe you might want to follow suit sometimes, especially if you could implement the two alternatives using different algorithms that provide some clear benefit to callers who want one or another.
Keep it simple:
def names(self, section):
"""Returns a list of names."""
return [m[0] for m in self.items(section)]
If the caller wants a tuple instead of a list, he does this:
names = tuple(obj.names(section))

When should I use varargs in designing a Python API?

Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.

Categories