proper name for python * operator? - python

What is the correct name for operator *, as in function(*args)? unpack, unzip, something else?

In Ruby and Perl 6 this has been called "splat", and I think most people from
those communities will figure out what you mean if you call it that.
The Python tutorial uses the phrase "unpacking argument lists", which is
long and descriptive.
It is also referred to as iterable unpacking, or in the case of **,
dictionary unpacking.

I call it "positional expansion", as opposed to ** which I call "keyword expansion".

The Python Tutorial simply calls it 'the *-operator'. It performs unpacking of arbitrary argument lists.

I say "star-args" and Python people seem to know what i mean.
** is trickier - I think just "qargs" since it is usually used as **kw or **kwargs

One can also call * a gather parameter (when used in function arguments definition) or a scatter operator (when used at function invocation).
As seen here: Think Python/Tuples/Variable-length argument tuples.

I believe it's most commonly called the "splat operator." Unpacking arguments is what it does.

The technical term for this is a Variadic function. So in a sense, that's the correct term without regard to programming language.
That said, in different languages the term does have legitimate names. As others have mentioned, it is called "splat" in ruby, julia, and several other languages and is noted by that name in official documentation. In javascript it is called the "spread" syntax. It has many other names in many other languages, as mentioned in other answers. Whatever you call it, it's quite useful!

For a colloquial name there is "splatting".
For arguments (list type) you use single * and for keyword arguments (dictionary type) you use double **.
Both * and ** is sometimes referred to as "splatting".
See for reference of this name being used:
https://stackoverflow.com/a/47875892/14305096

I call *args "star args" or "varargs" and **kwargs "keyword args".

Related

Was Python's splat operator ... at one point?

Today, I saw a presentation from pyData 2017 where the presenter used python's splat operator *. Imagine my surprise as I saw it as a pointer until he used the method. I thought Python's splat operator was something like an ellipsis ... No? A google search turned up nothing for me. Did they change it at some point or was it always *? If they did change it, why? Is there an implementation difference and/or speed difference if they changed it?
Edit: "unpacking argument lists" for the angry commenters.
No, Python's unpacking operator (sometimes called "splat" or "spread") never used the ... ellipsis symbol. Python has an .../Ellipsis literal value, but it's only used as a singleton constant for expressing multidimensional ranges in libraries like NumPy. It has no intrinsic behavior and is not syntactically valid in locations where you would use the * unpacking operator.
We can see that the change log for Python 2.0 (released in 2000) describes the new functionality of being able to use the * unpacking operator to call a function, but using the * asterisk character to define a variadic function (sometimes called using "rest parameters") is older than that.
A new syntax makes it more convenient to call a given function with a tuple of arguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier, you’d use the apply() built-in function: apply(f, args, kw) calls the function f() with the argument tuple args and the keyword arguments in the dictionary kw. apply() is the same in 2.0, but thanks to a patch from Greg Ewing, f(*args, **kw)is a shorter and clearer way to achieve the same effect. This syntax is symmetrical with the syntax for defining functions.
The source code for Python 1.0.1 (released in 1994) is still available from the Python website, and we can look at some of their examples to confirm that the use of the * asterisk character for variadic function definitions existed even then. From Demo/sockets/gopher.py:
# Browser main command, has default arguments
def browser(*args):
selector = DEF_SELECTOR
host = DEF_HOST
port = DEF_PORT
n = len(args)
if n > 0 and args[0]:
selector = args[0]

TypeError: split() takes no keyword arguments in Python 2.x

I am trying to separate a section of a document into its different components which are separated by ampersands. This is what I have:
name,function,range,w,h,k,frac,constraint = str.split(str="&", num=8)
Error:
TypeError: split() takes no keyword arguments
Can someone explain the error to me and also provide an alternate method for me to make this work?
The parameters of str.split are called sep and maxsplit:
str.split(sep="&", maxsplit=8)
But you can only use the parameter names like this in Python 3.x. In Python 2.x, you need to do:
str.split("&", 8)
which in my opinion is the best for both versions since using the names is really just redundant. str.split is a very well known tool in Python, so I doubt any Python programmers will have trouble understanding what the arguments to the method mean.
Also, you should avoid making user-defined names the same as one of the built-in names. Doing this overshadows the built-in and makes it unusable in the current scope. So, I'd pick a different name for your string besides str.
The error states that you can't provide named arguments to split. You have to call split with just the arguments - without the names of the arguments:
name,function,range,w,h,k,frac,constraint = str.split("&", 8)
split doesnt get keyword arguments str or num. Do this instead:
name,function,range,w,h,k,frac,constraint = str.split('&', 8)

Python: why str.join(iterable) instead of str.join(*strings)

I'm constantly wrapping my str.join() arguments in a list, e.g.
'.'.join([str_one, str_two])
The extra list wrapper always seems superfluous to me. I'd like to do...
'.'.join(str_one, str_two, str_three, ...)
... or if I have a list ...
'.'.join(*list_of_strings)
Yes I'm a minimalist, yes I'm picky, but mostly I'm just curious about the history here, or whether I'm missing something. Maybe there was a time before splats?
Edit:
I'd just like to note that max() handles both versions:
max(iterable[, key])
max(arg1, arg2, *args[, key])
For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join() is following:
''.join(process(x) for x in some_input)
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)
where input_data can have thousand of entries and you just want to generate the output string efficiently.
If join accepted variable arguments instead of an iterable, this would have to be spelled as:
''.join(*(process(x) for x in some_input))
# or
''.join(*result)
which would create a (possibly long) tuple, just to pass it as *args.
So that's 2 characters in a short case vs. being wasteful in large data case.
History note
(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)
The *args in function definitions were added in Python long time ago:
==> Release 0.9.8 (9 Jan 1993) <==
Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".
A proper way to pass a list to such functions was using a built-in function apply(callable, sequence). (Note, this doesn't mention **kwargs which can be first seen in docs for version 1.4).
The ability to call a function with * syntax is first mentioned in release notes for 1.6:
There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).
But it's missing from grammar docs until version 2.2.
Before 2.0 str.join() did not even exists and you had to do from string import join.
You'd have to write your own function to do that.
>>> def my_join(separator, *args):
return separator.join(args)
>>> my_join('.', '1', '2', '3')
'1.2.3'
Note that this doesn't avoid the creation of an extra object, it just hides that an extra object is being created. If you inspect the type of args, you'll see that it's a tuple.
If you don't want to create a function and you have a fixed list of strings then it would be possible to use format instead of join:
'{}.{}.{}.{}'.format(str_one, str_two, str_three, str_four)
It's better to just stick with '.'.join((a, b, c)).
Argh, now this is a hard question! Try arguing which style is more minimalist... Hard to give a good answer without being too subjective, since it's all about convention.
The problem is: We have a function that accepts an ordered collection; should it accept it as a single argument or as a variable-length argument list?
Python usually answers: Single argument; VLAL if you really have a reason to. Let's see how Python libs reflect this:
The standard library has a couple examples for VLAL, most notably:
when the function can be called with an arbitrary number of separate sequences - like zip or map or itertools.chain,
when there's one sequence to pass, but you don't really expect the caller to have the whole of it as a single variable. This seems to fit str.format.
And the common case for using a single argument:
When you want to do some generic data processing on a single sequence. This fits the functional trio (map*, reduce, filter), and specialized spawns of thereof, like sum or str.join. Also stateful transforms like enumerate.
The pattern is "consume an interable, give another iterable" or "consume an iterable, give a result".
Hope this answers your question.
Note: map is technically var-arg, but the common use case is just map(func, sequence) -> sequence which falls into one bucket with reduce and filter.
*The obscure case, map(func, *sequences) is conceptually like map(func, izip_longest(sequences)) - and the reason for zips to follow the var-arg convention was explained before.
I Hope you follow my thinking here; after all it's all a matter of programming style, I'm just pointing at some patterns in Python's library functions.

Please explain why these two builtin functions behave different when passed in keyword arguments

Consider these different behaviour::
>> def minus(a, b):
>> return a - b
>> minus(**dict(b=2, a=1))
-1
>> int(**dict(base=2, x='100'))
4
>> import operator
>> operator.sub.__doc__
'sub(a, b) -- Same as a - b.'
>> operator.sub(**dict(b=2, a=1))
TypeError: sub() takes no keyword arguments
Why does operator.sub behave differently from int(x, [base]) ?
It is an implementation detail. The Python C API to retrieve arguments separates between positional and keyword arguments. Positional arguments do not even have a name internally.
The code used to retrieve the arguments of the operator.add functions (and similar ones like sub) is this:
PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)
As you can see, it does not contain any argument name. The whole code related to operator.add is:
#define spam2(OP,AOP) static PyObject *OP(PyObject *s, PyObject *a) { \
PyObject *a1, *a2; \
if(! PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)) return NULL; \
return AOP(a1,a2); }
spam2(op_add , PyNumber_Add)
#define spam2(OP,ALTOP,DOC) {#OP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)}, \
{#ALTOP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)},
spam2(add,__add__, "add(a, b) -- Same as a + b.")
As you can see, the only place where a and b are used is in the docstring. The method definition also does not use the METH_KEYWORDS flag which would be necessary for the method to accept keyword arguments.
Generally spoken, you can safely assume that a python-based function where you know an argument name will always accept keyword arguments (of course someone could do nasty stuff with *args unpacking but creating a function doc where the arguments look normal) while C functions may or may not accept keyword arguments. Chances are good that functions with more than a few arguments or optional arguments accept keyword arguments for the later/optional ones. But you pretty much have to test it.
You can find a discussion about supporting keyword arguments everywhere on the python-ideas mailinglist. There is also a statement from Guido van Rossum (the Benevolent Dictator For Life aka the creator of Python) on it:
Hm. I think for many (most?) 1-arg and selected 2-arg functions (and
rarely 3+-arg functions) this would reduce readability, as the example
of ord(char=x) showed.
I would actually like to see a syntactic feature to state that an
argument cannot be given as a keyword argument (just as we already
added syntax to state that it must be a keyword).
One area where I think adding keyword args is outright wrong: Methods
of built-in types or ABCs and that are overridable. E.g. consider the
pop() method on dict. Since the argument name is currently
undocumented, if someone subclasses dict and overrides this method, or
if they create another mutable mapping class that tries to emulate
dict using duck typing, it doesn't matter what the argument name is --
all the callers (expecting a dict, a dict subclass, or a dict-like
duck) will be using positional arguments in the call. But if we were
to document the argument names for pop(), and users started to use
these, then most dict sublcasses and ducks would suddenly be broken
(except if by luck they happened to pick the same name).
operator is a C module, which defines functions differently. Unless the function declaration in the module initialization includes METH_KEYWORDS, the function will not accept keyword arguments under any conditions and you get the error given in the question.
minus(**dict(b=2, a=1)) expands to minus(b=2, a=1). This works because your definition has argument names a and b.
operator.sub(**dict(b=2, a=1)) expands to operator.sub(b=2, a=1). This doesn't work because sub does not accept keyword arguments.

When should I use varargs in designing a Python API?

Is there a good rule of thumb as to when you should prefer varargs function signatures in your API over passing an iterable to a function? ("varargs" being short for "variadic" or "variable-number-of-arguments"; i.e. *args)
For example, os.path.join has a vararg signature:
os.path.join(first_component, *rest) -> str
Whereas min allows either:
min(iterable[, key=func]) -> val
min(a, b, c, ...[, key=func]) -> val
Whereas any/all only permit an iterable:
any(iterable) -> bool
Consider using varargs when you expect your users to specify the list of arguments as code at the callsite or having a single value is the common case. When you expect your users to get the arguments from somewhere else, don't use varargs. When in doubt, err on the side of not using varargs.
Using your examples, the most common usecase for os.path.join is to have a path prefix and append a filename/relative path onto it, so the call usually looks like os.path.join(prefix, some_file). On the other hand, any() is usually used to process a list of data, when you know all the elements you don't use any([a,b,c]), you use a or b or c.
My rule of thumb is to use it when you might often switch between passing one and multiple parameters. Instead of having two functions (some GUI code for example):
def enable_tab(tab_name)
def enable_tabs(tabs_list)
or even worse, having just one function
def enable_tabs(tabs_list)
and using it as enable_tabls(['tab1']), I tend to use just: def enable_tabs(*tabs). Although, seeing something like enable_tabs('tab1') looks kind of wrong (because of the plural), I prefer it over the alternatives.
You should use it when your parameter list is variable.
Yeah, I know the answer is kinda daft, but it's true. Maybe your question was a bit diffuse. :-)
Default arguments, like min() above is more useful when you either want to different behaviours (like min() above) or when you simply don't want to force the caller to send in all parameters.
The *arg is for when you have a variable list of arguments of the same type. Joining is a typical example. You can replace it with an argument that takes a list as well.
**kw is for when you have many arguments of different types, where each argument also is connected to a name. A typical example is when you want a generic function for handling form submission or similar.
They are completely different interfaces.
In one case, you have one parameter, in the other you have many.
any(1, 2, 3)
TypeError: any() takes exactly one argument (3 given)
os.path.join("1", "2", "3")
'1\\2\\3'
It really depends on what you want to emphasize: any works over a list (well, sort of), while os.path.join works over a set of strings.
Therefore, in the first case you request a list; in the second, you request directly the strings.
In other terms, the expressiveness of the interface should be the main guideline for choosing the way parameters should be passed.

Categories