Semantics of tuple unpacking in python - python

Why does python only allow named arguments to follow a tuple unpacking expression in a function call?
>>> def f(a,b,c):
... print a, b, c
...
>>> f(*(1,2),3)
File "<stdin>", line 1
SyntaxError: only named arguments may follow *expression
Is it simply an aesthetic choice, or are there cases where allowing this would lead to some ambiguities?

i am pretty sure that the reason people "naturally" don't like this is because it makes the meaning of later arguments ambiguous, depending on the length of the interpolated series:
def dangerbaby(a, b, *c):
hug(a)
kill(b)
>>> dangerbaby('puppy', 'bug')
killed bug
>>> cuddles = ['puppy']
>>> dangerbaby(*cuddles, 'bug')
killed bug
>>> cuddles.append('kitten')
>>> dangerbaby(*cuddles, 'bug')
killed kitten
you cannot tell from just looking at the last two calls to dangerbaby which one works as expected and which one kills little kitten fluffykins.
of course, some of this uncertainty is also present when interpolating at the end. but the confusion is constrained to the interpolated sequence - it doesn't affect other arguments, like bug.
[i made a quick search to see if i could find anything official. it seems that the * prefix for varags was introduced in python 0.9.8. the previous syntax is discussed here and the rules for how it worked were rather complex. since the addition of extra arguments "had to" happen at the end when there was no * marker it seems like that simply carried over. finally there's a mention here of a long discussion on argument lists that was not by email.]

I suspect that it's for consistency with the star notation in function definitions, which is after all the model for the star notation in function calls.
In the following definition, the parameter *c will slurp all subsequent non-keyword arguments, so obviously when f is called, the only way to pass a value for d will be as a keyword argument.
def f(a, b, *c, d=1):
print "slurped", len(c)
(Such "keyword-only parameters" are only supported in Python 3. In Python 2 there is no way to assign values after a starred argument, so the above is illegal.)
So, in a function definition the starred argument must follow all ordinary positional arguments. What you observed is that the same rule has been extended to function calls. This way, the star syntax is consistent for function declarations and function calls.
Another parallelism is that you can only have one (single-)starred argument in a function call. The following is illegal, though one could easily imagine it being allowed.
f(*(1,2), *(3,4))

First of all, it is simple to provide a very similar interface yourself using a wrapper function:
def applylast(func, arglist, *literalargs):
return func(*(literalargs + arglist))
applylast(f, (1, 2), 3) # equivalent to f(3, 1, 2)
Secondly, enhancing the interpreter to support your syntax natively might add overhead to the very performance-critical activity of function application. Even if it only requires a few extra instructions in compiled code, due to the high usage of those routines, that might constitute an unacceptable performance penalty in exchange for a feature that is not called for all that often and easily accommodated in a user library.

Some observations:
Python processes positional arguments before keyword arguments (f(c=3, *(1, 2)) in your example still prints 1 2 3). This makes sense as (i) most arguments in function calls are positional and (ii) the semantics of a programming language need to be unambiguous (i.e., a choice needs to be made either way on the order in which to process positional and keyword arguments).
If we did have a positional argument to the right in a function call, it would be difficult to define what that means. If we call f(*(1, 2), 3), should that be f(1, 2, 3) or f(3, 1, 2) and why would either choice make more sense than the other?
For an official explanation, PEP 3102 provides a lot of insight on how function definitions work. The star (*) in a function definition indicates the end of position arguments (section Specification). To see why, consider: def g(a, b, *c, d). There's no way to provide a value for d other than as a keyword argument (positional arguments would be 'grabbed' by c).
It's important to realize what this means: as the star marks the end of positional arguments, that means all positional arguments must be in that position or to the left of it.

change the order:
def f(c,a,b):
print(a,b,c)
f(3,*(1,2))

If you have a Python 3 keyword-only parameter, like
def f(*a, b=1):
...
then you might expect something like f(*(1, 2), 3) to set a to (1 , 2) and b to 3, but of course, even if the syntax you want were allowed, it would not, because keyword-only parameters must be keyword-only, like f(*(1, 2), b=3). If it were allowed, I suppose it would have to set a to (1, 2, 3) and leave b as the default 1. So it's perhaps not syntactic ambiguity so much as ambiguity in what is expected, which is something Python greatly tries to avoid.

Related

Was Python's splat operator ... at one point?

Today, I saw a presentation from pyData 2017 where the presenter used python's splat operator *. Imagine my surprise as I saw it as a pointer until he used the method. I thought Python's splat operator was something like an ellipsis ... No? A google search turned up nothing for me. Did they change it at some point or was it always *? If they did change it, why? Is there an implementation difference and/or speed difference if they changed it?
Edit: "unpacking argument lists" for the angry commenters.
No, Python's unpacking operator (sometimes called "splat" or "spread") never used the ... ellipsis symbol. Python has an .../Ellipsis literal value, but it's only used as a singleton constant for expressing multidimensional ranges in libraries like NumPy. It has no intrinsic behavior and is not syntactically valid in locations where you would use the * unpacking operator.
We can see that the change log for Python 2.0 (released in 2000) describes the new functionality of being able to use the * unpacking operator to call a function, but using the * asterisk character to define a variadic function (sometimes called using "rest parameters") is older than that.
A new syntax makes it more convenient to call a given function with a tuple of arguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier, you’d use the apply() built-in function: apply(f, args, kw) calls the function f() with the argument tuple args and the keyword arguments in the dictionary kw. apply() is the same in 2.0, but thanks to a patch from Greg Ewing, f(*args, **kw)is a shorter and clearer way to achieve the same effect. This syntax is symmetrical with the syntax for defining functions.
The source code for Python 1.0.1 (released in 1994) is still available from the Python website, and we can look at some of their examples to confirm that the use of the * asterisk character for variadic function definitions existed even then. From Demo/sockets/gopher.py:
# Browser main command, has default arguments
def browser(*args):
selector = DEF_SELECTOR
host = DEF_HOST
port = DEF_PORT
n = len(args)
if n > 0 and args[0]:
selector = args[0]

Passing new shape to `np.reshape`

Within numpy.ndarray.reshape, the shape parameter is an int or tuple of ints, and
The new shape should be compatible with the original shape. If an
integer, then the result will be a 1-D array of that length.
The documentation signature is just:
# Note this question doesn't apply to the function version, `np.reshape`
np.ndarray.reshape(shape, order='C')
In practice the specification doesn't seem to be this strict. From the description above I would expect to need to use:
import numpy as np
a = np.arange(12)
b = a.reshape((4,3)) # (4,3) is passed to `newshape`
But instead I can get away with just:
c = a.reshape(4,3) # Seems like just 4 would be passed to `newshape`
# and 3 would be passed to next parameter, `order`
print(np.array_equal(b,c))
# True
How is it that I can do this? I know that if I just simply enter 2, 3 into a Python shell, it is technically a tuple whether I use parentheses or not. But the comparison above seems to violate basic laws of how positional parameters are passed to the dict of keyword args. I.e.:
def f(a, b=1, order='c'):
print(a)
print(b)
f((4,3))
print()
f(4,3)
# (4, 3)
# 1
#
# 4
# 3
...and there are no star operators in reshape. (Something akin to def f(*a, order='c') above.)
With the way that parameters are bound with normal Python methods, it should not work, but the method is not a Python method at all. Numpy is an extension module for CPython, and numpy.ndarray.reshape is actually implemented in C.
If you look at the implementation, the order parameter is only ever read as a keyword argument. A positional argument will never be bound to it, unlike with a normal Python method where the second positional argument would be bound to order. The C code tries to build the value for newshape from all of the positional arguments.
There's nothing magic going on. The function's signature just doesn't match the documentation. It's documented as
ndarray.reshape(shape, order='C')
but it's written in C, and instead of doing the C-api equivalent of
def reshape(self, shape, order='C'):
it does the C-api equivalent of manual *args and **kwargs handling. You can take a look in numpy/core/src/multiarray/methods.c. (Note that the C-api equivalent of def reshape(self, shape, order='C'): would have the same C-level signature as what the current code is doing, but it would immediately use something like PyArg_ParseTupleAndKeywords to parse the arguments instead of doing manual handling.)

What is the pythonic way to pass arguments to functions?

I'm working on a project that almost everywhere arguments are passed by key. There are functions with positional params only, with keyword (default value) params or mix of both. For example the following function:
def complete_task(activity_task, message=None, data=None):
pass
This function in the current code would be called like this:
complete_task(activity_task=activity_task, message="My massage", data=task_data)
For me there is no point to name arguments whose name is obvious by the context of the function execution / by the variable names. I would call it like this:
complete_task(activity_task, "My message", task_data)
In certain cases where it's not clear what the a call argument is from the context, or inferred from the variable names, I might do:
complete_task(activity_task, message="success", task_data=json_dump)
So this got me wondering if there is a convention or "pythonic" way to call functions with positional/keyword params, when there is no need to rearrange method arguments or use default values for some of the keyword params.
The usual rules of thumb I follow are:
Booleans, particularly boolean literals, should always be passed by keyword unless it is really obvious what they mean. This is important enough that I will often make booleans keyword-only when writing my own functions. If you have a boolean parameter, your function may want to be split into two smaller functions, particularly if it takes the overall structure of if boolean_parameter: do_something(); else: do_something_entirely_different().
If a function takes a lot of optional parameters (more than ~3 including required parameters), then the optionals should usually be passed by keyword. But if you have a lot of parameters, your function may want to be refactored into multiple smaller functions.
If a function takes multiple parameters of the same type, they probably want to be passed as keyword arguments unless order is completely obvious from context (e.g. src comes before dest).
Most of the time, keyword arguments are not wrong. If you have a case where positional arguments are confusing, you should use keyword arguments without a second thought. With the possible exception of simple one parameter functions, keyword arguments will not make your code any harder to read.
Python has 2 types of arguments1. positional and keyword (aka default). The waters get a little muddy because positional arguments can be called by keyword and keyword arguments can be called by position...
def foo(a, b=1):
print(a, b)
foo(1, 2)
foo(a=1, b=2)
With that said, I think that the names of the types of arguments should indicate how you should (typically) use them. Most of the time, I see positional arguments called by position and keyword arguments called by keyword. So, if you're looking for a general rule of thumb, I'd advise that you make the function call mimic the signature. In the case of our above foo function, I'd call it like this:
foo(1, b=2)
I think that one reason to follow this advice is because (most of the time), people expect keyword arguments to be passed via keyword. So it isn't uncommon for someone to later add a keyword:
def foo(a, aa='1', b=2):
print(a, aa, b)
If you were calling the function using only positional arguments, you'd now be passing a value to a different parameter than you were before. However, keyword arguments don't care what order you pass them, so you should still be all set.
So far, so good. But what rules should you use when you're creating a function? How do you know whether to make an argument a default argument or a positional argument? That's a reasonable question -- And it's hard to find a good rule of thumb. The rules of thumb I use are as follows:
Be consistent with the rest of the project -- It's hard to get it right if you're doing something different than the rest of the surrounding code.
Make an argument a default argument if (and only if) it is possible to supply a reasonable default. If the function will fail if the user doesn't supply a particular argument (because there is no good default), then it should be positional.
1Python3.x also has keyword only arguments. Those don't give you a choice, so I don't know that they add too much to the discussion here :-) -- Though I don't know that I've seen their use out in the wild too much.

Unpacking arguments: only named arguments may follow *expression

The following works beautifully in Python:
def f(x,y,z): return [x,y,z]
a=[1,2]
f(3,*a)
The elements of a get unpacked as if you had called it like f(3,1,2) and it returns [3,1,2]. Wonderful!
But I can't unpack the elements of a into the first two arguments:
f(*a,3)
Instead of calling that like f(1,2,3), I get "SyntaxError: only named arguments may follow *expression".
I'm just wondering why it has to be that way and if there's any clever trick I might not be aware of for unpacking arrays into arbitrary parts of argument lists without resorting to temporary variables.
As Raymond Hettinger's answer points out, this may change has changed in Python 3 and here is a related proposal, which has been accepted.
Especially related to the current question, here's one of the possible changes to that proposal that was discussed:
Only allow a starred expression as the last item in the exprlist. This would simplify the
unpacking code a bit and allow for the starred expression to be assigned an iterator. This
behavior was rejected because it would be too surprising.
So there are implementation reasons for the restriction with unpacking function arguments but it is indeed a little surprising!
In the meantime, here's the workaround I was looking for, kind of obvious in retrospect:
f(*(a+[3]))
It doesn't have to be that way. It was just rule that Guido found to be sensible.
In Python 3, the rules for unpacking have been liberalized somewhat:
>>> a, *b, c = range(10)
>>> a
0
>>> b
[1, 2, 3, 4, 5, 6, 7, 8]
>>> c
9
Depending on whether Guido feels it would improve the language, that liberalization could also be extended to function arguments.
See the discussion on extended iterable unpacking for some thoughts on why Python 3 changed the rules.
Thanks to the PEP 448 - Additional Unpacking Generalizations,
f(*a, 3)
is now accepted syntax starting from Python 3.5. Likewise you can use the double-star ** for keyword argument unpacking anywhere and either one can be used multiple times.
f is expecting 3 arguments (x, y, z, in that order).
Suppose L = [1,2]. When you call f(3, *L), what python does behind the scenes, is to call f(3, 1, 2), without really knowing the length of L.
So what happens if L was instead [1,2,3]?
Then, when you call f(3, *L), you'll end up calling f(3,1,2,3), which will be an error because f is expecting exactly 3 arguments and you gave it 4.
Now, suppose L=[1,2]1. Look at what happens when you callf`:
>>> f(3,*L) # works fine
>>> f(*L) # will give you an error when f(1,2) is called; insufficient arguments
Now, you implicitly know when you call f(*L, 3) that 3 will be assigned to z, but python doesn't know that. It only knows that the last j many elements of the input to f will be defined by the contents of L. But since it doesn't know the value of len(L), it can't make assumptions about whether f(*L,3) would have the correct number of arguments.
This however, is not the case with f(3,*L). In this case, python knows that all the arguments EXCEPT the first one will be defined by the contents of L.
But if you have named arguments f(x=1, y=2, z=3), then the arguments being assigned to by name will be bound first. Only then are the positional arguments bound. So you do f(*L, z=3). In that case, z is bound to 3 first, and then, the other values get bound.
Now interestingly, if you did f(*L, y=3), that would give you an error for trying to assign to y twice (once with the keyword, once again with the positional)
Hope this helps
You can use f(*a, z=3) if you use f(*a, 3), it do not know how to unpack the parameter for you provided 2 parameters and 2 is the second.
Nice. This also works for tuples. Don't forget the comma:
a = (1,2)
f(*(a+(3,)))

Please explain why these two builtin functions behave different when passed in keyword arguments

Consider these different behaviour::
>> def minus(a, b):
>> return a - b
>> minus(**dict(b=2, a=1))
-1
>> int(**dict(base=2, x='100'))
4
>> import operator
>> operator.sub.__doc__
'sub(a, b) -- Same as a - b.'
>> operator.sub(**dict(b=2, a=1))
TypeError: sub() takes no keyword arguments
Why does operator.sub behave differently from int(x, [base]) ?
It is an implementation detail. The Python C API to retrieve arguments separates between positional and keyword arguments. Positional arguments do not even have a name internally.
The code used to retrieve the arguments of the operator.add functions (and similar ones like sub) is this:
PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)
As you can see, it does not contain any argument name. The whole code related to operator.add is:
#define spam2(OP,AOP) static PyObject *OP(PyObject *s, PyObject *a) { \
PyObject *a1, *a2; \
if(! PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)) return NULL; \
return AOP(a1,a2); }
spam2(op_add , PyNumber_Add)
#define spam2(OP,ALTOP,DOC) {#OP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)}, \
{#ALTOP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)},
spam2(add,__add__, "add(a, b) -- Same as a + b.")
As you can see, the only place where a and b are used is in the docstring. The method definition also does not use the METH_KEYWORDS flag which would be necessary for the method to accept keyword arguments.
Generally spoken, you can safely assume that a python-based function where you know an argument name will always accept keyword arguments (of course someone could do nasty stuff with *args unpacking but creating a function doc where the arguments look normal) while C functions may or may not accept keyword arguments. Chances are good that functions with more than a few arguments or optional arguments accept keyword arguments for the later/optional ones. But you pretty much have to test it.
You can find a discussion about supporting keyword arguments everywhere on the python-ideas mailinglist. There is also a statement from Guido van Rossum (the Benevolent Dictator For Life aka the creator of Python) on it:
Hm. I think for many (most?) 1-arg and selected 2-arg functions (and
rarely 3+-arg functions) this would reduce readability, as the example
of ord(char=x) showed.
I would actually like to see a syntactic feature to state that an
argument cannot be given as a keyword argument (just as we already
added syntax to state that it must be a keyword).
One area where I think adding keyword args is outright wrong: Methods
of built-in types or ABCs and that are overridable. E.g. consider the
pop() method on dict. Since the argument name is currently
undocumented, if someone subclasses dict and overrides this method, or
if they create another mutable mapping class that tries to emulate
dict using duck typing, it doesn't matter what the argument name is --
all the callers (expecting a dict, a dict subclass, or a dict-like
duck) will be using positional arguments in the call. But if we were
to document the argument names for pop(), and users started to use
these, then most dict sublcasses and ducks would suddenly be broken
(except if by luck they happened to pick the same name).
operator is a C module, which defines functions differently. Unless the function declaration in the module initialization includes METH_KEYWORDS, the function will not accept keyword arguments under any conditions and you get the error given in the question.
minus(**dict(b=2, a=1)) expands to minus(b=2, a=1). This works because your definition has argument names a and b.
operator.sub(**dict(b=2, a=1)) expands to operator.sub(b=2, a=1). This doesn't work because sub does not accept keyword arguments.

Categories