Related
In Python, I have many times seen the yield function used to create a generator. Both this and the print function technically both perform the action of methods because they return a value. However, during the change from Python 2 to Python 3, the print function gained parentheses like a normal method call, but yield stayed the same. Also, yield gains a yellowish color of a reserved keyword while print is the purple of a reserved method. Why is yield not considered a method and colored this way along with not using parentheses syntax?
(In a similar vein, why does return also lack parentheses?)
Let me add some more stuff, yield and continue are not given parentheses in many other languages as well. I just wanted to know what makes it different other than it is reserved. There are many other reserved methods out there which get parentheses.
So I went digging for an answer. And it turns out, there is one. From PEP 255, the pep that gave us the yield keyword
Q. Why a new keyword for "yield"? Why not a builtin function instead?
A. Control flow is much better expressed via keyword in Python, and
yield is a control construct. It's also believed that efficient
implementation in Jython requires that the compiler be able to
determine potential suspension points at compile-time, and a new
keyword makes that easy. The CPython referrence implementation also
exploits it heavily, to detect which functions are generator-
functions (although a new keyword in place of "def" would solve that
for CPython -- but people asking the "why a new keyword?" question
don't want any new keyword).
Q: Then why not some other special syntax without a new keyword? For
example, one of these instead of "yield 3":
return 3 and continue
return and continue 3
return generating 3
continue return 3
return >> , 3
from generator return 3
return >> 3
return << 3
>> 3
<< 3
* 3
A: Did I miss one ? Out of hundreds of messages, I counted three
suggesting such an alternative, and extracted the above from them.
It would be nice not to need a new keyword, but nicer to make yield
very clear -- I don't want to have to deduce that a yield is
occurring from making sense of a previously senseless sequence of
keywords or operators. Still, if this attracts enough interest,
proponents should settle on a single consensus suggestion, and Guido
will Pronounce on it.
print wasn't a function that gained parentheses: it went from being a statement to being a function. yield is still a statement, like return. Syntax highlighting is specific to your development environment.
You can find more information about the difference between expressions and statements here, and more about the difference between functions and statements here. Also see the documentation on simple statements and compound statements.
yield is not a function, its an keyword, and it does not require parenthesis according to its grammar -
yield_atom ::= "(" yield_expression ")"
yield_expression ::= "yield" [expression_list]
print used to be a statement in Python 2 , but it was changed to being a built-in function in Python 3 using PEP 3105
print was a keyword defined by the language specification in Python 2, and became a builtin function (defined by the standard library specification) Python 3. yield was, and still is, a keyword.
I'm constantly wrapping my str.join() arguments in a list, e.g.
'.'.join([str_one, str_two])
The extra list wrapper always seems superfluous to me. I'd like to do...
'.'.join(str_one, str_two, str_three, ...)
... or if I have a list ...
'.'.join(*list_of_strings)
Yes I'm a minimalist, yes I'm picky, but mostly I'm just curious about the history here, or whether I'm missing something. Maybe there was a time before splats?
Edit:
I'd just like to note that max() handles both versions:
max(iterable[, key])
max(arg1, arg2, *args[, key])
For short lists this won't matter and it costs you exactly 2 characters to type. But the most common use-case (I think) for str.join() is following:
''.join(process(x) for x in some_input)
# or
result = []
for x in some_input:
result.append(process(x))
''.join(result)
where input_data can have thousand of entries and you just want to generate the output string efficiently.
If join accepted variable arguments instead of an iterable, this would have to be spelled as:
''.join(*(process(x) for x in some_input))
# or
''.join(*result)
which would create a (possibly long) tuple, just to pass it as *args.
So that's 2 characters in a short case vs. being wasteful in large data case.
History note
(Second Edit: based on HISTORY file which contains missing release from all releases. Thanks Don.)
The *args in function definitions were added in Python long time ago:
==> Release 0.9.8 (9 Jan 1993) <==
Case (a) was needed to accommodate variable-length argument lists;
there is now an explicit "varargs" feature (precede the last argument
with a '*'). Case (b) was needed for compatibility with old class
definitions: up to release 0.9.4 a method with more than one argument
had to be declared as "def meth(self, (arg1, arg2, ...)): ...".
A proper way to pass a list to such functions was using a built-in function apply(callable, sequence). (Note, this doesn't mention **kwargs which can be first seen in docs for version 1.4).
The ability to call a function with * syntax is first mentioned in release notes for 1.6:
There's now special syntax that you can use instead of the apply()
function. f(*args, **kwds) is equivalent to apply(f, args, kwds). You
can also use variations f(a1, a2, *args, **kwds) and you can leave one
or the other out: f(args), f(*kwds).
But it's missing from grammar docs until version 2.2.
Before 2.0 str.join() did not even exists and you had to do from string import join.
You'd have to write your own function to do that.
>>> def my_join(separator, *args):
return separator.join(args)
>>> my_join('.', '1', '2', '3')
'1.2.3'
Note that this doesn't avoid the creation of an extra object, it just hides that an extra object is being created. If you inspect the type of args, you'll see that it's a tuple.
If you don't want to create a function and you have a fixed list of strings then it would be possible to use format instead of join:
'{}.{}.{}.{}'.format(str_one, str_two, str_three, str_four)
It's better to just stick with '.'.join((a, b, c)).
Argh, now this is a hard question! Try arguing which style is more minimalist... Hard to give a good answer without being too subjective, since it's all about convention.
The problem is: We have a function that accepts an ordered collection; should it accept it as a single argument or as a variable-length argument list?
Python usually answers: Single argument; VLAL if you really have a reason to. Let's see how Python libs reflect this:
The standard library has a couple examples for VLAL, most notably:
when the function can be called with an arbitrary number of separate sequences - like zip or map or itertools.chain,
when there's one sequence to pass, but you don't really expect the caller to have the whole of it as a single variable. This seems to fit str.format.
And the common case for using a single argument:
When you want to do some generic data processing on a single sequence. This fits the functional trio (map*, reduce, filter), and specialized spawns of thereof, like sum or str.join. Also stateful transforms like enumerate.
The pattern is "consume an interable, give another iterable" or "consume an iterable, give a result".
Hope this answers your question.
Note: map is technically var-arg, but the common use case is just map(func, sequence) -> sequence which falls into one bucket with reduce and filter.
*The obscure case, map(func, *sequences) is conceptually like map(func, izip_longest(sequences)) - and the reason for zips to follow the var-arg convention was explained before.
I Hope you follow my thinking here; after all it's all a matter of programming style, I'm just pointing at some patterns in Python's library functions.
Consider these different behaviour::
>> def minus(a, b):
>> return a - b
>> minus(**dict(b=2, a=1))
-1
>> int(**dict(base=2, x='100'))
4
>> import operator
>> operator.sub.__doc__
'sub(a, b) -- Same as a - b.'
>> operator.sub(**dict(b=2, a=1))
TypeError: sub() takes no keyword arguments
Why does operator.sub behave differently from int(x, [base]) ?
It is an implementation detail. The Python C API to retrieve arguments separates between positional and keyword arguments. Positional arguments do not even have a name internally.
The code used to retrieve the arguments of the operator.add functions (and similar ones like sub) is this:
PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)
As you can see, it does not contain any argument name. The whole code related to operator.add is:
#define spam2(OP,AOP) static PyObject *OP(PyObject *s, PyObject *a) { \
PyObject *a1, *a2; \
if(! PyArg_UnpackTuple(a,#OP,2,2,&a1,&a2)) return NULL; \
return AOP(a1,a2); }
spam2(op_add , PyNumber_Add)
#define spam2(OP,ALTOP,DOC) {#OP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)}, \
{#ALTOP, op_##OP, METH_VARARGS, PyDoc_STR(DOC)},
spam2(add,__add__, "add(a, b) -- Same as a + b.")
As you can see, the only place where a and b are used is in the docstring. The method definition also does not use the METH_KEYWORDS flag which would be necessary for the method to accept keyword arguments.
Generally spoken, you can safely assume that a python-based function where you know an argument name will always accept keyword arguments (of course someone could do nasty stuff with *args unpacking but creating a function doc where the arguments look normal) while C functions may or may not accept keyword arguments. Chances are good that functions with more than a few arguments or optional arguments accept keyword arguments for the later/optional ones. But you pretty much have to test it.
You can find a discussion about supporting keyword arguments everywhere on the python-ideas mailinglist. There is also a statement from Guido van Rossum (the Benevolent Dictator For Life aka the creator of Python) on it:
Hm. I think for many (most?) 1-arg and selected 2-arg functions (and
rarely 3+-arg functions) this would reduce readability, as the example
of ord(char=x) showed.
I would actually like to see a syntactic feature to state that an
argument cannot be given as a keyword argument (just as we already
added syntax to state that it must be a keyword).
One area where I think adding keyword args is outright wrong: Methods
of built-in types or ABCs and that are overridable. E.g. consider the
pop() method on dict. Since the argument name is currently
undocumented, if someone subclasses dict and overrides this method, or
if they create another mutable mapping class that tries to emulate
dict using duck typing, it doesn't matter what the argument name is --
all the callers (expecting a dict, a dict subclass, or a dict-like
duck) will be using positional arguments in the call. But if we were
to document the argument names for pop(), and users started to use
these, then most dict sublcasses and ducks would suddenly be broken
(except if by luck they happened to pick the same name).
operator is a C module, which defines functions differently. Unless the function declaration in the module initialization includes METH_KEYWORDS, the function will not accept keyword arguments under any conditions and you get the error given in the question.
minus(**dict(b=2, a=1)) expands to minus(b=2, a=1). This works because your definition has argument names a and b.
operator.sub(**dict(b=2, a=1)) expands to operator.sub(b=2, a=1). This doesn't work because sub does not accept keyword arguments.
Why does python only allow named arguments to follow a tuple unpacking expression in a function call?
>>> def f(a,b,c):
... print a, b, c
...
>>> f(*(1,2),3)
File "<stdin>", line 1
SyntaxError: only named arguments may follow *expression
Is it simply an aesthetic choice, or are there cases where allowing this would lead to some ambiguities?
i am pretty sure that the reason people "naturally" don't like this is because it makes the meaning of later arguments ambiguous, depending on the length of the interpolated series:
def dangerbaby(a, b, *c):
hug(a)
kill(b)
>>> dangerbaby('puppy', 'bug')
killed bug
>>> cuddles = ['puppy']
>>> dangerbaby(*cuddles, 'bug')
killed bug
>>> cuddles.append('kitten')
>>> dangerbaby(*cuddles, 'bug')
killed kitten
you cannot tell from just looking at the last two calls to dangerbaby which one works as expected and which one kills little kitten fluffykins.
of course, some of this uncertainty is also present when interpolating at the end. but the confusion is constrained to the interpolated sequence - it doesn't affect other arguments, like bug.
[i made a quick search to see if i could find anything official. it seems that the * prefix for varags was introduced in python 0.9.8. the previous syntax is discussed here and the rules for how it worked were rather complex. since the addition of extra arguments "had to" happen at the end when there was no * marker it seems like that simply carried over. finally there's a mention here of a long discussion on argument lists that was not by email.]
I suspect that it's for consistency with the star notation in function definitions, which is after all the model for the star notation in function calls.
In the following definition, the parameter *c will slurp all subsequent non-keyword arguments, so obviously when f is called, the only way to pass a value for d will be as a keyword argument.
def f(a, b, *c, d=1):
print "slurped", len(c)
(Such "keyword-only parameters" are only supported in Python 3. In Python 2 there is no way to assign values after a starred argument, so the above is illegal.)
So, in a function definition the starred argument must follow all ordinary positional arguments. What you observed is that the same rule has been extended to function calls. This way, the star syntax is consistent for function declarations and function calls.
Another parallelism is that you can only have one (single-)starred argument in a function call. The following is illegal, though one could easily imagine it being allowed.
f(*(1,2), *(3,4))
First of all, it is simple to provide a very similar interface yourself using a wrapper function:
def applylast(func, arglist, *literalargs):
return func(*(literalargs + arglist))
applylast(f, (1, 2), 3) # equivalent to f(3, 1, 2)
Secondly, enhancing the interpreter to support your syntax natively might add overhead to the very performance-critical activity of function application. Even if it only requires a few extra instructions in compiled code, due to the high usage of those routines, that might constitute an unacceptable performance penalty in exchange for a feature that is not called for all that often and easily accommodated in a user library.
Some observations:
Python processes positional arguments before keyword arguments (f(c=3, *(1, 2)) in your example still prints 1 2 3). This makes sense as (i) most arguments in function calls are positional and (ii) the semantics of a programming language need to be unambiguous (i.e., a choice needs to be made either way on the order in which to process positional and keyword arguments).
If we did have a positional argument to the right in a function call, it would be difficult to define what that means. If we call f(*(1, 2), 3), should that be f(1, 2, 3) or f(3, 1, 2) and why would either choice make more sense than the other?
For an official explanation, PEP 3102 provides a lot of insight on how function definitions work. The star (*) in a function definition indicates the end of position arguments (section Specification). To see why, consider: def g(a, b, *c, d). There's no way to provide a value for d other than as a keyword argument (positional arguments would be 'grabbed' by c).
It's important to realize what this means: as the star marks the end of positional arguments, that means all positional arguments must be in that position or to the left of it.
change the order:
def f(c,a,b):
print(a,b,c)
f(3,*(1,2))
If you have a Python 3 keyword-only parameter, like
def f(*a, b=1):
...
then you might expect something like f(*(1, 2), 3) to set a to (1 , 2) and b to 3, but of course, even if the syntax you want were allowed, it would not, because keyword-only parameters must be keyword-only, like f(*(1, 2), b=3). If it were allowed, I suppose it would have to set a to (1, 2, 3) and leave b as the default 1. So it's perhaps not syntactic ambiguity so much as ambiguity in what is expected, which is something Python greatly tries to avoid.
What is the correct name for operator *, as in function(*args)? unpack, unzip, something else?
In Ruby and Perl 6 this has been called "splat", and I think most people from
those communities will figure out what you mean if you call it that.
The Python tutorial uses the phrase "unpacking argument lists", which is
long and descriptive.
It is also referred to as iterable unpacking, or in the case of **,
dictionary unpacking.
I call it "positional expansion", as opposed to ** which I call "keyword expansion".
The Python Tutorial simply calls it 'the *-operator'. It performs unpacking of arbitrary argument lists.
I say "star-args" and Python people seem to know what i mean.
** is trickier - I think just "qargs" since it is usually used as **kw or **kwargs
One can also call * a gather parameter (when used in function arguments definition) or a scatter operator (when used at function invocation).
As seen here: Think Python/Tuples/Variable-length argument tuples.
I believe it's most commonly called the "splat operator." Unpacking arguments is what it does.
The technical term for this is a Variadic function. So in a sense, that's the correct term without regard to programming language.
That said, in different languages the term does have legitimate names. As others have mentioned, it is called "splat" in ruby, julia, and several other languages and is noted by that name in official documentation. In javascript it is called the "spread" syntax. It has many other names in many other languages, as mentioned in other answers. Whatever you call it, it's quite useful!
For a colloquial name there is "splatting".
For arguments (list type) you use single * and for keyword arguments (dictionary type) you use double **.
Both * and ** is sometimes referred to as "splatting".
See for reference of this name being used:
https://stackoverflow.com/a/47875892/14305096
I call *args "star args" or "varargs" and **kwargs "keyword args".