A friend (fellow low skill level recreational python scripter) asked me to look over some code. I noticed that he had 7 separate statements that basically said.
if ( a and b and c):
do something
the statements a,b,c all tested their equality or lack of to set values. As I looked at it I found that because of the nature of the tests, I could re-write the whole logic block into 2 branches that never went more than 3 deep and rarely got past the first level (making the most rare occurrence test out first).
if a:
if b:
if c:
else:
if c:
else:
if b:
if c:
else:
if c:
To me, logically it seems like it should be faster if you are making less, simpler tests that fail faster and move on.
My real questions are
1) When I say if and else, should the if be true, does the else get completely ignored?
2) In theory would
if (a and b and c)
take as much time as the three separate if statements would?
I would say the single test is as fast as the separate tests. Python also makes use of so called short-circuit evaluation.
That means for (a and b and c), that b or c would not be tested anymore if a is false.
Similar, if you have an OR expression (a or b) and a is true, b is never evaluated.
So to sum up, the clauses don't fail faster with separation.
if statements will skip everything in an else bracket if it evaluates to true. It should be noted that worrying about this sort of problem, unless it's done millions of times per program execution, is called "premature optimization" and should be avoided. If your code is clearer with three if (a and b and c) statements, they should be left in.
Code:
import dis
def foo():
if ( a and b and c):
pass
else:
pass
def bar():
if a:
if b:
if c:
pass
print 'foo():'
dis.dis(foo)
print 'bar():'
dis.dis(bar)
Output:
foo():
4 0 LOAD_GLOBAL 0 (a)
3 JUMP_IF_FALSE 18 (to 24)
6 POP_TOP
7 LOAD_GLOBAL 1 (b)
10 JUMP_IF_FALSE 11 (to 24)
13 POP_TOP
14 LOAD_GLOBAL 2 (c)
17 JUMP_IF_FALSE 4 (to 24)
20 POP_TOP
5 21 JUMP_FORWARD 1 (to 25)
>> 24 POP_TOP
7 >> 25 LOAD_CONST 0 (None)
28 RETURN_VALUE
bar():
10 0 LOAD_GLOBAL 0 (a)
3 JUMP_IF_FALSE 26 (to 32)
6 POP_TOP
11 7 LOAD_GLOBAL 1 (b)
10 JUMP_IF_FALSE 15 (to 28)
13 POP_TOP
12 14 LOAD_GLOBAL 2 (c)
17 JUMP_IF_FALSE 4 (to 24)
20 POP_TOP
13 21 JUMP_ABSOLUTE 29
>> 24 POP_TOP
25 JUMP_ABSOLUTE 33
>> 28 POP_TOP
>> 29 JUMP_FORWARD 1 (to 33)
>> 32 POP_TOP
>> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
So, although the setup is the same, the cleanup for the combined expression is faster since it leaves only a single value on the stack.
At least in python, efficiency is second to readability and "Flat is better than nested".
See The Zen of Python
If you are worried about b or c being functions that are called instead of just variables that are evaluated, then this code shows that short-circuiting is your friend:
a = False
def b():
print "b was called"
return True
if a and b():
print "this shouldn't happen"
else:
print "if b was not called, then short-circuiting works"
prints
if b was not called, then short-circuiting works
But if you have code that does this:
a = call_to_expensive_function_A()
b = call_to_expensive_function_B()
c = call_to_expensive_function_C()
if a and b and c:
do something...
then your code is still calling all 3 expensive functions. Better to let Python be Python:
if (call_to_expensive_function_A() and
call_to_expensive_function_B() and
call_to_expensive_function_C())
do something...
which will only call as many expensive functions as necessary to determine the overall condition.
Edit
You can generalize this using the all built-in:
# note, this is a list of the functions themselves
# the functions are *not* called when creating this list
funcs = [function_A, function_B, function_C]
if all(fn() for fn in funcs):
do something
Now if you have to add other functions, or want to reorder them (maybe function_A is very time-consuming, and you would benefit by filtering cases that fail function_B or function_C first), you just update the funcs list. all does short-circuiting just as if you had spelled out the if as if a and b and c. (If functions are 'or'ed together, use any builtin instead.)
I doubt you'd see a measurable difference so I'd recommend doing whatever makes the code most readable.
if (a and b and c) will fail if a is falsy, and not bother checking b or c.
That said, I personally feel that nested conditionals are easier to read than 2^n combinations of conditionals.
In general, if you want to determine which way of doing something is fastest, you can write a simple benchmark using timeit.
if (a and b and c) is faster and better, for the sake of the "Real Programmer's code optimization" and code readability.
Only to say that the construct
if a:
if b:
if c:
else: #note A
if c:
else:
if b:
if c:
else: #note A
if c:
is not well constructed. in "(a and b and c)" construct the interpreter (and most lang compiled code) simply ignore the conditions following a false value, so the "#note A" else branch shouldn't exists. Writing (a and b and c) do it implicitly.
Related
Given the following example
try:
a, b = 0, 0
for _ in range(100):
a, b = (a+1, b+1)
except KeyboardInterrupt:
assert a == b
could an AssertionError be thrown? If so, is there a way to prevent it, i.e. to ensure that either both of a and b or none is updated on every iteration?
Possibly related to Is Python unpacking thread safe?
Within, the following example and corresponding opcode are given:
>>> def t(self): a,b=20,20
...
>>> dis.dis(t)
1 0 LOAD_CONST 2 ((20, 20))
3 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (a)
9 STORE_FAST 2 (b)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE
Since there are two separate instructions for storing a and b, I would expect that no guarantees can be given whether both or none of the two instructions is executed prior to a KeyboardInterrupt.
Your intuition is correct: while Python will handle the interrupt internally and re-expose it separately (so interrupts are not quite as fraught as they are in C), as noted in e.g. PEP 343:
Even if you write bug-free code, a KeyboardInterrupt exception can still cause it to exit between any two virtual machine opcodes.
Sometimes, some values/strings are hard-coded in functions. For example in the following function, I define a "constant" comparing string and check against it.
def foo(s):
c_string = "hello"
if s == c_string:
return True
return False
Without discussing too much about why it's bad to do this, and how it should be defined in the outer scope, I'm wondering what happens behind the scenes when it is defined this way.
Does the string get created each call?
If instead of the string "hello" it was the list: [1,2,3] (or a list with mutable content if it matters) would the same happen?
Because the string is immutable (as would a tuple), it is stored with the bytecode object for the function. It is loaded by a very simple and fast index lookup. This is actually faster than a global lookup.
You can see this in a disassembly of the bytecode, using the dis.dis() function:
>>> import dis
>>> def foo(s):
... c_string = "hello"
... if s == c_string:
... return True
... return False
...
>>> dis.dis(foo)
2 0 LOAD_CONST 1 ('hello')
3 STORE_FAST 1 (c_string)
3 6 LOAD_FAST 0 (s)
9 LOAD_FAST 1 (c_string)
12 COMPARE_OP 2 (==)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_GLOBAL 0 (True)
21 RETURN_VALUE
5 >> 22 LOAD_GLOBAL 1 (False)
25 RETURN_VALUE
>>> foo.__code__.co_consts
(None, 'hello')
The LOAD_CONST opcode loads the string object from the co_costs array that is part of the code object for the function; the reference is pushed to the top of the stack. The STORE_FAST opcode takes the reference from the top of the stack and stores it in the locals array, again a very simple and fast operation.
For mutable literals ({..}, [..]) special opcodes build the object, with the contents still treated as constants as much as possible (more complex structures just follow the same building blocks):
>>> def bar(): return ['spam', 'eggs']
...
>>> dis.dis(bar)
1 0 LOAD_CONST 1 ('spam')
3 LOAD_CONST 2 ('eggs')
6 BUILD_LIST 2
9 RETURN_VALUE
The BUILD_LIST call creates the new list object, using two constant string objects.
Interesting fact: If you used a list object for a membership test (something in ['option1', 'option2', 'option3'] Python knows the list object will never be mutated and will convert it to a tuple for you at compile time (a so-called peephole optimisation). The same applies to a set literal, which is converted to a frozenset() object, but only in Python 3.2 and newer. See Tuple or list when using 'in' in an 'if' clause?
Note that your sample function is using booleans rather verbosely; you could just have used:
def foo(s):
c_string = "hello"
return s == c_string
for the exact same result, avoiding the LOAD_GLOBAL calls in Python 2 (Python 3 made True and False keywords so the values can also be stored as constants).
This came up in a recent PyCon talk.
The statement
[] = []
does nothing meaningful, but it does not throw an exception either. I have the feeling this must be due to unpacking rules. You can do tuple unpacking with lists too, e.g.,
[a, b] = [1, 2]
does what you would expect. As logical consequence, this also should work, when the number of elements to unpack is 0, which would explain why assigning to an empty list is valid. This theory is further supported by what happens when you try to assign a non-empty list to an empty list:
>>> [] = [1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
I would be happy with this explanation, if the same would also be true for tuples. If we can unpack to a list with 0 elements, we should also be able to unpack to a tuple with 0 elements, no? However:
>>> () = ()
File "<stdin>", line 1
SyntaxError: can't assign to ()
It seems like unpacking rules are not applied for tuples as they are for lists. I cannot think of any explanation for this inconsistency. Is there a reason for this behavior?
The comment by #user2357112 that this seems to be coincidence appears to be correct. The relevant part of the Python source code is in Python/ast.c:
switch (e->kind) {
# several cases snipped
case List_kind:
e->v.List.ctx = ctx;
s = e->v.List.elts;
break;
case Tuple_kind:
if (asdl_seq_LEN(e->v.Tuple.elts)) {
e->v.Tuple.ctx = ctx;
s = e->v.Tuple.elts;
}
else {
expr_name = "()";
}
break;
# several more cases snipped
}
/* Check for error string set by switch */
if (expr_name) {
char buf[300];
PyOS_snprintf(buf, sizeof(buf),
"can't %s %s",
ctx == Store ? "assign to" : "delete",
expr_name);
return ast_error(c, n, buf);
}
tuples have an explicit check that the length is not zero and raise an error when it is. lists do not have any such check, so there's no exception raised.
I don't see any particular reason for allowing assignment to an empty list when it is an error to assign to an empty tuple, but perhaps there's some special case that I'm not considering. I'd suggest that this is probably a (trivial) bug and that the behaviors should be the same for both types.
I decided to try to use dis to figure out what's going on here, when I tripped over something curious:
>>> def foo():
... [] = []
...
>>> dis.dis(foo)
2 0 BUILD_LIST 0
3 UNPACK_SEQUENCE 0
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
>>> def bar():
... () = ()
...
File "<stdin>", line 2
SyntaxError: can't assign to ()
Somehow the Python compiler special-cases an empty tuple on the LHS. This difference varies from the specification, which states:
Assignment of an object to a single target is recursively defined as follows.
...
If the target is a target list enclosed in parentheses or in square brackets: The object must be an iterable with the same number of items as there are targets in the target list, and its items are assigned, from left to right, to the corresponding targets.
So it looks like you've found a legitimate, although ultimately inconsequential, bug in CPython (2.7.8 and 3.4.1 tested).
IronPython 2.6.1 exhibits the same difference, but Jython 2.7b3+ has a stranger behavior, with () = () starting a statement with seemingly no way to end it.
It's a bug.
http://bugs.python.org/issue23275
However, it seems to be harmless so I doubt it would get fixed for fear of breaking working code.
“Assigning to a list” is the wrong way to think about it.
In all cases you are unpacking: The Python interpreter creates an unpacking instruction from all three ways to write it, there are no lists or tuples involved on the left hand side (code courtesy of /u/old-man-prismo):
>>> def f():
... iterable = [1, 2]
... a, b = iterable
... (c, d) = iterable
... [e, f] = iterable
...
>>> from dis import dis
>>> dis(f)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 BUILD_LIST 2
9 STORE_FAST 0 (iterable)
3 12 LOAD_FAST 0 (iterable)
15 UNPACK_SEQUENCE 2
18 STORE_FAST 1 (a)
21 STORE_FAST 2 (b)
4 24 LOAD_FAST 0 (iterable)
27 UNPACK_SEQUENCE 2
30 STORE_FAST 3 (c)
33 STORE_FAST 4 (d)
5 36 LOAD_FAST 0 (iterable)
39 UNPACK_SEQUENCE 2
42 STORE_FAST 5 (e)
45 STORE_FAST 6 (f)
48 LOAD_CONST 0 (None)
51 RETURN_VALUE
As you can see, all three statements are exactly the same.
What unpacking does now is basically:
_iterator = iter(some_iterable)
a = next(_iterator)
b = next(_iterator)
for superfluous_element in _iterator:
# this only happens if there’s something left
raise SyntaxError('Expected some_iterable to have 2 elements')
Analoguously for more or less names on the left side.
Now as #blckknght said: The compiler for some reason checks if the left hand side is an empty tuple and disallows that, but not if it’s an empty list.
It’s only consistent and logical to allow assigning to 0 names: Why not? You basically just assert that the iterable on the right hand side is empty. That opinion also seems to emerge as consensus in the bug report #gecko mentioned: Let’s allow () = iterable.
This is more of a conceptual question. I recently saw a piece of code in Python (it worked in 2.7, and it might also have been run in 2.5 as well) in which a for loop used the same name for both the list that was being iterated over and the item in the list, which strikes me as both bad practice and something that should not work at all.
For example:
x = [1,2,3,4,5]
for x in x:
print x
print x
Yields:
1
2
3
4
5
5
Now, it makes sense to me that the last value printed would be the last value assigned to x from the loop, but I fail to understand why you'd be able to use the same variable name for both your parts of the for loop and have it function as intended. Are they in different scopes? What's going on under the hood that allows something like this to work?
What does dis tell us:
Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dis import dis
>>> dis("""x = [1,2,3,4,5]
... for x in x:
... print(x)
... print(x)""")
1 0 LOAD_CONST 0 (1)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 LOAD_CONST 4 (5)
15 BUILD_LIST 5
18 STORE_NAME 0 (x)
2 21 SETUP_LOOP 24 (to 48)
24 LOAD_NAME 0 (x)
27 GET_ITER
>> 28 FOR_ITER 16 (to 47)
31 STORE_NAME 0 (x)
3 34 LOAD_NAME 1 (print)
37 LOAD_NAME 0 (x)
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
44 JUMP_ABSOLUTE 28
>> 47 POP_BLOCK
4 >> 48 LOAD_NAME 1 (print)
51 LOAD_NAME 0 (x)
54 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
57 POP_TOP
58 LOAD_CONST 5 (None)
61 RETURN_VALUE
The key bits are sections 2 and 3 - we load the value out of x (24 LOAD_NAME 0 (x)) and then we get its iterator (27 GET_ITER) and start iterating over it (28 FOR_ITER). Python never goes back to load the iterator again.
Aside: It wouldn't make any sense to do so, since it already has the iterator, and as Abhijit points out in his answer, Section 7.3 of Python's specification actually requires this behavior).
When the name x gets overwritten to point at each value inside of the list formerly known as x Python doesn't have any problems finding the iterator because it never needs to look at the name x again to finish the iteration protocol.
Using your example code as the core reference
x = [1,2,3,4,5]
for x in x:
print x
print x
I would like you to refer the section 7.3. The for statement in the manual
Excerpt 1
The expression list is evaluated once; it should yield an iterable
object. An iterator is created for the result of the expression_list.
What it means is that your variable x, which is a symbolic name of an object list : [1,2,3,4,5] is evaluated to an iterable object. Even if the variable, the symbolic reference changes its allegiance, as the expression-list is not evaluated again, there is no impact to the iterable object that has already been evaluated and generated.
Note
Everything in Python is an Object, has an Identifier, attributes and methods.
Variables are Symbolic name, a reference to one and only one object at any given instance.
Variables at run-time can change its allegiance i.e. can refer to some other object.
Excerpt 2
The suite is then executed once for each item provided by the
iterator, in the order of ascending indices.
Here the suite refers to the iterator and not to the expression-list. So, for each iteration, the iterator is executed to yield the next item instead of referring to the original expression-list.
It is necessary for it to work this way, if you think about it. The expression for the sequence of a for loop could be anything:
binaryfile = open("file", "rb")
for byte in binaryfile.read(5):
...
We can't query the sequence on each pass through the loop, or here we'd end up reading from the next batch of 5 bytes the second time. Naturally Python must in some way store the result of the expression privately before the loop begins.
Are they in different scopes?
No. To confirm this you could keep a reference to the original scope dictionary (locals()) and notice that you are in fact using the same variables inside the loop:
x = [1,2,3,4,5]
loc = locals()
for x in x:
print locals() is loc # True
print loc["x"] # 1
break
What's going on under the hood that allows something like this to
work?
Sean Vieira showed exactly what is going on under the hood, but to describe it in more readable python code, your for loop is essentially equivalent to this while loop:
it = iter(x)
while True:
try:
x = it.next()
except StopIteration:
break
print x
This is different from the traditional indexing approach to iteration you would see in older versions of Java, for example:
for (int index = 0; index < x.length; index++) {
x = x[index];
...
}
This approach would fail when the item variable and the sequence variable are the same, because the sequence x would no longer be available to look up the next index after the first time x was reassigned to the first item.
With the former approach, however, the first line (it = iter(x)) requests an iterator object which is what is actually responsible for providing the next item from then on. The sequence that x originally pointed to no longer needs to be accessed directly.
It's the difference between a variable (x) and the object it points to (the list). When the for loop starts, Python grabs an internal reference to the object pointed to by x. It uses the object and not what x happens to reference at any given time.
If you reassign x, the for loop doesn't change. If x points to a mutable object (e.g., a list) and you change that object (e.g., delete an element) results can be unpredictable.
Basically, the for loop takes in the list x, and then, storing that as a temporary variable, reassigns a x to each value in that temporary variable. Thus, x is now the last value in the list.
>>> x = [1, 2, 3]
>>> [x for x in x]
[1, 2, 3]
>>> x
3
>>>
Just like in this:
>>> def foo(bar):
... return bar
...
>>> x = [1, 2, 3]
>>> for x in foo(x):
... print x
...
1
2
3
>>>
In this example, x is stored in foo() as bar, so although x is being reassigned, it still exist(ed) in foo() so that we could use it to trigger our for loop.
x no longer refers to the original x list, and so there's no confusion. Basically, python remembers it's iterating over the original x list, but as soon as you start assigning the iteration value (0,1,2, etc) to the name x, it no longer refers to the original x list. The name gets reassigned to the iteration value.
In [1]: x = range(5)
In [2]: x
Out[2]: [0, 1, 2, 3, 4]
In [3]: id(x)
Out[3]: 4371091680
In [4]: for x in x:
...: print id(x), x
...:
140470424504688 0
140470424504664 1
140470424504640 2
140470424504616 3
140470424504592 4
In [5]: id(x)
Out[5]: 140470424504592
A generator function can be defined by putting the yield keyword in the function’s body:
def gen():
for i in range(10):
yield i
How to define an empty generator function?
The following code doesn’t work, since Python cannot know that it is supposed to be a generator function instead of a normal function:
def empty():
pass
I could do something like this:
def empty():
if False:
yield
But that would be very ugly. Is there a nicer way?
You can use return once in a generator; it stops iteration without yielding anything, and thus provides an explicit alternative to letting the function run out of scope. So use yield to turn the function into a generator, but precede it with return to terminate the generator before yielding anything.
>>> def f():
... return
... yield
...
>>> list(f())
[]
I'm not sure it's that much better than what you have -- it just replaces a no-op if statement with a no-op yield statement. But it is more idiomatic. Note that just using yield doesn't work.
>>> def f():
... yield
...
>>> list(f())
[None]
Why not just use iter(())?
This question asks specifically about an empty generator function. For that reason, I take it to be a question about the internal consistency of Python's syntax, rather than a question about the best way to create an empty iterator in general.
If question is actually about the best way to create an empty iterator, then you might agree with Zectbumo about using iter(()) instead. However, it's important to observe that iter(()) doesn't return a function! It directly returns an empty iterable. Suppose you're working with an API that expects a callable that returns an iterable each time it's called, just like an ordinary generator function. You'll have to do something like this:
def empty():
return iter(())
(Credit should go to Unutbu for giving the first correct version of this answer.)
Now, you may find the above clearer, but I can imagine situations in which it would be less clear. Consider this example of a long list of (contrived) generator function definitions:
def zeros():
while True:
yield 0
def ones():
while True:
yield 1
...
At the end of that long list, I'd rather see something with a yield in it, like this:
def empty():
return
yield
or, in Python 3.3 and above (as suggested by DSM), this:
def empty():
yield from ()
The presence of the yield keyword makes it clear at the briefest glance that this is just another generator function, exactly like all the others. It takes a bit more time to see that the iter(()) version is doing the same thing.
It's a subtle difference, but I honestly think the yield-based functions are more readable and maintainable.
See also this great answer from user3840170 that uses dis to show another reason why this approach is preferable: it emits the fewest instructions when compiled.
iter(())
You don't require a generator. C'mon guys!
Python 3.3 (because I'm on a yield from kick, and because #senderle stole my first thought):
>>> def f():
... yield from ()
...
>>> list(f())
[]
But I have to admit, I'm having a hard time coming up with a use case for this for which iter([]) or (x)range(0) wouldn't work equally well.
Another option is:
(_ for _ in ())
Like #senderle said, use this:
def empty():
return
yield
I’m writing this answer mostly to share another justification for it.
One reason for choosing this solution above the others is that it is optimal as far as the interpreter is concerned.
>>> import dis
>>> def empty_yield_from():
... yield from ()
...
>>> def empty_iter():
... return iter(())
...
>>> def empty_return():
... return
... yield
...
>>> def noop():
... pass
...
>>> dis.dis(empty_yield_from)
2 0 LOAD_CONST 1 (())
2 GET_YIELD_FROM_ITER
4 LOAD_CONST 0 (None)
6 YIELD_FROM
8 POP_TOP
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
>>> dis.dis(empty_iter)
2 0 LOAD_GLOBAL 0 (iter)
2 LOAD_CONST 1 (())
4 CALL_FUNCTION 1
6 RETURN_VALUE
>>> dis.dis(empty_return)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
>>> dis.dis(noop)
2 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
As we can see, the empty_return has exactly the same bytecode as a regular empty function; the rest perform a number of other operations that don’t change the behaviour anyway. The only difference between empty_return and noop is that the former has the generator flag set:
>>> dis.show_code(noop)
Name: noop
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
>>> dis.show_code(empty_return)
Name: empty_return
Filename: <stdin>
Argument count: 0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: OPTIMIZED, NEWLOCALS, GENERATOR, NOFREE
Constants:
0: None
The above disassembly is outdated as of CPython 3.11, but empty_return still comes out on top, with only two more opcodes (four bytes) than a no-op function:
>>> dis.dis(empty_yield_from)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 1 (())
8 GET_YIELD_FROM_ITER
10 LOAD_CONST 0 (None)
>> 12 SEND 3 (to 20)
14 YIELD_VALUE
16 RESUME 2
18 JUMP_BACKWARD_NO_INTERRUPT 4 (to 12)
>> 20 POP_TOP
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
>>> dis.dis(empty_iter)
1 0 RESUME 0
2 2 LOAD_GLOBAL 1 (NULL + iter)
14 LOAD_CONST 1 (())
16 PRECALL 1
20 CALL 1
30 RETURN_VALUE
>>> dis.dis(empty_return)
1 0 RETURN_GENERATOR
2 POP_TOP
4 RESUME 0
2 6 LOAD_CONST 0 (None)
8 RETURN_VALUE
>>> dis.dis(noop)
1 0 RESUME 0
2 2 LOAD_CONST 0 (None)
4 RETURN_VALUE
Of course, the strength of this argument is very dependent on the particular implementation of Python in use; a sufficiently smart alternative interpreter may notice that the other operations amount to nothing useful and optimise them out. However, even if such optimisations are present, they require the interpreter to spend time performing them and to safeguard against optimisation assumptions being broken, like the iter identifier at global scope being rebound to something else (even though that would most likely indicate a bug if it actually happened). In the case of empty_return there is simply nothing to optimise, as bytecode generation stops after a return statement, so even the relatively naïve CPython will not waste time on any spurious operations.
Must it be a generator function? If not, how about
def f():
return iter(())
The "standard" way to make an empty iterator appears to be iter([]).
I suggested to make [] the default argument to iter(); this was rejected with good arguments, see http://bugs.python.org/issue25215
- Jurjen
I want to give a class based example since we haven't had any suggested yet. This is a callable iterator that generates no items. I believe this is a straightforward and descriptive way to solve the issue.
class EmptyGenerator:
def __iter__(self):
return self
def __next__(self):
raise StopIteration
>>> list(EmptyGenerator())
[]
generator = (item for item in [])
Nobody has mentioned it yet, but calling the built-in function zip with no arguments returns an empty iterator:
>>> it = zip()
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration