I read here the following example:
>>> def double_inputs():
... while True: # Line 1
... x = yield # Line 2
... yield x * 2 # Line 3
...
>>> gen = double_inputs()
>>> next(gen) # Run up to the first yield
>>> gen.send(10) # goes into 'x' variable
If I understand the above correctly, it seems to imply that Python actually waits until next(gen) to "run up to" to Line 2 in the body of the function. Put another way, the interpreter would not start executing the body of the function until we call next.
Is that actually correct?
To my knowledge, Python does not do AOT compilation, and it doesn't "look ahead" much except for parsing the code and making sure it's valid Python. Is this correct?
If the above are true, how would Python know when I invoke double_inputs() that it needs to wait until I call next(gen) before it even enters the loop while True?
Correct. Calling double_inputs never executes any of the code; it simply returns a generator object. The presence of the yield expression in the body, discovered when the def statement is parsed, changes the semantics of the def statement to create a generator object rather than a function object.
The function contains yield is a generator.
When you call gen = double_inputs(), you get a generator instance as the result. You need to consume this generator by calling next.
So for your first question, it is true. It runs lines 1, 2, 3 when you first call next.
For your second question, I don't exactly get your point. When you define the function, Python knows what you are defining, it doesn't need to look ahead when running it.
For your third question, the key is yield key word.
Generator-function is de iure a function, but de facto it is an iterator, i.e. a class (with implemented __next__(), __iter()__, and some other methods.)
In other words, it is a class disguised as a function.
It means, that “calling” this function is in reality making an instance of this class, and explains, why the “called function” does initially nothing. This is the answer to your 3rd question.
The answer to your 1st question is surprisingly no.
Instances always wait for calling its methods, and the __next__() method (indirectly launched by calling the next() build-in function) is not the only method of generators. Other method is the .send(), and you may use gen.send(None) instead of your next(gen).
The answer to your 2nd question is no. Python interpreter by no mean "look ahead" and there are no exceptions, including your
... except for parsing the code and making sure it's valid Python.
Or the answer to this question is yes, if you mean “parsing only up to the next command”. ;-)
Related
I am creating a generator in python 3 which may yield a single value or more.
The condition that I wanted is, I want to loop with this iterator starting at the second value and so on, running an API request function with that value. If the generator yield only a single value, the for loop and corresponding code is not needed to be executed. If the generator yield more than one value, the function inside the for-loop will be executed starting from the second value of generator and so on.
The reason why I want to start at the second value is because the first value is already accessed for the API request and its result has been stored.
My question is related to a generator that produce a single value.
I give the code example below: (I simplified API Request with print() function):
def iterexample(): # creating a simple iterator that return a single value
yield 0
print(0)
iter = iterexample()
next(iter) #generator is consumed once here
for i in iter: #1 generator is exhausted
print(i, ' inside loop') #2 this is skipped because the generator is exhausted
#3 rest of the code outside the loop will be executed
It returns what I expected: only 0 is printed, not "0 inside loop"
0
My question is:
Is it the safest and the most pythonic way to do that? will it raise
any error?
Will it produce infinite loop? I am very afraid if it will result as
infinite loop of API request.
Please review my #1 ~ #3 comment in above codes, are my
understanding correct?
Thanks for the response and the help. Cheers!
1 Is it the safest and the most pythonic way to do that? will it raise any error?
Once a generator is exhausted, it will continually raise StopIteration exceptions when asked for new values. For loops can handle this case by terminating the loop when this exception is raised, which makes it safe to pass an exhausted generator to a for loop constructor.
However, your code calls next directly, and is therefore only safe only if it also handle StopIteration exceptions. In this case you would need to document that the generator provided must produce 1 or more values or be tolerant of the empty case. If the generator returned no values, then you would get an error. e.g.
def iterexample():
while False:
yield 0
print(next(iterexample()))
Traceback (most recent call last):
File "test.py", line 5, in <module>
print(next(iterexample()))
StopIteration
To prevent against empty generators you can use the second optional default argument to next.
print(next(iterexample(), "default"))
default
2 Will it produce infinite loop? I am very afraid if it will result as infinite loop of API request.
Again this depends on the generator. Generators do not need to have an end value. You can easily define non-ending generators like this:
def iterexample():
i = 0
while True:
yield i
i += 1
for i in iterexample(): #This never ends.
print(i)
If this is a concern for you, one way to prevent never ending outputs would be to use an islice that cuts off your generator after so many values are consumed:
from itertools import islice
for i in islice(iterexample(), 5):
print(i)
0
1
2
3
4
If I understand correctly your issue: you have a first value that you need for a case, and the rest for another case.
I would recommend building a structure that fits your needs, something like this:
class MyStructrue:
def __init__(self, initial_data):
if not initial_data:
# Make sure your data structure is valid before using it
raise ValueErro("initial_data is empty")
self.initial_data = initial_data
#property
def cached_value(self):
return self.initial_data[0]
#property
def all_but_first(self):
return self.initial_data[1:]
In this case, you make sure your data is valid, and you can give your accessors names that reflects what you those value are representing. In this example, I gave them dummy names, but you should try to make something that is relevant to your business.
Such a class could be used this way (changed names just to illustrate how method naming can document your code):
tasks = TaskQueue(get_input_in_some_way())
advance_task_status(tasks.current_task)
for pending_task in tasks.pending_tasks:
log_remaining_time(pending_tasks)
You should first try to understand what your datastructure represents and build a useful api that hide the implementation to better reflect your business.
I had some trouble finding my error: I had written
myfile.close
instead of
myfile.close()
I am surprised and somewhat unhappy that python did not object; how come? BTW the file was NOT closed.
(python 2.7 on Ubuntu)
In python methods are first class objects, you can write something like that:
my_close = myfile.close
my_close()
Since expressions don't have to be assigned to some variable
2 + 3
a simple
myfile.close
is valid, too.
That is because myfile.close returns a method.
If you do print(myfile.close) you will get an output like:
<built-in method close of file object at 0x7fb672e0f540>
Python file object has built-in method close() to handle file object.
In your case myfile.close will return the method object reference but it is not being called.
This example will may be helpful to understand this:
>>> def test():
... print 'test print'
...
>>> a = test
>>> print(a)
<function test at 0x7f6ff1f6ab90>
>>>
myfile.close is a member function of a file-like object. Your code just 'mentions' this function, but doesn't call it (this is why the file wasn't closed).
It's the same as doing this:
a = 5
a # OK but does nothing
def test():
return 256
test # also not an error, but useless in this case
Now, you may say, why even allow this if it's totally useless? Well, not quite. With this you can pass functions as arguments to other functions, for example.
def MyMap(function, iterable):
for x in iterable:
yield function(x)
print(list(MyMap(str, range(5))))
fp.close is method of file object.
>>> fp.close
<built-in method close of file object at 0xb749c1d8>
fp.close() is method call by using ()
>>> fp.close()
You can understand in more details by following Demo.
We define test function and we is test and test() to see difference.
>>> def test():
... return 1
...
>>> test
<function test at 0xb742bb54>
>>> test()
1
Your code does not throw an error because you don't have to assign a name to the result of an expression (edit: or statement). Python allows this because it could be that you only need the expression/statement for some sort of side effect but not the actual return value.
This is often seen when advancing an iterator without assigning a name to the yielded object, i.e. just having a line
next(myiter, None)
in the code.
If you wanted expressions with no sideeffects like
1 + 1
or
myfile.close
to throw an error if there is no name assigned to the result (and therefore the expression is useless) Python would have to inspect each right hand side of an expression for possible sideeffects. Compared to the current behavior of just allowing these expressions, this seems needlessly expensive (and would also be very complex, if not impossible for every expression).
Consider how dynamic Python is. You could override what the + operator does to certain types at runtime.
In the below simplified code, I would like to reuse a loop to do a preparation first and yield the result.
However, the preparation (bar()) function is never executed.
Is yield statement changing the flow of the function?
def bar(*args,**kwargs):
print("ENTER bar")
pass
def foo(prepare=False):
print("ENTER foo")
for x in range(1,10):
if prepare:
bar(x)
else:
yield x
foo(prepare=True)
r = foo(prepare=False)
for x in r:
pass
Because the foo definition contains a yield, it won't run like a normal function even if you call it like one (e.g. foo(prepare=True) ).
Running foo() with whatever arguments will return a generator object, suitable to be iterated through. The body of the definition won't be run until you try and iterate that generator object.
The new coroutine syntax puts a keyword at the start of the definition, so that the change in nature isn't hidden inside the body of the function.
The problem is that having a yield statement changes the function to returning a generator and alters the behavior of the function.
Basically this means that on the call of the .next function of the generator the function executes to the yield or termination of the function (in which case it raises StopIteration exception).
Consequently what you should have done is to ensure that you iterate over it even if the yield statement won't be reached. Like:
r = foo(prepare=True)
for x in r:
pass
In this case the loop will terminate immediately as no yield statement is being reached.
In my opinion, the actual explanation here is that:
Python evaluates if condition lazily!
And I'll explain:
When you call to
foo(prepare=True)
just like that, nothing happens, although you might expected that bar(x) will be executed 10 times. But what really happen is that 'no-one' demanding the return value of foo(prepare=True) call, so the if is not evaluated, but it might if you use the return value from foo.
In the second call to foo, iterating the return value r, python has to evaluate the return value,and it does, and I'll show that:
Case 1
r = foo(prepare=True)
for x in r:
pass
The output here is 'ENTER bar' 9 times. This means that bar is executed 9 times.
Case 2
r = foo(prepare=False)
for x in r:
pass
In this case no 'ENTER bar' is printed, as expected.
To sum everything up, I'll say that:
There are some cases where Python perform Lazy Evaluation, one of them is the if statement.
Not everything is evaluated lazily in Python,
for example:
# builds a big list and immediately discards it
sum([x*x for x in xrange(2000000)])
vs.
# only keeps one value at a time in memory
sum(x*x for x in xrange(2000000))
About lazy and eager evaluation in python, continue read here.
From PEP342:
Because generator-iterators begin execution at the top of the generator's function body, there is no yield expression to receive a value when the generator has just been created. Therefore, calling send() with a non-None argument is prohibited when the generator iterator has just started, ...
For example,
>>> def a():
... for i in range(5):
... print((yield i))
...
>>> g = a()
>>> g.send("Illegal")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started generator
Why is this illegal? The way I understood the use of yield here, it pauses execution of the function, and returns to that spot the next time that next() (or send()) is called. But it seems like it should be legal to print the first result of (yield i)?
Asked a different way, in what state is the generator 'g' directly after g = a(). I assumed that it had run a() up until the first yield, and since there was a yield it returned a generator, instead of a standard synchronous object return.
So why exactly is calling send with non-None argument on a new generator illegal?
Note: I've read the answer to this question, but it doesn't really get to the heart of why it's illegal to call send (with non-None) on a new generator.
Asked a different way, in what state is the generator 'g' directly after g = a(). I assumed that it had run a() up until the first yield, and since there was a yield it returned a generator, instead of a standard synchronous object return.
No. Right after g = a() it is right at the beginning of the function. It does not run up to the first yield until after you advance the generator once (by calling next(g)).
This is what it says in the quote you included in your question: "Because generator-iterators begin execution at the top of the generator's function body..." It also says it in PEP 255, which introduced generators:
When a generator function is called, the actual arguments are bound to function-local formal argument names in the usual way, but no code in the body of the function is executed.
Note that it does not matter whether the yield statement is actually executed. The mere occurrence of yield inside the function body makes the function a generator, as documented:
Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
I was working with generator functions and private functions of a class. I am wondering
Why when yielding (which in my one case was by accident) in __someFunc that this function just appears not to be called from within __someGenerator. Also what is the terminology I want to use when referring to these aspects of the language?
Can the python interpreter warn of such instances?
Below is an example snippet of my scenario.
class someClass():
def __init__(self):
pass
#Copy and paste mistake where yield ended up in a regular function
def __someFunc(self):
print "hello"
#yield True #if yielding in this function it isn't called
def __someGenerator (self):
for i in range(0, 10):
self.__someFunc()
yield True
yield False
def someMethod(self):
func = self.__someGenerator()
while func.next():
print "next"
sc = someClass()
sc.someMethod()
I got burned on this and spent some time trying to figure out why a function just wasn't getting called. I finally discovered I was yielding in function I didn't want to in.
A "generator" isn't so much a language feature, as a name for functions that "yield." Yielding is pretty much always legal. There's not really any way for Python to know that you didn't "mean" to yield from some function.
This PEP http://www.python.org/dev/peps/pep-0255/ talks about generators, and may help you understand the background better.
I sympathize with your experience, but compilers can't figure out what you "meant for them to do", only what you actually told them to do.
I'll try to answer the first of your questions.
A regular function, when called like this:
val = func()
executes its inside statements until it ends or a return statement is reached. Then the return value of the function is assigned to val.
If a compiler recognizes the function to actually be a generator and not a regular function (it does that by looking for yield statements inside the function -- if there's at least one, it's a generator), the scenario when calling it the same way as above has different consequences. Upon calling func(), no code inside the function is executed, and a special <generator> value is assigned to val. Then, the first time you call val.next(), the actual statements of func are being executed until a yield or return is encountered, upon which the execution of the function stops, value yielded is returned and generator waits for another call to val.next().
That's why, in your example, function __someFunc didn't print "hello" -- its statements were not executed, because you haven't called self.__someFunc().next(), but only self.__someFunc().
Unfortunately, I'm pretty sure there's no built-in warning mechanism for programming errors like yours.
Python doesn't know whether you want to create a generator object for later iteration or call a function. But python isn't your only tool for seeing what's going on with your code. If you're using an editor or IDE that allows customized syntax highlighting, you can tell it to give the yield keyword a different color, or even a bright background, which will help you find your errors more quickly, at least. In vim, for example, you might do:
:syntax keyword Yield yield
:highlight yield ctermbg=yellow guibg=yellow ctermfg=blue guifg=blue
Those are horrendous colors, by the way. I recommend picking something better. Another option, if your editor or IDE won't cooperate, is to set up a custom rule in a code checker like pylint. An example from pylint's source tarball:
from pylint.interfaces import IRawChecker
from pylint.checkers import BaseChecker
class MyRawChecker(BaseChecker):
"""check for line continuations with '\' instead of using triple
quoted string or parenthesis
"""
__implements__ = IRawChecker
name = 'custom_raw'
msgs = {'W9901': ('use \\ for line continuation',
('Used when a \\ is used for a line continuation instead'
' of using triple quoted string or parenthesis.')),
}
options = ()
def process_module(self, stream):
"""process a module
the module's content is accessible via the stream object
"""
for (lineno, line) in enumerate(stream):
if line.rstrip().endswith('\\'):
self.add_message('W9901', line=lineno)
def register(linter):
"""required method to auto register this checker"""
linter.register_checker(MyRawChecker(linter))
The pylint manual is available here: http://www.logilab.org/card/pylint_manual
And vim's syntax documentation is here: http://www.vim.org/htmldoc/syntax.html
Because the return keyword is applicable in both generator functions and regular functions, there's nothing you could possibly check (as #Christopher mentions). The return keyword in a generator indicates that a StopIteration exception should be raised.
If you try to return with a value from within a generator (which doesn't make sense, since return just means "stop iteration"), the compiler will complain at compile-time -- this may catch some copy-and-paste mistakes:
>>> def foo():
... yield 12
... return 15
...
File "<stdin>", line 3
SyntaxError: 'return' with argument inside generator
I personally just advise against copy and paste programming. :-)
From the PEP:
Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.
We do this.
Generators have names with "generate" or "gen" in their name. It will have a yield statement in the body. Pretty easy to check visually, since no method is much over 20 lines of code.
Other methods don't have "gen" in their name.
Also, we do not every use __ (double underscore) names under any circumstances. 32,000 lines of code. Non __ names.
The "generator vs. non-generator" method function is entirely a design question. What did the programmer "intend" to happen. The compiler can't easily validate your intent, it can only validate what you actually typed.