Understanding python 2.7 email.feedparser Feedparser __init__ function - python

(So I'm trying to learn python. I figured it would be good to read code by people better than me. I decided to read through the email module...)
The init function for the Feedparser class in the email.feedparser module is defined as:
def __init__(self, _factory=message.Message):
"""_factory is called with no arguments to create a new message obj"""
self._factory = _factory
self._input = BufferedSubFile()
self._msgstack = []
self._parse = self._parsegen().next
self._cur = None
self._last = None
self._headersonly = False
The line I'm having trouble with is:
self._parse = self._parsegen().next
Which I think should mean 'set the attribute self._parse to the value of the next attribute of the return value of the method self._parsegen()
As far as I can tell, self._parsgen() when called during __init__() will first call self._new_message() which will set/add values to self._cur, self._last, and self._msgstack. It will then assign an empty list object to the local variable headers then start iterating over the self._input object. I think the first value for line will be a NeedMoreData object. Since the NeedMoreData class just extends object it should have no attribute or method named next. So does next just refer back to the iterator (self._input)?
Is there any way to have a look at this in the interpreter so that I can step through each line of the script?

So does next just refer back to the iterator (self._input)?
next does refer to the generator. Since the _parsegen() method uses yield, it returns a generator object. Consider the following simple example (from IPython):
In [1]: def a():
...: yield 1
...: yield 2
...:
In [2]: a()
Out[2]: <generator object a at 0x1a56550>
In [3]: a().next
Out[3]: <method-wrapper 'next' of generator object at 0x1a567d0>
In [4]: a().next()
Out[4]: 1
So, yes, you are mostly right. It will fall down to the iterator, and reference the method returning the next value from it.
Is there any way to have a look at this in the interpreter so that I can step through each line of the script?
You can use pdb for that.

The next method is a way to generate the next value of a python iterator or generator. The easiest way to think about this is to rewrite a for-loop.
You have a really easy syntax for looping over a list:
for element in list:
print element
which will produce an element on each iteration. But under the hood, Python is actually doing something akin to this:
iterator = iter(list)
while True:
element = iterator.next()
# do something with element (e.g. print it)
print element
When the iterator is exhausted (has no more items), it raises the StopIteration exception, which is how for loops and other methods employing iterators know when to stop. (so the previous code snippet should really be wrapped in a try/except block, but I figured it would be clearer to read without it).
You can read about the protocol for iterators in the Python docs. (but basically anything can be an iterator if it defines __iter__ and produces an iterator that defines __iter__ and next.

Related

What happen if we do for loop using exhausted generator in Python?

I am creating a generator in python 3 which may yield a single value or more.
The condition that I wanted is, I want to loop with this iterator starting at the second value and so on, running an API request function with that value. If the generator yield only a single value, the for loop and corresponding code is not needed to be executed. If the generator yield more than one value, the function inside the for-loop will be executed starting from the second value of generator and so on.
The reason why I want to start at the second value is because the first value is already accessed for the API request and its result has been stored.
My question is related to a generator that produce a single value.
I give the code example below: (I simplified API Request with print() function):
def iterexample(): # creating a simple iterator that return a single value
yield 0
print(0)
iter = iterexample()
next(iter) #generator is consumed once here
for i in iter: #1 generator is exhausted
print(i, ' inside loop') #2 this is skipped because the generator is exhausted
#3 rest of the code outside the loop will be executed
It returns what I expected: only 0 is printed, not "0 inside loop"
0
My question is:
Is it the safest and the most pythonic way to do that? will it raise
any error?
Will it produce infinite loop? I am very afraid if it will result as
infinite loop of API request.
Please review my #1 ~ #3 comment in above codes, are my
understanding correct?
Thanks for the response and the help. Cheers!
1 Is it the safest and the most pythonic way to do that? will it raise any error?
Once a generator is exhausted, it will continually raise StopIteration exceptions when asked for new values. For loops can handle this case by terminating the loop when this exception is raised, which makes it safe to pass an exhausted generator to a for loop constructor.
However, your code calls next directly, and is therefore only safe only if it also handle StopIteration exceptions. In this case you would need to document that the generator provided must produce 1 or more values or be tolerant of the empty case. If the generator returned no values, then you would get an error. e.g.
def iterexample():
while False:
yield 0
print(next(iterexample()))
Traceback (most recent call last):
File "test.py", line 5, in <module>
print(next(iterexample()))
StopIteration
To prevent against empty generators you can use the second optional default argument to next.
print(next(iterexample(), "default"))
default
2 Will it produce infinite loop? I am very afraid if it will result as infinite loop of API request.
Again this depends on the generator. Generators do not need to have an end value. You can easily define non-ending generators like this:
def iterexample():
i = 0
while True:
yield i
i += 1
for i in iterexample(): #This never ends.
print(i)
If this is a concern for you, one way to prevent never ending outputs would be to use an islice that cuts off your generator after so many values are consumed:
from itertools import islice
for i in islice(iterexample(), 5):
print(i)
0
1
2
3
4
If I understand correctly your issue: you have a first value that you need for a case, and the rest for another case.
I would recommend building a structure that fits your needs, something like this:
class MyStructrue:
def __init__(self, initial_data):
if not initial_data:
# Make sure your data structure is valid before using it
raise ValueErro("initial_data is empty")
self.initial_data = initial_data
#property
def cached_value(self):
return self.initial_data[0]
#property
def all_but_first(self):
return self.initial_data[1:]
In this case, you make sure your data is valid, and you can give your accessors names that reflects what you those value are representing. In this example, I gave them dummy names, but you should try to make something that is relevant to your business.
Such a class could be used this way (changed names just to illustrate how method naming can document your code):
tasks = TaskQueue(get_input_in_some_way())
advance_task_status(tasks.current_task)
for pending_task in tasks.pending_tasks:
log_remaining_time(pending_tasks)
You should first try to understand what your datastructure represents and build a useful api that hide the implementation to better reflect your business.

What happens when you invoke a function that contains yield?

I read here the following example:
>>> def double_inputs():
... while True: # Line 1
... x = yield # Line 2
... yield x * 2 # Line 3
...
>>> gen = double_inputs()
>>> next(gen) # Run up to the first yield
>>> gen.send(10) # goes into 'x' variable
If I understand the above correctly, it seems to imply that Python actually waits until next(gen) to "run up to" to Line 2 in the body of the function. Put another way, the interpreter would not start executing the body of the function until we call next.
Is that actually correct?
To my knowledge, Python does not do AOT compilation, and it doesn't "look ahead" much except for parsing the code and making sure it's valid Python. Is this correct?
If the above are true, how would Python know when I invoke double_inputs() that it needs to wait until I call next(gen) before it even enters the loop while True?
Correct. Calling double_inputs never executes any of the code; it simply returns a generator object. The presence of the yield expression in the body, discovered when the def statement is parsed, changes the semantics of the def statement to create a generator object rather than a function object.
The function contains yield is a generator.
When you call gen = double_inputs(), you get a generator instance as the result. You need to consume this generator by calling next.
So for your first question, it is true. It runs lines 1, 2, 3 when you first call next.
For your second question, I don't exactly get your point. When you define the function, Python knows what you are defining, it doesn't need to look ahead when running it.
For your third question, the key is yield key word.
Generator-function is de iure a function, but de facto it is an iterator, i.e. a class (with implemented __next__(), __iter()__, and some other methods.)
          In other words, it is a class disguised as a function.
It means, that “calling” this function is in reality making an instance of this class, and explains, why the “called function” does initially nothing. This is the answer to your 3rd question.
The answer to your 1st question is surprisingly no.
Instances always wait for calling its methods, and the __next__() method (indirectly launched by calling the next() build-in function) is not the only method of generators. Other method is the .send(), and you may use gen.send(None) instead of your next(gen).
The answer to your 2nd question is no. Python interpreter by no mean "look ahead" and there are no exceptions, including your
... except for parsing the code and making sure it's valid Python.
Or the answer to this question is yes, if you mean “parsing only up to the next command”. ;-)

Questions about "yield from" and "next" behaviour

So I am making a generator from a list but would like to call next on it, which should just return the next item in the list, however it returns the same object, i.e. the whole piece of code is run again in stead of just returning the yield part. The example below shows the expected behaviour when looping through the list, but then the next returns 1 twice, whereas I would like the second call of next to return 2.
class demo:
#property
def mygen(self):
a = [1,2,3,4,5]
b = [6,7,8,9,10]
yield from a
yield from b
if __name__=='__main__':
demo1 = demo()
print([_ for _ in demo1.mygen])
demo2 = demo()
print(next(demo2.mygen))
print(next(demo2.mygen))
There's a reason I am turning a list into a generator as it is the response from an api call and would like to dynamically return the next item in the list and make the api call if it comes to the end of that list.
Every call to the property creates a new generator. You should store the generator returned by the property to a variable. Then you will be able to call next multiple times. Change
print(next(demo2.mygen))
print(next(demo2.mygen)) # calls next on a fresh generator
to
gen = demo2.mygen
print(next(gen))
print(next(gen)) # calls next on the SAME generator
As others have pointed out, this behaviour should have you reconsider making this a property in the first place. Seeing
demo2.mygen()
makes it much more obvious that there is some dynamic stuff going on, while
demo2.mygen
gives the impression of a more static attribute producing the same object every time. You can find some more elaboration on that here.

Why does Python allow mentioning a method without calling it?

I had some trouble finding my error: I had written
myfile.close
instead of
myfile.close()
I am surprised and somewhat unhappy that python did not object; how come? BTW the file was NOT closed.
(python 2.7 on Ubuntu)
In python methods are first class objects, you can write something like that:
my_close = myfile.close
my_close()
Since expressions don't have to be assigned to some variable
2 + 3
a simple
myfile.close
is valid, too.
That is because myfile.close returns a method.
If you do print(myfile.close) you will get an output like:
<built-in method close of file object at 0x7fb672e0f540>
Python file object has built-in method close() to handle file object.
In your case myfile.close will return the method object reference but it is not being called.
This example will may be helpful to understand this:
>>> def test():
... print 'test print'
...
>>> a = test
>>> print(a)
<function test at 0x7f6ff1f6ab90>
>>>
myfile.close is a member function of a file-like object. Your code just 'mentions' this function, but doesn't call it (this is why the file wasn't closed).
It's the same as doing this:
a = 5
a # OK but does nothing
def test():
return 256
test # also not an error, but useless in this case
Now, you may say, why even allow this if it's totally useless? Well, not quite. With this you can pass functions as arguments to other functions, for example.
def MyMap(function, iterable):
for x in iterable:
yield function(x)
print(list(MyMap(str, range(5))))
fp.close is method of file object.
>>> fp.close
<built-in method close of file object at 0xb749c1d8>
fp.close() is method call by using ()
>>> fp.close()
You can understand in more details by following Demo.
We define test function and we is test and test() to see difference.
>>> def test():
... return 1
...
>>> test
<function test at 0xb742bb54>
>>> test()
1
Your code does not throw an error because you don't have to assign a name to the result of an expression (edit: or statement). Python allows this because it could be that you only need the expression/statement for some sort of side effect but not the actual return value.
This is often seen when advancing an iterator without assigning a name to the yielded object, i.e. just having a line
next(myiter, None)
in the code.
If you wanted expressions with no sideeffects like
1 + 1
or
myfile.close
to throw an error if there is no name assigned to the result (and therefore the expression is useless) Python would have to inspect each right hand side of an expression for possible sideeffects. Compared to the current behavior of just allowing these expressions, this seems needlessly expensive (and would also be very complex, if not impossible for every expression).
Consider how dynamic Python is. You could override what the + operator does to certain types at runtime.

TypeError: 'str' object is not callable for append()

#!/usr/bin/python
class List:
list = []
def append(self, append):
print(append())
#self.append = append
def displayList(self, displayList):
print(displayList())
#print(self.append)
def main():
list = List()
list.append('abc')
list.append('def')
list.append('ghi')
list.displayList()
if __name__ == '__main__':
main()
You have a method append (referenced with self.append), with a parameter append, then the method calls the passed argument append. But in your main you call the object's append method and send it a string. Remember that in that method you're calling the passed argument. Since that argument is a string, you can't call it.
The other method in your class, displayList, does the exact same thing as append, only you're calling it with no argument at all, which will also generate an error.
Don't attempt to fix these issues by prepending self (print(self.append()) or print(self.displayList(0))), as that will simply exceed the maximum recursion depth.
Your List class's list is also a class variable, not an instance variable. That will probably result in more problems later on.
I recommend taking a step back and thinking again about what you're trying to do and why. If you're creating this class for fun/education, there are probably better ways to learn. If you're doing it as part of a practical program (i.e., using it as a solution to a particular challenge), you may have an XY Problem.
Guess what? It is because string object is not callable

Categories