Only __iter__ implemented as generator-func, want to use next() - python

I have a class with __iter__ defined like below:
class MyIterator:
def __iter__(self):
for value in self._iterator:
if is_iterator(value):
yield from value
else:
yield value
I want to do next(my_iterator) but I have to implement __next__ to do so. But it would change this simple implementation to a fairly complicated one - or actually I don't know how to implement this instead of defining __iter__ as a generator function.
Generally spealing, if __iter__ is implemented as a generator fuction which might be difficult to be done without generator, how should I do if I want to use __next__?
Note: Apparently, next(iter(my_iterator)) works, but I don't want to do it.

If your class is supposed to be an iterator, it should not have its __iter__ method implemented as a generator function. That makes the class iterable, but not an iterator. An iterator's __iter__ method is supposed to return itself.
If you really want your class to be an iterator, try something like this:
class MyIterator:
def __init__(self, iterator):
self._iterator = iterator
self._subiterator = None
def __iter__(self):
return self
def __next__(self):
while True:
if self._subiterator is None:
value = next(self._iterator) # may raise StopIteration
try: # could test is_iterator(value) here for LBYL style code
self._subiterator = iter(value)
except TypeError:
return value
try:
return next(self._subiterator)
except StopIteraton:
self._subiterator = None
The next(self._iterator) call may raise StopIteration, which I deliberately do not catch. That exception is the signal we're finished iterating, so if we caught it we'd only have to raise it again.
This code uses a "Easier to Ask Forgiveness than Permission" (EAFP) approach to detecting iterable items within the iterator it's been given. It simply tries calling iter on each one and catches the TypeError that will be raised if they're not iterable. If you prefer to stick with the "Look Before You Leap" (LBYL) style and explicitly test with is_iterator (which is badly named, since it checks for any kind of iterable, not only iterators), you could replace the inner try with:
if is_iterator(value):
return value
else:
self._subiterator = iter(value)
I usually prefer EAFP style to LBYL style in my Python code, but there are situations where either one can be better. Other times it's just a matter of style.

As #BrenBarm commented, the apparent answer was to return next(iter(self)), or next(self._iter_self) by keeping self._iter_self = iter(self). I couldn't just come up with it.

an iterator is an object with a __next__ method that returns values until finally raising StopIteration.
an iterable is an object with an __iter__ method that returns an iterator.
a generator is a special iterable created by python when a function or method include the yield statement.
In your example, __iter__ has a yield so it is a generator. And that means it returns another iterable, not an iterator. That's why you have to do that strange next(iter(my_iterator)) thing, and that doesn't work because it restarts the enumation each time.
How best to solve this problem depends on how you are using this class. You could create a generator function instead and then use iter to make iterators as needed:
import collections.abc
def is_iterator(i):
return isinstance(i, collections.abc.Iterable)
def MyIterator(iterable):
for value in iterable:
if is_iterator(value):
yield from value
else:
yield value
test_this = [1,2, [3, 4, 5], [6], [], 7, 'foo']
my_iterator = iter(MyIterator(test_this))
try:
while True:
print(next(my_iterator))
except StopIteration:
pass
Or you can implement __next__ instead of __iter__. But you can't use yield and have to return a value on each call until the outer iterator completes and raises StopIteration.
import collections.abc
class MyIterator:
def __init__(self, iterable):
self._iterator = iter(iterable)
self._iterating = None
def __next__(self):
while True:
if self._iterating is not None:
try:
return next(self._iterating)
except StopIteration:
self._iterating = None
value = next(self._iterator)
if isinstance(value, collections.abc.Iterable):
self._iterating = iter(value)
else:
return value
test_this = [1,2, [3, 4, 5], [6], [], 7, 'foo']
my_iterator = MyIterator(test_this)
try:
while True:
print(next(my_iterator))
except StopIteration:
pass

Related

Randomly sample between multiple generators?

I'm trying to iterate over multiple generators randomly, and skip those that are exhausted by removing them from the list of available generators. However, the CombinedGenerator doesn't call itself like it should to switch generator. Instead it throws a StopIteration when the smaller iterator is exhausted. What am I missing?
The following works:
gen1 = (i for i in range(0, 5, 1))
gen2 = (i for i in range(100, 200, 1))
list_of_gen = [gen1, gen2]
print(list_of_gen)
list_of_gen.remove(gen1)
print(list_of_gen)
list_of_gen.remove(gen2)
print(list_of_gen)
where each generator is removed by their reference.
But here it doesn't:
import random
gen1 = (i for i in range(0, 5, 1))
gen2 = (i for i in range(100, 200, 1))
total = 105
class CombinedGenerator:
def __init__(self, generators):
self.generators = generators
def __call__(self):
generator = random.choice(self.generators)
try:
yield next(generator)
except StopIteration:
self.generators.remove(generator)
if len(self.generators) != 0:
self.__call__()
else:
raise StopIteration
c = CombinedGenerator([gen1, gen2])
for i in range(total):
print(f"iter {i}")
print(f"yielded {next(c())}")
As #Tomerikoo mentioned, you are basically creating your own Generator and it is better to implement __next__ which is cleaner and pythonic way.
The above code can be fixed with below lines.
def __call__(self):
generator = random.choice(self.generators)
try:
yield next(generator)
except StopIteration:
self.generators.remove(generator)
if len(self.generators) != 0:
# yield your self.__call__() result as well
yield next(self.__call__())
else:
raise StopIteration
First of all, in order to fix your current code, you just need to match the pattern you created by changing the line:
self.__call__()
to:
yield next(self.__call__())
Then, I would make a few small changes to your original code:
Instead of implementing __call__ and calling the object, it seems more reasonable to implement __next__ and simply call next on the object.
Instead of choosing the generator, I would choose the index. This mainly serves for avoiding the use of remove which is not so efficient when you can directly access the deleted object.
Personally I prefer to avoid recursion where possible so will change where I check that are still generators to use:
class CombinedGenerator:
def __init__(self, generators):
self.generators = generators
def __next__(self):
while self.generators:
i = random.choice(range(len(self.generators)))
try:
return next(self.generators[i])
except StopIteration:
del self.generators[i]
raise StopIteration
c = CombinedGenerator([gen1, gen2])
for i in range(total):
print(f"iter {i}")
print(f"yielded {next(c)}")
A nice bonus can be to add this to your class:
def __iter__(self):
return self
Which then allows you to directly iterate on the object itself and you don't need the total variable:
for i, num in enumerate(c):
print(f"iter {i}")
print(f"yielded {num}")

Iterable class in python3

I am trying to implement an iterable proxy for a web resource (lazily fetched images).
Firstly, I did (returning ids, in production those will be image buffers)
def iter(ids=[1,2,3]):
for id in ids:
yield id
and that worked nicely, but now I need to keep state.
I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return self
def __next__(self):
for id in self.ids:
yield id
raise StopIteration
test = Test([1,2,3])
for t in test:
print('new value', t)
Output:
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
forever.
What's wrong?
Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.
Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.
But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).
Because you wanted to create an iterable, you can just make __iter__ the generator here:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id
Note that a generator function should not use raise StopIteration, just returning from the function does that for you.
The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:
Iterable -> (call __iter__) -> Iterator
In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:
>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>
A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:
>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2
Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.
Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).
So this is an iterator:
class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value
This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.
You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.
The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:
>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE
The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.
If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:
>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)
You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)
class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value
That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.
A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.
(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).
It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.
class Test:
def __init__(self, ids):
self.ids = iter(ids)
def __iter__(self):
return self
def __next__(self):
return next(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
The simplest solution is to use __iter__ and return an iterator to the main list:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return iter(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
As the update, for lazily loading you can return an iterator to a generator:
def __iter__(self):
return iter(load_file(id) for id in self.ids)
The __next__ function is supposed to return the next value provided by an iterator. Since you have used yield in your implementation, the function returns a generator, which is what you get.
You need to make clear whether you want Test to be an iterable or an iterator. If it is an iterable, it will have the ability to provide an iterator with __iter__. If it is an iterator, it will have the ability to provide new elements with __next__. Iterators can typically work as iterables by returning themselves in __iter__. Martijn's answer shows what you probably want. However, if you want an example of how you could specifically implement __next__ (by making Test explicitly an iterator), it could be something like this:
class Test:
def __init__(self, ids):
self.ids = ids
self.idx = 0
def __iter__(self):
return self
def __next__(self):
if self.idx >= len(self.ids):
raise StopIteration
else:
self.idx += 1
return self.ids[self.idx - 1]
test = Test([1,2,3])
for t in test:
print('new value', t)

How to implement "next" for a dictionary object to be iterable?

I've got the following wrapper for a dictionary:
class MyDict:
def __init__(self):
self.container = {}
def __setitem__(self, key, value):
self.container[key] = value
def __getitem__(self, key):
return self.container[key]
def __iter__(self):
return self
def next(self):
pass
dic = MyDict()
dic['a'] = 1
dic['b'] = 2
for key in dic:
print key
My problem is that I don't know how to implement the next method to make MyDict iterable. Any advice would be appreciated.
Dictionaries are themselves not an iterator (which can only be iterated over once). You usually make them an iterable, an object for which you can produce multiple iterators instead.
Drop the next method altogether, and have __iter__ return an iterable object each time it is called. That can be as simple as just returning an iterator for self.container:
def __iter__(self):
return iter(self.container)
If you must make your class an iterator, you'll have to somehow track a current iteration position and raise StopIteration once you reach the 'end'. A naive implementation could be to store the iter(self.container) object on self the first time __iter__ is called:
def __iter__(self):
return self
def next(self):
if not hasattr(self, '_iter'):
self._iter = iter(self.container)
return next(self._iter)
at which point the iter(self.container) object takes care of tracking iteration position for you, and will raise StopIteration when the end is reached. It'll also raise an exception if the underlying dictionary was altered (had keys added or deleted) and iteration order has been broken.
Another way to do this would be to just store in integer position and index into list(self.container) each time, and simply ignore the fact that insertion or deletion can alter the iteration order of a dictionary:
_iter_index = 0
def __iter__(self):
return self
def next(self):
idx = self._iter_index
if idx is None or idx >= len(self.container):
# once we reach the end, all iteration is done, end of.
self._iter_index = None
raise StopIteration()
value = list(self.container)[idx]
self._iter_index = idx + 1
return value
In both cases your object is then an iterator that can only be iterated over once. Once you reach the end, you can't restart it again.
If you want to be able to use your dict-like object inside nested loops, for example, or any other application that requires multiple iterations over the same object, then you need to implement an __iter__ method that returns a newly-created iterator object.
Python's iterable objects all do this:
>>> [1, 2, 3].__iter__()
<listiterator object at 0x7f67146e53d0>
>>> iter([1, 2, 3]) # A simpler equivalent
<listiterator object at 0x7f67146e5390>
The simplest thing for your objects' __iter__ method to do would be to return an iterator on the underlying dict, like this:
def __iter__(self):
return iter(self.container)
For more detail than you probably will ever require, see this Github repository.

Converting "yield from" statement to Python 2.7 code

I had a code below in Python 3.2 and I wanted to run it in Python 2.7. I did convert it (have put the code of missing_elements in both versions) but I am not sure if that is the most efficient way to do it. Basically what happens if there are two yield from calls like below in upper half and lower half in missing_element function? Are the entries from the two halves (upper and lower) appended to each other in one list so that the parent recursion function with the yield from call and use both the halves together?
def missing_elements(L, start, end): # Python 3.2
if end - start <= 1:
if L[end] - L[start] > 1:
yield from range(L[start] + 1, L[end])
return
index = start + (end - start) // 2
# is the lower half consecutive?
consecutive_low = L[index] == L[start] + (index - start)
if not consecutive_low:
yield from missing_elements(L, start, index)
# is the upper part consecutive?
consecutive_high = L[index] == L[end] - (end - index)
if not consecutive_high:
yield from missing_elements(L, index, end)
def main():
L = [10, 11, 13, 14, 15, 16, 17, 18, 20]
print(list(missing_elements(L, 0, len(L)-1)))
L = range(10, 21)
print(list(missing_elements(L, 0, len(L)-1)))
def missing_elements(L, start, end): # Python 2.7
return_list = []
if end - start <= 1:
if L[end] - L[start] > 1:
return range(L[start] + 1, L[end])
index = start + (end - start) // 2
# is the lower half consecutive?
consecutive_low = L[index] == L[start] + (index - start)
if not consecutive_low:
return_list.append(missing_elements(L, start, index))
# is the upper part consecutive?
consecutive_high = L[index] == L[end] - (end - index)
if not consecutive_high:
return_list.append(missing_elements(L, index, end))
return return_list
If you don't use the results of your yields,* you can always turn this:
yield from foo
… into this:
for bar in foo:
yield bar
There might be a performance cost,** but there is never a semantic difference.
Are the entries from the two halves (upper and lower) appended to each other in one list so that the parent recursion function with the yield from call and use both the halves together?
No! The whole point of iterators and generators is that you don't build actual lists and append them together.
But the effect is similar: you just yield from one, then yield from another.
If you think of the upper half and the lower half as "lazy lists", then yes, you can think of this as a "lazy append" that creates a larger "lazy list". And if you call list on the result of the parent function, you of course will get an actual list that's equivalent to appending together the two lists you would have gotten if you'd done yield list(…) instead of yield from ….
But I think it's easier to think of it the other way around: What it does is exactly the same the for loops do.
If you saved the two iterators into variables, and looped over itertools.chain(upper, lower), that would be the same as looping over the first and then looping over the second, right? No difference here. In fact, you could implement chain as just:
for arg in *args:
yield from arg
* Not the values the generator yields to its caller, the value of the yield expressions themselves, within the generator (which come from the caller using the send method), as described in PEP 342. You're not using these in your examples. And I'm willing to bet you're not in your real code. But coroutine-style code often uses the value of a yield from expression—see PEP 3156 for examples. Such code usually depends on other features of Python 3.3 generators—in particular, the new StopIteration.value from the same PEP 380 that introduced yield from—so it will have to be rewritten. But if not, you can use the PEP also shows you the complete horrid messy equivalent, and you can of course pare down the parts you don't care about. And if you don't use the value of the expression, it pares down to the two lines above.
** Not a huge one, and there's nothing you can do about it short of using Python 3.3 or completely restructuring your code. It's exactly the same case as translating list comprehensions to Python 1.5 loops, or any other case when there's a new optimization in version X.Y and you need to use an older version.
Replace them with for-loops:
yield from range(L[start] + 1, L[end])
==>
for i in range(L[start] + 1, L[end]):
yield i
The same about elements:
yield from missing_elements(L, index, end)
==>
for el in missing_elements(L, index, end):
yield el
I just came across this issue and my usage was a bit more difficult since I needed the return value of yield from:
result = yield from other_gen()
This cannot be represented as a simple for loop but can be reproduced with this:
_iter = iter(other_gen())
try:
while True: #broken by StopIteration
yield next(_iter)
except StopIteration as e:
if e.args:
result = e.args[0]
else:
result = None
Hopefully this will help people who come across the same problem. :)
What about using the definition from pep-380 in order to construct a Python 2 syntax version:
The statement:
RESULT = yield from EXPR
is semantically equivalent to:
_i = iter(EXPR)
try:
_y = next(_i)
except StopIteration as _e:
_r = _e.value
else:
while 1:
try:
_s = yield _y
except GeneratorExit as _e:
try:
_m = _i.close
except AttributeError:
pass
else:
_m()
raise _e
except BaseException as _e:
_x = sys.exc_info()
try:
_m = _i.throw
except AttributeError:
raise _e
else:
try:
_y = _m(*_x)
except StopIteration as _e:
_r = _e.value
break
else:
try:
if _s is None:
_y = next(_i)
else:
_y = _i.send(_s)
except StopIteration as _e:
_r = _e.value
break
RESULT = _r
In a generator, the statement:
return value
is semantically equivalent to
raise StopIteration(value)
except that, as currently, the exception cannot be caught by except clauses within the returning generator.
The StopIteration exception behaves as though defined thusly:
class StopIteration(Exception):
def __init__(self, *args):
if len(args) > 0:
self.value = args[0]
else:
self.value = None
Exception.__init__(self, *args)
I think I found a way to emulate Python 3.x yield from construct in Python 2.x. It's not efficient and it is a little hacky, but here it is:
import types
def inline_generators(fn):
def inline(value):
if isinstance(value, InlineGenerator):
for x in value.wrapped:
for y in inline(x):
yield y
else:
yield value
def wrapped(*args, **kwargs):
result = fn(*args, **kwargs)
if isinstance(result, types.GeneratorType):
result = inline(_from(result))
return result
return wrapped
class InlineGenerator(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def _from(value):
assert isinstance(value, types.GeneratorType)
return InlineGenerator(value)
Usage:
#inline_generators
def outer(x):
def inner_inner(x):
for x in range(1, x + 1):
yield x
def inner(x):
for x in range(1, x + 1):
yield _from(inner_inner(x))
for x in range(1, x + 1):
yield _from(inner(x))
for x in outer(3):
print x,
Produces output:
1 1 1 2 1 1 2 1 2 3
Maybe someone finds this helpful.
Known issues: Lacks support for send() and various corner cases described in PEP 380. These could be added and I will edit my entry once I get it working.
I've found using resource contexts (using the python-resources module) to be an elegant mechanism for implementing subgenerators in Python 2.7. Conveniently I'd already been using the resource contexts anyway.
If in Python 3.3 you would have:
#resources.register_func
def get_a_thing(type_of_thing):
if type_of_thing is "A":
yield from complicated_logic_for_handling_a()
else:
yield from complicated_logic_for_handling_b()
def complicated_logic_for_handling_a():
a = expensive_setup_for_a()
yield a
expensive_tear_down_for_a()
def complicated_logic_for_handling_b():
b = expensive_setup_for_b()
yield b
expensive_tear_down_for_b()
In Python 2.7 you would have:
#resources.register_func
def get_a_thing(type_of_thing):
if type_of_thing is "A":
with resources.complicated_logic_for_handling_a_ctx() as a:
yield a
else:
with resources.complicated_logic_for_handling_b_ctx() as b:
yield b
#resources.register_func
def complicated_logic_for_handling_a():
a = expensive_setup_for_a()
yield a
expensive_tear_down_for_a()
#resources.register_func
def complicated_logic_for_handling_b():
b = expensive_setup_for_b()
yield b
expensive_tear_down_for_b()
Note how the complicated-logic operations only require the registration as a resource.
Another solution: by using my yield-from-as-an-iterator library, you can turn any yield from foo into
for value, handle_send, handle_throw in yield_from(foo):
try:
handle_send((yield value))
except:
if not handle_throw(*sys.exc_info()):
raise
To make sure this answer stands alone even if the PyPI package is ever lost, here is an entire copy of that library's yieldfrom.py from the 1.0.0 release:
# SPDX-License-Identifier: 0BSD
# Copyright 2022 Alexander Kozhevnikov <mentalisttraceur#gmail.com>
"""A robust implementation of ``yield from`` behavior.
Allows transpilers, backpilers, and code that needs
to be portable to minimal or old Pythons to replace
yield from ...
with
for value, handle_send, handle_throw in yield_from(...):
try:
handle_send(yield value)
except:
if not handle_throw(*sys.exc_info()):
raise
"""
__version__ = '1.0.0'
__all__ = ('yield_from',)
class yield_from(object):
"""Implementation of the logic that ``yield from`` adds around ``yield``."""
__slots__ = ('_iterator', '_next', '_default_next')
def __init__(self, iterable):
"""Initializes the yield_from instance.
Arguments:
iterable: The iterable to yield from and forward to.
"""
# Mutates:
# self._next: Prepares to use built-in function next in __next__
# for the first iteration on the iterator.
# self._default_next: Saves initial self._next tuple for reuse.
self._iterator = iter(iterable)
self._next = self._default_next = next, (self._iterator,)
def __repr__(self):
"""Represent the yield_from instance as a string."""
return type(self).__name__ + '(' + repr(self._iterator) + ')'
def __iter__(self):
"""Return the yield_from instance, which is itself an iterator."""
return self
def __next__(self):
"""Execute the next iteration of ``yield from`` on the iterator.
Returns:
Any: The next value from the iterator.
Raises:
StopIteration: If the iterator is exhausted.
Any: If the iterator raises an error.
"""
# Mutates:
# self._next: Resets to default, in case handle_send or
# or handle_throw changed it for this iteration.
next_, arguments = self._next
self._next = self._default_next
value = next_(*arguments)
return value, self.handle_send, self.handle_throw
next = __next__ # Python 2 used `next` instead of ``__next__``
def handle_send(self, value):
"""Handle a send method call for a yield.
Arguments:
value: The value sent through the yield.
Raises:
AttributeError: If the iterator has no send method.
"""
# Mutates:
# self._next: If value is not None, prepares to use the
# iterator's send attribute instead of the built-in
# function next in the next iteration of __next__.
if value is not None:
self._next = self._iterator.send, (value,)
def handle_throw(self, type, exception, traceback):
"""Handle a throw method call for a yield.
Arguments:
type: The type of the exception thrown through the yield.
If this is GeneratorExit, the iterator will be closed
by callings its close attribute if it has one.
exception: The exception thrown through the yield.
traceback: The traceback of the exception thrown through the yield.
Returns:
bool: Whether the exception will be forwarded to the iterator.
If this is false, you should bubble up the exception.
If this is true, the exception will be thrown into the
iterator at the start of the next iteration, and will
either be handled or bubble up at that time.
Raises:
TypeError: If type is not a class.
GeneratorExit: Re-raised after successfully closing the iterator.
Any: If raised by the close function on the iterator.
"""
# Mutates:
# self._next: If type was not GeneratorExit and the iterator
# has a throw attribute, prepares to use that attribute
# instead of the built-in function next in the next
# iteration of __next__.
iterator = self._iterator
if issubclass(type, GeneratorExit):
try:
close = iterator.close
except AttributeError:
return False
close()
return False
try:
throw = iterator.throw
except AttributeError:
return False
self._next = throw, (type, exception, traceback)
return True
What I really like about this way is that:
The implementation is much easier to fully think through and verify for correctness than the alternatives*.
The usage is still simple, and doesn't require decorators or any other code changes anywhere other than just replacing the yield from ... line.
It still has robust forwarding of .send and .throw and handling of errors, StopIteration, and GeneratorExit.
This yield_from implementation will work on any Python 3 and on Python 2 all the way back to Python 2.5**.
* The formal specification ends up entangling all the logic into one big loop, even with some duplication thrown in. All the fully-featured backport implementations I've seen further add complication on top of that. But we can do better by embracing manually implementing the iterator protocol:
We get StopIteration handling for free from Python itself around our __next__ method.
The logic can be split into separate pieces which are entirely decoupled except for the state-saving between them, which frees you from having to de-tangle the logic by yourself - fundamentally yield from is just three simple ideas:
call a method on the iterator to get the next element,
how to handle .send (which may change the method called in step 1), and
how to handle .throw (which may change the method called in step 1).
By asking for modest boilerplate at each yield from replacement, we can avoid needing any hidden magic with special wrapper types, decorators, and so on.
** Python 2.5 is when PEP-342 made yield an expression and added GeneratorExit. Though if you are ever unfortunate enough to need to backport or "backpile" (transpile to an older version of the language) this yield_from would still do all the hard parts of building yield from on top of yield for you.
Also, this idea leaves a lot of freedom for how how usage boilerplate looks. For example,
handle_throw could be trivially refactored into a context manager, enabling usage like this:
for value, handle_send, handle_throw in yield_from(foo):
with handle_throw:
handle_send(yield value)
and
you could make value, handle_send, handle_throw something like a named tuple if you find this usage nicer:
for step in yield_from(foo):
with step.handle_throw:
step.handle_send(yield step.value)

Python: the mechanism behind list comprehension

When using list comprehension or the in keyword in a for loop context, i.e:
for o in X:
do_something_with(o)
or
l=[o for o in X]
How does the mechanism behind in works?
Which functions\methods within X does it call?
If X can comply to more than one method, what's the precedence?
How to write an efficient X, so that list comprehension will be quick?
The, afaik, complete and correct answer.
for, both in for loops and list comprehensions, calls iter() on X. iter() will return an iterable if X either has an __iter__ method or a __getitem__ method. If it implements both, __iter__ is used. If it has neither you get TypeError: 'Nothing' object is not iterable.
This implements a __getitem__:
class GetItem(object):
def __init__(self, data):
self.data = data
def __getitem__(self, x):
return self.data[x]
Usage:
>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
This is an example of implementing __iter__:
class TheIterator(object):
def __init__(self, data):
self.data = data
self.index = -1
# Note: In Python 3 this is called __next__
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
def __iter__(self):
return self
class Iter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return TheIterator(data)
Usage:
>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
As you see you need both to implement an iterator, and __iter__ that returns the iterator.
You can combine them:
class CombinedIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
self.index = -1
return self
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
Usage:
>>> well, you get it, it's all the same...
But then you can only have one iterator going at once.
OK, in this case you could just do this:
class CheatIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
But that's cheating because you are just reusing the __iter__ method of list.
An easier way is to use yield, and make __iter__ into a generator:
class Generator(object):
def __init__(self, data):
self.data = data
def __iter__(self):
for x in self.data:
yield x
This last is the way I would recommend. Easy and efficient.
X must be iterable. It must implement __iter__() which returns an iterator object; the iterator object must implement next(), which returns next item every time it is called or raises a StopIteration if there's no next item.
Lists, tuples and generators are all iterable.
Note that the plain for operator uses the same mechanism.
Answering question's comments I can say that reading source is not the best idea in this case. The code that is responsible for execution of compiled code (ceval.c) does not seem to be very verbose for a person that sees Python sources for the first time. Here is the snippet that represents iteration in for loops:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
/*
Here tp_iternext corresponds to next() in Python
*/
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
To find what actually happens here you need to dive into bunch of other files which verbosity is not much better. Thus I think that in such cases documentation and sites like SO are the first place to go while the source should be checked only for uncovered implementation details.
X must be an iterable object, meaning it needs to have an __iter__() method.
So, to start a for..in loop, or a list comprehension, first X's __iter__() method is called to obtain an iterator object; then that object's next() method is called for each iteration until StopIteration is raised, at which point the iteration stops.
I'm not sure what your third question means, and how to provide a meaningful answer to your fourth question except that your iterator should not construct the entire list in memory at once.
Maybe this helps (tutorial http://docs.python.org/tutorial/classes.html Section 9.9):
Behind the scenes, the for statement
calls iter() on the container object.
The function returns an iterator
object that defines the method next()
which accesses elements in the
container one at a time. When there
are no more elements, next() raises a
StopIteration exception which tells
the for loop to terminate.
To answer your questions:
How does the mechanism behind in works?
It is the exact same mechanism as used for ordinary for loops, as others have already noted.
Which functions\methods within X does it call?
As noted in a comment below, it calls iter(X) to get an iterator. If X has a method function __iter__() defined, this will be called to return an iterator; otherwise, if X defines __getitem__(), this will be called repeatedly to iterate over X. See the Python documentation for iter() here: http://docs.python.org/library/functions.html#iter
If X can comply to more than one method, what's the precedence?
I'm not sure what your question is here, exactly, but Python has standard rules for how it resolves method names, and they are followed here. Here is a discussion of this:
Method Resolution Order (MRO) in new style Python classes
How to write an efficient X, so that list comprehension will be quick?
I suggest you read up more on iterators and generators in Python. One easy way to make any class support iteration is to make a generator function for iter(). Here is a discussion of generators:
http://linuxgazette.net/100/pramode.html

Categories