Python how to iterate two infinite generators at the same time? - python

I'd like to iterate over a number of infinite generators:
def x(y):
while True:
for i in xrange(y):
yield i
for i,j in zip(x(5),x(3)):
print i,j
The code above will will produce nothing. What am I doing wrong?

That's because Python 2 zip tries to create a list by getting all the elements the generator will ever produce. What you want is an iterator, i.e. itertools.izip.
In Python 3 zip works like izip.

zip is not the right tool for generators. Try itertools.izip instead!
(Or even better, use Python 3, where your code works fine - once you add parentheses to the print)

You just need to use a variant of zip that returns an iterator instead of a list. Fortunately, there's one of them in the itertools module.
import itertools
def x(y):
while True:
for i in xrange(y):
yield i
for i,j in itertools.izip(x(5),x(3)):
print i,j
Note that in Python 3, itertools.izip doesn't exist because the vanilla zip is already an iterator.
Also in itertools there's a function called cycle which infinitely cycles over an iterable.
Make an iterator returning elements from the iterable and saving a
copy of each. When the iterable is exhausted, return elements from the
saved copy. Repeats indefinitely.
So itertools.cycle(range(5)) does the same thing as your x(5); you can also pass xrange(5) to cycle, it's not fussy. ;)

From what I understood, you are trying to iterate over two iterators simultaneously. You can always use while loop if nothing else works.
gen1 = x(5)
gen2 = x(3)
while True:
try:
print(next(gen1), next(gen2))
except StopIteration:
break
If you using python3.4 and above, then your function x can be refactored too-
def x(y):
yield from xrange(y)

Related

Python get the last element from generator items

I'm super amazed using the generator instead of list.
But I can't find any solution for this question.
What is the efficient way to get the first and last element from generator items?
Because with list we can just do lst[0] and lst[-1]
Thanks for the help. I can't provide any codes since it's clearly that's just what I want to know :)
You have to iterate through the whole thing. Say you have this generator:
def foo():
yield 0
yield 1
yield 2
yield 3
The easiest way to get the first and last value would be to convert the generator into a list. Then access the values using list lookups.
data = list(foo())
print(data[0], data[-1])
If you want to avoid creating a container, you could use a for-loop to exhaust the generator.
gen = foo()
first = last = next(gen)
for last in gen: pass
print(first, last)
Note: You'll want to special case this when there are no values produced by the generator.

Python consume an iterator pair-wise

I am trying to understand Python's iterators in the context of the pysam module. By using the fetch method on a so called AlignmentFile class one get a proper iterator iter consisting of records from the file file. I can the use various methods to access each record (iterable), for instance the name with query_name:
import pysam
iter = pysam.AlignmentFile(file, "rb", check_sq=False).fetch(until_eof=True)
for record in iter:
print(record.query_name)
It happens that records come in pairs so that one would like something like:
while True:
r1 = iter.__next__()
r2 = iter.__next__()
print(r1.query_name)
print(r2.query_name)
Calling next() is probably not the right way for million of records, but how can one use a for loop to consume the same iterator in pairs of iterables. I looked at the grouper recipe from itertools and the SOs Iterate an iterator by chunks (of n) in Python? [duplicate] (even a duplicate!) and What is the most “pythonic” way to iterate over a list in chunks? but cannot get it to work.
First of all, don't use the variable name iter, because that's already the name of a builtin function.
To answer your question, simply use itertools.izip (Python 2) or zip (Python 3) on the iterator.
Your code may look as simple as
for next_1, next_2 in zip(iterator, iterator):
# stuff
edit: whoops, my original answer was the correct one all along, don't mind the itertools recipe.
edit 2: Consider itertools.izip_longest if you deal with iterators that could yield an uneven amount of objects:
>>> from itertools import izip_longest
>>> iterator = (x for x in (1,2,3))
>>>
>>> for next_1, next_2 in izip_longest(iterator, iterator):
... next_1, next_2
...
(1, 2)
(3, None)

Inconsistent behavior of python generators

The following python code produces [(0, 0), (0, 7)...(0, 693)] instead of the expected list of tuples combining all of the multiples of 3 and multiples of 7:
multiples_of_3 = (i*3 for i in range(100))
multiples_of_7 = (i*7 for i in range(100))
list((i,j) for i in multiples_of_3 for j in multiples_of_7)
This code fixes the problem:
list((i,j) for i in (i*3 for i in range(100)) for j in (i*7 for i in range(100)))
Questions:
The generator object seems to play the role of an iterator instead of providing an iterator object each time the generated list is to be enumerated. The later strategy seems to be adopted by .Net LINQ query objects. Is there an elegant way to get around this?
How come the second piece of code works? Shall I understand that the generator's iterator is not reset after looping through all multiples of 7?
Don't you think that this behavior is counter intuitive if not inconsistent?
A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators. This behavior is not something you can change with a switch somewhere, so any work around amounts to either using an iterable (e.g. a list) instead of an generator or repeatedly constructing generators.
The second snippet does the latter. It is by definition equivalent to the loops
for i in (i*3 for i in range(100)):
for j in (i*7 for i in range(100)):
...
Hopefully it isn't surprising that here, the latter generator expression is evaluated anew on each iteration of the outer loop.
As you discovered, the object created by a generator expression is an iterator (more precisely a generator-iterator), designed to be consumed only once. If you need a resettable generator, simply create a real generator and use it in the loops:
def multiples_of_3(): # generator
for i in range(100):
yield i * 3
def multiples_of_7(): # generator
for i in range(100):
yield i * 7
list((i,j) for i in multiples_of_3() for j in multiples_of_7())
Your second code works because the expression list of the inner loop ((i*7 ...)) is evaluated on each pass of the outer loop. This results in creating a new generator-iterator each time around, which gives you the behavior you want, but at the expense of code clarity.
To understand what is going on, remember that there is no "resetting" of an iterator when the for loop iterates over it. (This is a feature; such a reset would break iterating over a large iterator in pieces, and it would be impossible for generators.) For example:
multiples_of_2 = iter(xrange(0, 100, 2)) # iterator
for i in multiples_of_2:
print i
# prints nothing because the iterator is spent
for i in multiples_of_2:
print i
...as opposed to this:
multiples_of_2 = xrange(0, 100, 2) # iterable sequence, converted to iterator
for i in multiples_of_2:
print i
# prints again because a new iterator gets created
for i in multiples_of_2:
print i
A generator expression is equivalent to an invoked generator and can therefore only be iterated over once.
The real issue as I found out is about single versus multiple pass iterables and the fact that there is currently no standard mechanism to determine if an iterable single or multi pass: See Single- vs. Multi-pass iterability
If you want to convert a generator expression to a multipass iterable, then it can be done in a fairly routine fashion. For example:
class MultiPass(object):
def __init__(self, initfunc):
self.initfunc = initfunc
def __iter__(self):
return self.initfunc()
multiples_of_3 = MultiPass(lambda: (i*3 for i in range(20)))
multiples_of_7 = MultiPass(lambda: (i*7 for i in range(20)))
print list((i,j) for i in multiples_of_3 for j in multiples_of_7)
From the point of view of defining the thing it's a similar amount of work to typing:
def multiples_of_3():
return (i*3 for i in range(20))
but from the point of view of the user, they write multiples_of_3 rather than multiples_of_3(), which means the object multiples_of_3 is polymorphic with any other iterable, such as a tuple or list.
The need to type lambda: is a bit inelegant, true. I don't suppose there would be any harm in introducing "iterable comprehensions" to the language, to give you what you want while maintaining backward compatibility. But there are only so many punctuation characters, and I doubt this would be considered worth one.

zip and groupby curiosity in python 2.7

Can someone explain why these output different things in Python 2.7.4? They output the same thing in python 3.3.1. I'm just wondering if this is a bug in 2.7 that was fixed in 3, or if it is due to some change in the language.
>>> for (i,j),k in zip(groupby([1,1,2,2,3,3]), [4,5,6]):
... print list(j)
...
[]
[]
[3]
>>> for i,j in groupby([1,1,2,2,3,3]):
... print list(j)
...
[1, 1]
[2, 2]
[3, 3]
This isn't a mistake. It has to do with when the groupby iterable gets consumed. Try the following with python3 and you'll see the same behavior:
from itertools import groupby
for (i,j),k in list(zip(groupby([1,1,2,2,3,3]), [4,5,6])):
print (i,list(j),k)
Note that if you remove the outer list, then you get the result you expect. The "problem" here is that the grouper object (returned in j) is an iterable which yields elements as long as they are the same. It doesn't know ahead of time what it will yield or how many elements there are. It just receives an iterable as input and then yields from that iterable. If you move on to the next "group", then the iterable ends up being consumed before you ever get a chance to look at the elements. This is a design decision to allow groupby to operate on iterables which yield arbitrary (even infinite) numbers of elements.
In python2.x, zip will create a list, effectively moving past each "group" before the loop even starts. In doing so, it ends up consuming each of the "group" ojects returned by groupby. This is why you only have the last element in the list reported. The fix for python2.x is to use itertools.izip rather than zip. In python3.x, izip became the builtin zip. As I see it, the only way to support both in this script is via something like:
from __future__ import print_function
from itertools import groupby
try:
from itertools import izip
except ImportError: #python3.x
izip = zip
for (i,j),k in izip(groupby([1,1,2,2,3,3]), [4,5,6]):
print (i,list(j),k)

How do I reverse an itertools.chain object?

My function creates a chain of generators:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
chained = itertools.chain(some_sequence, some_other_sequence)
return chained
My function sometimes needs to return chained in reversed order. Conceptually, the following is what I would like to be able to do:
if num < 0:
return reversed(chained)
return chained
Unfortunately:
>>> reversed(chained)
TypeError: argument to reversed() must be a sequence
What are my options?
This is in some realtime graphic rendering code so I don't want to make it too complicated/slow.
EDIT:
When I first posed this question I hadn't thought about the reversibility of generators. As many have pointed out, generators can't be reversed.
I do in fact want to reverse the flattened contents of the chain; not just the order of the generators.
Based on the responses, there is no single call I can use to reverse an itertools.chain, so I think the only solution here is to use a list, at least for the reverse case, and perhaps for both.
if num < 0:
lst = list(chained)
lst.reverse()
return lst
else:
return chained
reversed() needs an actual sequence, because it iterates it backwards by index, and that wouldn't work for a generator (which only has the notion of "next" item).
Since you will need to unroll the whole generator anyway for reversing, the most efficient way is to read it to a list and reverse the list in-place with the .reverse() method.
You cannot reverse generators by definition. The interface of a generator is the iterator, which is a container that supports only forward iteration. When you want to reverse a iterator, you have to collect all it's items first and reverse them after that.
Use lists instead or generate the sequences backwards from the start.
itertools.chain would need to implement __reversed__() (this would be best) or __len__() and __getitem__()
Since it doesn't, and there's not even a way to access the internal sequences you'll need to expand the entire sequence to be able to reverse it.
reversed(list(CHAIN_INSTANCE))
It would be nice if chain would make __reversed__() available when all the sequences are reversable, but currently it does not do that. Perhaps you can write your own version of chain that does
def reversed2(iter):
return reversed(list(iter))
reversed only works on objects that support len and indexing. You have to first generate all results of a generator before wrapping reversed around them.
However, you could easily do this:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num, -1, -1))
some_other_sequence = (x*2.6 for x in range(num, -1, -1))
chained = itertools.chain(some_other_sequence, some_sequence)
return chained
Does this work in you real app?
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
list_of_chains = [some_sequence, some_other_sequence]
if num < 0:
list_of_chains.reverse()
chained = itertools.chain(*list_of_chains)
return chained
In theory you can't because chained objects may even contain infinite sequences such as itertools.count(...).
You should try to reverse your generators/sequences or use reversed(iterable) for each sequence if applicable and then chain them together last-to-first. Of course this highly depends on your use case.

Categories