Is it okay to use the yield statement in an instance method of a class? For example,
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
yield self.nout, xi
Python doesn't complain about this, and simple cases seem to work. However, I've only seen examples with yield from regular functions.
I start having problems when I try to use it with itertools functions. For example, suppose I have two large data streams X and Y that are stored across multiple files, and I want to compute their sum and difference with only one loop through the data. I could use itertools.tee and itertools.izip like in the following diagram
In code it would be something like this (sorry, it's long)
from itertools import izip_longest, izip, tee
import random
def add(x,y):
for xi,yi in izip(x,y):
yield xi + yi
def sub(x,y):
for xi,yi in izip(x,y):
yield xi - yi
class NthSumDiff(object):
def __init__(self, n):
self.nthsum = Nth(n)
self.nthdiff = Nth(n)
def itervalues(self, x, y):
xadd, xsub = tee(x)
yadd, ysub = tee(y)
gen_sum = self.nthsum.itervalues(add(xadd, yadd))
gen_diff = self.nthdiff.itervalues(sub(xsub, ysub))
# Have to use izip_longest here, but why?
#for (i,nthsum), (j,nthdiff) in izip_longest(gen_sum, gen_diff):
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
assert i==j, "sum row %d != diff row %d" % (i,j)
yield nthsum, nthdiff
nskip = 12
ns = Nth(nskip)
nd = Nth(nskip)
nsd = NthSumDiff(nskip)
nfiles = 10
for i in range(nfiles):
# Generate some data.
# If the block length is a multiple of nskip there's no problem.
#n = random.randint(5000, 10000) * nskip
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
# Independent processing is no problem but requires two loops.
for i, nthsum in ns.itervalues(add(x,y)):
pass
for j, nthdiff in nd.itervalues(sub(x,y)):
pass
assert i==j
# Trying to do both with one loops causes problems.
for nthsum, nthdiff in nsd.itervalues(x,y):
# If izip_longest is necessary, why don't I ever get a fillvalue?
assert nthsum is not None
assert nthdiff is not None
# After each block of data the two iterators should have the same state.
assert nsd.nthsum.nout == nsd.nthdiff.nout, \
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
But this fails unless I swap itertools.izip out for itertools.izip_longest even though the iterators have the same length. It's the last assert that gets hit, with output like
file 0 n=58581
file 1 n=87978
Traceback (most recent call last):
File "test.py", line 71, in <module>
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
AssertionError: sum nout 12213 != diff nout 12212
Edit: I guess it's not obvious from the example I wrote, but the input data X and Y are only available in blocks (in my real problem they're chunked in files). This is important because I need to maintain state between blocks. In the toy example above, this means Nth needs to yield the equivalent of
>>> x1 = range(0,10)
>>> x2 = range(10,20)
>>> (x1 + x2)[::3]
[0, 3, 6, 9, 12, 15, 18]
NOT the equivalent of
>>> x1[::3] + x2[::3]
[0, 3, 6, 9, 10, 13, 16, 19]
I could use itertools.chain to join the blocks ahead of time and then make one call to Nth.itervalues, but I'd like to understand what's wrong with maintaining state in the Nth class between calls (my real app is image processing involving more saved state, not simple Nth/add/subtract).
I don't understand how my Nth instances end up in different states when their lengths are the same. For example, if I give izip two strings of equal length
>>> [''.join(x) for x in izip('ABCD','abcd')]
['Aa', 'Bb', 'Cc', 'Dd']
I get a result of the same length; how come my Nth.itervalues generators seem to be getting unequal numbers of next() calls even though each one yields the same number of results?
Gist repo with revisions |
Quick link to solution
Quick answer
You never reset self.i and self.nout in class Nth. Also, you should have used something like this:
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
def itervalues(self, x):
for a,b in enumerate(islice(x, self.n - 1, None, self.n)):
self.nout = a
yield a,b
but since you don't even need nout, you should use this:
def Nth(iterable, step):
return enumerate(itertools.islice(iterable, step - 1, None, step))
Long answer
Your code had an off-by-one smell that led me to this line in NthSumDiff.itervalues():
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
If you swap gen_sum and gen_diff, you'll see that gen_diff will always be the one with nout greater by one. This is because izip() pulls from gen_sum before pulling from gen_diff. gen_sum raises a StopIteration exception before gen_diff is even tried in the last iteration.
For example, say you pick N samples where N % step == 7. At the end of each iteration, self.i for the Nth instances should equal 0. But on the very last iteration, self.i in gen_sum will increment up to 7 and then there will be no more elements in x. It will raise StopIteration. gen_diff is still sitting at self.i equal to 0, though.
If you add self.i = 0 and self.nout = 0 to the beginning of Nth.itervalues(), the problem goes away.
Lesson
You only had this problem because your code is too complicated and not Pythonic. If you find yourself using lots of counters and indexes in loops, that's a good sign (in Python) to take a step back and see if you can simplify your code. I have a long history of C programming, and consequently, I still catch myself doing the same thing from time to time in Python.
Simpler implementation
Putting my money where my mouth is...
from itertools import izip, islice
import random
def sumdiff(x,y,step):
# filter for the Nth values of x and y now
x = islice(x, step-1, None, step)
y = islice(y, step-1, None, step)
return ((xi + yi, xi - yi) for xi, yi in izip(x,y))
nskip = 12
nfiles = 10
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
for nthsum, nthdiff in sumdiff(x,y,nskip):
assert nthsum is not None
assert nthdiff is not None
assert len(list(sumdiff(x,y,nskip))) == n/nskip
More explanation of the problem
In response to Brian's comment:
This doesn't do the same thing. Not resetting i and nout is
intentional. I've basically got a continuous data stream X that's
split across several files. Slicing the blocks gives a different
result than slicing the concatenated stream (I commented earlier about
possibly using itertools.chain). Also my actual program is more
complicated than mere slicing; it's just a working example. I don't
understand the explanation about the order of StopIteration. If
izip('ABCD','abcd') --> Aa Bb Cc Dd then it seems like equal-length
generators should get an equal number of next calls, no? – Brian
Hawkins 6 hours ago
Your problem was so long that I missed the part about the stream coming from multiple files. Let's just look at the code itself. First, we need to be really clear about how itervalues(x) actually works.
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
# We increment self.i by self.n on every next()
# call to this generator method unless the
# number of objects remaining in x is less than
# self.n. In that case, we increment by that amount
# before the for loop exits normally.
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
# We're yielding, so we're a generator
yield self.nout, xi
# Python helpfully raises StopIteration to fulfill the
# contract of an iterable. That's how for loops and
# others know when to stop.
In itervalues(x) above, for every next() call, it internally increments self.i by self.n and then yields OR it increments self.i by the number of objects remaining in x and then exits the for loop and then exits the generator (itervalues() is a generator because it yields). When the itervalues() generator exits, Python raises a StopIteration exception.
So, for every instance of class Nth initialized with N, the value of self.i after exhausting all elements in itervalues(X) will be:
self.i = value_of_self_i_before_itervalues(X) + len(X) % N
Now when you iterate over izip(Nth_1, Nth_2), it will do something like this:
def izip(A, B):
try:
while True:
a = A.next()
b = B.next()
yield a,b
except StopIteration:
pass
So, imagine N=10 and len(X)=13. On the very last next() call to izip(),
both A and B have self.i==0 as their state. A.next() is called, increments self.i += 3, runs out of elements in X, exits the for loop, returns, and then Python raises StopIteration. Now, inside izip() we go directly to the exception block skipping B.next() entirely. So, A.i==3 and B.i==0 at the end.
Second try at simplification (with correct requirements)
Here's another simplified version that treats all file data as one continuous stream. It uses chained, small, re-usable generators. I would highly, highly recommend watching this PyCon '14 talk about generators by David Beazley. Guessing from your problem description, it should be 100% applicable.
from itertools import izip, islice
import random
def sumdiff(data):
return ((x + y, x - y) for x, y in data)
def combined_file_data(files):
for i,n in files:
# Generate some data.
x = range(n)
y = range(100,n+100)
for data in izip(x,y):
yield data
def filelist(nfiles):
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
yield i, n
def Nth(iterable, step):
return islice(iterable, step-1, None, step)
nskip = 12
nfiles = 10
filedata = combined_file_data(filelist(nfiles))
nth_data = Nth(filedata, nskip)
for nthsum, nthdiff in sumdiff(nth_data):
assert nthsum is not None
assert nthdiff is not None
Condensing the discussion, there's nothing wrong with using yield in an instance method per se. You get into trouble with izip if the instance state changes after the last yield because izip stops calling next() on its arguments once any of them stops yielding results. A clearer example might be
from itertools import izip
class Three(object):
def __init__(self):
self.status = 'init'
def run(self):
self.status = 'running'
yield 1
yield 2
yield 3
self.status = 'done'
raise StopIteration()
it = Three()
for x in it.run():
assert it.status == 'running'
assert it.status == 'done'
it1, it2 = Three(), Three()
for x, y in izip(it1.run(), it2.run()):
pass
assert it1.status == 'done'
assert it2.status == 'done', "Expected status=done, got status=%s." % it2.status
which hits the last assertion,
AssertionError: Expected status=done, got status=running.
In the original question, the Nth class can consume input data after its last yield, so the sum and difference streams can get out of sync with izip. Using izip_longest would work since it will try to exhaust each iterator. A clearer solution might be to refactor to avoid changing state after the last yield.
Related
I'm aware yield generates a value on the fly, by my understanding this means it doesn't keep the value in the memory, and therefore the current value shouldn't be able to interact with the last values.
But I just want to be sure that's the case, could someone confirm if it's possible or not?
I'm going to use 5 as the value in number.
Example without generator:
def factorial(number):
result = number
if number <= 1:
return 1
else:
for x in reversed(range(1, number)): # (4,1) reversed
result *= x # 5*4*3*2*1
return result # returns 120
Is it possible to do the same thing by using the yield function? how?
Thank you
Generators can be stateful:
def fibs():
a, b = 1, 1
while True:
yield b
a, b = b, a + b
g = fibs()
for i in range(10):
print next(g)
Here the state is in the local variables. They are kept alive while the iterator generated by the generator is alive.
EDIT. I'm blind it was a factorial
def factorials():
i = 1
a = 1
while True:
yield a
i+=1
a*=i
or if you need a function not a stream of them then here's a one liner
print reduce(lambda a, b: a*b, (range(1, 10+1)))
Is it possible to create a iterator/generator which will decide on the next value based on some result on the previous iteration?
i.e.
y = None
for x in some_iterator(ll, y):
y = some_calculation_on(x)
I would like the logic of choosing the next x to depend on the calculation result allowing different logic for different results, much like in a search problem.
I also want to keep the how to choose the next x and the calculation on x as separate as possible.
Did you that you can send to a generator using generator.send? So yes, you can have a generator to change its behaviour based on feedback from the outside world. From the doc:
generator.send(value)
Resumes the execution and “sends” a value into the generator function.
The value argument becomes the result of the current yield expression.
The send() method returns the next value yielded by the generator
[...]
Example
Here is a counter that will increment only if told to do so.
def conditionalCounter(start=0):
while True:
should_increment = yield start
if should_increment:
start += 1
Usage
Since iteration with a for-loop does not allow to use generator.send, you have to use a while-loop.
import random
def some_calculation_on(value):
return random.choice([True, False])
g = conditionalCounter()
last_value = next(g)
while last_value < 5:
last_value = g.send(some_calculation_on(last_value))
print(last_value)
Output
0
0
1
2
3
3
4
4
5
Make it work in a for-loop
You can make the above work in a for-loop by crafting a YieldReceive class.
class YieldReceive:
stop_iteration = object()
def __init__(self, gen):
self.gen = gen
self.next = next(gen, self.stop_iteration)
def __iter__(self):
return self
def __next__(self):
if self.next is self.stop_iteration:
raise StopIteration
else:
return self.next
def send(self, value):
try:
self.next = self.gen.send(value)
except StopIteration:
self.next = self.stop_iteration
Usage
it = YieldReceive(...)
for x in it:
# Do stuff
it.send(some_result)
It's possible but confusing. If you want to keep the sequence of x values and the calculations on x separate, you should do this explicitly by not involving x with an iterator.
def next_value(x):
"""Custom iterator"""
# Bunch of code defining a new x
yield new_x
x = None
while True:
x = next_value(x)
x = some_calculation_on(x)
# Break when you're done
if finished and done:
break
If you want the loop to execute exactly i times, then use a for loop:
for step in range(i):
x = next_value(x)
x = some_calculation_on(x)
# No break
def conditional_iterator(y):
# stuff to create new values
yield x if (expression involving y) else another_x
for x in conditional_iterator(y):
y = some_computation(x)
I have a linked list where I iterate within a range and return all of the square numbers that can be represented as integers within this range. Instead of just returning just the numbers that this can be done to it will return None in between for example 9, None, None...,16, None, None..., 25 I wanting it to just return 9, 16, 25 etc etc
class Squares:
def __init__(self, start, end):
self.__start = start - 1
self.__end = end -1
def __iter__(self):
return SquareIterator(self.__start, self.__end)
class SquareIterator:
def __init__(self, start, end):
self.__current = start
self.__step = 1
self.__end = end
def __next__(self):
if self.__current > self.__end:
raise StopIteration
else:
self.__current += self.__step
x = self.__current - self.__step + 1
self.__current - self.__step + 1
if str(x).isdigit() and math.sqrt(x) % 1 == 0:
return x
You need to make your __next__ function continue to loop until it gets to the target value:
def __next__(self):
# We're just going to keep looping. Loop breaking logic is below.
while True:
# out of bounds
if self.__current > self.__end:
raise StopIteration
# We need to get the current value
x = self.__current
# increase the state *after* grabbing it for test
self.__current += self.__step
# Test the value stored above
if math.sqrt(x) % 1 == 0:
return x
The reason you should be storing x, then incrementing is that you have to increment no matter what, even if you don't have a perfect square.
It is unclear why you are complicating things; there is a simple way:
import math
class Squares:
def __init__(self, start, end):
self.__start = start
self.__end = end
self.__step = 1
def __iter__(self):
for x in range(self.__start, self.__end, self.__step):
if math.sqrt(x) % 1 == 0:
yield x
s = Squares(0, 100)
for sq in s:
print(sq, end=' ')
output:
0 1 4 9 16 25 36 49 64 81
from the comments:
Mind you, it would likely be much easier to avoid the dedicated
iterator class, and just implement __iter__ for Squares as a generator
function. Explicit __next__ involves all sorts of inefficient state
management that Python does poorly, and isn't all that easy to follow;
__iter__ as a generator function is usually very straightforward; every time you hit a yield it's like the return from __next__, but all
your state is function local, no special objects involved (generators
take care of saving and restoring said local state). – ShadowRanger>
it probably doesn't even need a Squares class. A generator function
named squares would do what's needed; pass it start, stop and step and
use them as local variables, rather than attributes of some
unnecessary self. Only real advantage to the class is that it could be
iterated repeatedly without reconstructing it, a likely uncommon use
case
def squares_from(start, stop, step=1):
"""returns a generator function for the perfect squares
in the range comprised between start and stop, iterated over using step=step
"""
for x in range(start, stop, step):
if math.sqrt(x) % 1 == 0:
yield x
for sq in squares_from(0, 100):
print(sq, end=' ')
I have a generator and would like to find out what the first value which it generates larger than X. One way to do this is as follows, but it seems rather long-winded (it reads like it repeats itself).
def long_winded(gen,X)
n = next(gen)
while n < X: n=next(gen)
return n
What I wanted to write was something more simply:
short_broken(gen,X):
while next(gen)<X: pass
return next(gen) # returns the SECOND value larger than X, as gen is called again
short_broken2(gen,X):
while n = next(gen)<X: pass # Not python syntax!
return n
Is there a pythonically-concise way to return the same result?
from itertools import dropwhile
def first_result_larger_than_x(gen, X):
return next(dropwhile(lambda n: n <= X, gen))
Note that your code examples from the OP are actually returning the first result greater than or equal to X. I've corrected that in this code example, but if that was what you actually wanted, change the <= to a <.
def short2(gen,X):
for x in gen:
if x > X:
return x
or as a 1-liner (which I prefer to the itertools variant):
def short3(gen,X):
return next(x for x in gen if x > X)
my original answer -- left only for the sake of posterity
I'm not necessarily asserting that this method is better, but you can use a recursive function:
def short(gen,X):
n = next(gen)
return n if n>X else short(gen,X)
I'm trying to represent an array of evenly spaced floats, an arithmetic progression, starting at a0 and with elements a0, a0 + a1, a0 + 2a1, a0 + 3a1, ...
This is what numpy's arange() method does, but it seems to allocate memory for the whole array object and I'd like to do it using an iterator class which just stores a0, a1 and n (the total number of elements, which might be large).
Does anything that does this already exist in the standard Python packages?
I couldn't find it so, ploughed ahead with:
class mylist():
def __init__(self, n, a0, a1):
self._n = n
self._a0 = a0
self._a1 = a1
def __getitem__(self, i):
if i < 0 or i >= self._n:
raise IndexError
return self._a0 + i * self._a1
def __iter__(self):
self._i = 0
return self
def next(self):
if self._i >= self._n:
raise StopIteration
value = self.__getitem__(self._i)
self._i += 1
return value
Is this a sensible approach or am I revinventing the wheel?
Well, one thing that you are doing wrong is that it should be for i, x in enumerate(a): print i, x.
Also, I'd probably use a generator method instead of the hassle with the __iter__ and next() methods, especially because your solution wouldn't allow you to iterate over the same mylist twice at the same time with two different iterators (as self._i is local to the class).
This is probably a better solution which gives you random access as well as an efficient iterator. The support for the in and len operators are thrown in as a bonus :)
class mylist(object):
def __init__(self, n, a0, a1, eps=1e-8):
self._n = n
self._a0 = a0
self._a1 = a1
self._eps = eps
def __contains__(self, x):
y = float(x - self._a0) / self._a1
return 0 <= int(y) < self._n and abs(y - int(y)) < self._eps
def __getitem__(self, i):
if 0 <= i < self._n:
return self._a0 + i * self._a1
raise IndexError
def __iter__(self):
current = self._a0
for i in xrange(self._n):
yield current
current += self._a1
def __len__(self):
return self._n
Other answers answer the immediate problem. Note that if all you want is an iterator and you don't need random access, there's no need to write a whole iterator class.
def mylist(n, a0, a1):
for i in xrange(n):
yield a0 + i*a1
Only for reasons that are probably obvious to someone out there, iterating over mylist: for i,x in enumerate(a): print i,a doesn't return the values I expect but just a whole lot of references to the mylist instance. What am I doing wrong?
The culprit is print i,a. You are printing a, which is an array. You ought to print x instead. Chane this line to:
print i,x
Also a couple of things:
Change you class name to TitleCase. For e.g. class MyList or class Mylist.
It is a good idea to inherit from object if you are using Python 2.x. So class MyList(object): ...
You're enumerating over a, so printing it will print a lot of references to it. Print x instead.
Yes, there's a built-in generator for this (Python 2.7 and up):
import itertools
mygen = itertools.count(a0,a1)
If you don't have Python 2.7 yet (Python 2.4 and up):
import itertools
mygen = (a0 + a1*i for i in itertools.count())