I have the following code:
class MyClass:
def __init__(self):
self.some_variable = None
def func1(self):
i = 1
while i < 10:
yield i * i
self.some_variable = len(str((i * i)))
i += 1
def func2(self):
*_, last = my_class.func1()
print(self.some_variable)
my_class = MyClass()
my_class.func2()
As you can see, some_variable is the length of the last element in the generator. Basically, I was wondering, is this the most pythonic way of getting this variable? If not, how should this be done? I'm just wondering if this is how it should be done or if there's a better way of doing this.
Probably the simplest code is to simply use a for loop to consume the generator, doing nothing in the loop body. The loop variable will have the last value from the generator after the loop ends, which is exactly what you want.
for x in some_generator():
pass
print(x) # print the last value yielded by the generator
This may be a little more efficient than other options because it discards all the values before the last one, rather than storing them in a list or some other data structure.
I think that one pythonic way would be to yield both the element and the length:
def func1():
i = 1
while i < 10:
yield i * i, len(str((i * i)))
i += 1
def func2():
*_, (_, last_len) = func1()
print(last_len)
func2()
or even to extract the calculation of the derived value to another function and call it after consuming the generator:
def func1():
i = 1
while i < 10:
yield i * i
i += 1
def func2(i):
return len(str(i))
def func3():
*_, last = func1()
print(func2(last))
func3()
I think that you have simplified your example too much to be able to find the solution that fits your real use case the best.
I am wondering how I can define a function that allows the range to be set to whatever the starting index is for a certain list. For example:
def pTrend(stock):
pTrend = []
for x in range(0, len(stock)):
if x > 0:
print('This')
if x < 0:
print('That')
I have lists of stock data, and the first one is from range(0, 250) and the next is from (250, 500) and etc.. how could i create a function that sets the start at the range = to the particular index of each stock i will be plugging in?
I don't consider it good form to have functions maintain state like this. If possible, you should try and have the caller code maintain the state for you. For example, have the caller code maintain a cumsum variable that you pass to pTrend as the offset.
def pTrend(stock, offset):
pTrend = []
for i in range(len(stock)):
i += offset
...
This is in effect, what enumerate (as mentioned in this comment) does:
def pTrend(stock, offset):
for i, s in enumerate(stock, offset):
...
Now, in your caller code, you have:
cumsum = 0
for stock in stocks:
... = pTrend(stock, cumsum)
cumsum += len(stock)
Alternatively, define a class and have it keep track of the cumsum variable for you.
class Foo:
def __init__(self):
self.cumsum = 0
def pTrend(self, stock):
pTrend = []
for i, s in enumerate(stock, self.cumsum):
...
self.cumsum += len(stock)
Initialise an object:
f = Foo()
And call pTrend as f.pTrend(stock) as you would usually.
I have a linked list where I iterate within a range and return all of the square numbers that can be represented as integers within this range. Instead of just returning just the numbers that this can be done to it will return None in between for example 9, None, None...,16, None, None..., 25 I wanting it to just return 9, 16, 25 etc etc
class Squares:
def __init__(self, start, end):
self.__start = start - 1
self.__end = end -1
def __iter__(self):
return SquareIterator(self.__start, self.__end)
class SquareIterator:
def __init__(self, start, end):
self.__current = start
self.__step = 1
self.__end = end
def __next__(self):
if self.__current > self.__end:
raise StopIteration
else:
self.__current += self.__step
x = self.__current - self.__step + 1
self.__current - self.__step + 1
if str(x).isdigit() and math.sqrt(x) % 1 == 0:
return x
You need to make your __next__ function continue to loop until it gets to the target value:
def __next__(self):
# We're just going to keep looping. Loop breaking logic is below.
while True:
# out of bounds
if self.__current > self.__end:
raise StopIteration
# We need to get the current value
x = self.__current
# increase the state *after* grabbing it for test
self.__current += self.__step
# Test the value stored above
if math.sqrt(x) % 1 == 0:
return x
The reason you should be storing x, then incrementing is that you have to increment no matter what, even if you don't have a perfect square.
It is unclear why you are complicating things; there is a simple way:
import math
class Squares:
def __init__(self, start, end):
self.__start = start
self.__end = end
self.__step = 1
def __iter__(self):
for x in range(self.__start, self.__end, self.__step):
if math.sqrt(x) % 1 == 0:
yield x
s = Squares(0, 100)
for sq in s:
print(sq, end=' ')
output:
0 1 4 9 16 25 36 49 64 81
from the comments:
Mind you, it would likely be much easier to avoid the dedicated
iterator class, and just implement __iter__ for Squares as a generator
function. Explicit __next__ involves all sorts of inefficient state
management that Python does poorly, and isn't all that easy to follow;
__iter__ as a generator function is usually very straightforward; every time you hit a yield it's like the return from __next__, but all
your state is function local, no special objects involved (generators
take care of saving and restoring said local state). – ShadowRanger>
it probably doesn't even need a Squares class. A generator function
named squares would do what's needed; pass it start, stop and step and
use them as local variables, rather than attributes of some
unnecessary self. Only real advantage to the class is that it could be
iterated repeatedly without reconstructing it, a likely uncommon use
case
def squares_from(start, stop, step=1):
"""returns a generator function for the perfect squares
in the range comprised between start and stop, iterated over using step=step
"""
for x in range(start, stop, step):
if math.sqrt(x) % 1 == 0:
yield x
for sq in squares_from(0, 100):
print(sq, end=' ')
I know that you can use .send(value) to send values to an generator. I also know that you can iterate over a generator in a for loop. Is it possible to pass values to a generator while iterating over it in a for loop?
What I'm trying to do is
def example():
previous = yield
for i range(0,10):
previous = yield previous*i
t = example()
for value in example"...pass in a value?...":
"...do something with the result..."
You technically could, but the results would be confusing. eg:
def example():
previous = (yield)
for i in range(1,10):
received = (yield previous)
if received is not None:
previous = received*i
t = example()
for i, value in enumerate(t):
t.send(i)
print value
Outputs:
None
0
2
8
18
Dave Beazley wrote an amazing article on coroutines (tldr; don't mix generators and coroutines in the same function)
Ok, so I figured it out. The trick is to create an additional generator that wraps t.send(value) in a for loop (t.send(value) for value in [...]).
def example():
previous = yield
for i in range(0,10):
previous = yield previous * i
t = examplr()
t.send(None)
for i in (t.send(i) for i in ["list of objects to pass in"]):
print i
Is it okay to use the yield statement in an instance method of a class? For example,
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
yield self.nout, xi
Python doesn't complain about this, and simple cases seem to work. However, I've only seen examples with yield from regular functions.
I start having problems when I try to use it with itertools functions. For example, suppose I have two large data streams X and Y that are stored across multiple files, and I want to compute their sum and difference with only one loop through the data. I could use itertools.tee and itertools.izip like in the following diagram
In code it would be something like this (sorry, it's long)
from itertools import izip_longest, izip, tee
import random
def add(x,y):
for xi,yi in izip(x,y):
yield xi + yi
def sub(x,y):
for xi,yi in izip(x,y):
yield xi - yi
class NthSumDiff(object):
def __init__(self, n):
self.nthsum = Nth(n)
self.nthdiff = Nth(n)
def itervalues(self, x, y):
xadd, xsub = tee(x)
yadd, ysub = tee(y)
gen_sum = self.nthsum.itervalues(add(xadd, yadd))
gen_diff = self.nthdiff.itervalues(sub(xsub, ysub))
# Have to use izip_longest here, but why?
#for (i,nthsum), (j,nthdiff) in izip_longest(gen_sum, gen_diff):
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
assert i==j, "sum row %d != diff row %d" % (i,j)
yield nthsum, nthdiff
nskip = 12
ns = Nth(nskip)
nd = Nth(nskip)
nsd = NthSumDiff(nskip)
nfiles = 10
for i in range(nfiles):
# Generate some data.
# If the block length is a multiple of nskip there's no problem.
#n = random.randint(5000, 10000) * nskip
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
# Independent processing is no problem but requires two loops.
for i, nthsum in ns.itervalues(add(x,y)):
pass
for j, nthdiff in nd.itervalues(sub(x,y)):
pass
assert i==j
# Trying to do both with one loops causes problems.
for nthsum, nthdiff in nsd.itervalues(x,y):
# If izip_longest is necessary, why don't I ever get a fillvalue?
assert nthsum is not None
assert nthdiff is not None
# After each block of data the two iterators should have the same state.
assert nsd.nthsum.nout == nsd.nthdiff.nout, \
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
But this fails unless I swap itertools.izip out for itertools.izip_longest even though the iterators have the same length. It's the last assert that gets hit, with output like
file 0 n=58581
file 1 n=87978
Traceback (most recent call last):
File "test.py", line 71, in <module>
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
AssertionError: sum nout 12213 != diff nout 12212
Edit: I guess it's not obvious from the example I wrote, but the input data X and Y are only available in blocks (in my real problem they're chunked in files). This is important because I need to maintain state between blocks. In the toy example above, this means Nth needs to yield the equivalent of
>>> x1 = range(0,10)
>>> x2 = range(10,20)
>>> (x1 + x2)[::3]
[0, 3, 6, 9, 12, 15, 18]
NOT the equivalent of
>>> x1[::3] + x2[::3]
[0, 3, 6, 9, 10, 13, 16, 19]
I could use itertools.chain to join the blocks ahead of time and then make one call to Nth.itervalues, but I'd like to understand what's wrong with maintaining state in the Nth class between calls (my real app is image processing involving more saved state, not simple Nth/add/subtract).
I don't understand how my Nth instances end up in different states when their lengths are the same. For example, if I give izip two strings of equal length
>>> [''.join(x) for x in izip('ABCD','abcd')]
['Aa', 'Bb', 'Cc', 'Dd']
I get a result of the same length; how come my Nth.itervalues generators seem to be getting unequal numbers of next() calls even though each one yields the same number of results?
Gist repo with revisions |
Quick link to solution
Quick answer
You never reset self.i and self.nout in class Nth. Also, you should have used something like this:
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
def itervalues(self, x):
for a,b in enumerate(islice(x, self.n - 1, None, self.n)):
self.nout = a
yield a,b
but since you don't even need nout, you should use this:
def Nth(iterable, step):
return enumerate(itertools.islice(iterable, step - 1, None, step))
Long answer
Your code had an off-by-one smell that led me to this line in NthSumDiff.itervalues():
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
If you swap gen_sum and gen_diff, you'll see that gen_diff will always be the one with nout greater by one. This is because izip() pulls from gen_sum before pulling from gen_diff. gen_sum raises a StopIteration exception before gen_diff is even tried in the last iteration.
For example, say you pick N samples where N % step == 7. At the end of each iteration, self.i for the Nth instances should equal 0. But on the very last iteration, self.i in gen_sum will increment up to 7 and then there will be no more elements in x. It will raise StopIteration. gen_diff is still sitting at self.i equal to 0, though.
If you add self.i = 0 and self.nout = 0 to the beginning of Nth.itervalues(), the problem goes away.
Lesson
You only had this problem because your code is too complicated and not Pythonic. If you find yourself using lots of counters and indexes in loops, that's a good sign (in Python) to take a step back and see if you can simplify your code. I have a long history of C programming, and consequently, I still catch myself doing the same thing from time to time in Python.
Simpler implementation
Putting my money where my mouth is...
from itertools import izip, islice
import random
def sumdiff(x,y,step):
# filter for the Nth values of x and y now
x = islice(x, step-1, None, step)
y = islice(y, step-1, None, step)
return ((xi + yi, xi - yi) for xi, yi in izip(x,y))
nskip = 12
nfiles = 10
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
for nthsum, nthdiff in sumdiff(x,y,nskip):
assert nthsum is not None
assert nthdiff is not None
assert len(list(sumdiff(x,y,nskip))) == n/nskip
More explanation of the problem
In response to Brian's comment:
This doesn't do the same thing. Not resetting i and nout is
intentional. I've basically got a continuous data stream X that's
split across several files. Slicing the blocks gives a different
result than slicing the concatenated stream (I commented earlier about
possibly using itertools.chain). Also my actual program is more
complicated than mere slicing; it's just a working example. I don't
understand the explanation about the order of StopIteration. If
izip('ABCD','abcd') --> Aa Bb Cc Dd then it seems like equal-length
generators should get an equal number of next calls, no? – Brian
Hawkins 6 hours ago
Your problem was so long that I missed the part about the stream coming from multiple files. Let's just look at the code itself. First, we need to be really clear about how itervalues(x) actually works.
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
# We increment self.i by self.n on every next()
# call to this generator method unless the
# number of objects remaining in x is less than
# self.n. In that case, we increment by that amount
# before the for loop exits normally.
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
# We're yielding, so we're a generator
yield self.nout, xi
# Python helpfully raises StopIteration to fulfill the
# contract of an iterable. That's how for loops and
# others know when to stop.
In itervalues(x) above, for every next() call, it internally increments self.i by self.n and then yields OR it increments self.i by the number of objects remaining in x and then exits the for loop and then exits the generator (itervalues() is a generator because it yields). When the itervalues() generator exits, Python raises a StopIteration exception.
So, for every instance of class Nth initialized with N, the value of self.i after exhausting all elements in itervalues(X) will be:
self.i = value_of_self_i_before_itervalues(X) + len(X) % N
Now when you iterate over izip(Nth_1, Nth_2), it will do something like this:
def izip(A, B):
try:
while True:
a = A.next()
b = B.next()
yield a,b
except StopIteration:
pass
So, imagine N=10 and len(X)=13. On the very last next() call to izip(),
both A and B have self.i==0 as their state. A.next() is called, increments self.i += 3, runs out of elements in X, exits the for loop, returns, and then Python raises StopIteration. Now, inside izip() we go directly to the exception block skipping B.next() entirely. So, A.i==3 and B.i==0 at the end.
Second try at simplification (with correct requirements)
Here's another simplified version that treats all file data as one continuous stream. It uses chained, small, re-usable generators. I would highly, highly recommend watching this PyCon '14 talk about generators by David Beazley. Guessing from your problem description, it should be 100% applicable.
from itertools import izip, islice
import random
def sumdiff(data):
return ((x + y, x - y) for x, y in data)
def combined_file_data(files):
for i,n in files:
# Generate some data.
x = range(n)
y = range(100,n+100)
for data in izip(x,y):
yield data
def filelist(nfiles):
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
yield i, n
def Nth(iterable, step):
return islice(iterable, step-1, None, step)
nskip = 12
nfiles = 10
filedata = combined_file_data(filelist(nfiles))
nth_data = Nth(filedata, nskip)
for nthsum, nthdiff in sumdiff(nth_data):
assert nthsum is not None
assert nthdiff is not None
Condensing the discussion, there's nothing wrong with using yield in an instance method per se. You get into trouble with izip if the instance state changes after the last yield because izip stops calling next() on its arguments once any of them stops yielding results. A clearer example might be
from itertools import izip
class Three(object):
def __init__(self):
self.status = 'init'
def run(self):
self.status = 'running'
yield 1
yield 2
yield 3
self.status = 'done'
raise StopIteration()
it = Three()
for x in it.run():
assert it.status == 'running'
assert it.status == 'done'
it1, it2 = Three(), Three()
for x, y in izip(it1.run(), it2.run()):
pass
assert it1.status == 'done'
assert it2.status == 'done', "Expected status=done, got status=%s." % it2.status
which hits the last assertion,
AssertionError: Expected status=done, got status=running.
In the original question, the Nth class can consume input data after its last yield, so the sum and difference streams can get out of sync with izip. Using izip_longest would work since it will try to exhaust each iterator. A clearer solution might be to refactor to avoid changing state after the last yield.