I have a long running task that must be guided by external code. But external code need some information about this task. This is my homebrew example:
def longtask(self):
yield self.get_step_length(1)
for x in self.perform_step(1):
...
yield x.id
yield self.get_step_length(2)
for x in self.perform_step(2):
...
yield x.value
...
# call site
generator = self.longtask()
step1len = generator.Next()
step1pb = ProgressBar('Step 1', step1len)
# pull only step 1 items
for index, id in itertools.izip(xrange(0, step1len), generator):
step1pb.update(index)
...do something with id
step2len = generator.Next()
step2pb = ProgressBar('Step 2', step1len)
# pull only step 1 items
for index, value in itertools.izip(xrange(0, step1len), generator):
step2pb.update(index)
... do something other with value
Is it right to use such a complex generator protocols in python, or I need to refactor this code?
I'd refactor this to returning separate generators; you can use nested functions:
def longtask(self):
def step_generator(step):
for x in self.perform_step(step):
...
yield x.id
yield step_length_1, step_one_generator(1)
yield step_length_2, step_one_generator(2)
generators = self.longtask()
for counter, (steplength, stepgen) in enumerate(generators):
ProgressBar('Step %d' % counter, steplength)
for index, value in enumerate(stepgen):
# ....
Now you can also use the enumerate() function to add numbers to the items; that is much more readable than zipping together an xrange() and the generator.
Related
In an interview , the interviewer asked me for some of generators being used in Python. I know a generator is like a function which yield values instead of return.
so any one tell me is for/while loop is an example of generator.
Short answer: No, but there are other forms of generators.
A for/while loop is a loop structure: it does not emit values and thus is not a generator.
Nevertheless, there are other ways to construct generators.
You example with yield is for instance a generator:
def some_generator(xs):
for x in xs:
if x:
yield x
But there are also generator expressions, like:
(x for x in xs if x)
Furthermore in python-3.x the range(..), map(..), filter(..) constructs are generators as well.
And of course you can make an iterable (by using an iterable pattern):
class some_generator(object):
def __init__(self, xs):
self.n = n
self.idx = 0
def __iter__(self):
return self
def __next__(self):
return self.next()
def next(self):
while self.num < len(self.xs) and not self.xs[self.num]:
self.num += 1
if self.num < len(self.xs):
res = self.xs[self.num]
self.num += 1
return res
else:
raise StopIteration()
Neither while nor for are themselves generators or iterators. They are control constructs that perform iteration. Certainly, you can use for or while to iterate over the items yielded by a generator, and you can use for or while to perform iteration inside the code of a generator. But neither of those facts make for or while generators.
The first line in the python wiki for generators:
Generators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.
So in the context of your interview I'd believe they were looking for you to answer about the creation of an iterable.
The wiki for a for loop
In Python this is controlled instead by generating the appropriate sequence.
So you could get pedantic but generally, no, a for loop isn't a generator.
for and while are loop structures, and you can use them to iterate over generators. You can take certain elements of a generator by converting it to a list.
I'm trying to write a function that returns the next element of a generator and if it is at the end of the generator it resets it and returns the next result. The expected output of the code below would be:
1
2
3
1
2
However that is not what I get obviously. What am I doing that is incorrect?
a = '123'
def convert_to_generator(iterable):
return (x for x in iterable)
ag = convert_to_generator(a)
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original)
get_next_item(gen, original)
for n in range(5):
print(get_next_item(ag,a))
1
2
3
None
None
Is itertools.cycle(iterable) a possible alternative?
You need to return the result of your recursive call:
return get_next_item(gen, original)
which still does not make this a working approach.
The generator ag used in your for-loop is not changed by the rebinding of the local variable gen in your function. It will stay exhausted...
As has been mentioned in the comments, check out itertools.cycle.
the easy way is just use itertools.cycle, otherwise you would need to remember the elements in the iterable if said iterable is an iterator (aka a generator) becase those can't be reset, if its not a iterator, you can reuse it many times.
the documentation include a example implementation
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
or for example, to do the reuse thing
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
if iter(iterable) is iter(iterable): # is a iterator
saved = []
for element in iterable:
yield element
saved.append(element)
else:
saved = iterable
while saved:
for element in saved:
yield element
example use
test = cycle("123")
for i in range(5):
print(next(test))
now about your code, the problem is simple, it don't remember it state
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original) # <-- the problem is here
get_next_item(gen, original) #and you should return something here
in the marked line a new generator is build, but you would need to update your ag variable outside this function to get the desire behavior, there are ways to do it, like changing your function to return the element and the generator, there are other ways, but they are not recommended or more complicated like building a class so it remember its state
get_next_item is a generator, that returns an iterator, that gives you the values it yields via the __next__ method. For that reason, your statement doesn't do anything.
What you want to do is this:
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original)
for i in get_next_item(gen, original):
return i
or shorter, and completely equivalent (as long as gen has a __iter__ method, which it probably has):
def get_next_item(gen, original):
for i in gen:
yield i
for i in get_next_item(convert_to_generator(original)):
yield i
Or without recursion (which is a big problem in python, as it is 1. limited in depth and 2. slow):
def get_next_item(gen, original):
for i in gen:
yield i
while True:
for i in convert_to_generator(original):
yield i
If convert_to_generator is just a call to iter, it is even shorter:
def get_next_item(gen, original):
for i in gen:
yield i
while True:
for i in original:
yield i
or, with itertools:
import itertools
def get_next_item(gen, original):
return itertools.chain(gen, itertools.cycle(original))
and get_next_item is equivalent to itertools.cycle if gen is guaranteed to be an iterator for original.
Side note: You can exchange for i in x: yield i for yield from x (where x is some expression) with Python 3.3 or higher.
What is the Pythonic way to make a generator that also produces aggregate results? In meta code, something like this (but not for real, as my Python version does not support mixing yield and return):
def produce():
total = 0
for item in find_all():
total += 1
yield item
return total
As I see it, I could:
Not make produce() a generator, but pass it a callback function to call on every item.
With every yield, also yield the aggregate results up until now. I'd rather not calculate the intermediate results with every yield, only when finishing.
Send a dict as argument to produce() that will be populated with the aggregate results.
Use a global to store aggregate results.
All of them don't seem very attractive...
NB. total is a simple example, my actual code requires complex aggregations. And I need intermediate results before produce() finishes, hence a generator.
Maybe you shouldn't use a generator but an iterator.
def findall(): # no idea what your "find_all" does so I use this instead. :-)
yield 1
yield 2
yield 3
class Produce(object):
def __init__(self, iterable):
self._it = iterable
self.total = 0
def __iter__(self):
return self
def __next__(self):
self.total += 1
return next(self._it)
next = __next__ # only necessary for python2 compatibility
Maybe better to see this with an example:
>>> it = Produce(findall())
>>> it.total
0
>>> next(it)
1
>>> next(it)
2
>>> it.total
2
you can use enumerate to count stuff, for example
i=0
for i,v in enumerate(range(10), 1 ):
print(v)
print("total",i)
(notice the start value of the enumerate)
for more complex stuff, you can use the same principle, make produce a generator that yield both values and ignore one in the iteration and use it later when finished.
other alternative is passing a modifiable object, for example
def produce(mem):
t=0
for x in range(10):
t+=1
yield x
mem.append(t)
aggregate=[]
for x in produce(aggregate):
print(x)
print("total",aggregate[0])
in either case the result is the same for this example
0
1
2
3
4
5
6
7
8
9
total 10
Am I missing something? Why not:
def produce():
total = 0
for item in find_all():
total += 1
yield item
yield total
I know that you can use .send(value) to send values to an generator. I also know that you can iterate over a generator in a for loop. Is it possible to pass values to a generator while iterating over it in a for loop?
What I'm trying to do is
def example():
previous = yield
for i range(0,10):
previous = yield previous*i
t = example()
for value in example"...pass in a value?...":
"...do something with the result..."
You technically could, but the results would be confusing. eg:
def example():
previous = (yield)
for i in range(1,10):
received = (yield previous)
if received is not None:
previous = received*i
t = example()
for i, value in enumerate(t):
t.send(i)
print value
Outputs:
None
0
2
8
18
Dave Beazley wrote an amazing article on coroutines (tldr; don't mix generators and coroutines in the same function)
Ok, so I figured it out. The trick is to create an additional generator that wraps t.send(value) in a for loop (t.send(value) for value in [...]).
def example():
previous = yield
for i in range(0,10):
previous = yield previous * i
t = examplr()
t.send(None)
for i in (t.send(i) for i in ["list of objects to pass in"]):
print i
What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation