Structured programming and Python generators? - python

Update: What I really wanted all along were greenlets.
Note: This question mutated a bit as people answered and forced me to "raise the stakes", as my trivial examples had trivial simplifications; rather than continue to mutate it here, I will repose the question when I have it clearer in my head, as per Alex's suggestion.
Python generators are a thing of beauty, but how can I easily break one down into modules (structured programming)? I effectively want PEP 380, or at least something comparable in syntax burden, but in existing Python (e.g. 2.6)
As an (admittedly stupid) example, take the following:
def sillyGenerator():
for i in xrange(10):
yield i*i
for i in xrange(12):
yield i*i
for i in xrange(8):
yield i*i
Being an ardent believer in DRY, I spot the repeated pattern here and factor it out into a method:
def quadraticRange(n):
for i in xrange(n)
yield i*i
def sillyGenerator():
quadraticRange(10)
quadraticRange(12)
quadraticRange(8)
...which of course doesn't work. The parent must call the new function in a loop, yielding the results:
def sillyGenerator():
for i in quadraticRange(10):
yield i
for i in quadraticRange(12):
yield i
for i in quadraticRange(8):
yield i
...which is even longer than before!
If I want to push part of a generator into a function, I always need this rather verbose, two-line wrapper to call it. It gets worse if I want to support send():
def sillyGeneratorRevisited():
g = subgenerator()
v = None
try:
while True:
v = yield g.send(v)
catch StopIteration:
pass
if v < 4:
# ...
else:
# ...
And that's not taking into account passing in exceptions. The same boilerplate every time! Yet one cannot apply DRY and factor this identical code into a function, because...you'd need the boilerplate to call it! What I want is something like:
def sillyGenerator():
yield from quadraticRange(10)
yield from quadraticRange(12)
yield from quadraticRange(8)
def sillyGeneratorRevisited():
v = yield from subgenerator()
if v < 4:
# ...
else:
# ...
Does anyone have a solution to this problem? I have a first-pass attempt, but I'd like to know what others have come up with. Ultimately, any solution will have to tackle examples where the main generator performs complex logic based on the result of data sent into the generator, and potentially makes a very large number of calls to sub-generators: my use-case is generators used to implement long-running, complex state machines.

However, I'd like to make my
reusability criteria one notch harder:
what if I need a control structure
around my repeated generation?
itertools often helps even there - you need to provide concrete examples where you think it doesn't.
For
instance, I might want to call a
subgenerator forever with different
parameters.
itertools.chain.from_iterable.
Or my subgenerators might
be very costly, and I only want to
start them up as and when they are
reached.
Both chain and chain_from_iterable do that -- no sub-iterator is "started up" until the very instant the first item from it is needed.
Or (and this is a real
desire) I might want to vary what I do
next based on what my controller
passes me using send().
A specific example would be greatly appreciated. Anyway, worst case, you'll be coding for x in blargh: yield x where the suspended Pep3080 would let you code yield from blargh -- about 4 extra characters (not a tragedy;-).
And if some sophisticated coroutine-version of some itertools functionality (itertools mostly supports iterators - there's no equivalent coroutools module yet) becomes warranted, because a certain pattern of coroutine composition is often repeated in your code, then it's not too hard to code it yourself.
For example, suppose we often find ourselves doing something like: first yield a certain value; then, repeatedly, if we're sent 'foo', yield the next item from fooiter, if 'bla', from blaiter, if 'zop', from zopiter, anything else, from defiter. As soon as we spot the second occurrence of this compositional pattern, we can code:
def corou_chaiters(initsend, defiter, val2itermap):
currentiter = iter([initsend])
while True:
val = yield next(currentiter)
currentiter = val2itermap(val, defiter)
and call this simple compositional function as and when needed. If we need to compose other coroutines, rather than general iterators, we'll have a slightly different composer using the send method instead of the next built-in function; and so forth.
If you can offer an example that's not easily tamed by such techniques, I suggest you do so in a separate question (specifically targeted to coroutine-like generators), as there's already a lot of material on this one that will have little to do with your other, much more complex/sophisticated, example.

You want to chain several iterators together:
from itertools import chain
def sillyGenerator(a,b,c):
return chain(quadraticRange(a),quadraticRange(b),quadraticRange(c))

Impractical (unfortunately) answer:
from __future__ import PEP0380
def sillyGenerator():
yield from quadraticRange(10)
yield from quadraticRange(12)
yield from quadraticRange(8)
Potentially practical reference: Syntax for delegating to a subgenerator
Unfortunately making this impractical: Python language moratorium
UPDATE Feb 2011:
The moratorium has been lifted, and PEP 380 is on the TODO list for Python 3.3. Hopefully this answer will be practical soon.
Read Guido's remarks on comp.python.devel

import itertools
def quadraticRange(n):
for i in xrange(n)
yield i*i
def sillyGenerator():
return itertools.chain(
quadraticRange(10),
quadraticRange(12),
quadraticRange(8),
)
def sillyGenerator2():
return itertools.chain.from_iterable(
quadraticRange(n) for n in [10, 12, 8])
The last is useful if you want to make sure one iterator is exhausted before another starts (including its initialization code).

There is a Python Enhancement Proposal for providing a yield from statement for "delegating generation". Your example would be written as:
def sillyGenerator():
sq = lambda i: i * i
yield from map(sq, xrange(10))
yield from map(sq, xrange(12))
yield from map(sq, xrange(8))
Or better, in the spirit of DRY:
def sillyGenerator():
for i in [10, 12, 8]:
yield from quadraticRange(i)
The proposal is in draft status and its eventual inclusion is not certain, but it shows that other developers share your thoughts about generators.

For an arbitrary number of calls to quadraticRange:
from itertools import chain
def sillyGenerator(*args):
return chain(*map(quadraticRange, args))
This code uses map and itertools.chain. It takes an arbitrary number of arguments and passes them in order to quadraticRange. The resulting iterators are then chained.

There is a pattern which I call "generator kernel" where generators don't yield directly to the user but to some "kernel" loop that treats (some of) their yields as "system calls" with special meaning.
You can apply it here by an intermediate function that accepts yielded generators and unrolls them automatically. To make it easy to use, we'll create that intermediate function in a decorator:
import functools, types
def flatten(values_or_generators):
for x in values_or_generators:
if isinstance(x, GeneratorType):
for y in x:
yield y
else:
yield x
# Better name anyone?
def subgenerator(g):
"""Decorator making ``yield <gen>`` mean ``yield from <gen>``."""
#functools.wraps(g)
def flat_g(*args, **kw):
return flatten(g(*args, **kw))
return flat_g
and then you can just write:
def quadraticRange(n):
for i in xrange(n)
yield i*i
#subgenerator
def sillyGenerator():
yield quadraticRange(10)
yield quadraticRange(12)
yield quadraticRange(8)
Note that subgenerator() is unrolling exactly one level of the hierarchy. You could easily make it multi-level (by managing a manual stack, or just replacing the inner loop with for y in flatten(x): - but I think it's better as it is, so that every generator that wants to use this non-standard syntax has to be explicitly wrapped with #subgenerator.
Note also that detection of generators is imperfect! It will detect things written as generators, but that's an implementation detail. As a caller of a generator, all you care about is that it returns an iterator. It could be a function returning some itertools object, and then this decorator would fail.
Checking whether the object has a .next() method is too broad - you won't be able to yield strings without them being taken apart. So the most reliable way would be to check for some explicit marker, so you'd write e.g.:
#subgenerator
def sillyGenerator():
yield 'from', quadraticRange(10)
yield 'from', quadraticRange(12)
yield 'from', quadraticRange(8)
Hey, that's almost like the PEP!
[credits: this answer gives a similar function - but it's deep (which I consider wrong) and is not a framed as a decorator]

class Communicator:
def __init__(self, inflow):
self.outflow = None
self.inflow = inflow
Then you do:
c = Communicator(something)
yield c
response = c.outflow
And instead of the boilerplate code you can simply do:
for i in run():
something = i.inflow
# ...
i.outflow = value_to_return_back
It's simple enough code that works without much brain.

Related

Mixing functions and generators

So I was writing a function where a lot of processing happens in the body of a loop, and occasionally it may be of interest to the caller to have the answer to some of the computations.
Normally I would just put the results in a list and return the list, but in this case the results are too large (a few hundred MB on each loop).
I wrote this without really thinking about it, expecting Python's dynamic typing to figure things out, but the following is always created as a generator.
def mixed(is_generator=False):
for i in range(5):
# process some stuff including writing to a file
if is_generator:
yield i
return
From this I have two questions:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator?
2) Is there a sensible way to obtain the behaviour I intended?
2.1) If no, what is the reasoning behind it not being possible? (In terms of how functions and generators work in Python.)
Lets go step by step:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator? Yes
2) Is there a sensible way to obtain the behaviour I intended? Yes, see example below
The thing is to wrap the computation and either return a generator or a list with the data of that generator:
def mixed(is_generator=False):
# create a generator object
gen = (compute_stuff(i) for i in range(5))
# if we want just the generator
if is_generator:
return gen
# if not we consume it with a list and return that list
return list(gen)
Anyway, I would say this is a bad practice. You should have it separated, usually just have the generator function and then use some logic outside:
def computation():
for i in range(5):
# process some stuff including writing to a file
yield i
gen = computation()
if lazy:
for data in gen:
use_data(data)
else:
data = list(gen)
use_all_data(data)

Simpler way to run a generator function without caring about items

I have some use cases in which I need to run generator functions without caring about the yielded items.
I cannot make them non-generaor functions because in other use cases I certainly need the yielded values.
I am currently using a trivial self-made function to exhaust the generators.
def exhaust(generator):
for _ in generator:
pass
I wondered, whether there is a simpler way to do that, which I'm missing?
Edit
Following a use case:
def create_tables(fail_silently=True):
"""Create the respective tables."""
for model in MODELS:
try:
model.create_table(fail_silently=fail_silently)
except Exception:
yield (False, model)
else:
yield (True, model)
In some context, I care about the error and success values…
for success, table in create_tables():
if success:
print('Creation of table {} succeeded.'.format(table))
else:
print('Creation of table {} failed.'.format(table), file=stderr)
… and in some I just want to run the function "blindly":
exhaust(create_tables())
Setting up a for loop for this could be relatively expensive, keeping in mind that a for loop in Python is fundamentally successive execution of simple assignment statements; you'll be executing n (number of items in generator) assignments, only to discard the assignment targets afterwards.
You can instead feed the generator to a zero length deque; consumes at C-speed and does not use up memory as with list and other callables that materialise iterators/generators:
from collections import deque
def exhaust(generator):
deque(generator, maxlen=0)
Taken from the consume itertools recipe.
One very simple and possibly efficient solution could be
def exhaust(generator): all(generator)
if we can assume that generator will always return True (as in your case where a tuple of 2 elements (success,table) is true even if success and table both are False), or: any(generator) if it will always return False, and in the "worst case", all(x or True for x in generator).
Being that short & simple, you might not even need a function for it!
Regarding the "why?" comment (I dislike these...): There are many cases where one may want to exhaust a generator. To cite just one, it's a way of doing a for loop as an expression, e.g., any(print(i,x) for i,x in enumerate(S)) - of course there are less trivial examples.
Based on your use case it's hard to imagine that there would be sufficiently many tables to create that you would need to consider performance.
Additionally, table creation is going to be much more expensive than iteration.
So the for loop that you already have would seem the simplest and most Pythonic solution - in this case.
You could just have two functions that each do one thing and call the appropriate one at the appropriate time?
def create_table(model, fail_silently=True):
"""Create the table."""
try:
model.create_table(fail_silently=fail_silently)
except Exception:
return (False, model)
else:
return (True, model)
def create_tables(MODELS)
for model in MODELS:
create_table(model)
def iter_create_tables(MODELS)
for model in MODELS:
yield create_table(model)
When you care about the returned values do:
for success, table in iter_create_tables(MODELS):
if success:
print('Creation of table {} succeeded.'.format(table))
else:
print('Creation of table {} failed.'.format(table), file=stderr)
when you don't just do
create_tables(MODELS)

How do Python Recursive Generators work?

In a Python tutorial, I've learned that
Like functions, generators can be recursively programmed. The following
example is a generator to create all the permutations of a given list of items.
def permutations(items):
n = len(items)
if n==0: yield []
else:
for i in range(len(items)):
for cc in permutations(items[:i]+items[i+1:]):
yield [items[i]]+cc
for p in permutations(['r','e','d']): print(''.join(p))
for p in permutations(list("game")): print(''.join(p) + ", ", end="")
I cannot figure out how it generates the results. The recursive things and 'yield' really confused me. Could someone explain the whole process clearly?
There are 2 parts to this --- recursion and generator. Here's the non-generator version that just uses recursion:
def permutations2(items):
n = len(items)
if n==0: return [[]]
else:
l = []
for i in range(len(items)):
for cc in permutations2(items[:i]+items[i+1:]):
l.append([items[i]]+cc)
return l
l.append([item[i]]+cc) roughly translates to the permutation of these items include an entry where item[i] is the first item, and permutation of the rest of the items.
The generator part yield one of the permutations instead of return the entire list of permutations.
When you call a function that returns, it disappears after having produced its result.
When you ask a generator for its next element, it produces it (yields it), and pauses -- yields (the control back) to you. When asked again for the next element, it will resume its operations, and run normally until hitting a yield statement. Then it will again produce a value and pause.
Thus calling a generator with some argument causes creation of actual memory entity, an object, capable of running, remembering its state and arguments, and producing values when asked.
Different calls to the same generator produce different actual objects in memory. The definition is a recipe for the creation of that object. After the recipe is defined, when it is called it can call any other recipe it needs -- or the same one -- to create new memory objects it needs, to produce the values for it.
This is a general answer, not Python-specific.
Thanks for the answers. It really helps me to clear my mind and now I want to share some useful resources about recursion and generator I found on the internet, which is also very friendly to the beginners.
To understand generator in python. The link below is really readable and easy to understand.
What does the "yield" keyword do in Python?
To understand recursion, "https://www.youtube.com/watch?v=MyzFdthuUcA". This youtube video gives a "patented" 4 steps method to writing any recursive method/function. That is very clear and practicable. The channel also has several videos to show people how does the recursion works and how to trace it.
I hope it can help someone like me.

Pythonic equivalent of this function?

I have a function to port from another language, could you please help me make it "pythonic"?
Here the function ported in a "non-pythonic" way (this is a bit of an artificial example - every task is associated with a project or "None", we need a list of distinct projects, distinct meaning no duplication of the .identifier property, starting from a list of tasks):
#staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = {}
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifiers_seen.has_key(project_identifier):
continue
project_identifiers_seen[project_identifier] = True
projects.append(project)
return projects
I have specifically not even started to make it "pythonic" not to start off on the wrong foot (e.g. list comprehension with "if project.identifier is not None, filter() based on predicate that looks up the dictionary-based registry of identifiers, using set() to strip duplicates, etc.)
EDIT:
Based on the feedback, I have this:
#staticmethod
def get_projects_of_tasks(task_list):
projects = []
project_identifiers_seen = set()
for task in task_list:
project = task.project
if project is None:
continue
project_identifier = project.identifier
if project_identifier in project_identifiers_seen:
continue
project_identifiers_seen.add(project_identifier)
projects.append(project)
return projects
There's nothing massively unPythonic about this code. A couple of possible improvements:
project_identifiers_seen could be a set, rather than a dictionary.
foo.has_key(bar) is better spelled bar in foo
I'm suspicious that this is a staticmethod of a class. Usually there's no need for a class in Python unless you're actually doing data encapsulation. If this is just a normal function, make it a module-level one.
What about:
project_list = {task.project.identifier:task.project for task in task_list if task.project is not None}
return project_list.values()
For 2.6- use dict constructor instead:
return dict((x.project.id, x.project) for x in task_list if x.project).values()
def get_projects_of_tasks(task_list):
seen = set()
return [seen.add(task.project.identifier) or task.project #add is always None
for task in task_list if
task.project is not None and task.project.identifier not in seen]
This works because (a) add returns None (and or returns the value of the last expression evaluated) and (b) the mapping clause (the first clause) is only executed if the if clause is True.
There is no reason that it has to be in a list comprehension - you could just as well set it out as a loop, and indeed you may prefer to. This way has the advantage that it is clear that you are just building a list, and what is supposed to be in it.
I've not used staticmethod because there is rarely a need for it. Either have this as a module-level function, or a classmethod.
An alternative is a generator (thanks to #delnan for point this out):
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
if task.project is not None and task.project.identifier not in seen:
identifier = task.project.identifier
seen.add(identifier)
yield task.project
This eliminates the need for a side-effect in comprehension (which is controversial), but keeps clear what is being collected.
For the sake of avoiding another if/continue construction, I have left in two accesses to task.project.identifier. This could be conveniently eliminated with the use of a promise library.
This version uses promises to avoid repeated access to task.project.identifier without the need to include an if/continue:
from peak.util.proxies import LazyProxy, get_cache # pip install ProxyTypes
def get_projects_of_tasks(task_list):
seen = set()
for task in task_list:
identifier = LazyProxy(lambda:task.project.identifier) # a transparent promise
if task.project is not None and identifier not in seen:
seen.add(identifier)
yield task.project
This is safe from AttributeErrors because task.project.identifier is never accessed before task.project is checked.
Some say EAFP is pythonic, so:
#staticmethod
def get_projects_of_tasks(task_list):
projects = {}
for task in task_list:
try:
if not task.project.identifier in projects:
projects[task.project.identifier] = task.project
except AttributeError:
pass
return projects.values()
of cours an explicit check wouldn't be wrong either, and would of course be better if many tasks have not project.
And just one dict to keep track of seen identifiers and projects would be enough, if the order of the projects matters, then a OrderedDict (python2.7+) could come in handy.
There are already a lot of good answers, and, indeed, you have accepted one! But I thought I would add one more option. A number of people have seen that your code could be made more compact with generator expressions or list comprehensions. I'm going to suggest a hybrid style that uses generator expressions to do the initial filtering, while maintaining your for loop in the final filter.
The advantage of this style over the style of your original code is that it simplifies the flow of control by eliminating continue statements. The advantage of this style over a single list comprehension is that it avoids multiple accesses to task.project.identifier in a natural way. It also handles mutable state (the seen set) transparently, which I think is important.
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
unique_projects = []
for id, p in ids_projects:
if id not in seen:
seen.add(id)
unique_projects.append(p)
return unique_projects
Because these are generator expressions (enclosed in parenthesis instead of brackets), they don't build temporary lists. The first generator expression creates an iterable of projects; you could think of it as performing the project = task.project line from your original code on all the projects at once. The second generator expression creates an iterable of (project_id, project) tuples. The if clause at the end filters out the None values; (p.identifier, p) is only evaluated if p passes through the filter. Together, these two generator expressions do away with your first two if blocks. The remaining code is essentially the same as yours.
Note also the excellent suggestion from Marcin/delnan that you create a generator using yield. This cuts down further on the verbosity of your code, boiling it down to its essentials:
def get_projects_of_tasks(task_list):
projects = (task.project for task in task_list)
ids_projects = ((p.identifier, p) for p in projects if p is not None)
seen = set()
for id, p in ids_projects:
if id not in seen:
seen.add(id)
yield p
The only disadvantage -- in case this isn't obvious -- is that if you want to permanently store the projects, you have to pass the resulting iterable to list.
projects_of_tasks = list(get_projects_of_tasks(task_list))

Global variable messes up my recursive function

I've just run into a tricky issue. The following code is supposed to split words into chunks of length numOfChar. The function calls itself, which makes it impossible to have the resulting list (res) inside the function. But if I keep it outside as a global variable, then every subsequent call of the function with different input values leads to a wrong result because res doesn't get cleared.
Can anyone help me out?
Here's the code
(in case you are interested, this is problem 7-23 from PySchools.com):
res = []
def splitWord(word, numOfChar):
if len(word) > 0:
res.append(word[:numOfChar])
splitWord(word[numOfChar:], numOfChar)
return res
print splitWord('google', 2)
print splitWord('google', 3)
print splitWord('apple', 1)
print splitWord('apple', 4)
A pure recursive function should not modify the global state, this counts as a side effect.
Instead of appending-and-recursion, try this:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []
Here, you chop the word into pieces one piece at a time, on every call while going down, and then rebuild the pieces into a list while going up.
This is a common pattern called tail recursion.
P.S. As #e-satis notes, recursion is not an efficient way to do this in Python. See also #e-satis's answer for a more elaborate example of tail recursion, along with a more Pythonic way to solve the problem using generators.
Recursion is completely unnecessary here:
def splitWord(word, numOfChar):
return [word[i:i+numOfChar] for i in xrange(0, len(word), numOfChar)]
If you insist on a recursive solution, it is a good idea to avoid global variables (they make it really tricky to reason about what's going on). Here is one way to do it:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []
To elaborate on #Helgi answer, here is a more performant recursive implémentation. It updates the list instead of summing two lists (which results in the creation of a new object every time).
This pattern forces you to pass a list object as third parameter.
def split_word(word, num_of_chars, tail):
if len(word) > 0:
tail.append(word[:num_of_chars])
return split_word(word[num_of_chars:], num_of_chars, tail)
return tail
res = split_word('fdjskqmfjqdsklmfjm', 3, [])
Another advantage of this form, is that it allows tail recursion optimisation. It's useless in Python because it's not a language that performs such optimisation, but if you translate this code into Erlang or Lisp, you will get it for free.
Remember, in Python you are limited by the recursion stack, and there is no way out of it. This is why recursion is not the preferred method.
You would most likely use generators, using yield and itertools (a module to manipulate generators). Here is a very good example of a function that can split any iterable in chunks:
from itertools import chain, islice
def chunk(seq, chunksize, process=iter):
it = iter(seq)
while True:
yield process(chain([it.next()], islice(it, chunksize - 1)))
Now it's a bit complicated if you start learning Python, so I'm not expecting you to fully get it now, but it's good that you can see this and know it exists. You'll come back to it later (we all did, Python iteration tools are overwhelming at first).
The benefits of this approach are:
It can chunk ANY iterable, not just strings, but also lists, dictionaries, tuples, streams, files, sets, queryset, you name it...
It accepts iterables of any length, and even one with an unknown length (think bytes stream here).
It eats very few memory, as the best thing with generators is that they generate the values on the fly, one by one, and they don't store the previous results before computing the next.
It returns chunks of any nature, meaning you can have a chunks of x letters, lists of x items, or even generators spitting out x items (which is the default).
It returns a generator, and therefor can be use in a flow of other generators. Piping data from one generator to the other, bash style, is a wonderful Python ability.
To get the same result that with your function, you would do:
In [17]: list(chunk('fdjskqmfjqdsklmfjm', 3, ''.join))
Out[17]: ['fdj', 'skq', 'mfj', 'qds', 'klm', 'fjm']

Categories