Generator expressions vs generator functions and surprisingly eager evaluation

Generator expressions vs generator functions and surprisingly eager evaluation - python

For reasons which are not relevant I am combining some data structures in a certain way, whilst also replacing the Python 2.7's default dict with OrderedDict. The data structures use tuples as keys in dictionaries. Please ignore those details (the replacement of the dict type is not useful below, but it is in the real code).
import __builtin__
import collections
import contextlib
import itertools
def combine(config_a, config_b):
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
#contextlib.contextmanager
def dict_as_ordereddict():
dict_orig = __builtin__.dict
try:
__builtin__.dict = collections.OrderedDict
yield
finally:
__builtin__.dict = dict_orig
This works as expected initially (dict can take non-string keyword arguments as a special case):
print 'one level nesting'
with dict_as_ordereddict():
result = combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
)
print list(result)
print
Output:
one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]
However, when nesting calls to the combine generator expression, it can be seen that the dict reference is treated as OrderedDict, lacking the special behaviour of dict to use tuples as keyword arguments:
print 'two level nesting'
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting
Traceback (most recent call last):
File "test.py", line 36, in <module>
[{(8, 9): 'e', (10, 11): 'f'}]
File "test.py", line 8, in combine
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
File "test.py", line 8, in <genexpr>
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings
Furthermore, implementing via yield instead of a generator expression fixes the problem:
def combine_yield(config_a, config_b):
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
print 'two level nesting, yield'
with dict_as_ordereddict():
result = combine_yield(combine_yield(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]
Questions:
Why does some item (only the first?) from the generator expression get evaluated before required in the second example, or what is it required for?
Why is it not evaluated in the first example? I actually expected this behaviour in both.
Why does the yield-based version work?

Before going into the details note the following: itertools.product evaluates the iterator arguments in order to compute the product. This can be seen from the equivalent Python implementation in the docs (the first line is relevant):
def product(*args, **kwds):
pools = map(tuple, args) * kwds.get('repeat', 1)
...
You can also try this with a custom class and a short test script:
import itertools
class Test:
def __init__(self):
self.x = 0
def __iter__(self):
return self
def next(self):
print('next item requested')
if self.x < 5:
self.x += 1
return self.x
raise StopIteration()
t = Test()
itertools.product(t, t)
Creating the itertools.product object will show in the output that all the iterators items are immediately requested.
This means, as soon as you call itertools.product the iterator arguments are evaluated. This is important because in the first case the arguments are just two lists and so there's no problem. Then you evaluate the final result via list(result after the context manager dict_as_ordereddict has returned and so all calls to dict will be resolved as the normal builtin dict.
Now for the second example the inner call to combine works still fine, now returning a generator expression which is then used as one of the arguments to the second combine's call to itertools.product. As we've seen above these arguments are immediately evaluated and so the generator object is asked to generate its values. In order to do so, it needs to resolve dict. However now we're still inside the context manager dict_as_ordereddict and for that reason dict will be resolved as OrderedDict which doesn't accept non-string keys for keyword arguments.
It is important to notice here that the first version which uses return needs to create the generator object in order to return it. That involves creating the itertools.product object. That means this version is as lazy as itertools.product.
Now to the question why the yield version works. By using yield, invoking the function will return a generator. Now this is a truly lazy version in the sense that execution of the function body doesn't start until items are requested. This means neither the inner nor the outer call to convert will start executing the function body and thus invoking itertools.product until the items are requested via list(result). You can check that by putting an additional print statement inside that function and right behind the context manager:
def combine(config_a, config_b):
print 'start'
# return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print 'end of context manager'
print list(result)
print
With the yield version we'll notice that it prints the following:
end of context manager
start
start
I.e. the generators are started only when the results are requested via list(result). This is different from the return version (uncomment in the above code). Now you'll see
start
start
and before the end of the context manager is reached the error is already raised.
On a side note, in order for your code to work, the replacement of dict needs to be ineffective (and it is for the first version), so I don't see why you would use that context manager at all. Secondly, dict literals are not ordered in Python 2, and neither are keyword arguments so that also defeats the purpose of using OrderedDict. Also note that in Python 3 that non-string keyword arguments behavior of dict has been removed and the clean way to update dictionaries of any keys is to use dict.update.

Related

In Python, can a loop be used within a function?

I am just learning python and was trying to define a function using a for loop.
The code is as follows -
def chk(hilist):
``` The function returns the output of the enumerate function as (x1,y1) (x2,y2)...
```
for item in enumerate(hilist):
return item
I ran the above function for the input 'string' as below -
abc = chk('string')
abc
The output came out as (0,s).
If I ran the regular for function and the output will be as follows -
(0, 's')
(1, 't')
(2, 'r')
(3, 'i')
(4, 'n')
(5, 'g')
Can someone please help me understand what I am doing wrong ?
Thanks in advance.

Return will break the function immediately.
So, you have to save the result in a list and return it:
def chk(hilist):
""" The function returns the output of the enumerate function as (x1,y1) (x2,y2)...
"""
ret_list=list()
for item in enumerate(hilist):
ret_list.append(item)
return ret_list

in Python (and in all programming languages), using the return keyword will get you out of the function, so I propose two solutions:
solution 1: store your tuples in a list and then return the list
itself
solution 2: replace return with yield (but if you want to print returned items convert it to a list ex: list(abc(some_arguments)))

I think the simplest solution would be to use print(item) if you wanna get all the enumerated values from 'string':
def chk(hilist):
for item in enumerate(hilist):
print(item)
This worked smoothly for me.

Can generators be recursive?

I naively tried to create a recursive generator. Didn't work. This is what I did:
def recursive_generator(lis):
yield lis[0]
recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(k)
All I got was the first item 6.
Is there a way to make such code work? Essentially transferring the yield command to the level above in a recursion scheme?

Try this:
def recursive_generator(lis):
yield lis[0]
yield from recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(k)
I should point out this doesn't work because of a bug in your function. It should probably include a check that lis isn't empty, as shown below:
def recursive_generator(lis):
if lis:
yield lis[0]
yield from recursive_generator(lis[1:])
In case you are on Python 2.7 and don't have yield from, check this question out.

Why your code didn't do the job
In your code, the generator function:
returns (yields) the first value of the list
then it creates a new iterator object calling the same generator function, passing a slice of the list to it
and then stops
The second instance of the iterator, the one recursively created, is never being iterated over. That's why you only got the first item of the list.
A generator function is useful to automatically create an iterator object (an object that implements the iterator protocol), but then you need to iterate over it: either manually calling the next() method on the object or by means of a loop statement that will automatically use the iterator protocol.
So, can we recursively call a generator?
The answer is yes. Now back to your code, if you really want to do this with a generator function, I guess you could try:
def recursive_generator(some_list):
"""
Return some_list items, one at a time, recursively iterating over a slice of it...
"""
if len(some_list)>1:
# some_list has more than one item, so iterate over it
for i in recursive_generator(some_list[1:]):
# recursively call this generator function to iterate over a slice of some_list.
# return one item from the list.
yield i
else:
# the iterator returned StopIteration, so the for loop is done.
# to finish, return the only value not included in the slice we just iterated on.
yield some_list[0]
else:
# some_list has only one item, no need to iterate on it.
# just return the item.
yield some_list[0]
some_list = [6,3,9,1]
for k in recursive_generator(some_list):
print(k)
Note: the items are returned in reversed order, so you might want to use some_list.reverse() before calling the generator the first time.
The important thing to note in this example is: the generator function recursively calls itself in a for loop, which sees an iterator and automatically uses the iteration protocol on it, so it actually gets values from it.
This works, but I think this is really not useful. We are using a generator function to iterate over a list and just get the items out, one at a time, but... a list is an iterable itself, so no need for generators!
Of course I get it, this is just an example, maybe there are useful applications of this idea.
Another example
Let's recycle the previous example (for lazyness). Lets say we need to print the items in a list, adding to every item the count of previous items (just a random example, not necessarily useful).
The code would be:
def recursive_generator(some_list):
"""
Return some_list items, one at a time, recursively iterating over a slice of it...
and adding to every item the count of previous items in the list
"""
if len(some_list)>1:
# some_list has more than one item, so iterate over it
for i in recursive_generator(some_list[1:]):
# recursively call this generator function to iterate over a slice of some_list.
# return one item from the list, but add 1 first.
# Every recursive iteration will add 1, so we basically add the count of iterations.
yield i + 1
else:
# the iterator returned StopIteration, so the for loop is done.
# to finish, return the only value not included in the slice we just iterated on.
yield some_list[0]
else:
# some_list has only one item, no need to iterate on it.
# just return the item.
yield some_list[0]
some_list = [6,3,9,1]
for k in recursive_generator(some_list):
print(k)
Now, as you can see, the generator function is actually doing something before returning list items AND the use of recursion starts to make sense. Still, just a stupid example, but you get the idea.
Note: off course, in this stupid example the list is expected to contain only numbers. If you really want to go try and break it, just put in a string in some_list and have fun. Again, this is only an example, not production code!

Recursive generators are useful for traversing non-linear structures. For example, let a binary tree be either None or a tuple of value, left tree, right tree. A recursive generator is the easiest way to visit all nodes. Example:
tree = (0, (1, None, (2, (3, None, None), (4, (5, None, None), None))),
(6, None, (7, (8, (9, None, None), None), None)))
def visit(tree): #
if tree is not None:
try:
value, left, right = tree
except ValueError: # wrong number to unpack
print("Bad tree:", tree)
else: # The following is one of 3 possible orders.
yield from visit(left)
yield value # Put this first or last for different orders.
yield from visit(right)
print(list(visit(tree)))
# prints nodes in the correct order for 'yield value' in the middle.
# [1, 3, 2, 5, 4, 0, 6, 9, 8, 7]
Edit: replace if tree with if tree is not None to catch other false values as errors.
Edit 2: about putting the recursive calls in the try: clause (comment by #jpmc26).
For bad nodes, the code above just logs the ValueError and continues. If, for instance, (9,None,None) is replaced by (9,None), the output is
Bad tree: (9, None)
[1, 3, 2, 5, 4, 0, 6, 8, 7]
More typical would be to reraise after logging, making the output be
Bad tree: (9, None)
Traceback (most recent call last):
File "F:\Python\a\tem4.py", line 16, in <module>
print(list(visit(tree)))
File "F:\Python\a\tem4.py", line 14, in visit
yield from visit(right)
File "F:\Python\a\tem4.py", line 14, in visit
yield from visit(right)
File "F:\Python\a\tem4.py", line 12, in visit
yield from visit(left)
File "F:\Python\a\tem4.py", line 12, in visit
yield from visit(left)
File "F:\Python\a\tem4.py", line 7, in visit
value, left, right = tree
ValueError: not enough values to unpack (expected 3, got 2)
The traceback gives the path from the root to the bad node. One could wrap the original visit(tree) call to reduce the traceback to the path: (root, right, right, left, left).
If the recursive calls are included in the try: clause, the error is recaught, relogged, and reraised at each level of the tree.
Bad tree: (9, None)
Bad tree: (8, (9, None), None)
Bad tree: (7, (8, (9, None), None), None)
Bad tree: (6, None, (7, (8, (9, None), None), None))
Bad tree: (0, (1, None, (2, (3, None, None), (4, (5, None, None), None))), (6, None, (7, (8, (9, None), None), None)))
Traceback (most recent call last):
... # same as before
The multiple logging reports are likely more noise than help. If one wants the path to the bad node, it might be easiest to wrap each recursive call in its own try: clause and raise a new ValueError at each level, with the contructed path so far.
Conclusion: if one is not using an exception for flow control (as may be done with IndexError, for instance) the presence and placements of try: statements depends on the error reporting one wants.

The reason your recursive call only executes once is that you are essentially creating nested generators. That is, you are creating a new generator inside a generator each time you call the function recursive_generator recursively.
Try the following and you will see.
def recursive_generator(lis):
yield lis[0]
yield recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(type(k))
One simple solution, like others mention, is to use yield from.

Up to Python 3.4, a generator function used to have to raise StopIteration exception when it is done.
For the recursive case other exceptions (e.g. IndexError) are raised earlier than StopIteration, therefore we add it manually.
def recursive_generator(lis):
if not lis: raise StopIteration
yield lis[0]
yield from recursive_generator(lis[1:])
for k in recursive_generator([6, 3, 9, 1]):
print(k)
def recursive_generator(lis):
if not lis: raise StopIteration
yield lis.pop(0)
yield from recursive_generator(lis)
for k in recursive_generator([6, 3, 9, 1]):
print(k)
Note that for loop will catch StopIteration exception.
More about this here

Yes you can have recursive generators. However, they suffer from the same recursion depth limit as other recursive functions.
def recurse(x):
yield x
yield from recurse(x)
for (i, x) in enumerate(recurse(5)):
print(i, x)
This loop gets to about 3000 (for me) before crashing.
However, with some trickery, you can create a function that feeds a generator to itself. This allows you to write generators like they are recursive but are not: https://gist.github.com/3noch/7969f416d403ba3a54a788b113c204ce

How to convert recursive function to generator?

I have the following recursive function to generate a list of valid configurations for a (named) list of positions, where each position can be used only once:
def generate_configurations(configurations, named_positions, current):
if len(current) == len(named_positions):
configurations.append(current)
return configurations
name, positions = named_positions[len(current)]
for x in positions:
if x not in current:
generate_configurations(configurations, named_positions, current + (x,))
return configurations
Here is an example of how I call it:
named_positions = [('a', [0,1,2]),
('b', [1,3]),
('c', [1,2])]
for comb in generate_configurations([], named_positions, ()):
print comb
Which gives the following output:
(0, 1, 2)
(0, 3, 1)
(0, 3, 2)
(1, 3, 2)
(2, 3, 1)
Also, it is possible there are no valid combinations, e.g. for named_positions = [('a', [3]), ('b', [3])].
Now depending on the input named_positions, the configurations list can quickly become huge, resulting in a MemoryError. I believe this function could be re-written as a generator, so I tried the following:
def generate_configurations(named_positions, current):
if len(current) == len(named_positions):
yield current
name, positions = named_positions[len(current)]
for x in positions:
if x not in current:
generate_configurations(named_positions, current + (x,))
named_positions = [('a', [0,1,2]),
('b', [1,3]),
('c', [1,2])]
for comb in generate_configurations(named_positions, ()):
print comb
but this doesn't generate any results at all. What am I doing wrong?

You need to yield up the recursive call stack or the inner yields never happen and get discarded. Since this is tagged Python 2.7, the recursive calls would be handled by changing:
if x not in current:
# Creates the generator, but doesn't run it out to get and yield values
generate_configurations(named_positions, current + (x,))
to:
if x not in current:
# Actually runs the generator and yields values up the call stack
for y in generate_configurations(named_positions, current + (x,)):
yield y
In Python 3.3 and higher, you can delegate directly with:
if x not in current:
yield from generate_configurations(named_positions, current + (x,)):

When you are using generators, you need to make sure that your sub-generator recursive calls pass back up to the calling method.
def recur_generator(n):
yield my_thing
yield my_other_thing
if n > 0:
yield from recur_generator(n-1)
Notice here that the yield from is what passes the yield calls back up to the parent call.
You should change the recursive call line to
yield from generate_configurations(named_positions, current + (x,))
Otherwise, your generator is fine.
EDIT: Didn't notice that this was python2. You can use
for x in recur_generator(n-1):
yield x
instead of yield from.

What is going on in this Python for loop?

records = [
('foo', 1, 2),
('bar', 'hello'),
('foo', 3, 4),
]
def do_foo(x, y):
print('foo', x, y)
def do_bar(s):
print('bar', s)
for tag, *args in records:
if tag == 'foo':
do_foo(*args)
elif tag == 'bar':
do_bar(*args)
I know you can use syntax like for i, val in enumerate(a). To me, it looks like tag, *args is being used here to create a tuple, such that the code is effectively for tuple in records. But that is just an uneducated guess.

records is a list of tuples, which the for statement iterates over. On each iteration, tag is assigned the first element of the tuple (the strings 'foo' and 'bar'), and *args sets args to a tuple consisting of the rest of the tuple (e.g. (1, 2)). These are then spread as arguments when calling do_foo and do_bar.

for tag, *args in records:
means taken each element in the iterable records. That element will be iterable.
Put the first element of that element in tag, and put the rest in a tuple named args.
do_foo(*args)
means pass of the members of args to do_foo as arguments.
So the list
records = [
('foo', 1, 2),
('bar', 'hello'),
('foo', 3, 4),
]
causes
foo(1, 2)
bar('hello')
foo(3, 4)

This is a feature available in Python 3+.
a,*b=[1,2,3,4]
print(a)
>>>1
print(b)
>>>[2,3,4]
Similarly,
a,*b,c=('foo',1,2,3,4)
print(a)
>>>'foo'
print(b)
>>>[1,2,3]
print(c)
>>>4
The python interpreter creates a list of appropriate size for the variable beginning with *. I hope the purpose of *args in the above loop is now clear.

Reversing a nested tuple in Python using the function reversed

I have a tuple and would like to reverse it in Python.
The tuple looks like this : (2, (4, (1, (10, None)))).
I tried reversing in Python by:
a = (2, (4, (1, (10, None))))
b = reversed(a)
It returns me this:
<reversed object at 0x02C73270>
How do I get the reverse of a? Or must I write a function to do this?
The result should look like this:
((((None, 10), 1), 4), 2)

def my_reverser(x):
try:
x_ = x[::-1]
except TypeError:
return x
else:
return x if len(x) == 1 else tuple(my_reverser(e) for e in x_)

Try this deep-reverse function:
def deep_reverse(t):
return tuple(deep_reverse(x) if isinstance(x, tuple) else x
for x in reversed(t))
This will handle arbitrarily nested tuples, not just two-tuples.

As explained in the documentation, the reversed function returns an iterator (hence the <reversed at ...>). If you want to get a list or a tuple out of it, just use list(reversed(...)) or tuple(reversed(...)).
However, it's only part of our problem: you'll be reversing the initial object (2, (...)) as (...,2), while the ... stays the same. You have to implement a recursive reverse: if one element of your input tuple is an iterable, you need to reverse it to.

It does not make sense to do this with reversed, sorry. But a simple recursive function would return what you want:
def reversedLinkedTuple(t):
if t is None:
return t
a, b = t
return reversedLinkedTuple(b), a
reversed is usable only on reversible iterable objects like lists, tuples and the like. What you are using (a linked list) isn't iterable in the sense of the Python built-in iter.
You could write a wrapping class for your linked list which implements this and then offers a reverse iterator, but I think that would be overkill and would not really suit your needs.

def reverse(x):
while x >= 0:
print(x)
x = x = 1
reverse(x)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generator expressions vs generator functions and surprisingly eager evaluation - python

Related

In Python, can a loop be used within a function?

Can generators be recursive?

How to convert recursive function to generator?

What is going on in this Python for loop?

Reversing a nested tuple in Python using the function reversed

Categories

Resources