Can generators be recursive? - python

I naively tried to create a recursive generator. Didn't work. This is what I did:
def recursive_generator(lis):
yield lis[0]
recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(k)
All I got was the first item 6.
Is there a way to make such code work? Essentially transferring the yield command to the level above in a recursion scheme?

Try this:
def recursive_generator(lis):
yield lis[0]
yield from recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(k)
I should point out this doesn't work because of a bug in your function. It should probably include a check that lis isn't empty, as shown below:
def recursive_generator(lis):
if lis:
yield lis[0]
yield from recursive_generator(lis[1:])
In case you are on Python 2.7 and don't have yield from, check this question out.

Why your code didn't do the job
In your code, the generator function:
returns (yields) the first value of the list
then it creates a new iterator object calling the same generator function, passing a slice of the list to it
and then stops
The second instance of the iterator, the one recursively created, is never being iterated over. That's why you only got the first item of the list.
A generator function is useful to automatically create an iterator object (an object that implements the iterator protocol), but then you need to iterate over it: either manually calling the next() method on the object or by means of a loop statement that will automatically use the iterator protocol.
So, can we recursively call a generator?
The answer is yes. Now back to your code, if you really want to do this with a generator function, I guess you could try:
def recursive_generator(some_list):
"""
Return some_list items, one at a time, recursively iterating over a slice of it...
"""
if len(some_list)>1:
# some_list has more than one item, so iterate over it
for i in recursive_generator(some_list[1:]):
# recursively call this generator function to iterate over a slice of some_list.
# return one item from the list.
yield i
else:
# the iterator returned StopIteration, so the for loop is done.
# to finish, return the only value not included in the slice we just iterated on.
yield some_list[0]
else:
# some_list has only one item, no need to iterate on it.
# just return the item.
yield some_list[0]
some_list = [6,3,9,1]
for k in recursive_generator(some_list):
print(k)
Note: the items are returned in reversed order, so you might want to use some_list.reverse() before calling the generator the first time.
The important thing to note in this example is: the generator function recursively calls itself in a for loop, which sees an iterator and automatically uses the iteration protocol on it, so it actually gets values from it.
This works, but I think this is really not useful. We are using a generator function to iterate over a list and just get the items out, one at a time, but... a list is an iterable itself, so no need for generators!
Of course I get it, this is just an example, maybe there are useful applications of this idea.
Another example
Let's recycle the previous example (for lazyness). Lets say we need to print the items in a list, adding to every item the count of previous items (just a random example, not necessarily useful).
The code would be:
def recursive_generator(some_list):
"""
Return some_list items, one at a time, recursively iterating over a slice of it...
and adding to every item the count of previous items in the list
"""
if len(some_list)>1:
# some_list has more than one item, so iterate over it
for i in recursive_generator(some_list[1:]):
# recursively call this generator function to iterate over a slice of some_list.
# return one item from the list, but add 1 first.
# Every recursive iteration will add 1, so we basically add the count of iterations.
yield i + 1
else:
# the iterator returned StopIteration, so the for loop is done.
# to finish, return the only value not included in the slice we just iterated on.
yield some_list[0]
else:
# some_list has only one item, no need to iterate on it.
# just return the item.
yield some_list[0]
some_list = [6,3,9,1]
for k in recursive_generator(some_list):
print(k)
Now, as you can see, the generator function is actually doing something before returning list items AND the use of recursion starts to make sense. Still, just a stupid example, but you get the idea.
Note: off course, in this stupid example the list is expected to contain only numbers. If you really want to go try and break it, just put in a string in some_list and have fun. Again, this is only an example, not production code!

Recursive generators are useful for traversing non-linear structures. For example, let a binary tree be either None or a tuple of value, left tree, right tree. A recursive generator is the easiest way to visit all nodes. Example:
tree = (0, (1, None, (2, (3, None, None), (4, (5, None, None), None))),
(6, None, (7, (8, (9, None, None), None), None)))
def visit(tree): #
if tree is not None:
try:
value, left, right = tree
except ValueError: # wrong number to unpack
print("Bad tree:", tree)
else: # The following is one of 3 possible orders.
yield from visit(left)
yield value # Put this first or last for different orders.
yield from visit(right)
print(list(visit(tree)))
# prints nodes in the correct order for 'yield value' in the middle.
# [1, 3, 2, 5, 4, 0, 6, 9, 8, 7]
Edit: replace if tree with if tree is not None to catch other false values as errors.
Edit 2: about putting the recursive calls in the try: clause (comment by #jpmc26).
For bad nodes, the code above just logs the ValueError and continues. If, for instance, (9,None,None) is replaced by (9,None), the output is
Bad tree: (9, None)
[1, 3, 2, 5, 4, 0, 6, 8, 7]
More typical would be to reraise after logging, making the output be
Bad tree: (9, None)
Traceback (most recent call last):
File "F:\Python\a\tem4.py", line 16, in <module>
print(list(visit(tree)))
File "F:\Python\a\tem4.py", line 14, in visit
yield from visit(right)
File "F:\Python\a\tem4.py", line 14, in visit
yield from visit(right)
File "F:\Python\a\tem4.py", line 12, in visit
yield from visit(left)
File "F:\Python\a\tem4.py", line 12, in visit
yield from visit(left)
File "F:\Python\a\tem4.py", line 7, in visit
value, left, right = tree
ValueError: not enough values to unpack (expected 3, got 2)
The traceback gives the path from the root to the bad node. One could wrap the original visit(tree) call to reduce the traceback to the path: (root, right, right, left, left).
If the recursive calls are included in the try: clause, the error is recaught, relogged, and reraised at each level of the tree.
Bad tree: (9, None)
Bad tree: (8, (9, None), None)
Bad tree: (7, (8, (9, None), None), None)
Bad tree: (6, None, (7, (8, (9, None), None), None))
Bad tree: (0, (1, None, (2, (3, None, None), (4, (5, None, None), None))), (6, None, (7, (8, (9, None), None), None)))
Traceback (most recent call last):
... # same as before
The multiple logging reports are likely more noise than help. If one wants the path to the bad node, it might be easiest to wrap each recursive call in its own try: clause and raise a new ValueError at each level, with the contructed path so far.
Conclusion: if one is not using an exception for flow control (as may be done with IndexError, for instance) the presence and placements of try: statements depends on the error reporting one wants.

The reason your recursive call only executes once is that you are essentially creating nested generators. That is, you are creating a new generator inside a generator each time you call the function recursive_generator recursively.
Try the following and you will see.
def recursive_generator(lis):
yield lis[0]
yield recursive_generator(lis[1:])
for k in recursive_generator([6,3,9,1]):
print(type(k))
One simple solution, like others mention, is to use yield from.

Up to Python 3.4, a generator function used to have to raise StopIteration exception when it is done.
For the recursive case other exceptions (e.g. IndexError) are raised earlier than StopIteration, therefore we add it manually.
def recursive_generator(lis):
if not lis: raise StopIteration
yield lis[0]
yield from recursive_generator(lis[1:])
for k in recursive_generator([6, 3, 9, 1]):
print(k)
def recursive_generator(lis):
if not lis: raise StopIteration
yield lis.pop(0)
yield from recursive_generator(lis)
for k in recursive_generator([6, 3, 9, 1]):
print(k)
Note that for loop will catch StopIteration exception.
More about this here

Yes you can have recursive generators. However, they suffer from the same recursion depth limit as other recursive functions.
def recurse(x):
yield x
yield from recurse(x)
for (i, x) in enumerate(recurse(5)):
print(i, x)
This loop gets to about 3000 (for me) before crashing.
However, with some trickery, you can create a function that feeds a generator to itself. This allows you to write generators like they are recursive but are not: https://gist.github.com/3noch/7969f416d403ba3a54a788b113c204ce

Related

How to ignore the first yield value?

The following code is for illustrative purposes only.
def get_messages_from_redis():
for item in self.pubsub.listen():
yield (item['channel'], item['data']) # how to ignore the first yield?
I know the following way can ignore the first yield value:
g = get_messages_from_redis()
next(g)
But how to ignore this in get_messages_from_redis()?
(counter can be used to control whether to yield, but is there a better way?)
Iterate inside your function before yielding. I'm not sure what your iterable is exactly, but here's a generic example assuming a list.
def get_messages_from_redis():
for item in self.pubsub.listen()[1:]:
yield item['channel'], item['data']
For a more universal solution, you could create an iterator of your iterable, iterate over the first one, then loop and yield from there. Note: This is mostly for broader coverage, I'm not sure what negative consequences this might have with certain iterables.
def iter_skip_first(i):
iterable = iter(i)
next(iterable)
for i in iterable:
yield i
li = [1, 2, 3, 4]
d = {"one": 1, "two": 2, "three": 3, "four": 4}
print(*iter_skip_first(li))
print(*iter_skip_first(d))

What is the equivalent of recursion in a function with yield?

So i've been learning about generators and came across this code
def flatten(a):
for i in a:
try:
yield from flatten(i)
except TypeError:
yield i
a = list(flatten([1, 2, 2, 3, [1, 2, 68,[[98, 85 ,97],67]]]))
From what i understand right now the equivalent of calling a function back inside of the function itself is done with yield from flatten(i) in this case. I tried changing the code to yield flatten(i) but then a lot of weird things happen, instead of flattening my list the code returns a list of generator objects. Is there a reason for that? Is there something i am missing about generator functions?

Generator expressions vs generator functions and surprisingly eager evaluation

For reasons which are not relevant I am combining some data structures in a certain way, whilst also replacing the Python 2.7's default dict with OrderedDict. The data structures use tuples as keys in dictionaries. Please ignore those details (the replacement of the dict type is not useful below, but it is in the real code).
import __builtin__
import collections
import contextlib
import itertools
def combine(config_a, config_b):
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
#contextlib.contextmanager
def dict_as_ordereddict():
dict_orig = __builtin__.dict
try:
__builtin__.dict = collections.OrderedDict
yield
finally:
__builtin__.dict = dict_orig
This works as expected initially (dict can take non-string keyword arguments as a special case):
print 'one level nesting'
with dict_as_ordereddict():
result = combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
)
print list(result)
print
Output:
one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]
However, when nesting calls to the combine generator expression, it can be seen that the dict reference is treated as OrderedDict, lacking the special behaviour of dict to use tuples as keyword arguments:
print 'two level nesting'
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting
Traceback (most recent call last):
File "test.py", line 36, in <module>
[{(8, 9): 'e', (10, 11): 'f'}]
File "test.py", line 8, in combine
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
File "test.py", line 8, in <genexpr>
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings
Furthermore, implementing via yield instead of a generator expression fixes the problem:
def combine_yield(config_a, config_b):
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
print 'two level nesting, yield'
with dict_as_ordereddict():
result = combine_yield(combine_yield(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]
Questions:
Why does some item (only the first?) from the generator expression get evaluated before required in the second example, or what is it required for?
Why is it not evaluated in the first example? I actually expected this behaviour in both.
Why does the yield-based version work?
Before going into the details note the following: itertools.product evaluates the iterator arguments in order to compute the product. This can be seen from the equivalent Python implementation in the docs (the first line is relevant):
def product(*args, **kwds):
pools = map(tuple, args) * kwds.get('repeat', 1)
...
You can also try this with a custom class and a short test script:
import itertools
class Test:
def __init__(self):
self.x = 0
def __iter__(self):
return self
def next(self):
print('next item requested')
if self.x < 5:
self.x += 1
return self.x
raise StopIteration()
t = Test()
itertools.product(t, t)
Creating the itertools.product object will show in the output that all the iterators items are immediately requested.
This means, as soon as you call itertools.product the iterator arguments are evaluated. This is important because in the first case the arguments are just two lists and so there's no problem. Then you evaluate the final result via list(result after the context manager dict_as_ordereddict has returned and so all calls to dict will be resolved as the normal builtin dict.
Now for the second example the inner call to combine works still fine, now returning a generator expression which is then used as one of the arguments to the second combine's call to itertools.product. As we've seen above these arguments are immediately evaluated and so the generator object is asked to generate its values. In order to do so, it needs to resolve dict. However now we're still inside the context manager dict_as_ordereddict and for that reason dict will be resolved as OrderedDict which doesn't accept non-string keys for keyword arguments.
It is important to notice here that the first version which uses return needs to create the generator object in order to return it. That involves creating the itertools.product object. That means this version is as lazy as itertools.product.
Now to the question why the yield version works. By using yield, invoking the function will return a generator. Now this is a truly lazy version in the sense that execution of the function body doesn't start until items are requested. This means neither the inner nor the outer call to convert will start executing the function body and thus invoking itertools.product until the items are requested via list(result). You can check that by putting an additional print statement inside that function and right behind the context manager:
def combine(config_a, config_b):
print 'start'
# return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print 'end of context manager'
print list(result)
print
With the yield version we'll notice that it prints the following:
end of context manager
start
start
I.e. the generators are started only when the results are requested via list(result). This is different from the return version (uncomment in the above code). Now you'll see
start
start
and before the end of the context manager is reached the error is already raised.
On a side note, in order for your code to work, the replacement of dict needs to be ineffective (and it is for the first version), so I don't see why you would use that context manager at all. Secondly, dict literals are not ordered in Python 2, and neither are keyword arguments so that also defeats the purpose of using OrderedDict. Also note that in Python 3 that non-string keyword arguments behavior of dict has been removed and the clean way to update dictionaries of any keys is to use dict.update.

Lambda function error while using python3

I have a piece of code that works in Python 2.7 but not in Python3.7. Here I am trying to sort by values of a lambda function.
def get_nearest_available_slot(self):
"""Method to find nearest availability slot in parking
"""
available_slots = filter(lambda x: x.availability, self.slots.values())
if not available_slots:
return None
return sorted(available_slots, key=lambda x: x.slotNum)[0]
The error I get is:
File "/home/xyz/Desktop/parking-lot/parking-lot-1.4.2/parking_lot/bin/source/parking.py", line 45, in get_nearest_available_slot
return sorted(available_slots, key=lambda x: x.slotNum)[0]
IndexError: list index out of range
What am I doing wrong here?
The answer is simple: it's because of how filter works.
In Python 2, filter is eagerly evaluated, which means that once you call it, it returns a list:
filter(lambda x: x % 2 == 0, [1, 2, 3])
Output:
[2]
Conversely, in Python 3, filter is lazily evaluated; it produces an object you can iterate over once, or an iterator:
<filter at 0x110848f98>
In Python 2, the line if not available_slots stops execution if the result of filter is empty, since an empty list evaluates to False.
However, in Python 3, filter returns an iterator, which always evaluates to True, since you cannot tell if an iterator has been exhausted without trying to get the next element, and an iterator has no length. See this for more information.
Because of this, a case exists where an empty iterator gets passed to sorted, producing another empty list. You cannot access the element at position 0 of an empty list, so you get an IndexError.
To fix this, I suggest evaluating the condition strictly. You could do something like this, replacing sorted with min, since we only need one value:
def get_nearest_available_slot(self):
"""Method to find nearest availability slot in parking
"""
available_slots = [slot for slot in self.slots.values() if slot.availability]
if available_slots:
return min(available_slots, key=lambda x: x.slotNum)
else:
return None

How do I determine whether a container is infinitely recursive and find its smallest unique container?

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:
def flat(iterable):
try:
iter(iterable)
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:
def flatter(iterable):
try:
iter(iterable)
if isinstance(iterable, str):
raise TypeError
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
Now it works for strings as well. However, I then recalled that a list can contain references to itself.
>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True
So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.
The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.
def flattish(*i):
for item in i:
try: iter(item)
except: yield item
else:
if isinstance(item, str): yield item
else: yield from flattish(*item)
class Fake:
def __init__(self, l):
self.l = l
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.l):
raise StopIteration
else:
self.index +=1
return self.l[self.index-1]
def __str__(self):
return str(self.l)
def flatten_notype(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item
else:
yield from flatten(*item)
except TypeError:
yield item
def flatten(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
else:
yield from flatten(*item)
except TypeError:
yield item
f = Fake('abc')
print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake([1, 2, 3])
print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
I've also tried the following with the recursive lst defined above and flatten():
>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0
As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.
I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).
Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.
I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:
from collections import deque
def flat(iterable):
d = deque([iterable])
def _primitive(x):
return type(x) in (int, float, bool, str, unicode)
def _next():
x = d.popleft()
if _primitive(x):
return True, x
d.extend(x)
return False, None
while d:
ok, x = _next()
if ok:
yield x
xs = [1,[2], 'abc']
xs.insert(0, xs)
for p in flat(xs):
print p
The above definition of "primitive" is, well, primitive, but that surely can be improved.
The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]
The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.
Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.
To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.
However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.
If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.
There's an issue with your test code that's unrelated to the recursive container issue you're trying to solve. The issue is that your Fake class is an iterator and can only be used once. After you iterate over all its values, it will always raise StopIteration when you try to iterate on it again.
So if you do multiple operations on the same Fake instance, you shouldn't expect to get anything be empty output after the first operation has consumed the iterator. If you recreate the iterator before each operation, you won't have that problem (and you can actually try addressing the recursion issue).
So on to that issue. One way to avoid infinite recursion is to maintain a stack with the objects that you're currently nested in. If the next value you see is already on the stack somewhere, you know it's recursive and can skip it. Here's an implementation of this using a list as the stack:
def flatten(obj, stack=None):
if stack is None:
stack = []
if obj in stack:
yield obj
try:
it = iter(obj)
except TypeError:
yield obj
else:
stack.append(obj)
for item in it:
yield from flatten(item, stack)
stack.pop()
Note that this can still yield values from the same container more than once, as long as it's not nested within itself (e.g. for x=[1, 2]; y=[x, 3, x]; print(*flatten(y)) will print 1 2 3 1 2).
It also does recurse into strings, but it will only do so for only one level, so flatten("foo") will yield the letters 'f', 'o' and 'o' in turn. If you want to avoid that, you probably do need the function to be type aware, since from the iteration protocol's perspective, a string is not any different than an iterable container of its letters. It's only single character strings that recursively contain themselves.
What about something like this:
def flat(obj, used=[], old=None):
#This is to get inf. recurrences
if obj==old:
if obj not in used:
used.append(obj)
yield obj
raise StopIteration
try:
#Get strings
if isinstance(obj, str):
raise TypeError
#Try to iterate the obj
for item in obj:
yield from flat(item, used, obj)
except TypeError:
#Get non-iterable items
if obj not in used:
used.append(obj)
yield obj
After a finite number of (recursion) steps a list will contain at most itself as iterable element (Since we have to generate it in finite many steps). That's what we test for with obj==old where obj in an element of old.
The list used keeps track of all elements since we want each element only once. We could remove it but we'd get an ugly (and more importantly not well-defined) behaviour on which elements get yield how often.
Drawback is that we store the entire list at the end in the list used...
Testing this with some lists seems to work:
>> lst = [1]
>> lst.append(lst)
>> print('\nList1: ', lst)
>> print([x for x in flat(lst)])
List1: [1, [...]]
Elements: [1, [1, [...]]]
#We'd need to reset the iterator here!
>> lst2 = []
>> lst2.append(lst2)
>> lst2.append((1,'ab'))
>> lst2.append(lst)
>> lst2.append(3)
>> print('\nList2: ', lst2)
>> print([x for x in flat(lst2)])
List2: [[...], (1, 'ab'), [1, [...]], 3]
Elements: [[[...], (1, 'ab'), [1, [...]], 3], 1, 'ab', [1, [...]], 3]
Note: It actually makes sense that the infinite lists [[...], (1, 'ab'), [1, [...]], 3] and [1, [...]] are considered as elements since these actually contain themselves but if that's not desired one can comment out the first yield in the code above.
Just avoid flattening recurring containers. In the example below keepobj keeps track of them and keepcls ignores containers of a certain type. I believe this works down to python 2.3.
def flatten(item, keepcls=(), keepobj=()):
if not hasattr(item, '__iter__') or isinstance(item, keepcls) or item in keepobj:
yield item
else:
for i in item:
for j in flatten(i, keepcls, keepobj + (item,)):
yield j
It can flatten circular lists like lst = [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]] and keep some containers like strings and dicts un-flattened.
>>> list(flatten(l, keepcls=(dict, str)))
[1, 2, 5, 6, {'a': 1, 'b': 2}, 7, 'string', [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]]]
It also works with the following case:
>>> list(flatten([[1,2],[1,[1,2]],[1,2]]))
[1, 2, 1, 1, 2, 1, 2]
You may want to keep some default classes in keepcls to make calling
the function more terse.

Categories