Views in Python3.1? - python

What exactly are views in Python3.1? They seem to behave in a similar manner as that of iterators and they can be materialized into lists too. How are iterators and views different?

From what I can tell, a view is still attached to the object it was created from. Modifications to the original object affect the view.
from the docs (for dictionary views):
>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.keys()
>>> values = dishes.values()
>>> # iteration
>>> n = 0
>>> for val in values:
... n += val
>>> print(n)
504
>>> # keys and values are iterated over in the same order
>>> list(keys)
['eggs', 'bacon', 'sausage', 'spam']
>>> list(values)
[2, 1, 1, 500]
>>> # view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> del dishes['sausage']
>>> list(keys)
['spam', 'bacon']
>>> # set operations
>>> keys & {'eggs', 'bacon', 'salad'}
{'bacon'}

I would recommend that you read this. It seems to do the best job of explaining.
As far as I can tell, views seem to be associated more with dicts and can be forced into lists. You can also make an iterator out of them, through which you could then iterate (in a for loop or by calling next)
Update: updated link from wayback machine

How are iterators and views different?
I'll rephrase the question as "what's the difference between an iterable objects and an iterator"?
An iterable is an object that can be iterated over (e.g. used in a for loop).
An iterator is an object that can be called with the next() function, that is it implements the .next() method in Python2 and .__next__() in python3. An iterator is often used to wrap an iterable and return each item of interest. All iterators are iterable, but the reverse is not necessarily true (all iterables are not iterators).
Views are iterable objects, not iterators.
Let's look at some code to see the distinction (Python 3):
The "What's new in Python 3" document is very specific about which functions return iterators. map(), filter(), and zip() definitely return an iterator, whereas dict.items(), dict.values(), dict.keys() are said to return a view object. As for range(), although the description of what it returns exactly lacks precision, we know it's not an iterator.
Using map() to double all numbers in a list
m = map(lambda x: x*2, [0,1,2])
hasattr(m, '__next__')
# True
next(m)
# 0
next(m)
# 2
next(m)
# 4
next(m)
# StopIteration ...
Using filter() to extract all odd numbers
f = filter(lambda x: x%2==1, [0,1,2,3,4,5,6])
hasattr(f, '__next__')
# True
next(f)
# 1
next(f)
# 3
next(f)
# 5
next(f)
# StopIteration ...
Trying to use range() in the same manner to produce a sequence of number
r = range(3)
hasattr(r, '__next__')
# False
next(r)
# TypeError: 'range' object is not an iterator
But it's an iterable, so we should be able to wrap it with an iterator
it = iter(r)
next(it)
# 0
next(it)
# 1
next(it)
# 2
next(it)
# StopIteration ...
dict.items() as well as dict.keys() and dict.values() also do not return iterators in Python 3
d = {'a': 0, 'b': 1, 'c': 2}
items = d.items()
hasattr(items, '__next__')
# False
it = iter(items)
next(it)
# ('b', 1)
next(it)
# ('c', 2)
next(it)
# ('a', 0)
An iterator can only be used in a single for loop, whereas an iterable can be used repeatedly in subsequent for loops. Each time an iterable is used in this context it implicitely returns a new iterator (from its __iter__() method). The following custom class demonstrates this by outputting the memory id of both the list object and the returning iterator object:
class mylist(list):
def __iter__(self, *a, **kw):
print('id of iterable is still:', id(self))
rv = super().__iter__(*a, **kw)
print('id of iterator is now:', id(rv))
return rv
l = mylist('abc')
A for loop can use the iterable object and will implicitly get an iterator
for c in l:
print(c)
# id of iterable is still: 139696242511768
# id of iterator is now: 139696242308880
# a
# b
# c
A subsequent for loop can use the same iterable object, but will get another iterator
for c in l:
print(c)
# id of iterable is still: 139696242511768
# id of iterator is now: 139696242445616
# a
# b
# c
We can also obtain an iterator explicitly
it = iter(l)
# id of iterable is still: 139696242511768
# id of iterator is now: 139696242463688
but it can then only be used once
for c in it:
print(c)
# a
# b
# c
for c in it:
print(c)
for c in it:
print(c)

Related

is iterable object a copy of original object?

nums = [1,2,3,4,5]
it = iter(nums)
print(next(it))
print(next(it))
for i in nums:
print(i)
here the result is:
1
2
1
2
3
4
5
So my question is that when we apply iter method on a object then does it create a copy of object on which it runs next method?
iter(object) returns an iterator object which is an iterator version of the object given to it given that it implements __iter__. iter(object) doesn't create a copy of the object.
>>> l=[[1,2],[4,5]]
>>> it=iter(l)
>>>next(it).append(3) #appending to the output of next() mutates the list l
>>> l
[[1,2,3],[4,5]]
>>> next(it).append(6)
>>> l
[[1,2,3],[4,5,6]]
>>> it=iter(l)
>>> l.pop() #Mutating the list l mutated iterator it.
[4,5,6]
>>>list(it)
[[1,2,3]]
Here is one way to figure it out:
lst = ['Hi', 'I am a copy!']
itr = iter(lst)
print(next(itr))
lst[1] = 'I am _not_ a copy!'
print(next(itr))
(iter(lst) does not create a copy of lst)
No, they don't. Some Python types, e.g. all its collections, just support being iterated over multiple times. Multiple iterator objects can hold references to the very same list, they all just maintain their own position within the list.
Notice some effects:
lst = [1,2,3,4,5]
it = iter(lst)
lst.pop() # modify the original list
list(it) # the iterator is affected
# [1,2,3,4]
Even more obvious is the case of exhaustable iterators and calling iter on them:
it1 = iter(range(10))
it2 = iter(it1)
next(it)
# 0
next(it2)
# 1
next(it)
# 2
next(it2)
# 3
Clearly the iterators share state.
The = operator assigns values from right side operands to left side operands" i.e. c = a + b assigns value of a + b into c Operators
You're not altering any variables present in the right side of an assignment line, a copy of the value is having a function applied to it and then that result is being assigned the new variable name it.

Creating iterators from a generator returns the same object

Let's say I have a large list of data that I want to perform some operation on, and I would like to have multiple iterators performing this operation independently.
data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1 = iter(generator)
it2 = iter(generator)
I would expect these iterators to be different code objects, but it1 is it2 returns True... More confusingly, this is true for the following generators as well:
# copied data
gen = ((e, 2*e) for e in copy.deepcopy(data))
# temp object
gen = ((e, 2*e) for e in [1,2,3,4,5])
This means in practice that when I call next(it1), it2 is incremented as well, which is not the behavior I want.
What is going on here, and is there any way to do what I'm trying to do? I am using python 2.7 on Ubuntu 14.04.
Edit:
I just tried out the following as well:
gen = (e for e in [1,2,3,4,5])
it = iter(gen)
next(it)
next(it)
for e in gen:
print e
Which prints 3 4 5... Apparently generators are just a more constrained concept that I had imagined.
Generators are iterators. All well-behaved iterators have an __iter__ method that should simply
return self
From the docs
The iterator objects themselves are required to support the following
two methods, which together form the iterator protocol:
iterator.__iter__() Return the iterator object itself. This is
required to allow both containers and iterators to be used with the
for and in statements. This method corresponds to the tp_iter slot of
the type structure for Python objects in the Python/C API.
iterator.__next__() Return the next item from the container. If there
are no further items, raise the StopIteration exception. This method
corresponds to the tp_iternext slot of the type structure for Python
objects in the Python/C API.
So, consider another example of an iterator:
>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> it2 = iter(it)
>>> next(it)
1
>>> next(it2)
2
>>> it is it2
True
So, again, a list is iterable because it has an __iter__ method that returns an iterator. This iterator also has an __iter__ method, which should always return itself, but it also has a __next__ method.
So, consider:
>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> hasattr(x, '__iter__')
True
>>> hasattr(x, '__next__')
False
>>> hasattr(it, '__iter__')
True
>>> hasattr(it, '__next__')
True
>>> next(it)
1
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
And for a generator:
>>> g = (x**2 for x in range(10))
>>> g
<generator object <genexpr> at 0x104104390>
>>> hasattr(g, '__iter__')
True
>>> hasattr(g, '__next__')
True
>>> next(g)
0
Now, you are using generator expressions. But you can just use a generator function. The most straightforward way to accomplish what you are doing is just to use:
def paired(data):
for e in data:
yield (e, 2*e)
Then use:
it1 = paired(data)
it2 = paired(data)
Which in this case, it1 and it2 will be two separate iterator objects.
You are using the same generator for both iters. Calling iter(thing) returns the thing's iter if it has one, so, iter(generator) returns the same thing both times you call it. https://docs.python.org/3/library/stdtypes.html#generator-types
data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1 = iter(generator)
it2 = iter(generator)
type(it1)
generator
Here's two ways of getting a unique generators:
import itertools
data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1, it2 = itertools.tee(generator)
type(it1)
itertools._tee
or:
data = [1,2,3,4,5]
it1 = ((e, 2*e) for e in data)
it2 = ((e, 2*e) for e in data)
type(it1)
generator
both solutions produce this:
next(it1)
(1, 2)
next(it2)
(1, 2)

Why have an __iter__ method in Python?

Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself. If it is not an iterator but is instead an iterable, i.e something with an __iter__ and __getitem__ method, then why would one want to ever define something which returns an iterator but is not an iterator itself? In Python, when would one want to define an iterable that is not itself an iterator? Or, what is an example of something that is an iterable but not an iterator?
Trying to answer your questions one at a time:
Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself.
It's not pointless. The iterator protocol demands an __iter__ and __next__ (or next in Python 2) method. All sane iterators I have ever seen just return self in their __iter__ method, but it is still crucial to have that method. Not having it would lead to all kinds of weirdness, for example:
somelist = [1, 2, 3]
it = iter(somelist)
now
iter(it)
or
for x in it: pass
would throw a TypeError and complain that it is not iterable, because when iter(x) is called (which implicitly happens when you employ a for loop) it expects the argument object x to be able to produce an iterator (it just tries to call __iter__ on that object). Concrete example (Python 3):
>>> class A:
... def __iter__(self):
... return B()
...
>>> class B:
... def __next__(self):
... pass
...
>>> iter(iter(A()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'B' object is not iterable
Consider any functions, escpecially from itertools that expect an iterable, for example dropwhile. Calling it with any object that has an __iter__ method will be fine, regardless of whether it's an iterable that is not an iterator, or an iterator - because you can expect the same result when calling iter with that object as an argument. Making a weird distinction between two kinds of iterables here would go against the principle of duck typing which python strongly embraces.
Neat tricks like
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(zip(*[iter(a)]*3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
would just stop working if you could not pass iterators to zip.
why would one want to ever define something which returns an iterator but is not an iterator itself
Let's consider this simple list iterator:
>>> class MyList(list):
... def __iter__(self):
... return MyListIterator(self)
>>>
>>> class MyListIterator:
... def __init__(self, lst):
... self._lst = lst
... self.index = 0
... def __iter__(self):
... return self
... def __next__(self):
... try:
... n = self._lst[self.index]
... self.index += 1
... return n
... except IndexError:
... raise StopIteration
>>>
>>> a = MyList([1,2,3])
>>> for x in a:
... for x in a:
... x
...
1
2
3
1
2
3
1
2
3
Remember that iter is called with the iterable in question for both for loops, expecting a fresh iterator each time from the object's __iter__ method.
Now, without an iterator being produced each time a for loop is employed, how would you be able to keep track of the current state of any iteration when a MyList object is iterated over an arbitrary number of times at the same time? Oh, that's right, you can't. :)
edit: Bonus and sort of a reply to Tadhg McDonald-Jensen's comment
A resuable iterator is not unthinkable, but of course a bit weird because it would rely on being initialized with a "non-consumable" iterable (i.e. not a classic iterator):
>>> class riter(object):
... def __init__(self, iterable):
... self.iterable = iterable
... self.it = iter(iterable)
... def __next__(self): # python 2: next
... try:
... return next(self.it)
... except StopIteration:
... self.it = iter(self.iterable)
... raise
... def __iter__(self):
... return self
...
>>>
>>> a = [1, 2, 3]
>>> it = riter(a)
>>> for x in it:
... x
...
1
2
3
>>> for x in it:
... x
...
1
2
3
An iterable is something that can be iterated (looped) over, where as an iterator is something that is consumed.
what is an example of something that is an iterable but not an iterator?
Simple, a list. Or any sequence, since you can iterate over a list as many times as you want without destruction to the list:
>>> a = [1,2,3]
>>> for i in a:
print(i,end=" ")
1 2 3
>>> for i in a:
print(i,end=" ")
1 2 3
Where as an iterator (like a generator) can only be used once:
>>> b = (i for i in range(3))
>>> for i in b:
print(i,end=" ")
0 1 2
>>> for i in b:
print(i,end=" ")
>>> #iterator has already been used up, nothing gets printed
For a list to be consumed like an iterator you would need to use something like self.pop(0) to remove the first element of the list for iteration:
class IteratorList(list):
def __iter__(self):
return self #since the current mechanics require this
def __next__(self):
try:
return self.pop(0)
except IndexError: #we need to raise the expected kind of error
raise StopIteration
next = __next__ #for compatibility with python 2
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
if i==3: # lets stop at three and
break # see what the list is after
print(a)
which gives this output:
1
2
3
[4, 5]
You see? This is what iterators do, once a value is returned from __next__ it has no reason to hang around in the iterator or in memory, so it is removed. That's why we need the __iter__, to define iterators that let us iterate over sequences without destroying them in the process.
In response to #timgeb's comment, I suppose if you added items to an IteratorList then iterated over it again that would make sense:
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
a.extend([6,7,8,9])
for i in a:
print(i)
But all iterators only make sense to either be consumed or never end. (like itertools.repeat)
You are thinking in the wrong direction. The reason why an iterator has to implement __iter__ is that this way, both containers and iterators can be used in for and in statement.
> # list is a container
> list = [1,2,3]
> dir(list)
[...,
'__iter__',
'__getitem__',
...]
> # let's get its iterator
> it = iter(list)
> dir(it)
[...,
'__iter__',
'__next__',
...]
> # you can use the container directly:
> for i in list:
> print(i)
1
2
3
> # you can also use the iterator directly:
> for i in it:
> print(i)
1
2
3
> # the above will fail if it does not implement '__iter__'
And that is also why you simply need to return self in almost all implementations of an iterator. It is not meant for anything funky, just a little bit easiness on syntax.
Ref: https://docs.python.org/dev/library/stdtypes.html#iterator-types

How do I determine whether a container is infinitely recursive and find its smallest unique container?

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:
def flat(iterable):
try:
iter(iterable)
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:
def flatter(iterable):
try:
iter(iterable)
if isinstance(iterable, str):
raise TypeError
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
Now it works for strings as well. However, I then recalled that a list can contain references to itself.
>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True
So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.
The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.
def flattish(*i):
for item in i:
try: iter(item)
except: yield item
else:
if isinstance(item, str): yield item
else: yield from flattish(*item)
class Fake:
def __init__(self, l):
self.l = l
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.l):
raise StopIteration
else:
self.index +=1
return self.l[self.index-1]
def __str__(self):
return str(self.l)
def flatten_notype(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item
else:
yield from flatten(*item)
except TypeError:
yield item
def flatten(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
else:
yield from flatten(*item)
except TypeError:
yield item
f = Fake('abc')
print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake([1, 2, 3])
print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
I've also tried the following with the recursive lst defined above and flatten():
>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0
As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.
I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).
Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.
I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:
from collections import deque
def flat(iterable):
d = deque([iterable])
def _primitive(x):
return type(x) in (int, float, bool, str, unicode)
def _next():
x = d.popleft()
if _primitive(x):
return True, x
d.extend(x)
return False, None
while d:
ok, x = _next()
if ok:
yield x
xs = [1,[2], 'abc']
xs.insert(0, xs)
for p in flat(xs):
print p
The above definition of "primitive" is, well, primitive, but that surely can be improved.
The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]
The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.
Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.
To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.
However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.
If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.
There's an issue with your test code that's unrelated to the recursive container issue you're trying to solve. The issue is that your Fake class is an iterator and can only be used once. After you iterate over all its values, it will always raise StopIteration when you try to iterate on it again.
So if you do multiple operations on the same Fake instance, you shouldn't expect to get anything be empty output after the first operation has consumed the iterator. If you recreate the iterator before each operation, you won't have that problem (and you can actually try addressing the recursion issue).
So on to that issue. One way to avoid infinite recursion is to maintain a stack with the objects that you're currently nested in. If the next value you see is already on the stack somewhere, you know it's recursive and can skip it. Here's an implementation of this using a list as the stack:
def flatten(obj, stack=None):
if stack is None:
stack = []
if obj in stack:
yield obj
try:
it = iter(obj)
except TypeError:
yield obj
else:
stack.append(obj)
for item in it:
yield from flatten(item, stack)
stack.pop()
Note that this can still yield values from the same container more than once, as long as it's not nested within itself (e.g. for x=[1, 2]; y=[x, 3, x]; print(*flatten(y)) will print 1 2 3 1 2).
It also does recurse into strings, but it will only do so for only one level, so flatten("foo") will yield the letters 'f', 'o' and 'o' in turn. If you want to avoid that, you probably do need the function to be type aware, since from the iteration protocol's perspective, a string is not any different than an iterable container of its letters. It's only single character strings that recursively contain themselves.
What about something like this:
def flat(obj, used=[], old=None):
#This is to get inf. recurrences
if obj==old:
if obj not in used:
used.append(obj)
yield obj
raise StopIteration
try:
#Get strings
if isinstance(obj, str):
raise TypeError
#Try to iterate the obj
for item in obj:
yield from flat(item, used, obj)
except TypeError:
#Get non-iterable items
if obj not in used:
used.append(obj)
yield obj
After a finite number of (recursion) steps a list will contain at most itself as iterable element (Since we have to generate it in finite many steps). That's what we test for with obj==old where obj in an element of old.
The list used keeps track of all elements since we want each element only once. We could remove it but we'd get an ugly (and more importantly not well-defined) behaviour on which elements get yield how often.
Drawback is that we store the entire list at the end in the list used...
Testing this with some lists seems to work:
>> lst = [1]
>> lst.append(lst)
>> print('\nList1: ', lst)
>> print([x for x in flat(lst)])
List1: [1, [...]]
Elements: [1, [1, [...]]]
#We'd need to reset the iterator here!
>> lst2 = []
>> lst2.append(lst2)
>> lst2.append((1,'ab'))
>> lst2.append(lst)
>> lst2.append(3)
>> print('\nList2: ', lst2)
>> print([x for x in flat(lst2)])
List2: [[...], (1, 'ab'), [1, [...]], 3]
Elements: [[[...], (1, 'ab'), [1, [...]], 3], 1, 'ab', [1, [...]], 3]
Note: It actually makes sense that the infinite lists [[...], (1, 'ab'), [1, [...]], 3] and [1, [...]] are considered as elements since these actually contain themselves but if that's not desired one can comment out the first yield in the code above.
Just avoid flattening recurring containers. In the example below keepobj keeps track of them and keepcls ignores containers of a certain type. I believe this works down to python 2.3.
def flatten(item, keepcls=(), keepobj=()):
if not hasattr(item, '__iter__') or isinstance(item, keepcls) or item in keepobj:
yield item
else:
for i in item:
for j in flatten(i, keepcls, keepobj + (item,)):
yield j
It can flatten circular lists like lst = [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]] and keep some containers like strings and dicts un-flattened.
>>> list(flatten(l, keepcls=(dict, str)))
[1, 2, 5, 6, {'a': 1, 'b': 2}, 7, 'string', [1, 2, [5, 6, {'a': 1, 'b': 2}, 7, 'string'], [...]]]
It also works with the following case:
>>> list(flatten([[1,2],[1,[1,2]],[1,2]]))
[1, 2, 1, 1, 2, 1, 2]
You may want to keep some default classes in keepcls to make calling
the function more terse.

What is the purpose of __iter__ returning the iterator object itself?

I don't understand exactly why the __iter__ special method just returns the object it's called on (if it's called on an iterator). Is it essentially just a flag indicating that the object is an iterator?
EDIT: Actually, I discovered that "This is required to allow both containers and iterators to be used with the for and in statements." https://docs.python.org/3/library/stdtypes.html#iterator.iter
Alright, here's how I understand it: When writing a for loop, you're allowed to specify either an iterable or an iterator to loop over. But Python ultimately needs an iterator for the loop, so it calls the __iter__ method on whatever it's given. If it's been given an iterable, the __iter__ method will produce an iterator, and if it's been given an iterator, the __iter__ method will likewise produce an iterator (the original object given).
When you loop over something using for x in something, then the loop actually calls iter(something) first, so it has something to work with. In general, the for loop is approximately equivalent to something like this:
something_iterator = iter(something)
while True:
try:
x = next(something_iterator)
# loop body
except StopIteration:
break
So as you already figured out yourself, in order to be able to loop over an iterator, i.e. when something is already an iterator, iterators should always return themselves when calling iter() on them. So this basically makes sure that iterators are also iterable.
This depends what object you call iter on. If an object is already an iterator, then there is no operation required to convert it to an iterator, because it already is one. But if the object is not an iterator, but is iterable, then an iterator is constructed from the object.
A good example of this is the list object:
>>> x = [1, 2, 3]
>>> iter(x) == x
False
>>> iter(x)
<list_iterator object at 0x7fccadc5feb8>
>>> x
[1, 2, 3]
Lists are iterable, but they are not themselves iterators. The result of list.__iter__ is not the original list.
In Python when ever you try to use loops, or try to iterate over any object like below..
Lets try to understand for list object..
>>> l = [1, 2, 3] # Defined list l
If we iterate over the above list..
>>> for i in l:
... print i
...
1
2
3
When you try to do this iteration over list l, Python for loop checks for l.__iter__() which intern return an iterator object.
>>> for i in l.__iter__():
... print i
...
1
2
3
To understand this more, lets customize the list and create anew list class..
>>> class ListOverride(list):
... def __iter__(self):
... raise TypeError('Not iterable')
...
Here I've created ListOverride class which intern inherited from list and overrided list.__iter__ method to raise TypeError.
>>> ll = ListOverride([1, 2, 3])
>>> ll
[1, 2, 3]
And i've created anew list using ListOverride class, and since it's list object it should iterate in the same way as list does.
>>> for i in ll:
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __iter__
TypeError: Not iterable
If we try to iterate over ListOverride object ll, we'll endup getting NotIterable exception..

Categories