Related
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
I was looking into how the order in which you declare classes to inherit from affects Method Resolution Order (Detailed Here By Raymond Hettinger). I personally was using this to elegantly create an Ordered Counter via this code:
class OrderedCounter(Counter, OrderedDict):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> (1, 2) (2, 1) (3, 1)
I was trying to understand why the following didn't work similarly:
class OrderedCounter(OrderedDict, Counter):
pass
counts = OrderedCounter([1, 2, 3, 1])
print(*counts.items())
>>> TypeError: 'int' object is not iterable
While I understand that on a fundamental level this is because the OrderedCounter object is using the OrderedDict.__init__() function in the second example which according to the documentation only accepts "[items]". In the first example however the Counter.__init__() function is used which according to the documentation accepts "[iterable-or-mapping]" thus it can take the list as an input.
I wanted to further understand this interaction specifically though so I went to look at the actual source. When I looked at the OrderedDict.__init__() function I noticed that after some error handling it made a call to self.update(*args, **kwds). However, the code simply has the line update = MutableMapping.update which I can't find much documentation on.
I guess I would just like a more concrete answer as to why the second code block doesn't work.
Note: For context, I have a decent amount of programming experience but I'm new to python and OOP in Python
TLDR: How/Why does the Method Resolution Order interfere with the second code block?
In your second example, class OrderedCounter(OrderedDict, Counter): the object looks in OrderedDict first which uses the update method from MutableMapping.
MutableMapping is an Abstract Base Class in collections._abc. Its update method source is here. You can see that if the other argument is not a mapping it will try to iterate over other unpacking a key and value on each iteration.
for key, value in other:
self[key] = value
If other is a sequence of tuples it would work.
>>> other = ((1,2),(3,4))
>>> for key,value in other:
print(key,value)
1 2
3 4
>>>
But if other is a sequence of single items it will throw the error when it tries to unpack a single value into two names/variables.
>>> other = (1,2,3,4)
>>> for key,value in other:
print(key,value)
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
for key,value in other:
TypeError: cannot unpack non-iterable int object
>>>
Whearas collections.Counter's update method calls a different function if other is not a Mapping.
else:
_count_elements(self, iterable)
_count_elements adds keys for new items (with a count of zero) or adds one to the count of existing keys.
As you probably discovered if a class inherits from two classes it will look in the first class to find an attribute, if it isn't there it will look in the second class.
>>> class A:
def __init__(self):
pass
def f(self):
print('class A')
>>> class B:
def __init__(self):
pass
def f(self):
print('class B')
>>> class C(A,B):
pass
>>> c = C()
>>> c.f()
class A
>>> class D(B,A):
pass
>>> d = D()
>>> d.f()
class B
In mro, children precede their parents and the order of appearance in __bases__ is respected.
In the first example, Counter is a subclass of dict. When OrderedDict is provided along with Counter, the parent dict of Counter is replaced by OrderedDict and the code works seamlessly.
In the second example, OrderedDict is again a subclass of dict. When Counter is provided along with OrderedDict, it tries to replace the parent dict of OrderedDict with Counter, which is counter intuitive (pun intended). Hence the error!!
I hope this layman explaination helps you. Just think about that for a moment.
What is the logic for picking some methods to be prefixed with the items they are used with, but some are functions that need items as the arguments?
For example:
L=[1,4,3]
print len(L) #function(item)
L.sort() #item.method()
I thought maybe the functions that modify the item need to be prefixed while the ones that return information about the item use it as an argument, but I'm not too sure.
Edit:
What I'm trying to ask is why does python not have L.len()? What is the difference between the nature of the two kinds of functions? Or was it randomly chosen that some operations will be methods while some will be functions?
One of the principles behind Python is There is Only One Way to Do It. In particular, to get the length of a sequence (array / tuple / xrange...), you always use len, regardless of the sequence type.
However, sorting is not supporting on all of those sequence types. This makes it more suitable to being a method.
a = [0,1,2]
b = (0,1,2)
c = xrange(3)
d = "abc"
print len(a), len(b), len(c), len(d) # Ok
a.sort() # Ok
b.sort() # AttributeError: 'tuple' object has no attribute 'sort'
c.sort() # AttributeError: 'xrange' object has no attribute 'sort'
d.sort() # AttributeError: 'str' object has no attribute 'sort'
Something that may help you understand a bit better: http://www.tutorialspoint.com/python/python_classes_objects.htm
What you describe as item.function() is actually a method, which is defined by the class that said item belongs to. You need to form a comprehensive understanding of function, class, object, method and maybe more in Python.
Just conceptually speaking, when you call L.sort(), the sort() method of type/class list actually accepts an argument usually by convention called self that represents the object/instance of the type/class list, in this case L. And sort just like a standalone sorted function but just applies the sorting logic to L itself. Comparatively, sorted function would require an iterable (a list, for example), to be its required argument in order to function.
Code example:
my_list = [2, 1, 3]
# .sort() is a list method that applies the sorting logic to a
# specific instance of list, in this case, my_list
my_list.sort()
# sorted is a built-in function that's more generic which can be
# used on any iterable of Python, including list type
sorted(my_list)
There's a bigger difference between methods and functions than just their syntax.
def foo():
print "function!"
class ExampleClass(object):
def foo(self):
print "method!"
In this example, i defined a function foo and a class ExampleClass with 1 method, foo.
Let's try to use them:
>>> foo()
function!
>>> e = ExampleClass()
>>> e.foo()
method!
>>> l = [3,4,5]
>>> l.foo()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
l.foo()
AttributeError: 'list' object has no attribute 'foo'
>>>
Even though both have the same name, Python knows that if you do foo(), your calling a function, so it'll check if there's any function defined with that name.
And if you do a.foo(), it knows you're calling a method, so it'll check if there's a method foo defined for objects of the type a has, and if there is, it will call it. In the last example, we try that with a list and it gives us an error because lists don't have a foo method defined.
I need to create a class that returns one character at a time from a stream of data using the iterator protocol, but that also has a method analogous to a file object's tell(), returning what position it is currently at in the stream.
Basically, this is what I need:
>>> x = MyIterator('abcdefghijklmnopqrstuvwxyz')
>>> x.tell()
0
>>> next(x)
'a'
>>> next(x)
'b'
>>> next(x)
'c'
>>> x.tell()
3
>>> next(x)
'd'
>>> x.tell()
4
What is the most pythonic way to do this? Can I do it using a generator somehow, or do I have to create a class that implements the iterator protocol "by hand"? (i.e. a class with a __next__() method that manually raises StopIteration when it runs out of data)
Please note, this question is a bit of a simplification of what my actual needs are. I actually need the tell()-analogue to return a function of the stream position and not simply the position, so please don't tell me "just make the caller of my iterator use enumerate()" or something like that.
Can I do it using a generator somehow
Not really. (You could theoretically do it by storing the counter as a function attribute on the generator function, but trying to make a function refer to itself is a risky proposition.)
or do I have to create a class that implements the iterator protocol "by hand"? (i.e. a class with a __next__() method that manually raises StopIteration when it runs out of data)
Yes and no. Yes, you have to create an iterator class, but that doesn't mean you have to manually raise StopIteration. Assuming you want your iterator to wrap some source iterable, then you can just rely on that source to raise StopIteration. Here's a simple example:
class MyIter(object):
def __init__(self, source):
self.source = iter(source)
self.counter = 0
def next(self):
self.counter += 1
return next(self.source)
def __iter__(self):
return self
def tell(self):
return self.counter
(This uses next a la Python 2; for Python 3 you'd have to change def next to def __next__.)
Then:
>>> x = MyIter('abc')
>>> for item in x:
... print(item, x.tell())
a 1
b 2
c 3
If it's not too expensive to calculate it, have the iterator yield both the value and the position. Alternatively, provide two iteration methods, one that yields just the values and one that yields the values and the positions.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom