The same function for iterable obects and non iterable objects [duplicate] - python

Is there a method like isiterable? The only solution I have found so far is to call
hasattr(myObj, '__iter__')
But I am not sure how fool-proof this is.

Checking for __iter__ works on sequence types, but it would fail on e.g. strings in Python 2. I would like to know the right answer too, until then, here is one possibility (which would work on strings, too):
try:
some_object_iterator = iter(some_object)
except TypeError as te:
print(some_object, 'is not iterable')
The iter built-in checks for the __iter__ method or in the case of strings the __getitem__ method.
Another general pythonic approach is to assume an iterable, then fail gracefully if it does not work on the given object. The Python glossary:
Pythonic programming style that determines an object's type by inspection of its method or attribute signature rather than by explicit relationship to some type object ("If it looks like a duck and quacks like a duck, it must be a duck.") By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using type() or isinstance(). Instead, it typically employs the EAFP (Easier to Ask Forgiveness than Permission) style of programming.
...
try:
_ = (e for e in my_object)
except TypeError:
print my_object, 'is not iterable'
The collections module provides some abstract base classes, which allow to ask classes or instances if they provide particular functionality, for example:
from collections.abc import Iterable
if isinstance(e, Iterable):
# e is iterable
However, this does not check for classes that are iterable through __getitem__.

Duck typing
try:
iterator = iter(the_element)
except TypeError:
# not iterable
else:
# iterable
# for obj in iterator:
# pass
Type checking
Use the Abstract Base Classes. They need at least Python 2.6 and work only for new-style classes.
from collections.abc import Iterable # import directly from collections for Python < 3.3
if isinstance(the_element, Iterable):
# iterable
else:
# not iterable
However, iter() is a bit more reliable as described by the documentation:
Checking isinstance(obj, Iterable) detects classes that are
registered as Iterable or that have an __iter__() method, but
it does not detect classes that iterate with the __getitem__()
method. The only reliable way to determine whether an object
is iterable is to call iter(obj).

I'd like to shed a little bit more light on the interplay of iter, __iter__ and __getitem__ and what happens behind the curtains. Armed with that knowledge, you will be able to understand why the best you can do is
try:
iter(maybe_iterable)
print('iteration will probably work')
except TypeError:
print('not iterable')
I will list the facts first and then follow up with a quick reminder of what happens when you employ a for loop in python, followed by a discussion to illustrate the facts.
Facts
You can get an iterator from any object o by calling iter(o) if at least one of the following conditions holds true: a) o has an __iter__ method which returns an iterator object. An iterator is any object with an __iter__ and a __next__ (Python 2: next) method. b) o has a __getitem__ method.
Checking for an instance of Iterable or Sequence, or checking for the
attribute __iter__ is not enough.
If an object o implements only __getitem__, but not __iter__, iter(o) will construct
an iterator that tries to fetch items from o by integer index, starting at index 0. The iterator will catch any IndexError (but no other errors) that is raised and then raises StopIteration itself.
In the most general sense, there's no way to check whether the iterator returned by iter is sane other than to try it out.
If an object o implements __iter__, the iter function will make sure
that the object returned by __iter__ is an iterator. There is no sanity check
if an object only implements __getitem__.
__iter__ wins. If an object o implements both __iter__ and __getitem__, iter(o) will call __iter__.
If you want to make your own objects iterable, always implement the __iter__ method.
for loops
In order to follow along, you need an understanding of what happens when you employ a for loop in Python. Feel free to skip right to the next section if you already know.
When you use for item in o for some iterable object o, Python calls iter(o) and expects an iterator object as the return value. An iterator is any object which implements a __next__ (or next in Python 2) method and an __iter__ method.
By convention, the __iter__ method of an iterator should return the object itself (i.e. return self). Python then calls next on the iterator until StopIteration is raised. All of this happens implicitly, but the following demonstration makes it visible:
import random
class DemoIterable(object):
def __iter__(self):
print('__iter__ called')
return DemoIterator()
class DemoIterator(object):
def __iter__(self):
return self
def __next__(self):
print('__next__ called')
r = random.randint(1, 10)
if r == 5:
print('raising StopIteration')
raise StopIteration
return r
Iteration over a DemoIterable:
>>> di = DemoIterable()
>>> for x in di:
... print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration
Discussion and illustrations
On point 1 and 2: getting an iterator and unreliable checks
Consider the following class:
class BasicIterable(object):
def __getitem__(self, item):
if item == 3:
raise IndexError
return item
Calling iter with an instance of BasicIterable will return an iterator without any problems because BasicIterable implements __getitem__.
>>> b = BasicIterable()
>>> iter(b)
<iterator object at 0x7f1ab216e320>
However, it is important to note that b does not have the __iter__ attribute and is not considered an instance of Iterable or Sequence:
>>> from collections import Iterable, Sequence
>>> hasattr(b, '__iter__')
False
>>> isinstance(b, Iterable)
False
>>> isinstance(b, Sequence)
False
This is why Fluent Python by Luciano Ramalho recommends calling iter and handling the potential TypeError as the most accurate way to check whether an object is iterable. Quoting directly from the book:
As of Python 3.4, the most accurate way to check whether an object x is iterable is to call iter(x) and handle a TypeError exception if it isn’t. This is more accurate than using isinstance(x, abc.Iterable) , because iter(x) also considers the legacy __getitem__ method, while the Iterable ABC does not.
On point 3: Iterating over objects which only provide __getitem__, but not __iter__
Iterating over an instance of BasicIterable works as expected: Python
constructs an iterator that tries to fetch items by index, starting at zero, until an IndexError is raised. The demo object's __getitem__ method simply returns the item which was supplied as the argument to __getitem__(self, item) by the iterator returned by iter.
>>> b = BasicIterable()
>>> it = iter(b)
>>> next(it)
0
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Note that the iterator raises StopIteration when it cannot return the next item and that the IndexError which is raised for item == 3 is handled internally. This is why looping over a BasicIterable with a for loop works as expected:
>>> for x in b:
... print(x)
...
0
1
2
Here's another example in order to drive home the concept of how the iterator returned by iter tries to access items by index. WrappedDict does not inherit from dict, which means instances won't have an __iter__ method.
class WrappedDict(object): # note: no inheritance from dict!
def __init__(self, dic):
self._dict = dic
def __getitem__(self, item):
try:
return self._dict[item] # delegate to dict.__getitem__
except KeyError:
raise IndexError
Note that calls to __getitem__ are delegated to dict.__getitem__ for which the square bracket notation is simply a shorthand.
>>> w = WrappedDict({-1: 'not printed',
... 0: 'hi', 1: 'StackOverflow', 2: '!',
... 4: 'not printed',
... 'x': 'not printed'})
>>> for x in w:
... print(x)
...
hi
StackOverflow
!
On point 4 and 5: iter checks for an iterator when it calls __iter__:
When iter(o) is called for an object o, iter will make sure that the return value of __iter__, if the method is present, is an iterator. This means that the returned object
must implement __next__ (or next in Python 2) and __iter__. iter cannot perform any sanity checks for objects which only
provide __getitem__, because it has no way to check whether the items of the object are accessible by integer index.
class FailIterIterable(object):
def __iter__(self):
return object() # not an iterator
class FailGetitemIterable(object):
def __getitem__(self, item):
raise Exception
Note that constructing an iterator from FailIterIterable instances fails immediately, while constructing an iterator from FailGetItemIterable succeeds, but will throw an Exception on the first call to __next__.
>>> fii = FailIterIterable()
>>> iter(fii)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: iter() returned non-iterator of type 'object'
>>>
>>> fgi = FailGetitemIterable()
>>> it = iter(fgi)
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/iterdemo.py", line 42, in __getitem__
raise Exception
Exception
On point 6: __iter__ wins
This one is straightforward. If an object implements __iter__ and __getitem__, iter will call __iter__. Consider the following class
class IterWinsDemo(object):
def __iter__(self):
return iter(['__iter__', 'wins'])
def __getitem__(self, item):
return ['__getitem__', 'wins'][item]
and the output when looping over an instance:
>>> iwd = IterWinsDemo()
>>> for x in iwd:
... print(x)
...
__iter__
wins
On point 7: your iterable classes should implement __iter__
You might ask yourself why most builtin sequences like list implement an __iter__ method when __getitem__ would be sufficient.
class WrappedList(object): # note: no inheritance from list!
def __init__(self, lst):
self._list = lst
def __getitem__(self, item):
return self._list[item]
After all, iteration over instances of the class above, which delegates calls to __getitem__ to list.__getitem__ (using the square bracket notation), will work fine:
>>> wl = WrappedList(['A', 'B', 'C'])
>>> for x in wl:
... print(x)
...
A
B
C
The reasons your custom iterables should implement __iter__ are as follows:
If you implement __iter__, instances will be considered iterables, and isinstance(o, collections.abc.Iterable) will return True.
If the object returned by __iter__ is not an iterator, iter will fail immediately and raise a TypeError.
The special handling of __getitem__ exists for backwards compatibility reasons. Quoting again from Fluent Python:
That is why any Python sequence is iterable: they all implement __getitem__ . In fact,
the standard sequences also implement __iter__, and yours should too, because the
special handling of __getitem__ exists for backward compatibility reasons and may be
gone in the future (although it is not deprecated as I write this).

I've been studying this problem quite a bit lately. Based on that my conclusion is that nowadays this is the best approach:
from collections.abc import Iterable # drop `.abc` with Python 2.7 or lower
def iterable(obj):
return isinstance(obj, Iterable)
The above has been recommended already earlier, but the general consensus has been that using iter() would be better:
def iterable(obj):
try:
iter(obj)
except Exception:
return False
else:
return True
We've used iter() in our code as well for this purpose, but I've lately started to get more and more annoyed by objects which only have __getitem__ being considered iterable. There are valid reasons to have __getitem__ in a non-iterable object and with them the above code doesn't work well. As a real life example we can use Faker. The above code reports it being iterable but actually trying to iterate it causes an AttributeError (tested with Faker 4.0.2):
>>> from faker import Faker
>>> fake = Faker()
>>> iter(fake) # No exception, must be iterable
<iterator object at 0x7f1c71db58d0>
>>> list(fake) # Ooops
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/.../site-packages/faker/proxy.py", line 59, in __getitem__
return self._factory_map[locale.replace('-', '_')]
AttributeError: 'int' object has no attribute 'replace'
If we'd use insinstance(), we wouldn't accidentally consider Faker instances (or any other objects having only __getitem__) to be iterable:
>>> from collections.abc import Iterable
>>> from faker import Faker
>>> isinstance(Faker(), Iterable)
False
Earlier answers commented that using iter() is safer as the old way to implement iteration in Python was based on __getitem__ and the isinstance() approach wouldn't detect that. This may have been true with old Python versions, but based on my pretty exhaustive testing isinstance() works great nowadays. The only case where isinstance() didn't work but iter() did was with UserDict when using Python 2. If that's relevant, it's possible to use isinstance(item, (Iterable, UserDict)) to get that covered.

Since Python 3.5 you can use the typing module from the standard library for type related things:
from typing import Iterable
...
if isinstance(my_item, Iterable):
print(True)

This isn't sufficient: the object returned by __iter__ must implement the iteration protocol (i.e. next method). See the relevant section in the documentation.
In Python, a good practice is to "try and see" instead of "checking".

In Python <= 2.5, you can't and shouldn't - iterable was an "informal" interface.
But since Python 2.6 and 3.0 you can leverage the new ABC (abstract base class) infrastructure along with some builtin ABCs which are available in the collections module:
from collections import Iterable
class MyObject(object):
pass
mo = MyObject()
print isinstance(mo, Iterable)
Iterable.register(MyObject)
print isinstance(mo, Iterable)
print isinstance("abc", Iterable)
Now, whether this is desirable or actually works, is just a matter of conventions. As you can see, you can register a non-iterable object as Iterable - and it will raise an exception at runtime. Hence, isinstance acquires a "new" meaning - it just checks for "declared" type compatibility, which is a good way to go in Python.
On the other hand, if your object does not satisfy the interface you need, what are you going to do? Take the following example:
from collections import Iterable
from traceback import print_exc
def check_and_raise(x):
if not isinstance(x, Iterable):
raise TypeError, "%s is not iterable" % x
else:
for i in x:
print i
def just_iter(x):
for i in x:
print i
class NotIterable(object):
pass
if __name__ == "__main__":
try:
check_and_raise(5)
except:
print_exc()
print
try:
just_iter(5)
except:
print_exc()
print
try:
Iterable.register(NotIterable)
ni = NotIterable()
check_and_raise(ni)
except:
print_exc()
print
If the object doesn't satisfy what you expect, you just throw a TypeError, but if the proper ABC has been registered, your check is unuseful. On the contrary, if the __iter__ method is available Python will automatically recognize object of that class as being Iterable.
So, if you just expect an iterable, iterate over it and forget it. On the other hand, if you need to do different things depending on input type, you might find the ABC infrastructure pretty useful.

try:
#treat object as iterable
except TypeError, e:
#object is not actually iterable
Don't run checks to see if your duck really is a duck to see if it is iterable or not, treat it as if it was and complain if it wasn't.

You could try this:
def iterable(a):
try:
(x for x in a)
return True
except TypeError:
return False
If we can make a generator that iterates over it (but never use the generator so it doesn't take up space), it's iterable. Seems like a "duh" kind of thing. Why do you need to determine if a variable is iterable in the first place?

The best solution I've found so far:
hasattr(obj, '__contains__')
which basically checks if the object implements the in operator.
Advantages (none of the other solutions has all three):
it is an expression (works as a lambda, as opposed to the try...except variant)
it is (should be) implemented by all iterables, including strings (as opposed to __iter__)
works on any Python >= 2.5
Notes:
the Python philosophy of "ask for forgiveness, not permission" doesn't work well when e.g. in a list you have both iterables and non-iterables and you need to treat each element differently according to it's type (treating iterables on try and non-iterables on except would work, but it would look butt-ugly and misleading)
solutions to this problem which attempt to actually iterate over the object (e.g. [x for x in obj]) to check if it's iterable may induce significant performance penalties for large iterables (especially if you just need the first few elements of the iterable, for example) and should be avoided

I found a nice solution here:
isiterable = lambda obj: isinstance(obj, basestring) \
or getattr(obj, '__iter__', False)

According to the Python 2 Glossary, iterables are
all sequence types (such as list, str, and tuple) and some non-sequence types like dict and file and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ...). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object.
Of course, given the general coding style for Python based on the fact that it's “Easier to ask for forgiveness than permission.”, the general expectation is to use
try:
for i in object_in_question:
do_something
except TypeError:
do_something_for_non_iterable
But if you need to check it explicitly, you can test for an iterable by hasattr(object_in_question, "__iter__") or hasattr(object_in_question, "__getitem__"). You need to check for both, because strs don't have an __iter__ method (at least not in Python 2, in Python 3 they do) and because generator objects don't have a __getitem__ method.

I often find convenient, inside my scripts, to define an iterable function.
(Now incorporates Alfe's suggested simplification):
import collections
def iterable(obj):
return isinstance(obj, collections.Iterable):
so you can test if any object is iterable in the very readable form
if iterable(obj):
# act on iterable
else:
# not iterable
as you would do with thecallable function
EDIT: if you have numpy installed, you can simply do: from numpy import iterable,
which is simply something like
def iterable(obj):
try: iter(obj)
except: return False
return True
If you do not have numpy, you can simply implement this code, or the one above.

pandas has a built-in function like that:
from pandas.util.testing import isiterable

It's always eluded me as to why python has callable(obj) -> bool but not iterable(obj) -> bool...
surely it's easier to do hasattr(obj,'__call__') even if it is slower.
Since just about every other answer recommends using try/except TypeError, where testing for exceptions is generally considered bad practice among any language, here's an implementation of iterable(obj) -> bool I've grown more fond of and use often:
For python 2's sake, I'll use a lambda just for that extra performance boost...
(in python 3 it doesn't matter what you use for defining the function, def has roughly the same speed as lambda)
iterable = lambda obj: hasattr(obj,'__iter__') or hasattr(obj,'__getitem__')
Note that this function executes faster for objects with __iter__ since it doesn't test for __getitem__.
Most iterable objects should rely on __iter__ where special-case objects fall back to __getitem__, though either is required for an object to be iterable.
(and since this is standard, it affects C objects as well)

def is_iterable(x):
try:
0 in x
except TypeError:
return False
else:
return True
This will say yes to all manner of iterable objects, but it will say no to strings in Python 2. (That's what I want for example when a recursive function could take a string or a container of strings. In that situation, asking forgiveness may lead to obfuscode, and it's better to ask permission first.)
import numpy
class Yes:
def __iter__(self):
yield 1;
yield 2;
yield 3;
class No:
pass
class Nope:
def __iter__(self):
return 'nonsense'
assert is_iterable(Yes())
assert is_iterable(range(3))
assert is_iterable((1,2,3)) # tuple
assert is_iterable([1,2,3]) # list
assert is_iterable({1,2,3}) # set
assert is_iterable({1:'one', 2:'two', 3:'three'}) # dictionary
assert is_iterable(numpy.array([1,2,3]))
assert is_iterable(bytearray("not really a string", 'utf-8'))
assert not is_iterable(No())
assert not is_iterable(Nope())
assert not is_iterable("string")
assert not is_iterable(42)
assert not is_iterable(True)
assert not is_iterable(None)
Many other strategies here will say yes to strings. Use them if that's what you want.
import collections
import numpy
assert isinstance("string", collections.Iterable)
assert isinstance("string", collections.Sequence)
assert numpy.iterable("string")
assert iter("string")
assert hasattr("string", '__getitem__')
Note: is_iterable() will say yes to strings of type bytes and bytearray.
bytes objects in Python 3 are iterable True == is_iterable(b"string") == is_iterable("string".encode('utf-8')) There is no such type in Python 2.
bytearray objects in Python 2 and 3 are iterable True == is_iterable(bytearray(b"abc"))
The O.P. hasattr(x, '__iter__') approach will say yes to strings in Python 3 and no in Python 2 (no matter whether '' or b'' or u''). Thanks to #LuisMasuelli for noticing it will also let you down on a buggy __iter__.

There are a lot of ways to check if an object is iterable:
from collections.abc import Iterable
myobject = 'Roster'
if isinstance(myobject , Iterable):
print(f"{myobject } is iterable")
else:
print(f"strong text{myobject } is not iterable")

The easiest way, respecting the Python's duck typing, is to catch the error (Python knows perfectly what does it expect from an object to become an iterator):
class A(object):
def __getitem__(self, item):
return something
class B(object):
def __iter__(self):
# Return a compliant iterator. Just an example
return iter([])
class C(object):
def __iter__(self):
# Return crap
return 1
class D(object): pass
def iterable(obj):
try:
iter(obj)
return True
except:
return False
assert iterable(A())
assert iterable(B())
assert iterable(C())
assert not iterable(D())
Notes:
It is irrelevant the distinction whether the object is not iterable, or a buggy __iter__ has been implemented, if the exception type is the same: anyway you will not be able to iterate the object.
I think I understand your concern: How does callable exists as a check if I could also rely on duck typing to raise an AttributeError if __call__ is not defined for my object, but that's not the case for iterable checking?
I don't know the answer, but you can either implement the function I (and other users) gave, or just catch the exception in your code (your implementation in that part will be like the function I wrote - just ensure you isolate the iterator creation from the rest of the code so you can capture the exception and distinguish it from another TypeError.

The isiterable func at the following code returns True if object is iterable. if it's not iterable returns False
def isiterable(object_):
return hasattr(type(object_), "__iter__")
example
fruits = ("apple", "banana", "peach")
isiterable(fruits) # returns True
num = 345
isiterable(num) # returns False
isiterable(str) # returns False because str type is type class and it's not iterable.
hello = "hello dude !"
isiterable(hello) # returns True because as you know string objects are iterable

Instead of checking for the __iter__ attribute, you could check for the __len__ attribute, which is implemented by every python builtin iterable, including strings.
>>> hasattr(1, "__len__")
False
>>> hasattr(1.3, "__len__")
False
>>> hasattr("a", "__len__")
True
>>> hasattr([1,2,3], "__len__")
True
>>> hasattr({1,2}, "__len__")
True
>>> hasattr({"a":1}, "__len__")
True
>>> hasattr(("a", 1), "__len__")
True
None-iterable objects would not implement this for obvious reasons. However, it does not catch user-defined iterables that do not implement it, nor do generator expressions, which iter can deal with. However, this can be done in a line, and adding a simple or expression checking for generators would fix this problem. (Note that writing type(my_generator_expression) == generator would throw a NameError. Refer to this answer instead.)
You can use GeneratorType from types:
>>> import types
>>> types.GeneratorType
<class 'generator'>
>>> gen = (i for i in range(10))
>>> isinstance(gen, types.GeneratorType)
True
--- accepted answer by utdemir
(This makes it useful for checking if you can call len on the object though.)

Not really "correct" but can serve as quick check of most common types like strings, tuples, floats, etc...
>>> '__iter__' in dir('sds')
True
>>> '__iter__' in dir(56)
False
>>> '__iter__' in dir([5,6,9,8])
True
>>> '__iter__' in dir({'jh':'ff'})
True
>>> '__iter__' in dir({'jh'})
True
>>> '__iter__' in dir(56.9865)
False

In my code I used to check for non iterable objects:
hasattr(myobject,'__trunc__')
This is quite quick and can be used to check for iterables too (use not).
I'm not 100% sure if this solution works for all objects, maybe other can give a some more background on it. __trunc__ method seams to be related to numerical types (all objects that can be rounded to integers needs it). But I didn't found any object that contains __trunc__ together with __iter__ or __getitem__.

Kinda late to the party but I asked myself this question and saw this then thought of an answer. I don't know if someone already posted this. But essentially, I've noticed that all iterable types have __getitem__() in their dict. This is how you would check if an object was an iterable without even trying. (Pun intended)
def is_attr(arg):
return '__getitem__' in dir(arg)

Related

When is it better to use an iterator than a generator? [duplicate]

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Wrapping big list in Python 2.7 to make it immutable

In case I have a really big list (>100k elements) that can be retrieved from some object through function call, is there a way to wrap that list to make it immutable to the caller without copying it to tuple?
In the following example I have only one list field, but the solution should work for any number of list fields.
class NumieHolder(object):
def __init__(self):
self._numies = []
def add(self, new_numie):
self._numies.append(new_numie)
#property
def numies(self):
# return numies embedded in immutable wrapper/view
return ??? numies ???
if __name__ == '__main__':
nh = NumieHolder()
for numie in xrange(100001): # >100k holds
nh.add(numie)
# messing with numies should result in exception
nh.numies[3] = 4
# but I still want to use index operator
print '100th numie:', nh.numies[99]
I would know how to write adapter that behaves that way, but I'm interested if there is already some standard solution (i.e. in standard library or widely known library) I'm not aware of.
Unfortunately, there is no such wrapper in the standard library (or other prominent libraries). The main reason is that list is supposed to be a mutable sequence type with index access. The immutable sequence type would be a tuple as you already said yourself. So usually, the standard approach to make a list immutable would be to make it into a tuple by calling tuple(lst).
This is obviously not what you want, as you want to avoid to copy all the elements. So instead, you can create a custom type that wraps the list, and offers all non-modifying methods list also supports:
class ImmutableList:
def __init__ (self, actualList):
self.__lst = actualList
def __len__ (self):
return self.__lst.__len__()
def __getitem__ (self, key):
return self.__lst.__getitem__(key)
def __iter__ (self):
return self.__lst.__iter__()
def __reversed__ (self):
return self.__lst.__reversed__()
def __contains__ (self, item):
return self.__lst.__contains__(item)
def __repr__ (self):
return self.__lst.__repr__()
def __str__ (self):
return self.__lst.__str__()
>>> original = [1, 2, 3, 4]
>>> immutable = ImmutableList(original)
>>> immutable
[1, 2, 3, 4]
>>> immutable[2]
3
>>> for i in immutable:
print(i, end='; ')
1; 2; 3; 4;
>>> list(reversed(immutable))
[4, 3, 2, 1]
>>> immutable[1] = 4
Traceback (most recent call last):
File "<pyshell#39>", line 1, in <module>
immutable[1] = 4
TypeError: 'ImmutableList' object does not support item assignment
The alternative would be to subtype list and override __setitem__ and __delitem__ to raise an exception instead, but I would suggest against that, as a subtype of list would be expected to have the same interface as list itself. ImmutableList above on the other hand is just some indexable sequence type which happens to wrap a real list itself. Apart from that, having it as a subtype of list would actually require you to copy the contents once, so wrapping is definitely better if you don’t want to recreate all those items (which seems to be your point—otherwise you could just use tuple).
See Emulating Container Types for the "special" methods you'd want to implement or override. Namely, you'd want to implement __setitem__ and __delitem__ methods to raise an exception, so the list cannot be modified.

How do I get an iterator over the keys/indexes of an arbitrary Python object?

I'm working on a project where I need a method that takes an arbitrary Python object, and if that object acts like a dict, list, or tuple -- meaning that it supports the idea of accessing members of a collection via a key or index -- my method should return an iterator that can traverse the object's key-value or index-value pairs. It would also be fine for my purposes if the iterator simply traversed the objects keys or indexes. Here's the code I've got so far:
from collections import Mapping, Sequence
# Tuple used to identify string-like objects in Python 2 or 3.
STRINGS = (str, unicode) if str is bytes else (str, bytes)
def get_keyval_iter(obj):
if isinstance(obj, STRINGS): return None
elif isinstance(obj, Sequence): return enumerate(obj)
elif isinstance(obj, Mapping): return getattr(obj, 'iteritems', obj.items)()
else: return None
# For example:
print list(get_keyval_iter([0, 11, 22])) # [(0, 0), (1, 11), (2, 22)]
print list(get_keyval_iter(dict(a = 1, b = 2))) # [('a', 1), ('b', 2)]
print [ get_keyval_iter("foobar") ] # [None]
print [ get_keyval_iter(1234) ] # [None]
I don't like this solution for two reasons: (1) on general principle, I'd rather query the object's interface than check its type; (2) my code will return None for user-defined classes whose objects fail the isinstance tests but that nonetheless support the __getitem__ protocol and could, in theory, give me an iterator over the relevant keys or indexes.
Here's the code I'd like to write: return obj.__getitemiter__() -- or something like that.
Am I overlooking an obvious way to get what I need -- namely, an iterator over an arbitrary object's keys or indexes (or over its key-value or index-value pairs)?
You want to use the ABCs defined in the collections module only to detect a mapping (because you want to iterate over key-value pairs instead of keys), and use the standard iter() function for everything else:
import collections
def get_keyval_iter(obj):
if isinstance(obj, collections.Mapping):
return obj.iteritems()
try:
return enumerate(iter(obj))
except TypeError:
# not iterable
return None
Note the iter() call; it takes any iterable sequence object and returns an iterator object that will operate on it. It supports both objects that implement the iterator protocol and objects that support a .__getitem__() method:
[...] o must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).
So where collections.Sequence looks for both a __getitem__ and a __len__ method, iter() only looks for __getitem__.
Note that you should not go overboard with accepting and handling too many different types; there should not be an exception for strings here, for example. Rethink your code to perhaps be more strict in what you promise to handle.

Difference between Python's Generators and Iterators

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Determine the type of an object? [duplicate]

This question already has answers here:
What's the canonical way to check for type in Python?
(15 answers)
Closed 6 months ago.
Is there a simple way to determine if a variable is a list, dictionary, or something else?
There are two built-in functions that help you identify the type of an object. You can use type() if you need the exact type of an object, and isinstance() to check an object’s type against something. Usually, you want to use isinstance() most of the times since it is very robust and also supports type inheritance.
To get the actual type of an object, you use the built-in type() function. Passing an object as the only parameter will return the type object of that object:
>>> type([]) is list
True
>>> type({}) is dict
True
>>> type('') is str
True
>>> type(0) is int
True
This of course also works for custom types:
>>> class Test1 (object):
pass
>>> class Test2 (Test1):
pass
>>> a = Test1()
>>> b = Test2()
>>> type(a) is Test1
True
>>> type(b) is Test2
True
Note that type() will only return the immediate type of the object, but won’t be able to tell you about type inheritance.
>>> type(b) is Test1
False
To cover that, you should use the isinstance function. This of course also works for built-in types:
>>> isinstance(b, Test1)
True
>>> isinstance(b, Test2)
True
>>> isinstance(a, Test1)
True
>>> isinstance(a, Test2)
False
>>> isinstance([], list)
True
>>> isinstance({}, dict)
True
isinstance() is usually the preferred way to ensure the type of an object because it will also accept derived types. So unless you actually need the type object (for whatever reason), using isinstance() is preferred over type().
The second parameter of isinstance() also accepts a tuple of types, so it’s possible to check for multiple types at once. isinstance will then return true, if the object is of any of those types:
>>> isinstance([], (tuple, list, set))
True
Use type():
>>> a = []
>>> type(a)
<type 'list'>
>>> f = ()
>>> type(f)
<type 'tuple'>
It might be more Pythonic to use a try...except block. That way, if you have a class which quacks like a list, or quacks like a dict, it will behave properly regardless of what its type really is.
To clarify, the preferred method of "telling the difference" between variable types is with something called duck typing: as long as the methods (and return types) that a variable responds to are what your subroutine expects, treat it like what you expect it to be. For example, if you have a class that overloads the bracket operators with getattr and setattr, but uses some funny internal scheme, it would be appropriate for it to behave as a dictionary if that's what it's trying to emulate.
The other problem with the type(A) is type(B) checking is that if A is a subclass of B, it evaluates to false when, programmatically, you would hope it would be true. If an object is a subclass of a list, it should work like a list: checking the type as presented in the other answer will prevent this. (isinstance will work, however).
On instances of object you also have the:
__class__
attribute. Here is a sample taken from Python 3.3 console
>>> str = "str"
>>> str.__class__
<class 'str'>
>>> i = 2
>>> i.__class__
<class 'int'>
>>> class Test():
... pass
...
>>> a = Test()
>>> a.__class__
<class '__main__.Test'>
Beware that in python 3.x and in New-Style classes (aviable optionally from Python 2.6) class and type have been merged and this can sometime lead to unexpected results. Mainly for this reason my favorite way of testing types/classes is to the isinstance built in function.
Determine the type of a Python object
Determine the type of an object with type
>>> obj = object()
>>> type(obj)
<class 'object'>
Although it works, avoid double underscore attributes like __class__ - they're not semantically public, and, while perhaps not in this case, the builtin functions usually have better behavior.
>>> obj.__class__ # avoid this!
<class 'object'>
type checking
Is there a simple way to determine if a variable is a list, dictionary, or something else? I am getting an object back that may be either type and I need to be able to tell the difference.
Well that's a different question, don't use type - use isinstance:
def foo(obj):
"""given a string with items separated by spaces,
or a list or tuple,
do something sensible
"""
if isinstance(obj, str):
obj = str.split()
return _foo_handles_only_lists_or_tuples(obj)
This covers the case where your user might be doing something clever or sensible by subclassing str - according to the principle of Liskov Substitution, you want to be able to use subclass instances without breaking your code - and isinstance supports this.
Use Abstractions
Even better, you might look for a specific Abstract Base Class from collections or numbers:
from collections import Iterable
from numbers import Number
def bar(obj):
"""does something sensible with an iterable of numbers,
or just one number
"""
if isinstance(obj, Number): # make it a 1-tuple
obj = (obj,)
if not isinstance(obj, Iterable):
raise TypeError('obj must be either a number or iterable of numbers')
return _bar_sensible_with_iterable(obj)
Or Just Don't explicitly Type-check
Or, perhaps best of all, use duck-typing, and don't explicitly type-check your code. Duck-typing supports Liskov Substitution with more elegance and less verbosity.
def baz(obj):
"""given an obj, a dict (or anything with an .items method)
do something sensible with each key-value pair
"""
for key, value in obj.items():
_baz_something_sensible(key, value)
Conclusion
Use type to actually get an instance's class.
Use isinstance to explicitly check for actual subclasses or registered abstractions.
And just avoid type-checking where it makes sense.
You can use type() or isinstance().
>>> type([]) is list
True
Be warned that you can clobber list or any other type by assigning a variable in the current scope of the same name.
>>> the_d = {}
>>> t = lambda x: "aight" if type(x) is dict else "NOPE"
>>> t(the_d) 'aight'
>>> dict = "dude."
>>> t(the_d) 'NOPE'
Above we see that dict gets reassigned to a string, therefore the test:
type({}) is dict
...fails.
To get around this and use type() more cautiously:
>>> import __builtin__
>>> the_d = {}
>>> type({}) is dict
True
>>> dict =""
>>> type({}) is dict
False
>>> type({}) is __builtin__.dict
True
be careful using isinstance
isinstance(True, bool)
True
>>> isinstance(True, int)
True
but type
type(True) == bool
True
>>> type(True) == int
False
While the questions is pretty old, I stumbled across this while finding out a proper way myself, and I think it still needs clarifying, at least for Python 2.x (did not check on Python 3, but since the issue arises with classic classes which are gone on such version, it probably doesn't matter).
Here I'm trying to answer the title's question: how can I determine the type of an arbitrary object? Other suggestions about using or not using isinstance are fine in many comments and answers, but I'm not addressing those concerns.
The main issue with the type() approach is that it doesn't work properly with old-style instances:
class One:
pass
class Two:
pass
o = One()
t = Two()
o_type = type(o)
t_type = type(t)
print "Are o and t instances of the same class?", o_type is t_type
Executing this snippet would yield:
Are o and t instances of the same class? True
Which, I argue, is not what most people would expect.
The __class__ approach is the most close to correctness, but it won't work in one crucial case: when the passed-in object is an old-style class (not an instance!), since those objects lack such attribute.
This is the smallest snippet of code I could think of that satisfies such legitimate question in a consistent fashion:
#!/usr/bin/env python
from types import ClassType
#we adopt the null object pattern in the (unlikely) case
#that __class__ is None for some strange reason
_NO_CLASS=object()
def get_object_type(obj):
obj_type = getattr(obj, "__class__", _NO_CLASS)
if obj_type is not _NO_CLASS:
return obj_type
# AFAIK the only situation where this happens is an old-style class
obj_type = type(obj)
if obj_type is not ClassType:
raise ValueError("Could not determine object '{}' type.".format(obj_type))
return obj_type
using type()
x='hello this is a string'
print(type(x))
output
<class 'str'>
to extract only the str use this
x='this is a string'
print(type(x).__name__)#you can use__name__to find class
output
str
if you use type(variable).__name__ it can be read by us
In many practical cases instead of using type or isinstance you can also use #functools.singledispatch, which is used to define generic functions (function composed of multiple functions implementing the same operation for different types).
In other words, you would want to use it when you have a code like the following:
def do_something(arg):
if isinstance(arg, int):
... # some code specific to processing integers
if isinstance(arg, str):
... # some code specific to processing strings
if isinstance(arg, list):
... # some code specific to processing lists
... # etc
Here is a small example of how it works:
from functools import singledispatch
#singledispatch
def say_type(arg):
raise NotImplementedError(f"I don't work with {type(arg)}")
#say_type.register
def _(arg: int):
print(f"{arg} is an integer")
#say_type.register
def _(arg: bool):
print(f"{arg} is a boolean")
>>> say_type(0)
0 is an integer
>>> say_type(False)
False is a boolean
>>> say_type(dict())
# long error traceback ending with:
NotImplementedError: I don't work with <class 'dict'>
Additionaly we can use abstract classes to cover several types at once:
from collections.abc import Sequence
#say_type.register
def _(arg: Sequence):
print(f"{arg} is a sequence!")
>>> say_type([0, 1, 2])
[0, 1, 2] is a sequence!
>>> say_type((1, 2, 3))
(1, 2, 3) is a sequence!
As an aside to the previous answers, it's worth mentioning the existence of collections.abc which contains several abstract base classes (ABCs) that complement duck-typing.
For example, instead of explicitly checking if something is a list with:
isinstance(my_obj, list)
you could, if you're only interested in seeing if the object you have allows getting items, use collections.abc.Sequence:
from collections.abc import Sequence
isinstance(my_obj, Sequence)
if you're strictly interested in objects that allow getting, setting and deleting items (i.e mutable sequences), you'd opt for collections.abc.MutableSequence.
Many other ABCs are defined there, Mapping for objects that can be used as maps, Iterable, Callable, et cetera. A full list of all these can be seen in the documentation for collections.abc.
value = 12
print(type(value)) # will return <class 'int'> (means integer)
or you can do something like this
value = 12
print(type(value) == int) # will return true
type() is a better solution than isinstance(), particularly for booleans:
True and False are just keywords that mean 1 and 0 in python. Thus,
isinstance(True, int)
and
isinstance(False, int)
both return True. Both booleans are an instance of an integer. type(), however, is more clever:
type(True) == int
returns False.
In general you can extract a string from object with the class name,
str_class = object.__class__.__name__
and using it for comparison,
if str_class == 'dict':
# blablabla..
elif str_class == 'customclass':
# blebleble..
For the sake of completeness, isinstance will not work for type checking of a subtype that is not an instance. While that makes perfect sense, none of the answers (including the accepted one) covers it. Use issubclass for that.
>>> class a(list):
... pass
...
>>> isinstance(a, list)
False
>>> issubclass(a, list)
True

Categories