Accessing iterator in 'for in' loop - python

From my understanding, when code like the following is run:
for i in MyObject:
print(i)
MyObject's __iter__ function is run, and the for loop uses the iterator it returns to run the loop.
is it possible to access this iterator object mid-loop? Is it a hidden local variable, or something like that?
I would like to do the following:
for i in MyObject:
blah = forloopiterator()
modify_blah(blah)
print(i)
I want to do this because I am building a debugger, and I need to modify the iterator after it has been instantiated (adding an object to be iterated during this loop, mid-execution). I am aware that this a hack and should not be done conventionally. Modifying MyObject.items (which is what the iterator is iterating over) directly doesn't work, sicne the iterator only evaluates once. So I need to modify the iterator directly.

It is possible to do what you want to do, as long as you're willing to rely on multiple undocumented internals of your Python interpreter (in my case, CPython 3.7)—but it isn't going to do you any good.
The iterator is not exposed to locals, or anywhere else (not even to a debugger). But as pointed out by Patrick Haugh, you can get at it indirectly, via get_referrers. For example:
for ref in gc.get_referrers(seq):
if isinstance(ref, collections.abc.Iterator):
break
else:
raise RuntimeError('Oops')
Of course if you have two different iterators to the same list, I don't know if there's any way you can decide between them, but let's ignore that problem.
Now, what do you do with this? You've got an iterator over seq, and… now what? You can't replace it with something useful, like an itertools.chain(seq, [1, 2, 3]). There's no public API for mutating list, set, etc. iterators, much less arbitrary iterators.
if you happen to know it's a list iterator… well, the CPython 3.x listiterator does happen to be mutable. The way they're pickled is by creating an empty iterator and calling __setstate__ with a reference to a list and an index:
>>> print(ref.__reduce__())
(<function iter>, ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],), 7)
>>> ref.__setstate__(3) # resets the iterator to index 3 instead of 7
>>> ref.__reduce__()[1][0].append(10) # adds another value
But this is all kind of silly, because you could get the same effect by just mutating the original list. In fact:
>>> ref.__reduce__()[1][0] is seq
True
So:
lst = list(range(10))
for elem in lst:
print(elem, end=' ')
if elem % 2:
lst.append(elem * 2)
print()
… will print out:
0 1 2 3 4 5 6 7 8 9 2 6 10 14 18
… without having to monkey with the iterator at all.
You can't do the same thing with a set.
Mutating a set while you're in the middle of iterating it will affect the iterator, just as mutating a list will—but what it does is indeterminate. After all, sets have arbitrary order, which is only guaranteed to be consistent as long as you don't add or delete. What happens if you add or delete in the middle? You may get a whole different order, meaning you may end up repeating elements you already iterated, and missing ones you never saw. Python implies that this should be illegal in any implementation, and CPython does actually check it:
s = set(range(10))
for elem in s:
print(elem, end=' ')
if elem % 2:
s.add(elem * 2)
print()
This will just immediately raise:
RuntimeError: Set changed size during iteration
So, what happens if we use the same trick to go behind Python's back, find the set_iterator, and try to change it?
s = {1, 2, 3}
for elem in s:
print(elem)
for ref in gc.get_referrers(seq):
if isinstance(ref, collections.abc.Iterator):
break
else:
raise RuntimeError('Oops')
print(ref.__reduce__)
What you'll see in this case will be something like:
2
(<function iter>, ([1, 3],))
1
(<function iter>, ([3],))
3
(<function iter>, ([],))
In other words, when you pickle a set_iterator, it creates a list of the remaining elements, and gives you back instructions to build a new listiterator out of that list. Mutating that temporary list obviously has no useful effect.
What about a tuple? Obviously you can't just mutate the tuple itself, because tuples are immutable. But what about the iterator?
Under the covers, in CPython, tuple_iterator shares the same structure and code as listiterator (as does the iterator type that you get from calling iter on an "old-style sequence" type that defines __len__ and __getitem__ but not __iter__). So, you can do the exact same trick to get at the iterator, and toreduce` it.
But once you do, ref.__reduce__()[1][0] is seq is going to be true again—in other words, it's a tuple, the same tuple you already had, and still immutable.

No, it is not possible to access this iterator (unless maybe with the Python C API, but that is just a guess). If you need it, assign it to a variable before the loop.
it = iter(MyObject)
for i in it:
print(i)
# do something with it
Keep in mind that manually advancing the iterator can raise a StopIteration exception.
for i in it:
if check_skip_next_element(i):
try: next(it)
except StopIteration: break
The use of break is discussable. In this case it has the same semantics as continue but you may just use pass if you want to keep going until the end of the for-block.

If you want to insert an additional object into a loop mid-iteration in a debugger, you don't need to do it by modifying the iterator. Instead, after the end of the loop, jump to the first line of the loop body, then set the loop variable to the object you want. Here's a PDB example. With the following file:
import pdb
def f():
pdb.set_trace()
for i in range(5):
print(i)
f()
I've recorded a debugging session that inserts a 15 into the loop:
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) n
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) n
0
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) j 6
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) i = 15
(Pdb) n
15
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) n
> /tmp/asdf.py(6)f()
-> print(i)
(Pdb) n
1
> /tmp/asdf.py(5)f()
-> for i in range(5):
(Pdb) c
2
3
4
(Due to a PDB bug, you have to jump, then set the loop variable. PDB will lose the change to the loop variable if you jump immediately after setting it.)

If you are not aware of the pdb debugger in python, please give it a try. It's a very interactive debugger I have ever come across.
python debugger
I am sure we can control the loop iterations manually with pdb. But altering list mid way, not sure. Give it a try.

To access the iterator of a given object, you can use the iter() built-in function.
>>> it = iter(MyObject)
>>> it.next()

Related

Breaking down Python syntax in this quicksort algorithm [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the lesser-known but useful features of the Python programming language?
Try to limit answers to Python core.
One feature per answer.
Give an example and short description of the feature, not just a link to documentation.
Label the feature using a title as the first line.
Quick links to answers:
Argument Unpacking
Braces
Chaining Comparison Operators
Decorators
Default Argument Gotchas / Dangers of Mutable Default arguments
Descriptors
Dictionary default .get value
Docstring Tests
Ellipsis Slicing Syntax
Enumeration
For/else
Function as iter() argument
Generator expressions
import this
In Place Value Swapping
List stepping
__missing__ items
Multi-line Regex
Named string formatting
Nested list/generator comprehensions
New types at runtime
.pth files
ROT13 Encoding
Regex Debugging
Sending to Generators
Tab Completion in Interactive Interpreter
Ternary Expression
try/except/else
Unpacking+print() function
with statement
Chaining comparison operators:
>>> x = 5
>>> 1 < x < 10
True
>>> 10 < x < 20
False
>>> x < 10 < x*10 < 100
True
>>> 10 > x <= 9
True
>>> 5 == x > 4
True
In case you're thinking it's doing 1 < x, which comes out as True, and then comparing True < 10, which is also True, then no, that's really not what happens (see the last example.) It's really translating into 1 < x and x < 10, and x < 10 and 10 < x * 10 and x*10 < 100, but with less typing and each term is only evaluated once.
Get the python regex parse tree to debug your regex.
Regular expressions are a great feature of python, but debugging them can be a pain, and it's all too easy to get a regex wrong.
Fortunately, python can print the regex parse tree, by passing the undocumented, experimental, hidden flag re.DEBUG (actually, 128) to re.compile.
>>> re.compile("^\[font(?:=(?P<size>[-+][0-9]{1,2}))?\](.*?)[/font]",
re.DEBUG)
at at_beginning
literal 91
literal 102
literal 111
literal 110
literal 116
max_repeat 0 1
subpattern None
literal 61
subpattern 1
in
literal 45
literal 43
max_repeat 1 2
in
range (48, 57)
literal 93
subpattern 2
min_repeat 0 65535
any None
in
literal 47
literal 102
literal 111
literal 110
literal 116
Once you understand the syntax, you can spot your errors. There we can see that I forgot to escape the [] in [/font].
Of course you can combine it with whatever flags you want, like commented regexes:
>>> re.compile("""
^ # start of a line
\[font # the font tag
(?:=(?P<size> # optional [font=+size]
[-+][0-9]{1,2} # size specification
))?
\] # end of tag
(.*?) # text between the tags
\[/font\] # end of the tag
""", re.DEBUG|re.VERBOSE|re.DOTALL)
enumerate
Wrap an iterable with enumerate and it will yield the item along with its index.
For example:
>>> a = ['a', 'b', 'c', 'd', 'e']
>>> for index, item in enumerate(a): print index, item
...
0 a
1 b
2 c
3 d
4 e
>>>
References:
Python tutorial—looping techniques
Python docs—built-in functions—enumerate
PEP 279
Creating generators objects
If you write
x=(n for n in foo if bar(n))
you can get out the generator and assign it to x. Now it means you can do
for n in x:
The advantage of this is that you don't need intermediate storage, which you would need if you did
x = [n for n in foo if bar(n)]
In some cases this can lead to significant speed up.
You can append many if statements to the end of the generator, basically replicating nested for loops:
>>> n = ((a,b) for a in range(0,2) for b in range(4,6))
>>> for i in n:
... print i
(0, 4)
(0, 5)
(1, 4)
(1, 5)
iter() can take a callable argument
For instance:
def seek_next_line(f):
for c in iter(lambda: f.read(1),'\n'):
pass
The iter(callable, until_value) function repeatedly calls callable and yields its result until until_value is returned.
Be careful with mutable default arguments
>>> def foo(x=[]):
... x.append(1)
... print x
...
>>> foo()
[1]
>>> foo()
[1, 1]
>>> foo()
[1, 1, 1]
Instead, you should use a sentinel value denoting "not given" and replace with the mutable you'd like as default:
>>> def foo(x=None):
... if x is None:
... x = []
... x.append(1)
... print x
>>> foo()
[1]
>>> foo()
[1]
Sending values into generator functions. For example having this function:
def mygen():
"""Yield 5 until something else is passed back via send()"""
a = 5
while True:
f = (yield a) #yield a and possibly get f in return
if f is not None:
a = f #store the new value
You can:
>>> g = mygen()
>>> g.next()
5
>>> g.next()
5
>>> g.send(7) #we send this back to the generator
7
>>> g.next() #now it will yield 7 until we send something else
7
If you don't like using whitespace to denote scopes, you can use the C-style {} by issuing:
from __future__ import braces
The step argument in slice operators. For example:
a = [1,2,3,4,5]
>>> a[::2] # iterate over the whole list in 2-increments
[1,3,5]
The special case x[::-1] is a useful idiom for 'x reversed'.
>>> a[::-1]
[5,4,3,2,1]
Decorators
Decorators allow to wrap a function or method in another function that can add functionality, modify arguments or results, etc. You write decorators one line above the function definition, beginning with an "at" sign (#).
Example shows a print_args decorator that prints the decorated function's arguments before calling it:
>>> def print_args(function):
>>> def wrapper(*args, **kwargs):
>>> print 'Arguments:', args, kwargs
>>> return function(*args, **kwargs)
>>> return wrapper
>>> #print_args
>>> def write(text):
>>> print text
>>> write('foo')
Arguments: ('foo',) {}
foo
The for...else syntax (see http://docs.python.org/ref/for.html )
for i in foo:
if i == 0:
break
else:
print("i was never 0")
The "else" block will be normally executed at the end of the for loop, unless the break is called.
The above code could be emulated as follows:
found = False
for i in foo:
if i == 0:
found = True
break
if not found:
print("i was never 0")
From 2.5 onwards dicts have a special method __missing__ that is invoked for missing items:
>>> class MyDict(dict):
... def __missing__(self, key):
... self[key] = rv = []
... return rv
...
>>> m = MyDict()
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
There is also a dict subclass in collections called defaultdict that does pretty much the same but calls a function without arguments for not existing items:
>>> from collections import defaultdict
>>> m = defaultdict(list)
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
I recommend converting such dicts to regular dicts before passing them to functions that don't expect such subclasses. A lot of code uses d[a_key] and catches KeyErrors to check if an item exists which would add a new item to the dict.
In-place value swapping
>>> a = 10
>>> b = 5
>>> a, b
(10, 5)
>>> a, b = b, a
>>> a, b
(5, 10)
The right-hand side of the assignment is an expression that creates a new tuple. The left-hand side of the assignment immediately unpacks that (unreferenced) tuple to the names a and b.
After the assignment, the new tuple is unreferenced and marked for garbage collection, and the values bound to a and b have been swapped.
As noted in the Python tutorial section on data structures,
Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
Readable regular expressions
In Python you can split a regular expression over multiple lines, name your matches and insert comments.
Example verbose syntax (from Dive into Python):
>>> pattern = """
... ^ # beginning of string
... M{0,4} # thousands - 0 to 4 M's
... (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... $ # end of string
... """
>>> re.search(pattern, 'M', re.VERBOSE)
Example naming matches (from Regular Expression HOWTO)
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
You can also verbosely write a regex without using re.VERBOSE thanks to string literal concatenation.
>>> pattern = (
... "^" # beginning of string
... "M{0,4}" # thousands - 0 to 4 M's
... "(CM|CD|D?C{0,3})" # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... "(XC|XL|L?X{0,3})" # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... "(IX|IV|V?I{0,3})" # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... "$" # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"
Function argument unpacking
You can unpack a list or a dictionary as function arguments using * and **.
For example:
def draw_point(x, y):
# do some magic
point_foo = (3, 4)
point_bar = {'y': 3, 'x': 2}
draw_point(*point_foo)
draw_point(**point_bar)
Very useful shortcut since lists, tuples and dicts are widely used as containers.
ROT13 is a valid encoding for source code, when you use the right coding declaration at the top of the code file:
#!/usr/bin/env python
# -*- coding: rot13 -*-
cevag "Uryyb fgnpxbiresybj!".rapbqr("rot13")
Creating new types in a fully dynamic manner
>>> NewType = type("NewType", (object,), {"x": "hello"})
>>> n = NewType()
>>> n.x
"hello"
which is exactly the same as
>>> class NewType(object):
>>> x = "hello"
>>> n = NewType()
>>> n.x
"hello"
Probably not the most useful thing, but nice to know.
Edit: Fixed name of new type, should be NewType to be the exact same thing as with class statement.
Edit: Adjusted the title to more accurately describe the feature.
Context managers and the "with" Statement
Introduced in PEP 343, a context manager is an object that acts as a run-time context for a suite of statements.
Since the feature makes use of new keywords, it is introduced gradually: it is available in Python 2.5 via the __future__ directive. Python 2.6 and above (including Python 3) has it available by default.
I have used the "with" statement a lot because I think it's a very useful construct, here is a quick demo:
from __future__ import with_statement
with open('foo.txt', 'w') as f:
f.write('hello!')
What's happening here behind the scenes, is that the "with" statement calls the special __enter__ and __exit__ methods on the file object. Exception details are also passed to __exit__ if any exception was raised from the with statement body, allowing for exception handling to happen there.
What this does for you in this particular case is that it guarantees that the file is closed when execution falls out of scope of the with suite, regardless if that occurs normally or whether an exception was thrown. It is basically a way of abstracting away common exception-handling code.
Other common use cases for this include locking with threads and database transactions.
Dictionaries have a get() method
Dictionaries have a 'get()' method. If you do d['key'] and key isn't there, you get an exception. If you do d.get('key'), you get back None if 'key' isn't there. You can add a second argument to get that item back instead of None, eg: d.get('key', 0).
It's great for things like adding up numbers:
sum[value] = sum.get(value, 0) + 1
Descriptors
They're the magic behind a whole bunch of core Python features.
When you use dotted access to look up a member (eg, x.y), Python first looks for the member in the instance dictionary. If it's not found, it looks for it in the class dictionary. If it finds it in the class dictionary, and the object implements the descriptor protocol, instead of just returning it, Python executes it. A descriptor is any class that implements the __get__, __set__, or __delete__ methods.
Here's how you'd implement your own (read-only) version of property using descriptors:
class Property(object):
def __init__(self, fget):
self.fget = fget
def __get__(self, obj, type):
if obj is None:
return self
return self.fget(obj)
and you'd use it just like the built-in property():
class MyClass(object):
#Property
def foo(self):
return "Foo!"
Descriptors are used in Python to implement properties, bound methods, static methods, class methods and slots, amongst other things. Understanding them makes it easy to see why a lot of things that previously looked like Python 'quirks' are the way they are.
Raymond Hettinger has an excellent tutorial that does a much better job of describing them than I do.
Conditional Assignment
x = 3 if (y == 1) else 2
It does exactly what it sounds like: "assign 3 to x if y is 1, otherwise assign 2 to x". Note that the parens are not necessary, but I like them for readability. You can also chain it if you have something more complicated:
x = 3 if (y == 1) else 2 if (y == -1) else 1
Though at a certain point, it goes a little too far.
Note that you can use if ... else in any expression. For example:
(func1 if y == 1 else func2)(arg1, arg2)
Here func1 will be called if y is 1 and func2, otherwise. In both cases the corresponding function will be called with arguments arg1 and arg2.
Analogously, the following is also valid:
x = (class1 if y == 1 else class2)(arg1, arg2)
where class1 and class2 are two classes.
Doctest: documentation and unit-testing at the same time.
Example extracted from the Python documentation:
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
If the result is small enough to fit in an int, return an int.
Else return a long.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
Factorials of floats are OK, but the float must be an exact integer:
"""
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
Named formatting
% -formatting takes a dictionary (also applies %i/%s etc. validation).
>>> print "The %(foo)s is %(bar)i." % {'foo': 'answer', 'bar':42}
The answer is 42.
>>> foo, bar = 'question', 123
>>> print "The %(foo)s is %(bar)i." % locals()
The question is 123.
And since locals() is also a dictionary, you can simply pass that as a dict and have % -substitions from your local variables. I think this is frowned upon, but simplifies things..
New Style Formatting
>>> print("The {foo} is {bar}".format(foo='answer', bar=42))
To add more python modules (espcially 3rd party ones), most people seem to use PYTHONPATH environment variables or they add symlinks or directories in their site-packages directories. Another way, is to use *.pth files. Here's the official python doc's explanation:
"The most convenient way [to modify
python's search path] is to add a path
configuration file to a directory
that's already on Python's path,
usually to the .../site-packages/
directory. Path configuration files
have an extension of .pth, and each
line must contain a single path that
will be appended to sys.path. (Because
the new paths are appended to
sys.path, modules in the added
directories will not override standard
modules. This means you can't use this
mechanism for installing fixed
versions of standard modules.)"
Exception else clause:
try:
put_4000000000_volts_through_it(parrot)
except Voom:
print "'E's pining!"
else:
print "This parrot is no more!"
finally:
end_sketch()
The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement.
See http://docs.python.org/tut/node10.html
Re-raising exceptions:
# Python 2 syntax
try:
some_operation()
except SomeError, e:
if is_fatal(e):
raise
handle_nonfatal(e)
# Python 3 syntax
try:
some_operation()
except SomeError as e:
if is_fatal(e):
raise
handle_nonfatal(e)
The 'raise' statement with no arguments inside an error handler tells Python to re-raise the exception with the original traceback intact, allowing you to say "oh, sorry, sorry, I didn't mean to catch that, sorry, sorry."
If you wish to print, store or fiddle with the original traceback, you can get it with sys.exc_info(), and printing it like Python would is done with the 'traceback' module.
Main messages :)
import this
# btw look at this module's source :)
De-cyphered:
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Interactive Interpreter Tab Completion
try:
import readline
except ImportError:
print "Unable to load readline module."
else:
import rlcompleter
readline.parse_and_bind("tab: complete")
>>> class myclass:
... def function(self):
... print "my function"
...
>>> class_instance = myclass()
>>> class_instance.<TAB>
class_instance.__class__ class_instance.__module__
class_instance.__doc__ class_instance.function
>>> class_instance.f<TAB>unction()
You will also have to set a PYTHONSTARTUP environment variable.
Nested list comprehensions and generator expressions:
[(i,j) for i in range(3) for j in range(i) ]
((i,j) for i in range(4) for j in range(i) )
These can replace huge chunks of nested-loop code.
Operator overloading for the set builtin:
>>> a = set([1,2,3,4])
>>> b = set([3,4,5,6])
>>> a | b # Union
{1, 2, 3, 4, 5, 6}
>>> a & b # Intersection
{3, 4}
>>> a < b # Subset
False
>>> a - b # Difference
{1, 2}
>>> a ^ b # Symmetric Difference
{1, 2, 5, 6}
More detail from the standard library reference: Set Types

Is calling next(iter) inside for i in iter supported in python?

Question: Python list iterator behavior and next(iterator) documents the fact that calling
for i in iter:
...
next(iter)
Has the effect of skipping ahead in the for loop. Is this defined behavior that I can rely on (e.g. in some PEP) or merely an accident of implementation that could change without warning?
It depends on what iter is. If it’s an iterator created with a call of iter() on an iterable, then yes, calling next(iter) will also advance the iterator:
>>> it = iter(range(4))
>>> for x in it:
print(x)
_ = next(it)
0
2
It’s important that it is an iterator (instead of just any iterable) since the for loop will internally also call iter() on that object. Since iterators (usually) return themselves when iter(some_iterator) is called, this works fine.
Is this a good idea though? Generally not since it can easily break:
>>> it = iter(range(3))
>>> for x in it:
print(x)
_ = next(it)
0
2
Traceback (most recent call last):
File "<pyshell#21>", line 3, in <module>
_ = next(it)
StopIteration
You would have to add exception handling for the StopIteration exception manually inside the loop; and that gets messy very easily. Of course you could also use a default value, e.g. next(it, None) instead, but that becomes confusing quickly, especially when you’re not actually using the value for anything.
This whole concept also breaks as soon as someone later decides to not use an iterator but some other iterable in the for loop (for example because they are refactoring the code to use a list there, and then everything breaks).
You should attempt to let the for loop be the only one that consumes the iterator. Change your logic so that you can easily determine whether or not an iteration should be skipped: If you can, filter the iterable first, before starting the loop. Otherwise, use conditionals to skip iterations within the loop (using continue):
>>> it = filter(lambda x: x % 2 == 0, range(3))
>>> for x in it:
print(x)
0
2
>>> it = range(3) # no need for an iterator here
>>> for x in it:
if x % 2 == 1:
continue
print(x)
0
2
In for loop, it always call next of the iterating item.
if you call next() before for loop calls next, then it has the effect of skipping the (next) item.
So, being said that, you can implement this in your program, as long as for calls the same method (next).
So far, even in 3.6 the same being implemented. so, there is no need to worry.

Python for loop and iterator behavior

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.
An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.
However I still don't understand why is the following behavior happening:
In [1]: iter = (i for i in range(5))
In [2]: for _ in iter:
....: print _
....:
0
1
2
3
4
In [3]: for _ in iter:
....: print _
....:
In [4]:
After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.
However I never assigned a new value to the iter variable.
What is really happening under the hood of the for loop?
Your suspicion is correct: the iterator has been consumed.
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
type((i for i in range(5))) # says it's type generator
def another_generator():
yield 1 # the yield expression makes it a generator, not a function
type(another_generator()) # also a generator
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function
for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
Some other corrections to help improve your understanding:
The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
The keyword combination for in accepts an iterable object as its second argument.
The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).
You can "manually" implement a for loop in python this way (probably not perfect, but close enough):
try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue
There is pretty much no difference between the above and your example code.
Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.
When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:
1 in [1, 2, 3] # True
'He' in 'Hello' # True
3 in range(10) # True
'eH' in 'Hello'[::-1] # True
If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on1. A generator is just one type of iterator.
If the call to __iter__ is successful, the in keyword applies the function next() to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method).
If the object doesn't have a __iter__ method to return an iterator, in then falls back on the old-style iteration protocol using the object's __getitem__ method2.
If all of the above attempts fail, you'll get a TypeError exception.
If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).
class MyIterable():
def __iter__(self):
yield 1
m = MyIterable()
for _ in m: print(_) # 1
1 in m # True
The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).
If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.
class MyUselessContainer():
def __contains__(self, obj):
return True
m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True
1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.
2 See this answer for the different ways to create iterable classes.
For loop basically calls the next method of an object that is applied to (__next__ in Python 3).
You can simulate this simply by doing:
iter = (i for i in range(5))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
# this prints 1 2 3 4
At this point there is no next element in the input object. So doing this:
print(next(iter))
Will result in StopIteration exception thrown. At this point for will stop. And iterator can be any object which will respond to the next() function and throws the exception when there are no more elements. It does not have to be any pointer or reference (there are no such things in python anyway in C/C++ sense), linked list, etc.
There is an iterator protocol in python that defines how the for statement will behave with lists and dicts, and other things that can be looped over.
It's in the python docs here and here.
The way the iterator protocol works typically is in the form of a python generator. We yield a value as long as we have a value until we reach the end and then we raise StopIteration
So let's write our own iterator:
def my_iter():
yield 1
yield 2
yield 3
raise StopIteration()
for i in my_iter():
print i
The result is:
1
2
3
A couple of things to note about that. The my_iter is a function. my_iter() returns an iterator.
If I had written using iterator like this instead:
j = my_iter() #j is the iterator that my_iter() returns
for i in j:
print i #this loop runs until the iterator is exhausted
for i in j:
print i #the iterator is exhausted so we never reach this line
And the result is the same as above. The iter is exhausted by the time we enter the second for loop.
But that's rather simplistic what about something more complicated? Perhaps maybe in a loop why not?
def capital_iter(name):
for x in name:
yield x.upper()
raise StopIteration()
for y in capital_iter('bobert'):
print y
And when it runs, we use the iterator on the string type (which is built into iter). This in turn, allows us run a for loop on it, and yield the results until we are done.
B
O
B
E
R
T
So now this begs the question, so what happens between yields in the iterator?
j = capital_iter("bobert")
print i.next()
print i.next()
print i.next()
print("Hey there!")
print i.next()
print i.next()
print i.next()
print i.next() #Raises StopIteration
The answer is the function is paused at the yield waiting for the next call to next().
B
O
B
Hey There!
E
R
T
Traceback (most recent call last):
File "", line 13, in
StopIteration
Some additional details about the behaviour of iter() with __getitem__ classes that lack their own __iter__ method.
Before __iter__ there was __getitem__. If the __getitem__ works with ints from 0 - len(obj)-1, then iter() supports these objects. It will construct a new iterator that repeatedly calls __getitem__ with 0, 1, 2, ... until it gets an IndexError, which it converts to a StopIteration.
See this answer for more details of the different ways to create an iterator.
Excerpt from the Python Practice book:
5. Iterators & Generators
5.1. Iterators
We use for statement for looping over a list.
>>> for i in [1, 2, 3, 4]:
... print i,
...
1
2
3
4
If we use it with a string, it loops over its characters.
>>> for c in "python":
... print c
...
p
y
t
h
o
n
If we use it with a dictionary, it loops over its keys.
>>> for k in {"x": 1, "y": 2}:
... print k
...
y
x
If we use it with a file, it loops over lines of the file.
>>> for line in open("a.txt"):
... print line,
...
first line
second line
So there are many types of objects which can be used with a for loop. These are called iterable objects.
There are many functions which consume these iterables.
>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> ",".join({"x": 1, "y": 2})
'y,x'
>>> list("python")
['p', 'y', 't', 'h', 'o', 'n']
>>> list({"x": 1, "y": 2})
['y', 'x']
5.1.1. The Iteration Protocol
The built-in function iter takes an iterable object and returns an iterator.
>>> x = iter([1, 2, 3])
>>> x
<listiterator object at 0x1004ca850>
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Each time we call the next method on the iterator gives us the next element. If there are no more elements, it raises a StopIteration.
Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.
class yrange:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
The iter method is what makes an object iterable. Behind the scenes, the iter function calls iter method on the given object.
The return value of iter is an iterator. It should have a next method and raise StopIteration when there are no more elements.
Lets try it out:
>>> y = yrange(3)
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in next
StopIteration
Many built-in functions accept iterators as arguments.
>>> list(yrange(5))
[0, 1, 2, 3, 4]
>>> sum(yrange(5))
10
In the above case, both the iterable and iterator are the same object. Notice that the iter method returned self. It need not be the case always.
class zrange:
def __init__(self, n):
self.n = n
def __iter__(self):
return zrange_iter(self.n)
class zrange_iter:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
# Iterators are iterables too.
# Adding this functions to make them so.
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
If both iteratable and iterator are the same object, it is consumed in a single iteration.
>>> y = yrange(5)
>>> list(y)
[0, 1, 2, 3, 4]
>>> list(y)
[]
>>> z = zrange(5)
>>> list(z)
[0, 1, 2, 3, 4]
>>> list(z)
[0, 1, 2, 3, 4]
5.2. Generators
Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value.
def yrange(n):
i = 0
while i < n:
yield i
i += 1
Each time the yield statement is executed the function generates a new value.
>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So a generator is also an iterator. You don’t have to worry about the iterator protocol.
The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I’ll use the word “generator” to mean the generated object and “generator function” to mean the function that generates it.
Can you think about how it is working internally?
When a generator function is called, it returns a generator object without even beginning execution of the function. When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.
The following example demonstrates the interplay between yield and call to next method on generator object.
>>> def foo():
... print "begin"
... for i in range(3):
... print "before yield", i
... yield i
... print "after yield", i
... print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0
0
>>> f.next()
after yield 0
before yield 1
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Lets see an example:
def integers():
"""Infinite sequence of integers."""
i = 1
while True:
yield i
i = i + 1
def squares():
for i in integers():
yield i * i
def take(n, seq):
"""Returns first n values from the given sequence."""
seq = iter(seq)
result = []
try:
for i in range(n):
result.append(seq.next())
except StopIteration:
pass
return result
print take(5, squares()) # prints [1, 4, 9, 16, 25]
Concept 1
All generators are iterators but all iterators are not generator
Concept 2
An iterator is an object with a next (Python 2) or next (Python 3)
method.
Concept 3
Quoting from wiki
Generators Generators
functions allow you to declare a function that behaves like an
iterator, i.e. it can be used in a for loop.
In your case
>>> it = (i for i in range(5))
>>> type(it)
<type 'generator'>
>>> callable(getattr(it, 'iter', None))
False
>>> callable(getattr(it, 'next', None))
True

Lazy evaluation in Python

What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation

Please explain Python for loops to a C++/C#/Java programmer

I have had experience in Java/C#/C++ and for loops or pretty much if not exactly done the same. Now I'm learning Python through Codecademy. I find it poor the way it trys to explain for loops to me. The code they give you is
my_list = [1,9,3,8,5,7]
for number in my_list:
# Your code here
print 2 * number
Is this saying for every number in my_list ... print 2 * number.
That somewhat makes sense to me if that's true but I don't get number and how that works. It's not even a variable declared earlier. Are you declaring a variable withing the for loop? And how does Python know that number is accessing the values within my_list and multiplying them by 2? Also, how do for loops work with things other than lists because I've looked at other Python code that contains for loops and they make no sense. Could you please find some way to explain the way these are similar to something like C# for loops or just explain Python for loops in general.
Yes, number is a newly defined variable. Python does not require variables to be declared before using them. And the understanding of the loop iteration is correct.
This is the same sytnax Borne-style shells use (such as bash).
The logic of the for loop is this: assign the named variable the next value in the list, iterate, repeat.
correction
As for other non-list values, they should translate into a sequence in python. Try this:
val="1 2 3"
for number in val:
print number
Note this prints "1", " ", "2", " ", "3".
Here's a useful reference: http://www.tutorialspoint.com/python/python_for_loop.htm.
The quick answer, to relate to C#, is that a Python for loop is roughly equivalent to a C# foreach loop. C++ sort of has similar facilities (BOOST_FOREACH for example, or the for syntax in C++11), but C does not have an equivalent.
There is no equivalent in Python of the C-style for (initial; condition; increment) style loop.
Python for loops can iterate over more than just lists; they can iterate over anything that is iterable. See for example What makes something iterable in python.
Python doesn't need variables to be declared it can be declared itself at the time of initialization
While and do while are similar to those languages but for loop is quite different in python
you can use it for list similar to for each
but for another purpose like to run from 1 to 10 you can use,
for number in range(10):
print number
Python for loops should be pretty similar to C# foreach loops. It steps through my_list and at each step you can use "number" to reverence to that element in the list.
If you want to access list indices as well as list elements while you are iterating, the usual idiom is to use g the "enumerate function:
for (i, x) in enumerate(my_list):
print "the", i, "number in the list is", x
The foreach loop should should be similar to the following desugared code:
my_iterator = iter(my_list)
while True:
try:
number = iter.next()
#Your code here
print 2*number
except StopIteration:
break
Pretty similar to this Java loop: Java for loop syntax: "for (T obj : objects)"
In python there's no need to declare variable type, that's why number has no type.
In Python you don't need to declare variables. In this case the number variable is defined by using it in the loop.
As for the loop construct itself, it's similar to the C++11 range-based for loop:
std::vector<int> my_list = { 1, 9, 3, 8, 5, 7 };
for (auto& number : my_list)
std::cout << 2 * number << '\n';
This can of course be implemented pre-C++11 using std::for_each with a suitable functor object (which may of course be a C++11 lambda expression).
Python does not have an equivalent to the normal C-style for loop.
I will try to explain the python for loop to you in much basic way as possible:
Let's say we have a list:
a = [1, 2, 3, 4, 5]
Before we jump into the for loop let me tell you we don't have to initialize the variable type in python while declaring variable.
int a, str a is not required.
Let's go to for loop now.
for i in a:
print 2*i
Now, what does it do?
The loop will start from the first element so,
i is replaced by 1 and it is multiplied by 2 and displayed. After it's done with 1 it will jump to 2.
Regarding your another question:
Python knows its variable type in it's execution:
>>> a = ['a', 'b', 'c']
>>> for i in a:
... print 2*i
...
aa
bb
cc
>>>
Python uses protocols (duck-typing with specially named methods, pre and post-fixed with double underscores). The equivalent in Java would be an interface or an abstract base class.
In this case, anything in Python which implements the iterator protocol can be used in a for loop:
class TheStandardProtocol(object):
def __init__(self):
self.i = 0
def __iter__(self):
return self
def __next__(self):
self.i += 1
if self.i > 15: raise StopIteration()
return self.i
# In Python 2 `next` is the only protocol method without double underscores
next = __next__
class TheListProtocol(object):
"""A less common option, but still valid"""
def __getitem__(self, index):
if index > 15: raise IndexError()
return index
We can then use instances of either class in a for loop and everything will work correctly:
standard = TheStandardProtocol()
for i in standard: # `__iter__` invoked to get the iterator
# `__next__` invoked and its return value bound to `i`
# until the underlying iterator returned by `__iter__`
# raises a StopIteration exception
print i
# prints 1 to 15
list_protocol = TheListProtocol()
for x in list_protocol: # Python creates an iterator for us
# `__getitem__` is invoked with ascending integers
# and the return value bound to `x`
# until the instance raises an IndexError
print x
# prints 0 to 15
The equivalent in Java is the Iterable and Iterator interface:
class MyIterator implements Iterable<Integer>, Iterator<Integer> {
private Integer i = 0;
public Iterator<Integer> iterator() {
return this;
}
public boolean hasNext() {
return i < 16;
}
public Integer next() {
return i++;
}
}
// Elsewhere
MyIterator anIterator = new MyIterator();
for(Integer x: anIterator) {
System.out.println(x.toString());
}

Categories