How to prevent iterator getting exhausted? [duplicate] - python

This question already has answers here:
Why can't I iterate twice over the same iterator? How can I "reset" the iterator or reuse the data?
(5 answers)
Closed last month.
If I create two lists and zip them
a=[1,2,3]
b=[7,8,9]
z=zip(a,b)
Then I typecast z into two lists
l1=list(z)
l2=list(z)
Then the contents of l1 turn out to be fine [(1,7),(2,8),(3,9)], but the contents of l2 is just [].
I guess this is the general behavior of python with regards to iterables. But as a novice programmer migrating from the C family, this doesn't make sense to me. Why does it behave in such a way? And is there a way to get past this problem?
I mean, yeah in this particular example, I can just copy l1 into l2, but in general is there a way to 'reset' whatever Python uses to iterate 'z' after I iterate it once?

There's no way to "reset" a generator. However, you can use itertools.tee to "copy" an iterator.
>>> z = zip(a, b)
>>> zip1, zip2 = itertools.tee(z)
>>> list(zip1)
[(1, 7), (2, 8), (3, 9)]
>>> list(zip2)
[(1, 7), (2, 8), (3, 9)]
This involves caching values, so it only makes sense if you're iterating through both iterables at about the same rate. (In other words, don't use it the way I have here!)
Another approach is to pass around the generator function, and call it whenever you want to iterate it.
def gen(x):
for i in range(x):
yield i ** 2
def make_two_lists(gen):
return list(gen()), list(gen())
But now you have to bind the arguments to the generator function when you pass it. You can use lambda for that, but a lot of people find lambda ugly. (Not me though! YMMV.)
>>> make_two_lists(lambda: gen(10))
([0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
I hope it goes without saying that under most circumstances, it's better just to make a list and copy it.
Also, as a more general way of explaining this behavior, consider this. The point of a generator is to produce a series of values, while maintaining some state between iterations. Now, at times, instead of simply iterating over a generator, you might want to do something like this:
z = zip(a, b)
while some_condition():
fst = next(z, None)
snd = next(z, None)
do_some_things(fst, snd)
if fst is None and snd is None:
do_some_other_things()
Let's say this loop may or may not exhaust z. Now we have a generator in an indeterminate state! So it's important, at this point, that the behavior of a generator is restrained in a well-defined way. Although we don't know where the generator is in its output, we know that a) all subsequent accesses will produce later values in the series, and b) once it's "empty", we've gotten all the items in the series exactly once. The more ability we have to manipulate the state of z, the harder it is to reason about it, so it's best that we avoid situations that break those two promises.
Of course, as Joel Cornett points out below, it is possible to write a generator that accepts messages via the send method; and it would be possible to write a generator that could be reset using send. But note that in that case, all we can do is send a message. We can't directly manipulate the generator's state, and so all changes to the state of the generator are well-defined (by the generator itself -- assuming it was written correctly!). send is really for implementing coroutines, so I wouldn't use it for this purpose. Everyday generators almost never do anything with values sent to them -- I think for the very reasons I give above.

If you need two copies of the list, which you do if you need to modify them, then I suggest you make the list once, and then copy it:
a=[1,2,3]
b=[7,8,9]
l1 = list(zip(a,b))
l2 = l1[:]

Just create a list out of your iterator using list() once, and use it afterwards.
It just happens that zip returns a generator, which is an iterator that you can only iterate once.
You can iterate a list as many times as you want.

No, there is no way to "reset them".
Generators generate their output once, one by one, on demand, and then are done when the output is exhausted.
Think of them like reading a file, once you are through, you'll have to restart if you want to have another go at the data.
If you need to keep the generator's output around, then consider storing it, for instance, in a list, and subsequently re-use it as often as you need. (Somewhat similar to the decisions that guided the use of xrange(), a generator vs range() which created a whole list of items in memory in v2)
Updated: corrected terminology, temporary brain-outage ...

Yet another explanation. As a programmer, you probably understand the difference between classes vs. instances (i.e. objects). The zip() is said to be a built-in function (in the official doc). Actually, it is a built-in generator function. It means it is rather the class. You can even try in the interactive mode:
>>> zip
<class 'zip'>
The classes are types. Because of that also the following should be clear:
>>> type(zip)
<class 'type'>
Your z is the instance of the class, and you can think about calling the zip() as about calling the class constructor:
>>> a = [1, 2, 3]
>>> b = [7, 8, 9]
>>> z = zip(a, b)
>>> z
<zip object at 0x0000000002342AC8>
>>> type(z)
<class 'zip'>
The z is an iterator (object) that keeps inside the iterators for the a and b. Because of its generic implementation, the z (or the zip class) has no mean to reset the iterators through the a or b or whatever sequences. Because of that there is no way to reset the z. The cleanest way to solve your concrete problem is to copy the list (as you mentioned in the question and Lennart Regebro suggests). Another understandable way is to use the zip(a, b) twice, thus constructing the two z-like iterators that behaves from the start the same way:
>>> lst1 = list(zip(a, b))
>>> lst2 = list(zip(a, b))
However, this cannot be used generally with the identical result. Think about a or b being unique sequences generated based on some current conditions (say temperatures read from several thermometers).

Related

Pythonic way to unpack an iterator inside of a list

I'm trying to figure out what is the pythonic way to unpack an iterator inside of a list.
For example:
my_iterator = zip([1, 2, 3, 4], [1, 2, 3, 4])
I have come with the following ways to unpack my iterator inside of a list:
1)
my_list = [*my_iterator]
2)
my_list = [e for e in my_iterator]
3)
my_list = list(my_iterator)
No 1) is my favorite way to do it since is less code, but I'm wondering if this is also the pythonic way. Or maybe there is another way to achieve this besides those 3 which is the pythonic way?
This might be a repeat of Fastest way to convert an iterator to a list, but your question is a bit different since you ask which is the most Pythonic. The accepted answer is list(my_iterator) over [e for e in my_iterator] because the prior runs in C under the hood. One commenter suggests [*my_iterator] is faster than list(my_iterator), so you might want to test that. My general vote is that they are all equally Pythonic, so I'd go with the faster of the two for your use case. It's also possible that the older answer is out of date.
After exploring more the subject I've come with some conclusions.
There should be one-- and preferably only one --obvious way to do it
(zen of python)
Deciding which option is the "pythonic" one should take into consideration some criteria :
how explicit,
simple,
and readable it is.
And the obvious "pythonic" option winning in all criteria is option number 3):
list = list(my_iterator)
Here is why is "obvious" that no 3) is the pythonic one:
Option 3) is close to natural language making you to 'instantly'
think what is the output.
Option 2) (using list comprehension) if you see for the first time
that line of code will take you to read a little bit more and to pay
a bit more attention. For example, I use list comprehension when I
want to add some extra steps(calling a function with the iterated
elements or having some checking using if statement), so when I see a
list comprehension I check for any possible function call inside or
for any if statment.
option 1) (unpacking using *) asterisk operator can be a bit confusing
if you don't use it regularly, there are 4 cases for using the
asterisk in Python:
For multiplication and power operations.
For repeatedly extending the list-type containers.
For using the variadic arguments. (so-called “packing”)
For unpacking the containers.
Another good argument is python docs themselves, I have done some statistics to check which options are chosen by the docs, for this I've chose 4 buil-in iterators and everything from the module itertools (that are used like: itertools.) to see how they are unpacked in a list:
map
range
filter
enumerate
itertools.
After exploring the docs I found: 0 iterators unpacked in a list using option 1) and 2) and 35 using option 3).
Conclusion :
The pythonic way to unpack an iterator inside of a list is: my_list = list(my_iterator)
While the unpacking operator * is not often used for unpacking a single iterable into a list (therefore [*it] is a bit less readable than list(it)), it is handy and more Pythonic in several other cases:
1. Unpacking an iterable into a single list / tuple / set, adding other values:
mixed_list = [a, *it, b]
This is more concise and efficient than
mixed_list = [a]
mixed_list.extend(it)
mixed_list.append(b)
2. Unpacking multiple iterables + values into a list / tuple / set
mixed_list = [*it1, *it2, a, b, ... ]
This is similar to the first case.
3. Unpacking an iterable into a list, excluding elements
first, *rest = it
This extracts the first element of it into first and unpacks the rest into a list. One can even do
_, *mid, last = it
This dumps the first element of it into a don't-care variable _, saves last element into last, and unpacks the rest into a list mid.
4. Nested unpacking of multiple levels of an iterable in one statement
it = (0, range(5), 3)
a1, (*a2,), a3 = it # Unpack the second element of it into a list a2
e1, (first, *rest), e3 = it # Separate the first element from the rest while unpacking it[1]
This can also be used in for statements:
from itertools import groupby
s = "Axyz123Bcba345D"
for k, (first, *rest) in groupby(s, key=str.isalpha):
...
If you're interested in the least amount of typing possible, you can actually do one character better than my_list = [*my_iterator] with iterable unpacking:
*my_list, = my_iterator
or (although this only equals my_list = [*my_iterator] in the number of characters):
[*my_list] = my_iterator
(Funny how it has the same effect as my_list = [*my_iterator].)
For the most Pythonic solution, however, my_list = list(my_iterator) is clearly the clearest and the most readable of all, and should therefore be considered the most Pythonic.
I tend to use zip if I need to convert a list to a dictionary or use it as a key-value pair in a loop or list comprehension.
However, if this is only for illustration to create an iterator. I will definitely vote for #3 for clarity.

Python syntax for a map(max()) call

I came across this particular piece of code in one of "beginner" tutorials for Python. It doesn't make logical sense, if someone can explain it to me I'd appreciate it.
print(list(map(max, [4,3,7], [1,9,2])))
I thought it would print [4,9] (by running max() on each of the provided lists and then printing max value in each list). Instead it prints [4,9,7]. Why three numbers?
You're thinking of
print(list(map(max, [[4,3,7], [1,9,2]])))
# ^ ^
providing one sequence to map, whose elements are [4,3,7] and [1,9,2].
The code you've posted:
print(list(map(max, [4,3,7], [1,9,2])))
provides [4,3,7] and [1,9,2] as separate arguments to map. When map receives multiple sequences, it iterates over those sequences in parallel and passes corresponding elements as separate arguments to the mapped function, which is max.
Instead of calling
max([4, 3, 7])
max([1, 9, 2])
it calls
max(4, 1)
max(3, 9)
max(7, 2)
map() takes each element in turn from all sequences passed as the second and subsequent arguments. Therefore the code is equivalent to:
print([max(4, 1), max(3, 9), max(7, 2)])
It looks like this question has been answered already, but I'd like to note that map() is considered obsolete in python, with list comprehensions being used instead as they are usually more performant. Your code would be equivalent to print([max(x) for x in [(4,1),(3,9),(7,2)]]).
Also, here is an interesting article from Guido on the subject.
Most have answered OPs question as to why,
Here's how to get that output using max:
a = [4,3,7]
b = [1,9,2]
print(list(map(max, [a, b])))
gives
[7, 9]

Python - Count Elements in Iterator Without Consuming

Given an iterator it, I would like a function it_count that returns the count of elements that iterator produces, without destroying the iterator. For example:
ita = iter([1, 2, 3])
print(it_count(ita))
print(it_count(ita))
should print
3
3
It has been pointed out that this may not be a well-defined question for all iterators, so I am not looking for a completely general solution, but it should function as anticipated on the example given.
Okay, let me clarify further to my specific case. Given the following code:
ita = iter([1, 2, 3])
itb, itc = itertools.tee(ita)
print(sum(1 for _ in itb))
print(sum(1 for _ in itc))
...can we write the it_count function described above, so that it will function in this manner? Even if the answer to the question is "That cannot be done," that's still a perfectly valid answer. It doesn't make the question bad. And the proof that it is impossible would be far from trivial...
Not possible. Until the iterator has been completely consumed, it doesn't have a concrete element count.
The only way to get the length of an arbitary iterator is by iterating over it, so the basic question here is ill-defined. You can't get the length of any iterator without iterating over it.
Also the iterator itself may change it's contents while being iterated over, so the count may not be constant anyway.
But there are possibilities that might do what you ask, be warned none of them is foolproof or really efficient:
When using python 3.4 or later you can use operator.length_hint and hope the iterator supports it (be warned: not many iterators do! And it's only meant as a hint, the actual length might be different!):
>>> from operator import length_hint
>>> it_count = length_hint
>>> ita = iter([1, 2, 3])
>>> print(it_count(ita))
3
>>> print(it_count(ita))
3
As alternative: You can use itertools.tee but read the documentation of that carefully before using it. It may solve your issue but it won't really solve the underlying problem.
import itertools
def it_count(iterator):
return sum(1 for _ in iterator)
ita = iter([1, 2, 3])
it1, it2 = itertools.tee(ita, 2)
print(it_count(it1)) # 3
print(it_count(it2)) # 3
But this is less efficient (memory and speed) than casting it to a list and using len on it.
I have not been able to come up with an exact solution (because iterators may be immutable types), but here are my best attempts. I believe the second should be faster, according to the documentation (final paragraph of itertools.tee).
Option 1
def it_count(it):
tmp_it, new_it = itertools.tee(it)
return sum(1 for _ in tmp_it), new_it
Option 2
def it_count2(it):
lst = list(it)
return len(lst), lst
It functions well, but has the slight annoyance of returning the pair rather than simply the count.
ita = iter([1, 2, 3])
count, ita = it_count(ita)
print(count)
Output: 3
count, ita = it_count2(ita)
print(count)
Output: 3
count, ita = it_count(ita)
print(count)
Output: 3
print(list(ita))
Output: [1, 2, 3]
There's no generic way to do what you want. An iterator may not have a well defined length (e.g. itertools.count which iterates forever). Or it might have a length that's expensive to calculate up front, so it won't let you know how far you have to go until you've reached the end (e.g. a file object, which can be iterated yielding lines, which are not easy to count without reading the whole file's contents).
Some kinds of iterators might implement a __length_hint__ method that returns an estimated length, but that length may not be accurate. And not all iterators will implement that method at all, so you probably can't rely upon it (it does work for list iterators, but not for many others).
Often the best way to deal with the whole contents of an iterator is to dump it into a list or other container. After you're done doing whatever operation you need (like calling len on it), you can iterate over the list again. Obviously this requires the iterator to be finite (and for all of its contents to fit into memory), but that's the limitation you have to deal with.
If you only need to peek ahead by a few elements, you might be able to use itertools.tee, but it's no better than dumping into a list if you need to consume the whole contents (since it keeps the values seen by one of its returned iterators but another in a data structure similar to a deque). It wouldn't be any use for finding the length of the iterator.

PYTHON How to continue code formatted by input result? [duplicate]

If I write
for i in range(5):
print i
Then it gives 0, 1, 2, 3, 4
Does that mean Python assigned 0, 1, 2, 3, 4 to i at the same time?
However if I wrote:
for i in range(5):
a=i+1
Then I call a, it only gives 5
But if I add ''print a'' it gives 1, 2, 3, 4, 5
So my question is what is the difference here?
Is i a string or a list or something else?
Or maybe can anyone help me to sort out:
for l in range(5):
#vs,fs,rs are all m*n matrixs,got initial values in,i.e vs[0],fs[0],rs[0] are known
#want use this foor loop to update them
vs[l+1]=vs[l]+fs[l]
fs[l+1]=((rs[l]-re[l])
rs[l+1]=rs[l]+vs[l]
#then this code gives vs,fs,rs
If I run this kind of code, then I will get the answer only when l=5
How can I make them start looping?
i.e l=0 got values for vs[1],fs[1],rs[1],
then l=1 got values for vs[2],rs[2],fs[2]......and so on.
But python gives different arrays of fs,vs,rs, correspond to different value of l
How can I make them one piece?
A "for loop" in most, if not all, programming languages is a mechanism to run a piece of code more than once.
This code:
for i in range(5):
print i
can be thought of working like this:
i = 0
print i
i = 1
print i
i = 2
print i
i = 3
print i
i = 4
print i
So you see, what happens is not that i gets the value 0, 1, 2, 3, 4 at the same time, but rather sequentially.
I assume that when you say "call a, it gives only 5", you mean like this:
for i in range(5):
a=i+1
print a
this will print the last value that a was given. Every time the loop iterates, the statement a=i+1 will overwrite the last value a had with the new value.
Code basically runs sequentially, from top to bottom, and a for loop is a way to make the code go back and something again, with a different value for one of the variables.
I hope this answered your question.
When I'm teaching someone programming (just about any language) I introduce for loops with terminology similar to this code example:
for eachItem in someList:
doSomething(eachItem)
... which, conveniently enough, is syntactically valid Python code.
The Python range() function simply returns or generates a list of integers from some lower bound (zero, by default) up to (but not including) some upper bound, possibly in increments (steps) of some other number (one, by default).
So range(5) returns (or possibly generates) a sequence: 0, 1, 2, 3, 4 (up to but not including the upper bound).
A call to range(2,10) would return: 2, 3, 4, 5, 6, 7, 8, 9
A call to range(2,12,3) would return: 2, 5, 8, 11
Notice that I said, a couple times, that Python's range() function returns or generates a sequence. This is a relatively advanced distinction which usually won't be an issue for a novice. In older versions of Python range() built a list (allocated memory for it and populated with with values) and returned a reference to that list. This could be inefficient for large ranges which might consume quite a bit of memory and for some situations where you might want to iterate over some potentially large range of numbers but were likely to "break" out of the loop early (after finding some particular item in which you were interested, for example).
Python supports more efficient ways of implementing the same semantics (of doing the same thing) through a programming construct called a generator. Instead of allocating and populating the entire list and return it as a static data structure, Python can instantiate an object with the requisite information (upper and lower bounds and step/increment value) ... and return a reference to that.
The (code) object then keeps track of which number it returned most recently and computes the new values until it hits the upper bound (and which point it signals the end of the sequence to the caller using an exception called "StopIteration"). This technique (computing values dynamically rather than all at once, up-front) is referred to as "lazy evaluation."
Other constructs in the language (such as those underlying the for loop) can then work with that object (iterate through it) as though it were a list.
For most cases you don't have to know whether your version of Python is using the old implementation of range() or the newer one based on generators. You can just use it and be happy.
If you're working with ranges of millions of items, or creating thousands of different ranges of thousands each, then you might notice a performance penalty for using range() on an old version of Python. In such cases you could re-think your design and use while loops, or create objects which implement the "lazy evaluation" semantics of a generator, or use the xrange() version of range() if your version of Python includes it, or the range() function from a version of Python that uses the generators implicitly.
Concepts such as generators, and more general forms of lazy evaluation, permeate Python programming as you go beyond the basics. They are usually things you don't have to know for simple programming tasks but which become significant as you try to work with larger data sets or within tighter constraints (time/performance or memory bounds, for example).
[Update: for Python3 (the currently maintained versions of Python) the range() function always returns the dynamic, "lazy evaluation" iterator; the older versions of Python (2.x) which returned a statically allocated list of integers are now officially obsolete (after years of having been deprecated)].
for i in range(5):
is the same as
for i in [0,1,2,3,4]:
range(x) returns a list of numbers from 0 to x - 1.
>>> range(1)
[0]
>>> range(2)
[0, 1]
>>> range(3)
[0, 1, 2]
>>> range(4)
[0, 1, 2, 3]
for i in range(x): executes the body (which is print i in your first example) once for each element in the list returned by range().
i is used inside the body to refer to the “current” item of the list.
In that case, i refers to an integer, but it could be of any type, depending on the objet on which you loop.
The range function wil give you a list of numbers, while the for loop will iterate through the list and execute the given code for each of its items.
for i in range(5):
print i
This simply executes print i five times, for i ranging from 0 to 4.
for i in range(5):
a=i+1
This will execute a=i+1 five times. Since you are overwriting the value of a on each iteration, at the end you will only get the value for the last iteration, that is 4+1.
Useful links:
http://www.network-theory.co.uk/docs/pytut/rangeFunction.html
http://www.ibiblio.org/swaroopch/byteofpython/read/for-loop.html
It is looping, probably the problem is in the part of the print...
If you can't find the logic where the system prints, just add the folling where you want the content out:
for i in range(len(vs)):
print vs[i]
print fs[i]
print rs[i]

What does the slice() function do in Python?

First of all, I'd like to clarify the question: it's about the slice() function, not slices of lists or strings like a[5:4:3].
The docs mention that this function is used in NumPy and give no examples of usage (it's said how to use it but it's not said when to use it). Moreover, I've never seen this function used in any Python program.
When should one use the slice() function when programming in plain Python (without NumPy or SciPy)? Any examples will be appreciated.
a[x:y:z] gives the same result as a[slice(x, y, z)]. One of the advantages of a slice object is that it can be stored and retrieved later as a single object instead of storing x, y and z.
It is often used to let the user define their own slice that can later be applied on data, without the need of dealing with many different cases.
(Using function semantics) Calling the slice class instantiates a slice object (start,stop,step), which you can use as a slice specifier later in your program:
>>> myname='Rufus'
>>> myname[::-1] # reversing idiom
'sufuR'
>>> reversing_slice=slice(None,None,-1) # reversing idiom as slice object
>>> myname[reversing_slice]
'sufuR'
>>> odds=slice(0,None,2) # another example
>>> myname[odds]
'Rfs'
If you had a slice you often used, this is preferable to using constants in multiple program areas, and save the pain of keeping 2 or 3 references that had to be typed in
each time.
Of course, it does make it look like an index, but after using Python a while, you learn that everything is not what it looks like at first glance, so I recommend naming your variables better (as I did with reversing_slice, versus odds which isn't so clear.
No, it's not all!
As objects are already mentioned, first you have to know is that slice is a class, not a function returning an object.
Second use of the slice() instance is for passing arguments to getitem() and getslice() methods when you're making your own object that behaves like a string, list, and other objects supporting slicing.
When you do:
print "blahblah"[3:5]
That automatically translates to:
print "blahblah".__getitem__(slice(3, 5, None))
So when you program your own indexing and slicing object:
class example:
def __getitem__ (self, item):
if isinstance(item, slice):
print "You are slicing me!"
print "From", item.start, "to", item.stop, "with step", item.step
return self
if isinstance(item, tuple):
print "You are multi-slicing me!"
for x, y in enumerate(item):
print "Slice #", x
self[y]
return self
print "You are indexing me!\nIndex:", repr(item)
return self
Try it:
>>> example()[9:20]
>>> example()[2:3,9:19:2]
>>> example()[50]
>>> example()["String index i.e. the key!"]
>>> # You may wish to create an object that can be sliced with strings:
>>> example()["start of slice":"end of slice"]
Older Python versions supported the method getslice() that would be used instead of getitem(). It is a good practice to check in the getitem() whether we got a slice, and if we did, redirect it to getslice() method. This way you will have complete backward compatibility.
This is how numpy uses slice() object for matrix manipulations, and it is obvious that it is constantly used everywhere indirectly.
From your question I believe you are looking for an example. So here is what I have when I try to slice a list from range(1, 20) with a step of 3
>>> x = range(1, 20)
>>> x[1:20:3]
[2, 5, 8, 11, 14, 17]
>>> x[slice(1, 20, 3)]
[2, 5, 8, 11, 14, 17]

Categories