Indexing form the end of a generator - python

Say I have a generator in Python and I want to iterate over everything in it except the first 10 iterations and the last 10 iterations. itertools.islice supports the first part of this slicing operation, but not the second. Is there a simple way to accomplish this?

Something like this might do the job. EDIT: Added use of deque as per comments.
from collections import deque
def generator():
for i in ['ignore'] * 10 + ['yield this'] * 10 + ['ignore'] * 10:
yield i
def func(mygenerator):
cache = deque()
for i, item in enumerate(mygenerator()):
if i < 10:
continue
cache.appendleft(item)
if len(cache) > 10:
yield cache.pop()
for i in func(generator):
print i

Not only is there not a simple way, there is not a way at all, if you want to allow any generator (or any iterable). In general, there is no way to know when you are 10 items from the end of a generator, or even whether the generator has an end. Generators only give you one item at a time, and tell you nothing about how many items are "left". You would have to iterate through the entire generator, keeping a temporary cache of the most recent 10 items, and then yield those when (or if!) the generator terminates.
Note the "or if". A generator need not be finite. For an infinite generator, there is no such thing as the "last" 10 elements.

Related

What is a nice, python style way to yield ranged subsets of a string/bytes/list several items at a time?

I want to loop over bytes data, and I'd hope the same principle would apply to strings and lists, where I don't go item by item, but a few items at a time. I know I can do mystr[0:5] to get the first five characters, and I'd like to do that in a loop.
I can do it the C style way, looping over ranges and then returning the remaining elements, if any:
import math
def chunkify(listorstr, chunksize:int):
# Loop until the last chunk that is still chunksize long
end_index = int(math.floor(len(listorstr)/chunksize))
for i in range(0, end_index):
print(f"yield ")
yield listorstr[i*chunksize:(i+1)*chunksize]
# If anything remains at the end, yield the rest
remainder = len(listorstr)%chunksize
if remainder != 0:
yield listorstr[end_index*chunksize:len(listorstr)]
[i for i in chunkify("123456789", 2)]
This works just fine, but I strongly suspect python language features could make this a lot more compact.
You can condense your code using the range step parameter. Instead of the for loop, a generator for this is
listorstr[i:i+chunksize] for i in range(0,len(listorstr), chunksize)
Your function could yield from this generator to make for a more tidy call.
def chunkify(listorstr, chunksize:int):
yield from (listorstr[i:i+chunksize]
for i in range(0,len(listorstr), chunksize))

Python get the last element from generator items

I'm super amazed using the generator instead of list.
But I can't find any solution for this question.
What is the efficient way to get the first and last element from generator items?
Because with list we can just do lst[0] and lst[-1]
Thanks for the help. I can't provide any codes since it's clearly that's just what I want to know :)
You have to iterate through the whole thing. Say you have this generator:
def foo():
yield 0
yield 1
yield 2
yield 3
The easiest way to get the first and last value would be to convert the generator into a list. Then access the values using list lookups.
data = list(foo())
print(data[0], data[-1])
If you want to avoid creating a container, you could use a for-loop to exhaust the generator.
gen = foo()
first = last = next(gen)
for last in gen: pass
print(first, last)
Note: You'll want to special case this when there are no values produced by the generator.

Inconsistent behavior of python generators

The following python code produces [(0, 0), (0, 7)...(0, 693)] instead of the expected list of tuples combining all of the multiples of 3 and multiples of 7:
multiples_of_3 = (i*3 for i in range(100))
multiples_of_7 = (i*7 for i in range(100))
list((i,j) for i in multiples_of_3 for j in multiples_of_7)
This code fixes the problem:
list((i,j) for i in (i*3 for i in range(100)) for j in (i*7 for i in range(100)))
Questions:
The generator object seems to play the role of an iterator instead of providing an iterator object each time the generated list is to be enumerated. The later strategy seems to be adopted by .Net LINQ query objects. Is there an elegant way to get around this?
How come the second piece of code works? Shall I understand that the generator's iterator is not reset after looping through all multiples of 7?
Don't you think that this behavior is counter intuitive if not inconsistent?
A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators. This behavior is not something you can change with a switch somewhere, so any work around amounts to either using an iterable (e.g. a list) instead of an generator or repeatedly constructing generators.
The second snippet does the latter. It is by definition equivalent to the loops
for i in (i*3 for i in range(100)):
for j in (i*7 for i in range(100)):
...
Hopefully it isn't surprising that here, the latter generator expression is evaluated anew on each iteration of the outer loop.
As you discovered, the object created by a generator expression is an iterator (more precisely a generator-iterator), designed to be consumed only once. If you need a resettable generator, simply create a real generator and use it in the loops:
def multiples_of_3(): # generator
for i in range(100):
yield i * 3
def multiples_of_7(): # generator
for i in range(100):
yield i * 7
list((i,j) for i in multiples_of_3() for j in multiples_of_7())
Your second code works because the expression list of the inner loop ((i*7 ...)) is evaluated on each pass of the outer loop. This results in creating a new generator-iterator each time around, which gives you the behavior you want, but at the expense of code clarity.
To understand what is going on, remember that there is no "resetting" of an iterator when the for loop iterates over it. (This is a feature; such a reset would break iterating over a large iterator in pieces, and it would be impossible for generators.) For example:
multiples_of_2 = iter(xrange(0, 100, 2)) # iterator
for i in multiples_of_2:
print i
# prints nothing because the iterator is spent
for i in multiples_of_2:
print i
...as opposed to this:
multiples_of_2 = xrange(0, 100, 2) # iterable sequence, converted to iterator
for i in multiples_of_2:
print i
# prints again because a new iterator gets created
for i in multiples_of_2:
print i
A generator expression is equivalent to an invoked generator and can therefore only be iterated over once.
The real issue as I found out is about single versus multiple pass iterables and the fact that there is currently no standard mechanism to determine if an iterable single or multi pass: See Single- vs. Multi-pass iterability
If you want to convert a generator expression to a multipass iterable, then it can be done in a fairly routine fashion. For example:
class MultiPass(object):
def __init__(self, initfunc):
self.initfunc = initfunc
def __iter__(self):
return self.initfunc()
multiples_of_3 = MultiPass(lambda: (i*3 for i in range(20)))
multiples_of_7 = MultiPass(lambda: (i*7 for i in range(20)))
print list((i,j) for i in multiples_of_3 for j in multiples_of_7)
From the point of view of defining the thing it's a similar amount of work to typing:
def multiples_of_3():
return (i*3 for i in range(20))
but from the point of view of the user, they write multiples_of_3 rather than multiples_of_3(), which means the object multiples_of_3 is polymorphic with any other iterable, such as a tuple or list.
The need to type lambda: is a bit inelegant, true. I don't suppose there would be any harm in introducing "iterable comprehensions" to the language, to give you what you want while maintaining backward compatibility. But there are only so many punctuation characters, and I doubt this would be considered worth one.

In Python, When yield cost over the return an list?

In many case, people all ways say "use the yield to lazily create element."
but I think everything have cost, include the yield and its iterator.
In effective nord eyes, I think it's nice question.
so,for example, when I get an function.
def list_gen(n):
if n > MAGIC_NUM:
return xrange(n)
else:
return range(n)
How much dose the MAGIC_NUM is?
UPDATE sorry for this mistake, I'm origin meaning is compare the iterator's cost and list cost.
UPDATE AGAIN Please imaging an case. Whether have an condition, that the memory so limit that it's can't create an iterator.
ha, this question is more funny now.
UPDATE AGAIN Why does create an iterator and save the yield context are less then create a list? or How much does iterator cost ?(sorry for my insult) How many bytes?
You're mixing several things up.
def list_gen(n):
i=0
while i<n:
yield i
i += 1
This function is a generator. Calling it returns a generator object, which is an iterator.
An iterator is a thing that has next(), i.e. it can be traversed over once. An iterator is created over something using iter whenever you do a for i in something.
def list_gen(n):
return range(n)
def list_gen(n):
return xrange(n)
These functions are regular functions. One returns a list and the other returns an xrange object. Both lists and xranges are iterable, i.e. multiple independent iterators can be created for them.
So back to your question: You're asking whether to return a list or an xrange object.
That depends, obviously! It depends on what you want to do with the result.
If you want to mutate it somehow, then you need a real list. Use range directly.
If you only want to iterate over it, then it doesn't make a difference semantically: both an xrange object and a list returned by range will produce an iterator which iterates over the same sequence.
However, if you use xrange, you'll never create the whole list in memory. Why create a full-fledged list object in memory if all you want to do is a simple iteration? You don't need to allocate a temporary large memory buffer whenever you want a for loop, right?
Hence: It's safe to stick with xrange, since the caller can always make a list out of it.
Let's confirm that with a benchmark. We want to know if it's faster to iterate over xranges than over lists constructed by range (including the cost of range call, of course).
Code:
import timeit
ns = [1,2,3, 5, 10, 50, 100]
print 'n', '\t', 'range', '\t', 'xrange'
for n in ns:
t1 = timeit.timeit("for i in range({}): pass".format(n))
t2 = timeit.timeit("for i in xrange({}): pass".format(n))
print n, '\t', t1, '\t', t2
Result:
n range xrange
1 0.566222990493 0.418698436395
2 0.594136874362 0.477882061758
3 0.630704800817 0.488603362929
5 0.725149288913 0.540597548519
10 0.90297752809 0.687031507818
50 2.44493085566 1.89102105759
100 4.31189321914 3.33713522433
It has nothing to do with the length of the iterator you are generating, but with how you need to use it afterwards. If you only need to use it once then you should definitely go for yield, if you'll go on an use it multiple times you can skip yield and just get a regular list. Keep in mind generators you get using yield can only be iterated once.
Although your question and its title is still kind of mixed up, I'll try to answer it the way I understand.
If you only want to iterate over the result of (x)range(), a xrange() (special object) is better than a range() (list) for shorter as well as longer ranges:
$ python -m timeit 'a=range(3)' 'for i in a: pass'
1000000 loops, best of 3: 0.608 usec per loop
$ python -m timeit 'a=xrange(3)' 'for i in a: pass'
1000000 loops, best of 3: 0.466 usec per loop
$ python -m timeit 'a=xrange(30000)' 'for i in a: pass'
1000 loops, best of 3: 1.01 msec per loop
$ python -m timeit 'a=range(30000)' 'for i in a: pass'
1000 loops, best of 3: 1.49 msec per loop
So it will be better to use xrange() always.
If you have a look at the general case, it might be slightly different: you compare "pre-producing" values/objects, storing them in a list and processing them afterwards with consuming them directly after production:
def gen(num):
import random
i = 0
while i < num:
value = random.random()
yield value
i += 1
def process(value): pass
def test1(num):
data = list(gen(num))
for i in data: process(num)
def test2(num):
for i in gen(num): process(num)
Here it depends how production and consumption can interact, and how big the overhead is.
If you want them to act independently, you can do 'both at once' with threading:
def list_eater(l):
while l:
yield l.pop(0)
def test3(num):
data = []
def producer():
for i in gen(num): data.append(i)
import threading
consumerthread = threading.Thread(target=producer)
consumerthread.start()
while data or consumerthread.isAlive():
for item in list_eater(data): process(item)
# Optimizeable. Does idle waiting; a threading.Condition might be quite useful here...
runs the production and consumes all items as they are here no atter ho long it needs them to be produced or consumed.
Please note that it is not possible to use both yield and return. A function can be either a generator function or a normal function, but not both.
Usually yield avoids having to create an intermediate list, but instead yields elements one by one. This can be useful for instance when you are recursively walking a tree. See this link for an example: http://code.activestate.com/recipes/105873-walk-a-directory-tree-using-a-generator/
Another use of a generator would be when you want to return numerous elements, but your user is probably interested in the first few only (e.g. for search results).
Avoiding the intermediate list will save memory, but only if the caller does not need to create a list out of the results. In general the advantage is that it will allow you to code your generator function more consisely.
Using yield or a generator is mostly irrelevant to the list size, for example:
if you don't need to process the whole list and could break shortly, it's more efficient to use a generator,.
to simulate a stream with infinite size, for example a prime number generator.
If however, you have limited memory, an embedded system for example, and can't create the whole list at once then it becomes necessary to use a generator.
As for the cost, there's an additional cost to using a generator, if you count the cost of evaluating the call to the generator each time it's called, but using a list will take more memory, so you can not say generally that a generator is better than a list, since it involves some trade off between memory and performance, whether to use a generator or not depends on your needs and the situation.

Python: If an iterator is an expression, is it calculated every time?

Take the following example:
>>> for item in [i * 2 for i in range(1, 10)]:
print item
2
4
6
8
10
12
14
16
18
Is [i * 2 for i in range(1, 10)] computed every time through the loop, or just once and stored? (Also, what is the proper name for that part of the expression?)
One reason I would want to do this is that I only want the results of that list comprehension to be available in the loop.
A good translation of for i in <whatever>: <loopbody>, showing exactly what it does for any <whatever> and any <loopbody>:
_aux = iter(<whatever>)
while True:
try: i = next(_aux)
except StopIteration: break
<loopbody>
except that the pseudo-variable I have here named _aux actually remains unnamed.
So, <whatever> always gets evaluated just once (to get an iter() from it) and the resulting iterator is nexted until it runs out (unless there's some break in the <loopbody>).
With a listcomp, as you've used, the evaluation produces a list object (which in your code sample remains unnamed). In the very similar code:
for item in (i * 2 for i in range(1, 10)): ...
using a genexp rather than the listcomp (syntactically, round parentheses instead of the listcomp's square brackets), it's the next() that actually does most of the work (advancing i and doubling it), instead of lumping all work at construction time -- this takes up less temporary memory, and may save time if the loop body's reasonably likely to break out early, but except in such special conditions (very tight memory or likely early loop termination) a listcomp may typically be (by a wee little bit) faster.
All members of the expression list are calculated once, and then iterated over.
In Python 2.x, the variable used in a LC does leak out into the parent scope, but since the LC has already been evaluated, the only value available is the one used to generate the final element in the resultant list.
In that case, you are constructing a list in memory and then you are iterating over the contents of the list, so yes, it is computed once and stored. it is the no different than doing
i for i in [2,4,6,8]:
print(i)
If you do iter(i * 2 for i in xrange(1,10)), you get an iterator which evaluates on each iteration.

Categories