Given,
import itertools as it
def foo():
idx = 0
while True:
yield idx
idx += 1
k = foo()
When I use zip() as in the following,
>>> list(zip(k,[11,12,13]))
[(0, 11), (1, 12), (2, 13)]
and then immediately after,
>>> list(zip(k,[11,12,13]))
[(4, 11), (5, 12), (6, 13)]
Notice that the second zip should have started with (3,11) but it jumped to (4,11)
instead. It's as if there is another hidden next(k) somewhere. This does not happen when I use it.islice
>>> k = foo()
>>> list(it.islice(k,6))
[0, 1, 2, 3, 4, 5]
Notice it.islice is not missing the 3 term.
I am using Python 3.8.
zip basically (and necessarily, given the design of the iterator protocol) works like this:
# zip is actually a class, but we'll pretend it's a generator
# function for simplicity.
def zip(xs, ys):
# zip doesn't require its arguments to be iterators, just iterable
xs = iter(xs)
ys = iter(ys)
while True:
x = next(xs)
y = next(ys)
yield x, y
There is no way to tell if ys is exhausted before an element of xs is consumed, and the iterator protocol doesn't provide a way for zip to put x "back" in xs if next(ys) raises a StopIteration exception.
For the special case where one of the input iterables is sized, you can do a little better than zip:
import itertools as it
from collections.abc import Sized
def smarter_zip(*iterables):
sized = [i for i in iterables if isinstance(i, Sized)]
try:
min_length = min(len(s) for s in sized)
except ValueError:
# can't determine a min length.. fall back to regular zip
return zip(*iterables)
return zip(*[it.islice(i, min_length) for i in iterables])
It uses islice to prevent zip from consuming more from each iterator than we know is strictly necessary. This smarter_zip will solve the problem for the case posed in the original question.
However, in the general case, there is no way to tell beforehand whether an iterator is exhausted or not (consider a generator yielding bytes arriving on a socket). If the shortest of the iterables is not sized, the original problem still remains. For solving the general case, you may want to wrap iterators in a class which remembers the last-yielded item, so that it can be recalled from memory if necessary.
Related
I have an iterator that consists of several lists of the same size. For my purpose I need to know the length of at least one of these lists. But as it is with iterators they can't be accessed the same way as ordinary arrays. So my idea was to get this length by saying:
for i in iter:
list_len = len(i)
break
And this works, however, when using this list later on, and wanting to loop over it again it skips the first iteration, and basically continues from the next iteration from the previous loop (the one above).
Is there some way to fix this ? Or, what is the pythonic way of doing it ?
I was thinking/reading about doing it like:
from itertools import tee
iter_tmp, iter = tee(iter)
for i in iter_tmp:
list_len = len(i)
break
And yeah, that works too, since I can now use the original iter for later use, but it just hurt my eyes that I have to make a loop, import itertools and such just to get the length of a list in an iterator. But maybe that is just the way to go about it ?
UPDATE
Just trying to further explain what I'm doing.
As such iterations is not a list or an array, but in my case, if I were to loop through my iterator I would get something like (in the case of my iterator having four "lists" in it):
>>> for i in iter_list:
print(i)
[1, 2, 5, 3]
[3, 2, 5, 8]
[6, 8, 3, 7]
[1, 4, 6, 1]
Now, all "lists" in the iterator has the same length, but since the lists themselves are calculated through many steps, I really don't know the length in any way before it enters the iterator. If I don't use an iterator I run out of memory - so it is a pro/con solution. But yeah, it is the length of just one of the lists I need as a constant I can use throughout the rest of my code.
That is how iterators work. But you have a few options apart from tee.
You can extract the first element and reuse it when iterating the second time:
first_elem = next(my_iter)
list_len = len(first_elem)
for l in itertools.chain([first_elem], my_iter):
pass
Or if you are going to iterate over the iterator more times, you could perhaps listify it (if it's feasible to fit in memory).
my_list = list(my_iter)
first_len = len(my_list[0])
for l in my_list:
pass
And certainly not the least, as Palivek said, keep/get the information about the length of the lists (from) somewhere else.
In general iterators are not re-iteratable so you'll probably need to store something additional anyway.
class peek_iterator(object):
def __init__(self, source):
self._source = iter(source)
self._first = None
self._sent = False
def __iter__(self):
return self
def next(self):
if self._first is None:
self._first = self._source.next()
if self._sent:
return self._source.next()
self._sent = True
return self._first
def get_isotropic(self, getter):
if self._first is None:
self._first = self._source.next()
return getter(self._first)
lists = [[1, 2, 3], [4, 5, 6]]
i = peek_iterator(lists)
print i.get_isotropic(len) # 3
for j in i: print j # [1, 2, 3]; [4, 5, 6]
You can do a little trick and wrap the original iterator in a generator. This way, you can obtain the first element and "re-yield" it with the generator without consuming the entire iterator. The head() function below returns the first element and a generator that iterates over the original sequence.
def head(seq):
seq_iter = iter(seq)
first = next(seq_iter)
def gen():
yield first
yield from seq_iter
return first, gen()
seq = range(100, 300, 50)
first, seq2 = head(seq)
print('first item: {}'.format(first))
for item in seq2:
print(item)
Output:
first item: 100
100
100
150
200
250
This is conceptually equivalent to Moberg's answer, but uses a generator to "re-assemble" the original sequence instead of itertools.chain().
Suppose I have a nested tuple as follows:
a = (((1, 2), 2), 3)
I know we could use a[0][0][1] to acquire the second element 2. However, this method might be inefficient with a long tuple. Is there any more efficient way to acquire the tuple element in this case?
You may write function for access tuple value
a = (((1, 2), 2), 3)
def access(obj, indexes):
a = obj
for i in indexes:
try:
a = a[i]
except IndexError:
return None
# except TypeError:
# when you try to index deeper than the object supports
# a is not constrained to be a scalar, it may still be dimensional
# if insufficient indexes were passed.
return a
print(access(a,(0,0,0)))
print(access(a,(0,0)))
Output
1
(1, 2)
If you want to be able to do this dynamically and know that your indices are valid you can use functools.reduce to write this compactly:
from functools import reduce
reduce(lambda it, idx: it[idx], [0, 0, 1], a) # returns 2
I would like to loop through a list checking each item against the one following it.
Is there a way I can loop through all but the last item using for x in y? I would prefer to do it without using indexes if I can.
Note
freespace answered my actual question, which is why I accepted the answer, but SilentGhost answered the question I should have asked.
Apologies for the confusion.
for x in y[:-1]
If y is a generator, then the above will not work.
the easiest way to compare the sequence item with the following:
for i, j in zip(a, a[1:]):
# compare i (the current) to j (the following)
If you want to get all the elements in the sequence pair wise, use this approach (the pairwise function is from the examples in the itertools module).
from itertools import tee, izip, chain
def pairwise(seq):
a,b = tee(seq)
b.next()
return izip(a,b)
for current_item, next_item in pairwise(y):
if compare(current_item, next_item):
# do what you have to do
If you need to compare the last value to some special value, chain that value to the end
for current, next_item in pairwise(chain(y, [None])):
if you meant comparing nth item with n+1 th item in the list you could also do with
>>> for i in range(len(list[:-1])):
... print list[i]>list[i+1]
note there is no hard coding going on there. This should be ok unless you feel otherwise.
To compare each item with the next one in an iterator without instantiating a list:
import itertools
it = (x for x in range(10))
data1, data2 = itertools.tee(it)
data2.next()
for a, b in itertools.izip(data1, data2):
print a, b
This answers what the OP should have asked, i.e. traverse a list comparing consecutive elements (excellent SilentGhost answer), yet generalized for any group (n-gram): 2, 3, ... n:
zip(*(l[start:] for start in range(0, n)))
Examples:
l = range(0, 4) # [0, 1, 2, 3]
list(zip(*(l[start:] for start in range(0, 2)))) # == [(0, 1), (1, 2), (2, 3)]
list(zip(*(l[start:] for start in range(0, 3)))) # == [(0, 1, 2), (1, 2, 3)]
list(zip(*(l[start:] for start in range(0, 4)))) # == [(0, 1, 2, 3)]
list(zip(*(l[start:] for start in range(0, 5)))) # == []
Explanations:
l[start:] generates a a list/generator starting from index start
*list or *generator: passes all elements to the enclosing function zip as if it was written zip(elem1, elem2, ...)
Note:
AFAIK, this code is as lazy as it can be. Not tested.
Is there an equivalent of cons in Python? (any version above 2.5)
If so, is it built in? Or do I need easy_install do get a module?
WARNING AHEAD: The material below may not be practical!
Actually, cons needs not to be primitive in Lisp, you can build it with λ.
See Use of lambda for cons/car/cdr definition in SICP for details. In Python, it is translated to:
def cons(x, y):
return lambda pair: pair(x, y)
def car(pair):
return pair(lambda p, q: p)
def cdr(pair):
return pair(lambda p, q: q)
Now, car(cons("a", "b")) should give you 'a'.
How is that? Prefix Scheme :)
Obviously, you can start building list using cdr recursion. You can define nil to be the empty pair in Python.
def nil(): return ()
Note that you must bind variable using = in Python. Am I right? Since it may mutate the variable, I'd rather define constant function.
Of course, this is not Pythonic but Lispy, not so practical yet elegant.
Exercise: Implement the List Library http://srfi.schemers.org/srfi-1/srfi-1.html of Scheme in Python. Just kidding :)
In Python, it's more typical to use the array-based list class than Lisp-style linked lists. But it's not too hard to convert between them:
def cons(seq):
result = None
for item in reversed(seq):
result = (item, result)
return result
def iter_cons(seq):
while seq is not None:
car, cdr = seq
yield car
seq = cdr
>>> cons([1, 2, 3, 4, 5, 6])
(1, (2, (3, (4, (5, (6, None))))))
>>> iter_cons(_)
<generator object uncons at 0x00000000024D7090>
>>> list(_)
[1, 2, 3, 4, 5, 6]
Note that Python's lists are implemented as vectors, not as linked lists. You could do lst.insert(0, val), but that operation is O(n).
If you want a data structure that behaves more like a linked list, try using a Deque.
In Python 3, you can use the splat operator * to do this concisely by writing [x, *xs]. For example:
>>> x = 1
>>> xs = [1, 2, 3]
>>> [x, *xs]
[1, 1, 2, 3]
If you prefer to define it as a function, that is easy too:
def cons(x, xs):
return [x, *xs]
You can quite trivially define a class that behaves much like cons:
class Cons(object):
def __init__(self, car, cdr):
self.car = car
self.cdr = cdr
However this will be a very 'heavyweight' way to build basic data structures, which Python is not optimised for, so I would expect the results to be much more CPU/memory intensive than doing something similar in Lisp.
No. cons is an implementation detail of Lisp-like languages; it doesn't exist in any meaningful sense in Python.
I have a generator function and want to get the first ten items from it; my first attempt was:
my_generator()[:10]
This doesn't work because generators aren't subscriptable, as the error tells me. Right now I have worked around that with:
list(my_generator())[:10]
This works since it converts the generator to a list; however, it's inefficient and defeats the point of having a generator. Is there some built-in, Pythonic equivalent of [:10] for generators?
import itertools
itertools.islice(mygenerator(), 10)
itertools has a number of utilities for working with iterators. islice takes start, stop, and step arguments to slice an iterator just as you would slice a list.
to clarify the above comments:
from itertools import islice
def fib_gen():
a, b = 1, 1
while True:
yield a
a, b = b, a + b
assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))