How to use Python iterators elegantly

How to use Python iterators elegantly - python

I am trying to use iterators more for looping since I heard it is faster than index looping. One thing I am not sure is about how to treat the end of the sequence nicely. The way I can think of is to use try and except StopIteration, which looks ugly to me.
To be more concrete, suppose we are asked to print the merged sorted list of two sorted lists a and b. I would write the following
aNull = False
I = iter(a)
try:
tmp = I.next()
except StopIteration:
aNull = True
for x in b:
if aNull:
print x
else:
if x < tmp:
print x
else:
print tmp,x
try:
tmp = I.next()
except StopIteration:
aNull = True
while not aNull:
print tmp
try:
tmp = I.next()
except StopIteration:
aNull = True
How would you code it to make it neater?

I think handling a and b more symmetrically would make it easier to read. Also, using the built-in next function in Python 2.6 with a default value avoids the need to handle StopIteration:
def merge(a, b):
"""Merges two iterators a and b, returning a single iterator that yields
the elements of a and b in non-decreasing order. a and b are assumed to each
yield their elements in non-decreasing order."""
done = object()
aNext = next(a, done)
bNext = next(b, done)
while (aNext is not done) or (bNext is not done):
if (bNext is done) or ((aNext is not done) and (aNext < bNext)):
yield aNext
aNext = next(a, done)
else:
yield bNext
bNext = next(b, done)
for i in merge(iter(a), iter(b)):
print i
The following function generalizes the approach to work for arbitrarily many iterators.
def merge(*iterators):
"""Merges a collection of iterators, returning a single iterator that yields
the elements of the original iterators in non-decreasing order. Each of
the original iterators is assumed to yield its elements in non-decreasing
order."""
done = object()
n = [next(it, done) for it in iterators]
while any(v is not done for v in n):
v, i = min((v, i) for (i, v) in enumerate(n) if v is not done)
yield v
n[i] = next(iterators[i], done)

You're missing the whole point of iterators. You don't manually call I.next(), you just iterate through I.
for tmp in I:
print tmp
Edited
To merge two iterators, use the very handy functions in the itertools module. The one you want is probably izip:
merged = []
for x, y in itertools.izip(a, b):
if x < y:
merged.append(x)
merged.append(y)
else:
merged.append(y)
merged.append(x)
Edit again
As pointed out in the comments, this won't actually work, because there could be multiple items from list a smaller than the next item in list b. However, I realised that there is another built-in funciton that deals with this: heapq.merge.

The function sorted works with lists and iterators. Maybe it is not what you desire, but the following code works.
a.expand(b)
print sorted(iter(a))

Related

Get all elements of nested lists with recursion

~ from This Edabit Challenge ~
I need to get all the elements of nested lists and put them all in one list using recursion.My code below prints out each element, but how do I save them all to one list and return them?
It has to be kept in the scope of the function. I can't add a global list and append all of them. It works technically, but it doesn't work for the challenge I'm trying to pass.
I printed the values out (which is var x in the code) because that shows me that I'm getting close (I think). I just need a way to return the values back to my function and have it append it to the list that I will eventually return.
Examples:
flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]]) ➞ ["direction", 372, "one", "Era", "Sruth", 3337, "First"]
flatten([[4666], [5394], [466], [[["Saskia", [[[[["DXTD"]], "Lexi"]]]]]]]) ➞ [4666, 5394, 466, "Saskia", "DXTD", "Lexi"]
Code:
def flatten(arr):
res = []
if isinstance(arr, list):
for i in arr:
res.append(flatten(i))
else:
return arr
if isinstance(res, list):
for i in res:
x = flatten(i)
if x:
print(x)
x = flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]])
print(main)
outputs :
direction
372
one
Era
Sruth
3337
First
[]
The output above shows that my code goes through every non-list value.

Variations of Hai Vu's solutions...
Their first solution uses nested generators, meaning every value gets yielded through that stack of generators. If the structure is deeply nested, this can make the solution take quadratic instead of linear time overall. An alternative is to create a local list in the main function and have the helper function fill it. I prefer using a nested function for that, so I don't have to pass the list around and don't expose the helper function to the outside.
def flatten(nested):
flat = []
def helper(nested):
for e in nested:
if isinstance(e, list):
helper(e)
else:
flat.append(e)
helper(nested)
return flat
Benchmark with 800 integers at depth 800:
26.03 ms Hai_Vu
0.25 ms Kelly
25.62 ms Hai_Vu
0.24 ms Kelly
26.07 ms Hai_Vu
0.24 ms Kelly
Their second solution uses a "queue" (but really treats it like a "reversed" stack, extending/popping only on the left). I think an ordinary stack (using a list) is more natural and simpler:
def flatten(nested):
stack = [nested]
out = []
while stack:
e = stack.pop()
if isinstance(e, list):
stack += reversed(e)
else:
out.append(e)
return out

Pass a list to flatten, and append to it at each step:
def flatten(arr, list_):
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
test = [['a'], 'b']
output = []
flatten(test, output)
output
['a', 'b']
EDIT: If you want specifically to return the list, use
def flatten(arr, list_=None):
if list_ is None:
list_ = []
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
return list_

I would like to offer two solutions: the first uses recursion and the second uses a queue.
First solution
def flatten_helper(nested):
for e in nested:
if isinstance(e, list):
yield from flatten_helper(e)
else:
yield e
def flatten(nested):
return list(flatten_helper(nested))
The flatten_helper function is a generator, which generates a list of elements that are not a list. If an element is a list, we call flatten_helper again until we get non-list elements.
Second solution
import collections
def flatten(nested):
queue = collections.deque(nested)
out = []
while queue:
e = queue.popleft()
if isinstance(e, list):
queue.extendleft(reversed(e))
else:
out.append(e)
return out
In this solution, we loop through the nested list. If the element is a list, we place each sub element into a queue for later processing. If the element is not a list, we append it to out.

Another possibility... more on the same wavelength of Hai Vu 1st solution:
def flatter(lst):
output = []
for i in lst:
if isinstance(i, list):
output.extend(flatter(i))
else:
output.append(i)
return output

Python zip(): Check which iterable got exhausted

In Python 3, zip(*iterables) as of the documentation
Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted.
As an example, I am running
for x in zip(a,b):
f(x)
Is there a way to find out which of the iterables, a or b, led to the stopping of the zip iterator?
Assume that len() is not reliable and iterating over both a and b to check their lengths is not feasible.

I found the following solution which replaces zip with a for loop over only the first iterable and iterates over the second one inside the loop.
ib = iter(b)
for r in a:
try:
s = next(ib)
except StopIteration:
print('Only b exhausted.')
break
print((r,s))
else:
try:
s = next(ib)
print('Only a exhausted.')
except StopIteration:
print('a and b exhausted.')
Here ib = iter(b) makes sure that it also works if b is a sequence or generator object. print((r,s)) would be replaced by f(x) from the question.

I think Jan has the best answer. Basically, you want to handle the last iteration from zip separately.
import itertools as it
a = (x for x in range(5))
b = (x for x in range(3))
iterables = ((it.chain(g,[f"generator {i} was exhausted"]) for i,g in enumerate([a,b])))
for i, j in zip(*iterables):
print(i, j)
# 0 0
# 1 1
# 2 2
# 3 generator 1 was exhausted

If you have only two iterables, you can use the below code. The exhausted[0] will have your indicator for which iterator was exhausted. Value of None means both were exhausted.
However I must say that I do not agree with len() not being reliable. In fact, you should depend on the len() call to determine the answer. (unless you tell us the reason why you can not.)
def f(val):
print(val)
def manual_iter(a,b, exhausted):
iters = [iter(it) for it in [a,b]]
iter_map = {}
iter_map[iters[0]] = 'first'
iter_map[iters[1]] = 'second'
while 1:
values = []
for i, it in enumerate(iters):
try:
value = next(it)
except StopIteration:
if i == 0:
try:
next(iters[1])
except StopIteration:
return None
exhausted.append(iter_map[it])
return iter_map[it]
values.append(value)
yield tuple(values)
if __name__ == '__main__':
exhausted = []
a = [1,2,3]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3,4]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3]
b = [10,20,30,40]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)

See below for by me written function zzip() which will do what you want to achieve. It uses the zip_longest method from the itertools module and returns a tuple with what zip would return plus a list of indices which if not empty shows at which 0-based position(s) was/were the iterable/iterables) becoming exhausted before other ones:
def zzip(*args):
""" Returns a tuple with the result of zip(*args) as list and a list
with ZERO-based indices of iterables passed to zzip which got
exhausted before other ones. """
from itertools import zip_longest
nanNANaN = 'nanNANaN'
Zipped = list(zip_longest(*args, fillvalue=nanNANaN))
ZippedT = list(zip(*Zipped))
Indx_exhausted = []
indx_nanNANaN = None
for i in range(len(args)):
try: # gives ValueError if nanNANaN is not in the column
indx_nanNANaN = ZippedT[i].index(nanNANaN)
Indx_exhausted += [(indx_nanNANaN, i)]
except ValueError:
pass
if Indx_exhausted: # list not empty, iterables were not same length
Indx_exhausted.sort()
min_indx_nanNANaN = Indx_exhausted[0][0]
Indx_exhausted = [
i for n, i in Indx_exhausted if n == min_indx_nanNANaN ]
return (Zipped[:min_indx_nanNANaN], Indx_exhausted)
else:
return (Zipped, Indx_exhausted)
assert zzip(iter([1,2,3]),[4,5],iter([6])) ==([(1,4,6)],[2])
assert zzip(iter([1,2]),[3,4,5],iter([6,7]))==([(1,3,6),(2,4,7)],[0,2])
assert zzip([1,2],[3,4],[5,6]) ==([(1,3,5),(2,4,6)],[])
The code above runs without raising an assertion error on the used test cases.
Notice that the 'for loop' in the function loops over the items of the passed parameter list and not over the elements of the passed iterables.

How can I retrieve the remaining items in a for loop in Python?

I have a simple for loop iterating over a list of items. At some point, I know it will break. How can I then return the remaining items?
for i in [a,b,c,d,e,f,g]:
try:
some_func(i)
except:
return(remaining_items) # if some_func fails i.e. for c I want to return [c,d,e,f,g]
I know I could just take my inital list and delete the the items from the beginning for every iteration one by one. But is there maybe some native Python function for this or something more elegant?

You can make use of enumerate that yields both the element and its index in the list.
myList = [a,b,c,d,e,f,g]
for index, item in enumerate(myList):
try:
some_func(item)
except:
return myList[index:]
Test-it online

You could use itertools.dropwhile, you use a test method to apply some_func, when it raises an error, the test becomes False and so it stops and dropwhile return the remaining ones
from itertools import dropwhile
def my_fct():
def test(v):
try:
some_func(v)
return False
except:
return True
return list(dropwhile(test, [a, b, c, d, e, f, g]))

def test():
myList = [1,2,3,4,0,7,8,9]
for index, item in enumerate(myList):
try:
print(index,item)
1/item
except:
return(myList[index:])

Check if a generator really generate something in one line

Many functions of a library I use return generators instead of lists when the result is some collection.
Sometimes I just want to check if the result is empty of not, and of course I can't write something like:
result = return_generator()
if result:
print 'Yes, the generator did generate something!'
Now I come up with a one-liner that solve this problem without consuming the generator:
result = return_generator()
if zip("_", result):
print 'Yes, the generator did generate something!'
I wonder if there are cleaner ways to solve this problem in one line?

here is one way using itertools tee function which duplicates the generator:
from itertools import tee
a, b = tee(xrange(0))
try:
next(a)
print list(b)
except StopIteration:
print "f1 was empty"
a, b = tee(xrange(3))
try:
next(a)
print list(b)
except StopIteration:
print "f2 was empty"
>>>
[0, 1, 2, 3]
f2 was empty

This zip stuff eats the first item yielded, so it is not a good idea as well.
You can only detect if a generator has an item yielding by getting and keeping it until needed. The following class will help you to do so.
If needed, it gets an item from the iterator and keeps it.
If asked for emptyness (if myiterwatch: ...), it tries to get and returns if it could get one.
If asked for the next item, it will return the retrieved one or a new one.
class IterWatch(object):
def __init__(self, it):
self.iter = iter(it)
self._pending = []
#property
def pending(self):
try:
if not self._pending:
# will raise StopIteration if exhausted
self._pending.append(next(self.iter))
except StopIteration:
pass # swallow this
return self._pending
def next(self):
try:
return self.pending.pop(0)
except IndexError:
raise StopIteration
__next__ = next # for Py3
def __iter__(self): return self
def __nonzero__(self):
# returns True if we have data.
return not not self.pending
# or maybe bool(self.pending)
__bool__ = __nonzero__ # for Py3
This solves the problem in a very generic way. If you have an iterator which you just want to test, you can use
guard = object()
result = return_generator()
if next(result, guard) is not guard:
print 'Yes, the generator did generate something!'
next(a, b) returns b if iterator a is exhausted. So if it returns the guard in our case, it didn't generate something, otherwise it did.
But your zip() approach is perfectly valid as well...

Instead of returning the generator, just use it? Here's an example of a generator that may or may not return results - if there are results action is taken, if there are no results no action is taken.
#!/usr/bin/python
import time
def do_work():
if int(time.time()) % 2:
for a in xrange(0,100):
yield a
for thing in do_work():
print thing

Here are the two most simple solutions for your situation that I can think of:
def f1():
return (i for i in range(10))
def f2():
return (i for i in range(0))
def has_next(g):
try:
from itertools import chain
return chain([g.next()],g)
except StopIteration:
return False
g = has_next(f1())
if g:
print list(g)
g = has_next(f2())
if g:
print list(g)
def has_next(g):
for i in g:
return chain([i],g)
return False
g = has_next(f1())
if g:
print list(g)
g = has_next(f2())
if g:
print list(g)

Yielding from sorted iterators in sorted order in Python?

Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.

yes, you want heapq.merge() which does exactly one thing; iterate over sorted iterators in order
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use Python iterators elegantly - python

The function sorted works with lists and iterators. Maybe it is not what you desire, but the following code works. a.expand(b) print sorted(iter(a))

Related

Get all elements of nested lists with recursion

Python zip(): Check which iterable got exhausted

How can I retrieve the remaining items in a for loop in Python?

Check if a generator really generate something in one line

Yielding from sorted iterators in sorted order in Python?

Categories

Resources