I'm working on a problem from stuy's coding problems and came across this one.
So given two generators that each output numbers in increasing order, merge the two generators into one generator that outputs the numbers in increasing order. If duplicates occur, output the number as many times as it occurs.
My attempt: Since I'm more familiar with working with lists, tuples, dictionaries, etc, I thought I'd just make a helper to create a list of items in the generators. Then I'd merge the two lists and sort them
def list_maker(gener):
l1 = []
for item in gener:
l1.append(item)
return l1
def merge_gens(first_gen, second_gen):
first_list = list_maker(first_gen)
second_list = list_maker(second_gen)
first_list.extend(second_list)
final_list = first_list
final_list.sort()
yield from final_list
Although this approach seems to work on finite generators, it does not on infinite generators(which I forgot to account for). I obviously can't have a list of infinite items. Could I get help on how to do this without importing python libraries?
You can try :
def merge(first, second):
a = next(first)
b = next(second)
while(True):
# yield the smaller one
yield a if a < b else b
# get the next number from the
# generator that yielded the smaller one
if a < b:
a = next(first)
elif a==b:
# when the numbers are equal
# yield second number a second time
yield a
# get the next numbers from both the generators.
a = next(first)
b = next(second)
else:
b = next(second)
Sorry for the lack of comments and explanation. I haven't tested edge cases. I hope you get the general gist of the approach and would help you get the pointers to work on your task further.
Assumption
- StopIteration exceptions will be handled by the callee
This was a bit tricky to handle the edge cases, I had fun with this. Haven't tested it fully, and it's in a pretty verbose state right now, a couple helper functions could add clarity:
def merge(first, second):
first = iter(first)
second = iter(second)
exhausted = object()
f = next(first, exhausted)
if f is exhausted:
yield from second
s = next(second, exhausted)
if s is exhausted:
yield f
yield from first
return
while True:
if f is exhausted:
if s is not exhausted:
yield s
yield from second
return
elif s is exhausted:
if f is not exhausted:
yield f
yield from first
return
elif f < s:
yield f
f = next(first, exhausted)
elif f == s:
yield f
yield s
f = next(first, exhausted)
s = next(second, exhausted)
else:
yield s
s = next(second, exhausted)
I think the following makes it more readable by removing some of the deeper nesting and re-using logic:
def merge(first, second):
first = iter(first)
second = iter(second)
exhausted = object() # just a unique sentinel value
def _cleanup(item, iterator):
if item is not exhausted:
yield item
yield from iterator
f = next(first, exhausted)
if f is exhausted:
yield from second
s = next(second, exhausted)
if s is exhausted:
yield from _cleanup(f, first)
return
while True:
if f is exhausted:
yield from _cleanup(s, second)
return
elif s is exhausted:
yield from _cleanup(f, first)
return
elif f < s:
yield f
f = next(first, exhausted)
elif f == s:
yield f
yield s
f = next(first, exhausted)
s = next(second, exhausted)
else:
yield s
s = next(second, exhausted)
The key idea is to keep asking for a value from each of the iterators, yielding the smallest item (or if they are equal, yield both items), and only drawing from the iterator that gave you the smallest item (or from both if they are equal) until one iterator is exhausted then you clean it all up by delegating to the other.
Related
~ from This Edabit Challenge ~
I need to get all the elements of nested lists and put them all in one list using recursion.My code below prints out each element, but how do I save them all to one list and return them?
It has to be kept in the scope of the function. I can't add a global list and append all of them. It works technically, but it doesn't work for the challenge I'm trying to pass.
I printed the values out (which is var x in the code) because that shows me that I'm getting close (I think). I just need a way to return the values back to my function and have it append it to the list that I will eventually return.
Examples:
flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]]) ➞ ["direction", 372, "one", "Era", "Sruth", 3337, "First"]
flatten([[4666], [5394], [466], [[["Saskia", [[[[["DXTD"]], "Lexi"]]]]]]]) ➞ [4666, 5394, 466, "Saskia", "DXTD", "Lexi"]
Code:
def flatten(arr):
res = []
if isinstance(arr, list):
for i in arr:
res.append(flatten(i))
else:
return arr
if isinstance(res, list):
for i in res:
x = flatten(i)
if x:
print(x)
x = flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]])
print(main)
outputs :
direction
372
one
Era
Sruth
3337
First
[]
The output above shows that my code goes through every non-list value.
Variations of Hai Vu's solutions...
Their first solution uses nested generators, meaning every value gets yielded through that stack of generators. If the structure is deeply nested, this can make the solution take quadratic instead of linear time overall. An alternative is to create a local list in the main function and have the helper function fill it. I prefer using a nested function for that, so I don't have to pass the list around and don't expose the helper function to the outside.
def flatten(nested):
flat = []
def helper(nested):
for e in nested:
if isinstance(e, list):
helper(e)
else:
flat.append(e)
helper(nested)
return flat
Benchmark with 800 integers at depth 800:
26.03 ms Hai_Vu
0.25 ms Kelly
25.62 ms Hai_Vu
0.24 ms Kelly
26.07 ms Hai_Vu
0.24 ms Kelly
Their second solution uses a "queue" (but really treats it like a "reversed" stack, extending/popping only on the left). I think an ordinary stack (using a list) is more natural and simpler:
def flatten(nested):
stack = [nested]
out = []
while stack:
e = stack.pop()
if isinstance(e, list):
stack += reversed(e)
else:
out.append(e)
return out
Pass a list to flatten, and append to it at each step:
def flatten(arr, list_):
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
test = [['a'], 'b']
output = []
flatten(test, output)
output
['a', 'b']
EDIT: If you want specifically to return the list, use
def flatten(arr, list_=None):
if list_ is None:
list_ = []
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
return list_
I would like to offer two solutions: the first uses recursion and the second uses a queue.
First solution
def flatten_helper(nested):
for e in nested:
if isinstance(e, list):
yield from flatten_helper(e)
else:
yield e
def flatten(nested):
return list(flatten_helper(nested))
The flatten_helper function is a generator, which generates a list of elements that are not a list. If an element is a list, we call flatten_helper again until we get non-list elements.
Second solution
import collections
def flatten(nested):
queue = collections.deque(nested)
out = []
while queue:
e = queue.popleft()
if isinstance(e, list):
queue.extendleft(reversed(e))
else:
out.append(e)
return out
In this solution, we loop through the nested list. If the element is a list, we place each sub element into a queue for later processing. If the element is not a list, we append it to out.
Another possibility... more on the same wavelength of Hai Vu 1st solution:
def flatter(lst):
output = []
for i in lst:
if isinstance(i, list):
output.extend(flatter(i))
else:
output.append(i)
return output
In Python 3, zip(*iterables) as of the documentation
Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted.
As an example, I am running
for x in zip(a,b):
f(x)
Is there a way to find out which of the iterables, a or b, led to the stopping of the zip iterator?
Assume that len() is not reliable and iterating over both a and b to check their lengths is not feasible.
I found the following solution which replaces zip with a for loop over only the first iterable and iterates over the second one inside the loop.
ib = iter(b)
for r in a:
try:
s = next(ib)
except StopIteration:
print('Only b exhausted.')
break
print((r,s))
else:
try:
s = next(ib)
print('Only a exhausted.')
except StopIteration:
print('a and b exhausted.')
Here ib = iter(b) makes sure that it also works if b is a sequence or generator object. print((r,s)) would be replaced by f(x) from the question.
I think Jan has the best answer. Basically, you want to handle the last iteration from zip separately.
import itertools as it
a = (x for x in range(5))
b = (x for x in range(3))
iterables = ((it.chain(g,[f"generator {i} was exhausted"]) for i,g in enumerate([a,b])))
for i, j in zip(*iterables):
print(i, j)
# 0 0
# 1 1
# 2 2
# 3 generator 1 was exhausted
If you have only two iterables, you can use the below code. The exhausted[0] will have your indicator for which iterator was exhausted. Value of None means both were exhausted.
However I must say that I do not agree with len() not being reliable. In fact, you should depend on the len() call to determine the answer. (unless you tell us the reason why you can not.)
def f(val):
print(val)
def manual_iter(a,b, exhausted):
iters = [iter(it) for it in [a,b]]
iter_map = {}
iter_map[iters[0]] = 'first'
iter_map[iters[1]] = 'second'
while 1:
values = []
for i, it in enumerate(iters):
try:
value = next(it)
except StopIteration:
if i == 0:
try:
next(iters[1])
except StopIteration:
return None
exhausted.append(iter_map[it])
return iter_map[it]
values.append(value)
yield tuple(values)
if __name__ == '__main__':
exhausted = []
a = [1,2,3]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3,4]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3]
b = [10,20,30,40]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
See below for by me written function zzip() which will do what you want to achieve. It uses the zip_longest method from the itertools module and returns a tuple with what zip would return plus a list of indices which if not empty shows at which 0-based position(s) was/were the iterable/iterables) becoming exhausted before other ones:
def zzip(*args):
""" Returns a tuple with the result of zip(*args) as list and a list
with ZERO-based indices of iterables passed to zzip which got
exhausted before other ones. """
from itertools import zip_longest
nanNANaN = 'nanNANaN'
Zipped = list(zip_longest(*args, fillvalue=nanNANaN))
ZippedT = list(zip(*Zipped))
Indx_exhausted = []
indx_nanNANaN = None
for i in range(len(args)):
try: # gives ValueError if nanNANaN is not in the column
indx_nanNANaN = ZippedT[i].index(nanNANaN)
Indx_exhausted += [(indx_nanNANaN, i)]
except ValueError:
pass
if Indx_exhausted: # list not empty, iterables were not same length
Indx_exhausted.sort()
min_indx_nanNANaN = Indx_exhausted[0][0]
Indx_exhausted = [
i for n, i in Indx_exhausted if n == min_indx_nanNANaN ]
return (Zipped[:min_indx_nanNANaN], Indx_exhausted)
else:
return (Zipped, Indx_exhausted)
assert zzip(iter([1,2,3]),[4,5],iter([6])) ==([(1,4,6)],[2])
assert zzip(iter([1,2]),[3,4,5],iter([6,7]))==([(1,3,6),(2,4,7)],[0,2])
assert zzip([1,2],[3,4],[5,6]) ==([(1,3,5),(2,4,6)],[])
The code above runs without raising an assertion error on the used test cases.
Notice that the 'for loop' in the function loops over the items of the passed parameter list and not over the elements of the passed iterables.
I was wondering that whether there's a way to refactor the following codes
first_run = True
for i in gen:
if first_run:
last_head = i[1]
last_tail = i[2]
last_chrom = i[0]
first_run = False
else:
func(i[1], last_head)
func(i[1], last_tail)
last_head = i[1]
last_tail = i[2]
last_chrom = i[0]
The essential point of your loop seems to be performing some operation on pairs of consecutive elements of the iterable. So I would look to the function pairwise whose definition is given in the itertools module documentation:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
Note that this is not an actual itertools function, you will have to copy and paste the implementation into your code. Anyway, with this function, your loop can be implemented like so:
for a, b in pairwise(gen):
func(b[1], a[1])
func(b[1], a[2])
I would remove if/ else and assign by slicing list - unless arguments of func are objects that are updated by it:
If gen is generator:
my_gen = gen
values = my_gen.next()
last_chrom, last_head, last_tail = values[:3]
for values in my_gen:
func(last_head, last_head)
func(last_head, last_tail)
last_chrom, last_head, last_tail = values[:3]
EDIT:
Just noticed my mistake
this should simplify the loop
first_run = True
for i in gen:
if first_run == False:
func(i[1], last_head)
func(i[1], last_tail)
last_head, last_tail, last_chrom = i[1], i[2], i[0]
first_run = False
updated the answer...
If your don't need the variables last_head, last_tail and last_chrom after the loop, you could take this solution:
for index, val in enumerate(gen[:1]):
func(val[1], gen[index-1][1])
func(val[1], gen[index-1][2])
it = iter(gen) # make sure we have an iterator
_, last_head, last_tail = next(it, [None]*3) # assume iterator returns 3 values
for _, head, tail in it:
func(head, last_head)
func(head, last_tail)
last_head, last_tail = head, tail
If you can't assume that the iterator returns 3 values at a time then:
it = iter(gen)
last = next(it, None)
for x in it:
func(x[1], last[1]) # head, last_head
func(x[1], last[2]) # head, last_tail
last = x
You could also use itertools' pairwise() recipe suggested by #David:
for last, x in pairwise(gen):
func(x[1], last[1]) # head, last_head
func(x[1], last[2]) # head, last_tail
My favorite way to process the "first item" in a special way is an one-time loop with break:
def gen():
for x in range(5):
yield x
def first_special(g):
for item in g:
print 'first', item
break
for item in g:
print item
first_special(gen())
# prints "first 0, 1,2,3,4
Note that this works fine with one-element or empty iterators. To make first_special work with arbitrary iterables as well, I usually add a safety iter() call to it:
def first_special(g):
g = iter(g)
for item in g:
print 'first', item
break
for item in g:
print item
Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.
yes, you want heapq.merge() which does exactly one thing; iterate over sorted iterators in order
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))
I am trying to use iterators more for looping since I heard it is faster than index looping. One thing I am not sure is about how to treat the end of the sequence nicely. The way I can think of is to use try and except StopIteration, which looks ugly to me.
To be more concrete, suppose we are asked to print the merged sorted list of two sorted lists a and b. I would write the following
aNull = False
I = iter(a)
try:
tmp = I.next()
except StopIteration:
aNull = True
for x in b:
if aNull:
print x
else:
if x < tmp:
print x
else:
print tmp,x
try:
tmp = I.next()
except StopIteration:
aNull = True
while not aNull:
print tmp
try:
tmp = I.next()
except StopIteration:
aNull = True
How would you code it to make it neater?
I think handling a and b more symmetrically would make it easier to read. Also, using the built-in next function in Python 2.6 with a default value avoids the need to handle StopIteration:
def merge(a, b):
"""Merges two iterators a and b, returning a single iterator that yields
the elements of a and b in non-decreasing order. a and b are assumed to each
yield their elements in non-decreasing order."""
done = object()
aNext = next(a, done)
bNext = next(b, done)
while (aNext is not done) or (bNext is not done):
if (bNext is done) or ((aNext is not done) and (aNext < bNext)):
yield aNext
aNext = next(a, done)
else:
yield bNext
bNext = next(b, done)
for i in merge(iter(a), iter(b)):
print i
The following function generalizes the approach to work for arbitrarily many iterators.
def merge(*iterators):
"""Merges a collection of iterators, returning a single iterator that yields
the elements of the original iterators in non-decreasing order. Each of
the original iterators is assumed to yield its elements in non-decreasing
order."""
done = object()
n = [next(it, done) for it in iterators]
while any(v is not done for v in n):
v, i = min((v, i) for (i, v) in enumerate(n) if v is not done)
yield v
n[i] = next(iterators[i], done)
You're missing the whole point of iterators. You don't manually call I.next(), you just iterate through I.
for tmp in I:
print tmp
Edited
To merge two iterators, use the very handy functions in the itertools module. The one you want is probably izip:
merged = []
for x, y in itertools.izip(a, b):
if x < y:
merged.append(x)
merged.append(y)
else:
merged.append(y)
merged.append(x)
Edit again
As pointed out in the comments, this won't actually work, because there could be multiple items from list a smaller than the next item in list b. However, I realised that there is another built-in funciton that deals with this: heapq.merge.
The function sorted works with lists and iterators. Maybe it is not what you desire, but the following code works.
a.expand(b)
print sorted(iter(a))