Refactor the first iteration for generator in python - python

I was wondering that whether there's a way to refactor the following codes
first_run = True
for i in gen:
if first_run:
last_head = i[1]
last_tail = i[2]
last_chrom = i[0]
first_run = False
else:
func(i[1], last_head)
func(i[1], last_tail)
last_head = i[1]
last_tail = i[2]
last_chrom = i[0]

The essential point of your loop seems to be performing some operation on pairs of consecutive elements of the iterable. So I would look to the function pairwise whose definition is given in the itertools module documentation:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
Note that this is not an actual itertools function, you will have to copy and paste the implementation into your code. Anyway, with this function, your loop can be implemented like so:
for a, b in pairwise(gen):
func(b[1], a[1])
func(b[1], a[2])

I would remove if/ else and assign by slicing list - unless arguments of func are objects that are updated by it:
If gen is generator:
my_gen = gen
values = my_gen.next()
last_chrom, last_head, last_tail = values[:3]
for values in my_gen:
func(last_head, last_head)
func(last_head, last_tail)
last_chrom, last_head, last_tail = values[:3]
EDIT:
Just noticed my mistake

this should simplify the loop
first_run = True
for i in gen:
if first_run == False:
func(i[1], last_head)
func(i[1], last_tail)
last_head, last_tail, last_chrom = i[1], i[2], i[0]
first_run = False
updated the answer...

If your don't need the variables last_head, last_tail and last_chrom after the loop, you could take this solution:
for index, val in enumerate(gen[:1]):
func(val[1], gen[index-1][1])
func(val[1], gen[index-1][2])

it = iter(gen) # make sure we have an iterator
_, last_head, last_tail = next(it, [None]*3) # assume iterator returns 3 values
for _, head, tail in it:
func(head, last_head)
func(head, last_tail)
last_head, last_tail = head, tail
If you can't assume that the iterator returns 3 values at a time then:
it = iter(gen)
last = next(it, None)
for x in it:
func(x[1], last[1]) # head, last_head
func(x[1], last[2]) # head, last_tail
last = x
You could also use itertools' pairwise() recipe suggested by #David:
for last, x in pairwise(gen):
func(x[1], last[1]) # head, last_head
func(x[1], last[2]) # head, last_tail

My favorite way to process the "first item" in a special way is an one-time loop with break:
def gen():
for x in range(5):
yield x
def first_special(g):
for item in g:
print 'first', item
break
for item in g:
print item
first_special(gen())
# prints "first 0, 1,2,3,4
Note that this works fine with one-element or empty iterators. To make first_special work with arbitrary iterables as well, I usually add a safety iter() call to it:
def first_special(g):
g = iter(g)
for item in g:
print 'first', item
break
for item in g:
print item

Related

Get all elements of nested lists with recursion

~ from This Edabit Challenge ~
I need to get all the elements of nested lists and put them all in one list using recursion.My code below prints out each element, but how do I save them all to one list and return them?
It has to be kept in the scope of the function. I can't add a global list and append all of them. It works technically, but it doesn't work for the challenge I'm trying to pass.
I printed the values out (which is var x in the code) because that shows me that I'm getting close (I think). I just need a way to return the values back to my function and have it append it to the list that I will eventually return.
Examples:
flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]]) ➞ ["direction", 372, "one", "Era", "Sruth", 3337, "First"]
flatten([[4666], [5394], [466], [[["Saskia", [[[[["DXTD"]], "Lexi"]]]]]]]) ➞ [4666, 5394, 466, "Saskia", "DXTD", "Lexi"]
Code:
def flatten(arr):
res = []
if isinstance(arr, list):
for i in arr:
res.append(flatten(i))
else:
return arr
if isinstance(res, list):
for i in res:
x = flatten(i)
if x:
print(x)
x = flatten([[[[[["direction"], [372], ["one"], [[[[[["Era"]]]], "Sruth", 3337]]], "First"]]]])
print(main)
outputs :
direction
372
one
Era
Sruth
3337
First
[]
The output above shows that my code goes through every non-list value.
Variations of Hai Vu's solutions...
Their first solution uses nested generators, meaning every value gets yielded through that stack of generators. If the structure is deeply nested, this can make the solution take quadratic instead of linear time overall. An alternative is to create a local list in the main function and have the helper function fill it. I prefer using a nested function for that, so I don't have to pass the list around and don't expose the helper function to the outside.
def flatten(nested):
flat = []
def helper(nested):
for e in nested:
if isinstance(e, list):
helper(e)
else:
flat.append(e)
helper(nested)
return flat
Benchmark with 800 integers at depth 800:
26.03 ms Hai_Vu
0.25 ms Kelly
25.62 ms Hai_Vu
0.24 ms Kelly
26.07 ms Hai_Vu
0.24 ms Kelly
Their second solution uses a "queue" (but really treats it like a "reversed" stack, extending/popping only on the left). I think an ordinary stack (using a list) is more natural and simpler:
def flatten(nested):
stack = [nested]
out = []
while stack:
e = stack.pop()
if isinstance(e, list):
stack += reversed(e)
else:
out.append(e)
return out
Pass a list to flatten, and append to it at each step:
def flatten(arr, list_):
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
test = [['a'], 'b']
output = []
flatten(test, output)
output
['a', 'b']
EDIT: If you want specifically to return the list, use
def flatten(arr, list_=None):
if list_ is None:
list_ = []
if isinstance(arr, list):
for i in arr:
flatten(i, list_)
else:
list_.append(arr)
return list_
I would like to offer two solutions: the first uses recursion and the second uses a queue.
First solution
def flatten_helper(nested):
for e in nested:
if isinstance(e, list):
yield from flatten_helper(e)
else:
yield e
def flatten(nested):
return list(flatten_helper(nested))
The flatten_helper function is a generator, which generates a list of elements that are not a list. If an element is a list, we call flatten_helper again until we get non-list elements.
Second solution
import collections
def flatten(nested):
queue = collections.deque(nested)
out = []
while queue:
e = queue.popleft()
if isinstance(e, list):
queue.extendleft(reversed(e))
else:
out.append(e)
return out
In this solution, we loop through the nested list. If the element is a list, we place each sub element into a queue for later processing. If the element is not a list, we append it to out.
Another possibility... more on the same wavelength of Hai Vu 1st solution:
def flatter(lst):
output = []
for i in lst:
if isinstance(i, list):
output.extend(flatter(i))
else:
output.append(i)
return output

Python zip(): Check which iterable got exhausted

In Python 3, zip(*iterables) as of the documentation
Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted.
As an example, I am running
for x in zip(a,b):
f(x)
Is there a way to find out which of the iterables, a or b, led to the stopping of the zip iterator?
Assume that len() is not reliable and iterating over both a and b to check their lengths is not feasible.
I found the following solution which replaces zip with a for loop over only the first iterable and iterates over the second one inside the loop.
ib = iter(b)
for r in a:
try:
s = next(ib)
except StopIteration:
print('Only b exhausted.')
break
print((r,s))
else:
try:
s = next(ib)
print('Only a exhausted.')
except StopIteration:
print('a and b exhausted.')
Here ib = iter(b) makes sure that it also works if b is a sequence or generator object. print((r,s)) would be replaced by f(x) from the question.
I think Jan has the best answer. Basically, you want to handle the last iteration from zip separately.
import itertools as it
a = (x for x in range(5))
b = (x for x in range(3))
iterables = ((it.chain(g,[f"generator {i} was exhausted"]) for i,g in enumerate([a,b])))
for i, j in zip(*iterables):
print(i, j)
# 0 0
# 1 1
# 2 2
# 3 generator 1 was exhausted
If you have only two iterables, you can use the below code. The exhausted[0] will have your indicator for which iterator was exhausted. Value of None means both were exhausted.
However I must say that I do not agree with len() not being reliable. In fact, you should depend on the len() call to determine the answer. (unless you tell us the reason why you can not.)
def f(val):
print(val)
def manual_iter(a,b, exhausted):
iters = [iter(it) for it in [a,b]]
iter_map = {}
iter_map[iters[0]] = 'first'
iter_map[iters[1]] = 'second'
while 1:
values = []
for i, it in enumerate(iters):
try:
value = next(it)
except StopIteration:
if i == 0:
try:
next(iters[1])
except StopIteration:
return None
exhausted.append(iter_map[it])
return iter_map[it]
values.append(value)
yield tuple(values)
if __name__ == '__main__':
exhausted = []
a = [1,2,3]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3,4]
b = [10,20,30]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
exhausted = []
a = [1,2,3]
b = [10,20,30,40]
for x in manual_iter(a,b, exhausted):
f(x)
print(exhausted)
See below for by me written function zzip() which will do what you want to achieve. It uses the zip_longest method from the itertools module and returns a tuple with what zip would return plus a list of indices which if not empty shows at which 0-based position(s) was/were the iterable/iterables) becoming exhausted before other ones:
def zzip(*args):
""" Returns a tuple with the result of zip(*args) as list and a list
with ZERO-based indices of iterables passed to zzip which got
exhausted before other ones. """
from itertools import zip_longest
nanNANaN = 'nanNANaN'
Zipped = list(zip_longest(*args, fillvalue=nanNANaN))
ZippedT = list(zip(*Zipped))
Indx_exhausted = []
indx_nanNANaN = None
for i in range(len(args)):
try: # gives ValueError if nanNANaN is not in the column
indx_nanNANaN = ZippedT[i].index(nanNANaN)
Indx_exhausted += [(indx_nanNANaN, i)]
except ValueError:
pass
if Indx_exhausted: # list not empty, iterables were not same length
Indx_exhausted.sort()
min_indx_nanNANaN = Indx_exhausted[0][0]
Indx_exhausted = [
i for n, i in Indx_exhausted if n == min_indx_nanNANaN ]
return (Zipped[:min_indx_nanNANaN], Indx_exhausted)
else:
return (Zipped, Indx_exhausted)
assert zzip(iter([1,2,3]),[4,5],iter([6])) ==([(1,4,6)],[2])
assert zzip(iter([1,2]),[3,4,5],iter([6,7]))==([(1,3,6),(2,4,7)],[0,2])
assert zzip([1,2],[3,4],[5,6]) ==([(1,3,5),(2,4,6)],[])
The code above runs without raising an assertion error on the used test cases.
Notice that the 'for loop' in the function loops over the items of the passed parameter list and not over the elements of the passed iterables.

Python: Functions and Lists?

I have a function that, when inputting a list and a specific string in that list, removes any duplicates of that specific string from the list. (find_start and find_end are separate functions that determine the first and last position of a certain string)
def remove_duplicates(sorted_list, item):
i = 0
real_list = []
for x in range(len(sorted_list)-1):
if(sorted_list[i] == item):
a = find_start(sorted_list, item)
b = find_end(sorted_list, item)
real_list = real_list + [item]
i = i+(b-a)
else:
real_list = real_list + [sorted_list[i]]
i+=1
return real_list
So for example, remove_duplicates(['a','a','b','b','c','c'], 'a') would return ['a','b','b','c','c']
I'm trying to define another function that uses this function in it for each iteration, like so
def remove_all_duplicates(sorted_list):
i = 0
list_tru = []
for x in range(len(sorted_list)):
list_tru = remove_duplicates(sorted_list, sorted_list[i])
i+=1
return list_tru
but if I input remove_all(['a','a','b','b','c','c']), it outputs ['a','a','b','b','c']. What am I doing wrong?
def remove_all_duplicates(L):
# NOTE: this modifies L IN-PLACE. Tread carefully
i = 1
while i<len(L):
if L[i] == L[i-1]:
del(L[i])
continue
i += 1
Usage:
In [88]: L = ['a','a','b','b','c','c']
In [89]: remove_all_duplicates(L)
In [90]: L
Out[90]: ['a', 'b', 'c']
With every iteration, you just keep going back to the original sorted_list. I would recommend copying it and then operating on that copy:
def remove_all_duplicates(sorted_list):
list_tru = sorted_list[:] # copy it
for x in set(sorted_list): # just use a set
list_tru = remove_duplicates(list_tru, x) # remove this character from your list
return list_tru
I've also turned the sorted list into a set so that you don't try to remove duplicates of the same letter multiple times, and removed the unnecessary i counter.
Of course, if all you really want to do is remove the duplicates from a sorted list of strings and you're not attached to the algorithm you're developing, that's particularly simple:
new_list = sorted(set(old_list))
def remove_duplicates(sorted_list):
for item in sorted_list:
hits = sorted_list.count(item)
while hits > 1:
sorted_list.remove(item)
hits = sorted_list.count(item)
return sorted_list
print(remove_duplicates(["a","a", "b", "b"]))
this is the simplest method I could come up with on the spot uses .count to tell if there are duplicates returns ["a", "b"]
You can use this too:
A = ['a','a','b','c','c'] #example of input list with duplicates
value = remove_duplicates(A) #pass the list to the function
print value #prints ['a','b','c']
def remove_duplicates(A):
B = [] #empty list
for item in A:
if item in B:
pass
else:
B.append(item) #Append the list
return B
Hope that this helps. Have a nice day.

Yielding from sorted iterators in sorted order in Python?

Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.
yes, you want heapq.merge() which does exactly one thing; iterate over sorted iterators in order
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))

How to use Python iterators elegantly

I am trying to use iterators more for looping since I heard it is faster than index looping. One thing I am not sure is about how to treat the end of the sequence nicely. The way I can think of is to use try and except StopIteration, which looks ugly to me.
To be more concrete, suppose we are asked to print the merged sorted list of two sorted lists a and b. I would write the following
aNull = False
I = iter(a)
try:
tmp = I.next()
except StopIteration:
aNull = True
for x in b:
if aNull:
print x
else:
if x < tmp:
print x
else:
print tmp,x
try:
tmp = I.next()
except StopIteration:
aNull = True
while not aNull:
print tmp
try:
tmp = I.next()
except StopIteration:
aNull = True
How would you code it to make it neater?
I think handling a and b more symmetrically would make it easier to read. Also, using the built-in next function in Python 2.6 with a default value avoids the need to handle StopIteration:
def merge(a, b):
"""Merges two iterators a and b, returning a single iterator that yields
the elements of a and b in non-decreasing order. a and b are assumed to each
yield their elements in non-decreasing order."""
done = object()
aNext = next(a, done)
bNext = next(b, done)
while (aNext is not done) or (bNext is not done):
if (bNext is done) or ((aNext is not done) and (aNext < bNext)):
yield aNext
aNext = next(a, done)
else:
yield bNext
bNext = next(b, done)
for i in merge(iter(a), iter(b)):
print i
The following function generalizes the approach to work for arbitrarily many iterators.
def merge(*iterators):
"""Merges a collection of iterators, returning a single iterator that yields
the elements of the original iterators in non-decreasing order. Each of
the original iterators is assumed to yield its elements in non-decreasing
order."""
done = object()
n = [next(it, done) for it in iterators]
while any(v is not done for v in n):
v, i = min((v, i) for (i, v) in enumerate(n) if v is not done)
yield v
n[i] = next(iterators[i], done)
You're missing the whole point of iterators. You don't manually call I.next(), you just iterate through I.
for tmp in I:
print tmp
Edited
To merge two iterators, use the very handy functions in the itertools module. The one you want is probably izip:
merged = []
for x, y in itertools.izip(a, b):
if x < y:
merged.append(x)
merged.append(y)
else:
merged.append(y)
merged.append(x)
Edit again
As pointed out in the comments, this won't actually work, because there could be multiple items from list a smaller than the next item in list b. However, I realised that there is another built-in funciton that deals with this: heapq.merge.
The function sorted works with lists and iterators. Maybe it is not what you desire, but the following code works.
a.expand(b)
print sorted(iter(a))

Categories