Using iter() with sentinel to replace while loops - python

Oftentimes the case arises where one would need to loop indefinitely until a certain condition has been attained. For example, if I want keep collecting random integers until I find a number == n, following which I break. I'd do this:
import random
rlist = []
n = ...
low, high = ..., ...
while True:
num = random.randint(low, high)
if num == n:
break
rlist.append(num)
And this works, but is quite clunky. There is a much more pythonic alternative using iter:
iter(o[, sentinel])
Return an iterator object. The first argument is
interpreted very differently depending on the presence of the second
argument. [...] If the second argument,
sentinel, is given, then o must be a callable object. The iterator
created in this case will call o with no arguments for each call to
its next() method; if the value returned is equal to sentinel,
StopIteration will be raised, otherwise the value will be returned.
The loop above can be replaced with
import random
from functools import partial
f = partial(random.randint, low, high)
rlist = list(iter(f, 10))
To extend this principle to lists that have already been created, a slight change is needed. I'll need to define a partial function like this:
f = partial(next, iter(x)) # where x is some list I want to keep taking items from until I hit a sentinel
The rest remains the same, but the main caveat with this approach versus the while loop is I cannot apply generic boolean conditions.
For example, I cannot apply a "generate numbers until the first even number greater than 1000 is encountered".
The bottom line is this: Is there another alternative to the while loop and iter that supports a callback sentinel?

If you want generic boolean conditions, then iter(object, sentinel) is insufficiently expressive for your needs. itertools.takewhile(), in contrast, seems to be more or less what you want: It takes an iterator, and cuts it off once a given predicate stops being true.
rlist = list(itertools.takewhile(lambda x: x >= 20, inputlist))
Incidentally, partial is not very Pythonic, and neither is itertools. GvR is on record as disliking higher-order functional-style programming (note the downgrading of reduce from built-in to a module member in 3.0). Attributes like "elegant" and "readable" are in the eye of the beholder, but if you're looking for Pythonic in the purest sense, you want the while loop.

Related

Efficient functional list iteration in Python

So suppose I have an array of some elements. Each element have some number of properties.
I need to filter this list from some subsets of values determined by predicates. This subsets of course can have intersections.
I also need to determine amount of values in each such subset.
So using imperative approach I could write code like that and it would have running time of 2*n. One iteration to copy array and another one to filter it count subsets sizes.
from split import import groupby
a = [{'some_number': i, 'some_time': str(i) + '0:00:00'} for i in range(10)]
# imperative style
wrong_number_count = 0
wrong_time_count = 0
for item in a[:]:
if predicate1(item):
delete_original(item, a)
wrong_number_count += 1
if predicate2(item):
delete_original(item, a)
wrong_time_count += 1
update_some_data(item)
do_something_with_filtered(a)
def do_something_with_filtered(a, c1, c2):
print('filtered a {}'.format(a))
print('{} items had wrong number'.format(c1))
print('{} items had wrong time'.format(c2))
def predicate1(x):
return x['some_number'] < 3
def predicate2(x):
return x['some_time'] < '50:00:00'
Somehow I can't think of the way to do that in Python in functional way with same running time.
So in functional style I could have used groupby multiple times probably or write a comprehension for each predicate, but that's obviously would be slower than imperative approach.
I think such thing possible in Haskell using Stream Fusion (am I right?)
But how do that in Python?
Python has a strong support to "stream processing" in the form of its iterators - and what you ask seens just trivial to do. You just have to have a way to group your predicates and attributes to it - it could be a dictionary where the predicate itself is the key.
That said, a simple iterator function that takes in your predicate data structure, along with the data to be processed could do what you want. TThe iterator would have the side effect of changing your data-structure with the predicate-information. If you want "pure functions" you'd just have to duplicate the predicate information before, and maybe passing and retrieving all predicate and counters valus to the iterator (through the send method) for each element - I donĀ“ t think it would be worth that level of purism.
That said you could have your code something along:
from collections import OrderedDict
def predicate1(...):
...
...
def preticateN(...):
...
def do_something_with_filtered(item):
...
def multifilter(data, predicates):
for item in data:
for predicate in predicates:
if predicate(item):
predicates[predicate] += 1
break
else:
yield item
def do_it(data):
predicates = OrderedDict([(predicate1, 0), ..., (predicateN, 0) ])
for item in multifilter(data, predicates):
do_something_with_filtered(item)
for predicate, value in predicates.items():
print("{} filtered out {} items".format(predicate.__name__, value)
a = ...
do_it(a)
(If you have to count an item for all predicates that it fails, then an obvious change from the "break" statement to a state flag variable is enough)
Yes, fusion in Haskell will often turn something written as two passes into a single pass. Though in the case of lists, it's actually foldr/build fusion rather than stream fusion.
That's not generally possible in languages that don't enforce purity, though. When side effects are involved, it's no longer correct to fuse multiple passes into one. What if each pass performed output? Unfused, you get all the output from each pass separately. Fused, you get the output from both passes interleaved.
It's possible to write a fusion-style framework in Python that will work correctly if you promise to only ever use it with pure functions. But I'm doubtful such a thing exists at the moment. (I'd loved to be proven wrong, though.)

python 3 median-of-3 Quicksort implementation which switches to heapsort after a recursion depth limit is met

Functions called: (regardless of class)
def partition( pivot, lst ):
less, same, more = list(), list(), list()
for val in lst:
if val < pivot:
less.append(val)
elif val > pivot:
more.append(val)
else:
same.append(val)
return less, same, more
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
finder=[]
start=lst[0]
mid=lst[len(lst)//2]
end=lst[len(lst)-1]
finder.append(start)
finder.append(mid)
finder.append(end)
finder.sort()
pivot_val=finder[1]
return pivot_val
main code
def quicheSortRec(lst,limit):
"""
A non in-place, depth limited quickSort, using median-of-3 pivot.
Once the limit drops to 0, it uses heapSort instead.
"""
if limit==0:
return heapSort.heapSort(lst)
else:
pivot=qsPivotMedian3.medianOf3(lst) # here we select the first element as the pivot
less, same, more = qsPivotMedian3.partition(pivot, lst)
return quicheSortRec(less,limit-1) + same + quicheSortRec(more,limit-1)
def quicheSort(lst):
"""
The main routine called to do the sort. It should call the
recursive routine with the correct values in order to perform
the sort
"""
N=len(lst)
end= int(math.log(N,2))
if len(lst)==0:
return list()
else:
return quicheSortRec(lst,end)
so this code is supposed to take a list and sort it using a median of 3 implementation of quicksort, up until it reaches a depth recursion limit, at which point the the function will instead use heapsort.the results of this code are then fed into another program designed to test the algorithm with lists of length 1,10,100,1000,10000,100000.
however, when I run the code, tells me that lengths 1 and 10 work fine, but after that, it gives an list index out of bounds error for
start=lst[0] in the median of 3 function
I cant figure out why.
Without seeing your test data, we're flying blind here. In general,
less, same, more = qsPivotMedian3.partition(pivot, lst)
is fine on its own, but you never check less or more to see whether they're empty. Even if they are empty, they're passed on via:
return quicheSortRec(less,limit-1) + same + quicheSortRec(more,limit-1)
and quicheSortRec() doesn't check to see whether they're empty either: it unconditionally calls:
pivot=qsPivotMedian3.medianOf3(lst)
and that function will raise the error you're seeing if it's passed an empty list. Note that your quicheSort() does check for an empty input. The recursive worker function should too.
BTW, consider this rewrite of medianOf3():
def medianOf3(lst):
return sorted([lst[0], lst[len(lst)//2], lst[-1]])[1]
This is one case where brevity is clearer ;-)

Python list slicing efficiency

In the following code:
def listSum(alist):
"""Get sum of numbers in a list recursively."""
sum = 0
if len(alist) == 1:
return alist[0]
else:
return alist[0] + listSum(alist[1:])
return sum
is a new list created every time when I do listSum(alist[1:])?
If yes, is this the recommended way or can I do something more efficient? (Not for the specific function -this serves as an example-, but rather when I want to process a specific part of a list in general.)
Edit:
Sorry if I confused anyone, I am not interested in an efficient sum implementation, this served as an example to use slicing this way.
Yes, it creates a new list every time. If you can get away with using an iterable, you can use itertools.islice, or juggle iter(list) (if you only need to skip some items at the start). But this gets messy when you need to determine if the argument is empty or only has one element - you have to use try and catch StopIteration. Edit: You could also add an extra argument that determines where to start. Unlike #marcadian's version, you should make it a default argument to not burden the caller with that and avoid bugs from the wrong index being passed in from the outside.
It's often better to not get in that sort of situation - either write your code such that you can let for deal with iteration (read: don't use recursion like that). Alternatively, if the slice is reasonably small (possibly because the list as a whole is small), bite the bullet and slice anyway - it's easier, and while slicing is linear time, the constant factor is really tiny.
I can think of some options:
use builtin function sum for this particular case
If you really need recursion (for some reason), pass in the same list to the function call and also index of current element, so you don't need to slice the list (which creates new list)
option 2 works like this:
def f_sum(alist, idx=0):
if idx >= len(alist):
return 0
return alist[idx] + f_sum(alist, idx + 1)
f_sum([1, 2, 3, 4])
a = range(5)
f_sum(a)
This is another way if you must use recursion (tail-recursive). In many other languages this is more efficient than regular recursions in terms of space complexity. Unfortunately this isn't the case in python because it does not have a built-in support for optimizing tail calls. It's a good practice to take note of if you are learning recursion nonetheless.
def helper(l, s, i):
if len(l) == 0:
return 0
elif i < len(l) - 1:
return helper(l, s + l[i], i + 1)
else:
return s + l[i]
def listSum(l):
return helper(l, 0, 0)

Inconsistent behavior of python generators

The following python code produces [(0, 0), (0, 7)...(0, 693)] instead of the expected list of tuples combining all of the multiples of 3 and multiples of 7:
multiples_of_3 = (i*3 for i in range(100))
multiples_of_7 = (i*7 for i in range(100))
list((i,j) for i in multiples_of_3 for j in multiples_of_7)
This code fixes the problem:
list((i,j) for i in (i*3 for i in range(100)) for j in (i*7 for i in range(100)))
Questions:
The generator object seems to play the role of an iterator instead of providing an iterator object each time the generated list is to be enumerated. The later strategy seems to be adopted by .Net LINQ query objects. Is there an elegant way to get around this?
How come the second piece of code works? Shall I understand that the generator's iterator is not reset after looping through all multiples of 7?
Don't you think that this behavior is counter intuitive if not inconsistent?
A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators. This behavior is not something you can change with a switch somewhere, so any work around amounts to either using an iterable (e.g. a list) instead of an generator or repeatedly constructing generators.
The second snippet does the latter. It is by definition equivalent to the loops
for i in (i*3 for i in range(100)):
for j in (i*7 for i in range(100)):
...
Hopefully it isn't surprising that here, the latter generator expression is evaluated anew on each iteration of the outer loop.
As you discovered, the object created by a generator expression is an iterator (more precisely a generator-iterator), designed to be consumed only once. If you need a resettable generator, simply create a real generator and use it in the loops:
def multiples_of_3(): # generator
for i in range(100):
yield i * 3
def multiples_of_7(): # generator
for i in range(100):
yield i * 7
list((i,j) for i in multiples_of_3() for j in multiples_of_7())
Your second code works because the expression list of the inner loop ((i*7 ...)) is evaluated on each pass of the outer loop. This results in creating a new generator-iterator each time around, which gives you the behavior you want, but at the expense of code clarity.
To understand what is going on, remember that there is no "resetting" of an iterator when the for loop iterates over it. (This is a feature; such a reset would break iterating over a large iterator in pieces, and it would be impossible for generators.) For example:
multiples_of_2 = iter(xrange(0, 100, 2)) # iterator
for i in multiples_of_2:
print i
# prints nothing because the iterator is spent
for i in multiples_of_2:
print i
...as opposed to this:
multiples_of_2 = xrange(0, 100, 2) # iterable sequence, converted to iterator
for i in multiples_of_2:
print i
# prints again because a new iterator gets created
for i in multiples_of_2:
print i
A generator expression is equivalent to an invoked generator and can therefore only be iterated over once.
The real issue as I found out is about single versus multiple pass iterables and the fact that there is currently no standard mechanism to determine if an iterable single or multi pass: See Single- vs. Multi-pass iterability
If you want to convert a generator expression to a multipass iterable, then it can be done in a fairly routine fashion. For example:
class MultiPass(object):
def __init__(self, initfunc):
self.initfunc = initfunc
def __iter__(self):
return self.initfunc()
multiples_of_3 = MultiPass(lambda: (i*3 for i in range(20)))
multiples_of_7 = MultiPass(lambda: (i*7 for i in range(20)))
print list((i,j) for i in multiples_of_3 for j in multiples_of_7)
From the point of view of defining the thing it's a similar amount of work to typing:
def multiples_of_3():
return (i*3 for i in range(20))
but from the point of view of the user, they write multiples_of_3 rather than multiples_of_3(), which means the object multiples_of_3 is polymorphic with any other iterable, such as a tuple or list.
The need to type lambda: is a bit inelegant, true. I don't suppose there would be any harm in introducing "iterable comprehensions" to the language, to give you what you want while maintaining backward compatibility. But there are only so many punctuation characters, and I doubt this would be considered worth one.

How do I reverse an itertools.chain object?

My function creates a chain of generators:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
chained = itertools.chain(some_sequence, some_other_sequence)
return chained
My function sometimes needs to return chained in reversed order. Conceptually, the following is what I would like to be able to do:
if num < 0:
return reversed(chained)
return chained
Unfortunately:
>>> reversed(chained)
TypeError: argument to reversed() must be a sequence
What are my options?
This is in some realtime graphic rendering code so I don't want to make it too complicated/slow.
EDIT:
When I first posed this question I hadn't thought about the reversibility of generators. As many have pointed out, generators can't be reversed.
I do in fact want to reverse the flattened contents of the chain; not just the order of the generators.
Based on the responses, there is no single call I can use to reverse an itertools.chain, so I think the only solution here is to use a list, at least for the reverse case, and perhaps for both.
if num < 0:
lst = list(chained)
lst.reverse()
return lst
else:
return chained
reversed() needs an actual sequence, because it iterates it backwards by index, and that wouldn't work for a generator (which only has the notion of "next" item).
Since you will need to unroll the whole generator anyway for reversing, the most efficient way is to read it to a list and reverse the list in-place with the .reverse() method.
You cannot reverse generators by definition. The interface of a generator is the iterator, which is a container that supports only forward iteration. When you want to reverse a iterator, you have to collect all it's items first and reverse them after that.
Use lists instead or generate the sequences backwards from the start.
itertools.chain would need to implement __reversed__() (this would be best) or __len__() and __getitem__()
Since it doesn't, and there's not even a way to access the internal sequences you'll need to expand the entire sequence to be able to reverse it.
reversed(list(CHAIN_INSTANCE))
It would be nice if chain would make __reversed__() available when all the sequences are reversable, but currently it does not do that. Perhaps you can write your own version of chain that does
def reversed2(iter):
return reversed(list(iter))
reversed only works on objects that support len and indexing. You have to first generate all results of a generator before wrapping reversed around them.
However, you could easily do this:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num, -1, -1))
some_other_sequence = (x*2.6 for x in range(num, -1, -1))
chained = itertools.chain(some_other_sequence, some_sequence)
return chained
Does this work in you real app?
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
list_of_chains = [some_sequence, some_other_sequence]
if num < 0:
list_of_chains.reverse()
chained = itertools.chain(*list_of_chains)
return chained
In theory you can't because chained objects may even contain infinite sequences such as itertools.count(...).
You should try to reverse your generators/sequences or use reversed(iterable) for each sequence if applicable and then chain them together last-to-first. Of course this highly depends on your use case.

Categories