Is there a way to specify the reduce() accumulator in Python?

Is there a way to specify the reduce() accumulator in Python? - python

I've been learning a lot of Haskell lately, and wanted to try out some of the neat tricks it has in Python. From what I can understand, Python's reduce automatically sets the iterative variable and the accumulator in the function passed to the first two values of the list given in reduce. In Haskell, when I use its equivalent, fold, I can specify what I want the accumulator to be. Is there a way I can do this with Python's reduce?

Quoting reduce docs, an interface is:
reduce(function, iterable[, initializer])
If the optional initializer is present, it is placed before the items
of the iterable in the calculation, and serves as a default when the
iterable is empty. If initializer is not given and iterable contains
only one item, the first item is returned.
So, an (academic) example for using initializer may be:
seq = ['s1', 's22', 's333']
len_sum_count = reduce(lambda accumulator, s: accumulator + len(s), seq, 0)
assert len_sum_count == 9

Related

Using iter() with sentinel to replace while loops

Oftentimes the case arises where one would need to loop indefinitely until a certain condition has been attained. For example, if I want keep collecting random integers until I find a number == n, following which I break. I'd do this:
import random
rlist = []
n = ...
low, high = ..., ...
while True:
num = random.randint(low, high)
if num == n:
break
rlist.append(num)
And this works, but is quite clunky. There is a much more pythonic alternative using iter:
iter(o[, sentinel])
Return an iterator object. The first argument is
interpreted very differently depending on the presence of the second
argument. [...] If the second argument,
sentinel, is given, then o must be a callable object. The iterator
created in this case will call o with no arguments for each call to
its next() method; if the value returned is equal to sentinel,
StopIteration will be raised, otherwise the value will be returned.
The loop above can be replaced with
import random
from functools import partial
f = partial(random.randint, low, high)
rlist = list(iter(f, 10))
To extend this principle to lists that have already been created, a slight change is needed. I'll need to define a partial function like this:
f = partial(next, iter(x)) # where x is some list I want to keep taking items from until I hit a sentinel
The rest remains the same, but the main caveat with this approach versus the while loop is I cannot apply generic boolean conditions.
For example, I cannot apply a "generate numbers until the first even number greater than 1000 is encountered".
The bottom line is this: Is there another alternative to the while loop and iter that supports a callback sentinel?

If you want generic boolean conditions, then iter(object, sentinel) is insufficiently expressive for your needs. itertools.takewhile(), in contrast, seems to be more or less what you want: It takes an iterator, and cuts it off once a given predicate stops being true.
rlist = list(itertools.takewhile(lambda x: x >= 20, inputlist))
Incidentally, partial is not very Pythonic, and neither is itertools. GvR is on record as disliking higher-order functional-style programming (note the downgrading of reduce from built-in to a module member in 3.0). Attributes like "elegant" and "readable" are in the eye of the beholder, but if you're looking for Pythonic in the purest sense, you want the while loop.

How to sort with incomplete ordering?

I have a list of elements to sort and a comparison function cmp(x,y) which decides if x should appear before y or after y. The catch is that some elements do not have a defined order. The cmp function returns "don't care".
Example: Input: [A,B,C,D], and C > D, B > D. Output: many correct answers, e.g. [D,C,B,A] or [A,D,B,C]. All I need is one output from all possible outputs..
I was not able to use the Python's sort for this and my solution is the old-fashioned bubble-sort to start with an empty list and insert one element at a time to the right place to keep the list sorted all the time.
Is it possible to use the built-in sort/sorted function for this purpose? What would be the key?

It's not possible to use the built-in sort for this. Instead, you need to implement a Topological Sort.

The built-in sort method requires that cmp imposes a total ordering. It doesn't work if the comparisons are inconsistent. If it returns that A < B one time it must always return that, and it must return that B > A if the arguments are reversed.
You can make your cmp implementation work if you introduce an arbitrary tiebreaker. If two elements don't have a defined order, make one up. You could return cmp(id(a), id(b)) for instance -- compare the objects by their arbitrary ID numbers.

Efficient functional list iteration in Python

So suppose I have an array of some elements. Each element have some number of properties.
I need to filter this list from some subsets of values determined by predicates. This subsets of course can have intersections.
I also need to determine amount of values in each such subset.
So using imperative approach I could write code like that and it would have running time of 2*n. One iteration to copy array and another one to filter it count subsets sizes.
from split import import groupby
a = [{'some_number': i, 'some_time': str(i) + '0:00:00'} for i in range(10)]
# imperative style
wrong_number_count = 0
wrong_time_count = 0
for item in a[:]:
if predicate1(item):
delete_original(item, a)
wrong_number_count += 1
if predicate2(item):
delete_original(item, a)
wrong_time_count += 1
update_some_data(item)
do_something_with_filtered(a)
def do_something_with_filtered(a, c1, c2):
print('filtered a {}'.format(a))
print('{} items had wrong number'.format(c1))
print('{} items had wrong time'.format(c2))
def predicate1(x):
return x['some_number'] < 3
def predicate2(x):
return x['some_time'] < '50:00:00'
Somehow I can't think of the way to do that in Python in functional way with same running time.
So in functional style I could have used groupby multiple times probably or write a comprehension for each predicate, but that's obviously would be slower than imperative approach.
I think such thing possible in Haskell using Stream Fusion (am I right?)
But how do that in Python?

Python has a strong support to "stream processing" in the form of its iterators - and what you ask seens just trivial to do. You just have to have a way to group your predicates and attributes to it - it could be a dictionary where the predicate itself is the key.
That said, a simple iterator function that takes in your predicate data structure, along with the data to be processed could do what you want. TThe iterator would have the side effect of changing your data-structure with the predicate-information. If you want "pure functions" you'd just have to duplicate the predicate information before, and maybe passing and retrieving all predicate and counters valus to the iterator (through the send method) for each element - I don´ t think it would be worth that level of purism.
That said you could have your code something along:
from collections import OrderedDict
def predicate1(...):
...
...
def preticateN(...):
...
def do_something_with_filtered(item):
...
def multifilter(data, predicates):
for item in data:
for predicate in predicates:
if predicate(item):
predicates[predicate] += 1
break
else:
yield item
def do_it(data):
predicates = OrderedDict([(predicate1, 0), ..., (predicateN, 0) ])
for item in multifilter(data, predicates):
do_something_with_filtered(item)
for predicate, value in predicates.items():
print("{} filtered out {} items".format(predicate.__name__, value)
a = ...
do_it(a)
(If you have to count an item for all predicates that it fails, then an obvious change from the "break" statement to a state flag variable is enough)

Yes, fusion in Haskell will often turn something written as two passes into a single pass. Though in the case of lists, it's actually foldr/build fusion rather than stream fusion.
That's not generally possible in languages that don't enforce purity, though. When side effects are involved, it's no longer correct to fuse multiple passes into one. What if each pass performed output? Unfused, you get all the output from each pass separately. Fused, you get the output from both passes interleaved.
It's possible to write a fusion-style framework in Python that will work correctly if you promise to only ever use it with pure functions. But I'm doubtful such a thing exists at the moment. (I'd loved to be proven wrong, though.)

Force all iterations on an iterable

I've written a for-loop using map, with a function that has a side-effect. Here's a minimal working example of what I mean:
def someFunc(t):
n, d = t
d[n] = str(n)
def main():
d = {}
map(somefunc, ((i,d) for i in range(10**3)))
print(len(d))
So it's clear that someFunc, which is mapped onto the non-negative numbers under 1000, has the side-effect of populating a dictionary, which is later used for something else.
Now, given the way that the above code has been structured, the expected output of print(len(d)) is 0, since map returns an iterator, and not a list (unlike python2.x). So if I really want to see the changes applied to d, then I would have to iterate over that map object until completion. One way I could do so is:
d = {}
for i in map(somefunc, ((i,d) for i in range(10**3))):
pass
But that doesn't seem very elegant. I could call list on the map object, but that would require O(n) memory, which is inefficient. Is there a way to force a full iteration over the map object?

You don't want to do this (run a map() just for the side effects), but there is a itertools consume recipe that applies here:
from collections import deque
deque(map(somefunc, ((i,d) for i in range(10**3))), maxlen=0)
The collections.deque() object, configured to a maximum size of 0, consumes the map() iterable with no additional memory use. The deque object is specifically optimized for this use-case.

How do I reverse an itertools.chain object?

My function creates a chain of generators:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
chained = itertools.chain(some_sequence, some_other_sequence)
return chained
My function sometimes needs to return chained in reversed order. Conceptually, the following is what I would like to be able to do:
if num < 0:
return reversed(chained)
return chained
Unfortunately:
>>> reversed(chained)
TypeError: argument to reversed() must be a sequence
What are my options?
This is in some realtime graphic rendering code so I don't want to make it too complicated/slow.
EDIT:
When I first posed this question I hadn't thought about the reversibility of generators. As many have pointed out, generators can't be reversed.
I do in fact want to reverse the flattened contents of the chain; not just the order of the generators.
Based on the responses, there is no single call I can use to reverse an itertools.chain, so I think the only solution here is to use a list, at least for the reverse case, and perhaps for both.

if num < 0:
lst = list(chained)
lst.reverse()
return lst
else:
return chained
reversed() needs an actual sequence, because it iterates it backwards by index, and that wouldn't work for a generator (which only has the notion of "next" item).
Since you will need to unroll the whole generator anyway for reversing, the most efficient way is to read it to a list and reverse the list in-place with the .reverse() method.

You cannot reverse generators by definition. The interface of a generator is the iterator, which is a container that supports only forward iteration. When you want to reverse a iterator, you have to collect all it's items first and reverse them after that.
Use lists instead or generate the sequences backwards from the start.

itertools.chain would need to implement __reversed__() (this would be best) or __len__() and __getitem__()
Since it doesn't, and there's not even a way to access the internal sequences you'll need to expand the entire sequence to be able to reverse it.
reversed(list(CHAIN_INSTANCE))
It would be nice if chain would make __reversed__() available when all the sequences are reversable, but currently it does not do that. Perhaps you can write your own version of chain that does

def reversed2(iter):
return reversed(list(iter))

reversed only works on objects that support len and indexing. You have to first generate all results of a generator before wrapping reversed around them.
However, you could easily do this:
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num, -1, -1))
some_other_sequence = (x*2.6 for x in range(num, -1, -1))
chained = itertools.chain(some_other_sequence, some_sequence)
return chained

Does this work in you real app?
def bar(num):
import itertools
some_sequence = (x*1.5 for x in range(num))
some_other_sequence = (x*2.6 for x in range(num))
list_of_chains = [some_sequence, some_other_sequence]
if num < 0:
list_of_chains.reverse()
chained = itertools.chain(*list_of_chains)
return chained

In theory you can't because chained objects may even contain infinite sequences such as itertools.count(...).
You should try to reverse your generators/sequences or use reversed(iterable) for each sequence if applicable and then chain them together last-to-first. Of course this highly depends on your use case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a way to specify the reduce() accumulator in Python? - python

Related

Using iter() with sentinel to replace while loops

How to sort with incomplete ordering?

Efficient functional list iteration in Python

Force all iterations on an iterable

How do I reverse an itertools.chain object?

Categories

Resources