itertools.accumulate() versus functools.reduce() - python

In Python 3.3, itertools.accumulate(), which normally repeatedly applies an addition operation to the supplied iterable, can now take a function argument as a parameter; this means it now overlaps with functools.reduce(). With a cursory look, the main differences between the two now would seem to be:
accumulate() defaults to summing but doesn't let you supply an extra initial condition explicitly while reduce() doesn't default to any method but does let you supply an initial condition for use with 1/0-element sequences, and
accumulate() takes the iterable first while reduce() takes the function first.
Are there any other differences between the two? Or is this just a matter of behavior of two functions with initially distinct uses beginning to converge over time?

It seems that accumulate keeps the previous results, whereas reduce (which is known as fold in other languages) does not necessarily.
e.g. list(accumulate([1,2,3], operator.add)) would return [1,3,6] whereas a plain fold would return 6
Also (just for fun, don't do this) you can define accumulate in terms of reduce
def accumulate(xs, f):
return reduce(lambda a, x: a + [f(a[-1], x)], xs[1:], [xs[0]])

You can see in the documentation what the difference is. reduce returns a single result, the sum, product, etc., of the sequence. accumulate returns an iterator over all the intermediate results. Basically, accumulate returns an iterator over the results of each step of the reduce operation.

itertools.accumulate
is like reduce but returns a generator* instead of a value. This generator can give you all the intermediate step values. So basically reduce gives you the last element of what accumulate will give you.
*A generator is like an iterator but can be iterated over only once.

Related

Different behavior between generator expression and first constructing a list then using list-converted generator

Can these two pieces of codes generate different behavior for downstream applications? In other words, are the returning objects any different ?
return (func(i) for i in a_list)
and
b_list=[]
for i in a_list:
b_list.append(func(i))
return (i for i in b_list)
PS: The second way to build generator is very questionable.
First of all, this code:
b_list=[]
for i in a_list:
b_list.append(func(i))
return (i for i in b_list)
Is entirely the same as this code:
return iter([func(i) for i in a_list])
The difference between these and this:
return (func(i) for i in a_list)
Is that the latter is lazy and the other two are eager - i.e the latter runs func every time its __next__() method is called and the other two run func for every item ia a_list immediately.
So, the answer depends whether func is a pure function - in this case, if it behaves the same regardless of when it is run and independently from external/global variables.
It also depends on whether the items of the list are mutable and if them or the list itself are changed by other code.
If the answers are respectively "yes" and "no", they are same save for their memory footprint.
Finally, as long we're discussing functional concepts, please consider using:
return map(func, a_list)
Instead of:
return (func(i) for i in a_list)
But only if you're using Python 3's lovely lazy map() and not Python 2's evil eager map().

Sorting with a custom method with additional parameter?

So I have a fitness function (returning only true or false for a given pair of arguments) which I would like to use as a key for sorting my list of possible arguments. While normally, I'd be able to do something like:
sorted(possibleArguments, key = fitnessFunction)
Here the probles is that my fitness function looks like this:
def fitnessFunction(arg1, arg2, f):
return f(*arg1) < f(*arg2)
Of course in the method I want to use the sorting, the function using which the fitness is to be calculated is known and doesn't change during the sorting but can I somehow tell Python that's the case? Can I do something like:
sorted(possibleArguments, key = fitnessFunction(one element to be compared, the other one, function I'm currently interested in))
If so, how?
key does not take a comparison function, it converts an element of the list into a comparable item.
BTW It's no longer possible to pass a comparison function to sort in python 3 (and the __cmp__ method is gone from objects too), so you better get used to it (it was cumbersome, you had to return 0 if equal, negative if lesser, positive if bigger, a bit like strcmp does, archaic. You could create complex comparison functions, but they could reveal unstable. I surely don't miss them).
Fortunately you have the f() function which is enough.
You just have to do this in your case:
sorted(possibleArguments, key = lambda x : f(*x))
the comparisons are done by the sort function. No need for fitnessFunction

map of function list and arguments: unpacking difficulty

I have an assignment in a mooc where I have to code a function that returns the cumulative sum, cumulative product, max and min of an input list.
This part of the course was about functional programming, so I wanted to go all out on this, even though I can use other ways.
So I tried this:
from operator import mul
from itertools import repeat
from functools import reduce
def reduce2(l):
print(l)
return reduce(*l)
def numbers(l):
return tuple(map(reduce2, zip([sum, mul,min, max], repeat(l,4))))
l=[1,2,3,4,5]
numbers(l)
My problem is that it doesn't work. zip will pass only one object to reduce if I use it inside map, and unpacking the zip will yield the 4 tuple of (function and argument list l) so I defined reduce2 for this reason, I wanted to unpack the zip inside it but it did not work.
Python returns a TypeError: int' object is not iterable
I thought that I could use return reduce(l[0],l[1]) in reduce2, but there is still the same Error.
I don't understand the behavior of python here.
If I merely use return reduce(l), it returns again a TypeError: reduce expected at least 2 arguments, got 1
What's happening here? How could I make it work?
Thanks for your help.
Effectively, you are trying to execute code like this:
xs = [1, 2, 3, 4, 5]
reduce(sum, xs)
But sum takes an iterable and isn't really compatible with direct use via reduce. Instead, you need a function that takes 2 arguments and returns their sum -- a function analogous to mul. You can get that from operator:
from operator import mul, add
Then just change sum to add in your program.
BTW, functional programming has a variable naming convention that is really cool: x for one thing, and xs for a list of them. It's much better than the hard-to-read l variable name. Also it uses singular/plural to tell you whether you are dealing with a scalar value or a collection.
FMc answer's correctly diagnoses the error in your code. I just want to add a couple alternatives to your map + zip approach.
For one, instead of defining a special version of reduce, you can use itertools.starmap instead of map, which is designed specifically for this purpose:
def numbers(xs):
return tuple(starmap(reduce, zip([add, mul, min, max], repeat(xs))))
However, even better would be to use the often ignored variadic version of map instead of manually zipping the arguments:
def numbers(xs):
return tuple(map(reduce, [add, mul, min, max], repeat(xs)))
It essentially does the zip + starmap for you. In terms of functional programming, this version of map is analogous to Haskell's zipWith function.

Optimizing Itertools Results in Python

I am calling itertools in python (see below). In this code, snp_dic is a dictionary with integer keys and sets as values. The goal here is to find the minimum list of keys whose union of values is a combination of unions of sets that is equivalent to the set_union. (This is equivalent to solving for a global optimum for the popular NP-hard graph theory problem set-cover for those of you interested)! The algorithm below works but the goal here is optimization.
The most obvious optimization I see has to do with itertools. Let's say for a length r, there exists a combination of r sets in snp_dic whose union = set_union. Basic probability dictates that if this combination exists and is distributed somewhere uniformly at random over the combinations, it is expected to on average only have to iterate over have the combinations to find this set-covering combination. Itertools however will return all the possible combinations, taking twice as long as the expected time of checking set_unions by checking at each iteration.
A logical solution would seem to be simply by to implement itertools.combinations() locally. Based on the "equivalent" python implementation of itertools.combinations() in the python docs however the time is approximately twice as slow because itertools.combinations calls a C level implementation rather than a python-native one.
The question (finally) is then, how can I stream the results of itertools.combinations() one by one so I can check set unions as I go along so it still runs at a near equivalent time as the python implementation of itertools.combinations(). In an answer I would appreciate if you could include the results of timing your new method to prove it runs at a similar time as the python-native implementation. Any other optimizations also appreciated.
def min_informative_helper(snp_dic, min, set_union):
union = lambda set_iterable : reduce(lambda a,b: a|b, set_iterable) #takes the union of sets
for i in range(min, len(snp_dic)):
combinations = itertools.combinations(snp_dic, i)
combinations = [{i:snp_dic[i] for i in combination} for combination in combinations]
for combination in combinations:
comb_union = union(combination.values())
if(comb_union == set_union):
return combination.keys()
itertools provides generators for the things it returns. To stream them simply use
for combo in itertools.combinations(snp_dic, i):
... remainder of your logic
The combinations method returns one new element each time you access it: one per loop iteration.

making python code block with loop faster

Is there a way I can implement the code block below using map or list comprehension or any other faster way, keeping it functionally the same?
def name_check(names, n_val):
lower_names = names.lower()
for item in set(n_val):
if item in lower_names:
return True
return False
Any help here is appreciated
A simple implementation would be
return any(character in names_lower for character in n_val)
A naive guess at the complexity would be O(K*2*N) where K is the number of characters in names and N is the number of characters in n_val. We need one "loop" for the call to lower*, one for the inner comprehension, and one for any. Since any is a built-in function and we're using a generator expression, I would expect it to be faster than your manual loop, but as always, profile to be sure.
To be clear, any short-circuits, so that behaviour is preserved
Notes on Your Implementation
On using a set: Your intuition to use a set to reduce the number of checks is a good one (you could add it to my form above, also), but it's a trade-off. In the case that the first element short circuits, the extra call to set is an additional N steps to produce the set expression. In the case where you wind up checking each item, it will save you some time. It depends on your expected inputs. If n_val was originally an iterable, you've lost that benefit and allocated all the memory up front. If you control the input to the function, why not just recommend it's called using lists that don't have duplicates (i.e., call set() on its input), and leave the function general?
* #Neopolitan pointed out that names_lower = names.lower() should be called out of the loop, as your original implementation called it, else it may (will?) be called repeatedly in the generator expression

Categories