List comprehension as substitute for reduce() in Python

List comprehension as substitute for reduce() in Python - python

The following python tutorial says that:
List comprehension is a complete substitute for the lambda function as well as the functions map(), filter() and reduce().
http://python-course.eu/python3_list_comprehension.php
However, it does not mention an example how a list comprehension can substitute a reduce() and I can't think of an example how it should be possible.
Can please someone explain how to achieve a reduce-like functionality with list comprehension or confirm that it isn't possible?

Ideally, list comprehension is to create a new list. Quoting official documentation,
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
whereas reduce is used to reduce an iterable to a single value. Quoting functools.reduce,
Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value.
So, list comprehension cannot be used as a drop-in replacement for reduce.

I was surprised at first to find that Guido van Rossum, creator of Python, was against reduce. His reasoning was that beyond summing, multiplying, and-ing, and or-ing, using reduce yields an unreadable solution that is better suited by a function which iterates through and updates an accumulator. His article on the matter is here. So no, there isn't a list comprehension alternative to reduce, instead the "pythonic" way is to implement an accumulating function the old fashioned way:
Instead of:
out = reduce((lambda x,y: x*y),[1,2,3])
Use:
def prod(myList):
out = 1
for el in myList:
out *= el
return out
Of course nothing stops you from continuing to use reduce (python 2) or functools.reduce (python 3)

List comprehensions are supposed to return lists. If your reduce is supposed to return a list, then yes, you can replace it with a list comprehension.
But this is no obstacle to providing "reduce-like functionality". Python lists can contain any object. If you'll accept your result contained in a single-item list, then there is a [...][0] list comprehension form that can replace any reduce() whatsoever.
This should be obvious, but that form is
[x for x in [reduce(function, sequence, initial)]][0]
for some binary function and and some iterable sequence and some initial value. Or, if you want the initial from the first of the iterable,
[x for x in [reduce(function, sequence)]][0]
Arguably, the above is cheating, and also pointless, since you could just use reduce without the comprehension. So let's try it without reduce.
[stack.append(function(stack.pop(), e)) or stack[0]
for stack in ([initial],)
for e in sequence][-1]
This produces a list of all the intermediate values, and we want the last one. [-1] is just as easy as [0]. We need an accumulator to reduce, but can't use assignment statements in a comprehension, hence the stack (which is just a list), but we could have used many other data structures here. The .append() always returns None, so we use or stack[0] to put the value so far in the resulting list.
It's a little more difficult without initial,
[stack.append(function(stack.pop(), e)) or stack[0]
for it in [iter(sequence)]
for stack in [[next(it)]]
for e in it][-1]
Really, you might as well use a for statement at this point.
But this takes up memory for the list of intermediate values. For a very long sequence, that might be a problem. But we can avoid that too by using generator expressions.
Doing this is tricky, so let's start with an easier example and work up to it.
stack = [initial]
[stack.append(function(stack.pop(), e)) for e in sequence]
stack.pop() # returns the answer
It computes the answer, but also creates a useless list of Nones. We can avoid that by converting it to a generator expression inside a list comprehension.
stack = [initial]
[_ for _s in (stack.append(function(stack.pop(), e)) or ()
for e in sequence)
for _ in _s]
stack.pop()
The list comprehension exhausts the generator that updates the stack, but returns an empty list itself. This is possible because the inner loop always has zero iterations, because _s is always an empty tuple.
We can move the stack.pop() inside if the last _s has one element. It doesn't matter what that element is though. So we chain on a [None] as the final _s.
from itertools import chain
stack = [initial]
[stack.pop()
for _s in chain((stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]])
for _ in _s][0]
Again, we have a single-item list comprehension. We can also implement chain as a generator expression. And you've already seen how to move the stack variable inside using a single-item list.
[stack.pop()
for stack in [[initial]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]],
]
for x in xs)
for _ in _s][0]
And we can also get the initial from the sequence for the two-argument reduce.
[stack.pop()
for it in [iter(sequence)]
for stack in [[next(it)]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in it),
[[None]],
]
for x in xs)
for _ in _s][0]
This is insane. But it works. So yes, it's possible to get "reduce-like functionality" with comprehensions. That doesn't mean you should. Seven fors is too hard!

You could accomplish something like a reduce with a comprehension by using a couple of helper functions that I've named last and cofold:
>>> last(r(a+b) for a, b, r in cofold(range(10)))
45
This is functionally equivalent to
>>> reduce(lambda a, b: a+b, range(10))
45
Note that unlike reduce() the comprehension didn't use a lambda.
The trick is to use a generator with a callback to "return" the result of the operator. cofold is the corecursive dual of the reduce (or fold) function.
_sentinel = object()
def cofold(it, initial=_sentinel):
if initial is _sentinel:
it = iter(it)
accumulator = next(it)
else:
accumulator = initial
def callback(result):
nonlocal accumulator
accumulator = result
return result
for element in it:
yield accumulator, element, callback
Here's cofold in a list comprehension.
>>> [r(a+b) for a, b, r in cofold(range(10))]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
The elements represent each step in the dual reduction. The last one is our answer. The last function is trivial.
def last(it):
for e in it:
pass
return e
Unlike reduce, cofold is a lazy generator, so it can safely act on infinite iterables when used in a generator expression.
>>> from itertools import islice, count
>>> lazy_results = (r(a+b) for a, b, r in cofold(count()))
>>> [*islice(lazy_results, 0, 9)]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
>>> next(lazy_results)
55
>>> next(lazy_results)
66

Related

What is a better pythonic version of this conditional deleting?

i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length

This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]

It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter

Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]

"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.

For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.

Does python interpreter reuse results for efficiency?

#L is a very large list
A = [x/sum(L) for x in L]
When the interpreter evaluates this, how many times will sum(L) be calculated? Just once, or once for each element?

A list comprehension executes the expression for each iteration.
sum(L) is executed for each x in L. Calculate it once outside the list comprehension:
s = sum(L)
A = [x/s for x in L]
Python has no way of knowing that the outcome of sum(L) is stable, and cannot optimize the call away for you.
sum() could be rebound to a different function that returns random values. The elements in L could implement __add__ methods that produce side effects; the built-in sum() would be calling these. L itself could implement a custom __iter__ method that alters the list in-place as you iterate, affecting both the list comprehension and the sum() call. Any of those hooks could rebind sum or give x elements a __div__ method that alters sum, etc.
In other words, Python is too dynamic to accurately predict expression outcomes.

I would opt for Martijn's approach, but thought I'd point out that you can (ab)use a lambda with a default argument and a map if you wanted to retain a "one-liner", eg:
L = range(1, 10)
A = map(lambda el, total=sum(L, 0.0): el / total, L)

Is there a 'foreach' function in Python 3?

When I meet the situation I can do it in javascript, I always think if there's an foreach function it would be convenience. By foreach I mean the function which is described below:
def foreach(fn,iterable):
for x in iterable:
fn(x)
they just do it on every element and didn't yield or return something,i think it should be a built-in function and should be more faster than writing it with pure Python, but I didn't found it on the list,or it just called another name?or I just miss some points here?
Maybe I got wrong, cause calling an function in Python cost high, definitely not a good practice for the example. Rather than an out loop, the function should do the loop in side its body looks like this below which already mentioned in many python's code suggestions:
def fn(*args):
for x in args:
dosomething
but I thought foreach is still welcome base on the two facts:
In normal cases, people just don't care about the performance
Sometime the API didn't accept iterable object and you can't rewrite its source.

Every occurence of "foreach" I've seen (PHP, C#, ...) does basically the same as pythons "for" statement.
These are more or less equivalent:
// PHP:
foreach ($array as $val) {
print($val);
}
// C#
foreach (String val in array) {
console.writeline(val);
}
// Python
for val in array:
print(val)
So, yes, there is a "foreach" in python. It's called "for".
What you're describing is an "array map" function. This could be done with list comprehensions in python:
names = ['tom', 'john', 'simon']
namesCapitalized = [capitalize(n) for n in names]

Python doesn't have a foreach statement per se. It has for loops built into the language.
for element in iterable:
operate(element)
If you really wanted to, you could define your own foreach function:
def foreach(function, iterable):
for element in iterable:
function(element)
As a side note the for element in iterable syntax comes from the ABC programming language, one of Python's influences.

Other examples:
Python Foreach Loop:
array = ['a', 'b']
for value in array:
print(value)
# a
# b
Python For Loop:
array = ['a', 'b']
for index in range(len(array)):
print("index: %s | value: %s" % (index, array[index]))
# index: 0 | value: a
# index: 1 | value: b

map can be used for the situation mentioned in the question.
E.g.
map(len, ['abcd','abc', 'a']) # 4 3 1
For functions that take multiple arguments, more arguments can be given to map:
map(pow, [2, 3], [4,2]) # 16 9
It returns a list in python 2.x and an iterator in python 3
In case your function takes multiple arguments and the arguments are already in the form of tuples (or any iterable since python 2.6) you can use itertools.starmap. (which has a very similar syntax to what you were looking for). It returns an iterator.
E.g.
for num in starmap(pow, [(2,3), (3,2)]):
print(num)
gives us 8 and 9

The correct answer is "python collections do not have a foreach". In native python we need to resort to the external for _element_ in _collection_ syntax which is not what the OP is after.
Python is in general quite weak for functionals programming. There are a few libraries to mitigate a bit. I helped author one of these infixpy
pip install infixpy https://pypi.org/project/infixpy/
from infixpy import Seq
(Seq([1,2,3]).foreach(lambda x: print(x)))
1
2
3
Also see: Left to right application of operations on a list in Python 3

Here is the example of the "foreach" construction with simultaneous access to the element indexes in Python:
for idx, val in enumerate([3, 4, 5]):
print (idx, val)

Yes, although it uses the same syntax as a for loop.
for x in ['a', 'b']: print(x)

This does the foreach in python 3
test = [0,1,2,3,4,5,6,7,8,"test"]
for fetch in test:
print(fetch)

Look at this article. The iterator object nditer from numpy package, introduced in NumPy 1.6, provides many flexible ways to visit all the elements of one or more arrays in a systematic fashion.
Example:
import random
import numpy as np
ptrs = np.int32([[0, 0], [400, 0], [0, 400], [400, 400]])
for ptr in np.nditer(ptrs, op_flags=['readwrite']):
# apply random shift on 1 for each element of the matrix
ptr += random.choice([-1, 1])
print(ptrs)
d:\>python nditer.py
[[ -1 1]
[399 -1]
[ 1 399]
[399 401]]

If I understood you right, you mean that if you have a function 'func', you want to check for each item in list if func(item) returns true; if you get true for all, then do something.
You can use 'all'.
For example: I want to get all prime numbers in range 0-10 in a list:
from math import sqrt
primes = [x for x in range(10) if x > 2 and all(x % i !=0 for i in range(2, int(sqrt(x)) + 1))]

If you really want you can do this:
[fn(x) for x in iterable]
But the point of the list comprehension is to create a list - using it for the side effect alone is poor style. The for loop is also less typing
for x in iterable: fn(x)

I know this is an old thread but I had a similar question when trying to do a codewars exercise.
I came up with a solution which nests loops, I believe this solution applies to the question, it replicates a working "for each (x) doThing" statement in most scenarios:
for elements in array:
while elements in array:
array.func()

If you're just looking for a more concise syntax you can put the for loop on one line:
array = ['a', 'b']
for value in array: print(value)
Just separate additional statements with a semicolon.
array = ['a', 'b']
for value in array: print(value); print('hello')
This may not conform to your local style guide, but it could make sense to do it like this when you're playing around in the console.

In short, the functional programming way to do this is:
def do_and_return_fn(og_fn: Callable[[T], None]):
def do_and_return(item: T) -> T:
og_fn(item)
return item
return do_and_return
# where og_fn is the fn referred to by the question.
# i.e. a function that does something on each element, but returns nothing.
iterable = map(do_and_return_fn(og_fn), iterable)
All of the answers that say "for" loops are the same as "foreach" functions are neglecting the point that other similar functions that operate on iters in python such as map, filter, and others in itertools are lazily evaluated.
Suppose, I have an iterable of dictionaries coming from my database and I want to pop an item off of each dictionary element when the iterator is iterated over. I can't use map because pop returns the item popped, not the original dictionary.
The approach I gave above would allow me to achieve this if I pass lambda x: x.pop() as my og_fn,
What would be nice is if python had a built-in lazy function with an interface like I constructed:
foreach(do_fn: Callable[[T], None], iterable: Iterable)
Implemented with the function given before, it would look like:
def foreach(do_fn: Callable[[T], None], iterable: Iterable[T]) -> Iterable[T]:
return map(do_and_return_fn(do_fn), iterable)
# being called with my db code.
# Lazily removes the INSERTED_ON_SEC_FIELD on every element:
doc_iter = foreach(lambda x: x.pop(INSERTED_ON_SEC_FIELD, None), doc_iter)

No there is no from functools import foreach support in python. However, you can just implement in the same number of lines as the import takes, anyway:
foreach = lambda f, iterable: (*map(f, iterable),)
Bonus:
variadic support: foreach = lambda f, iterable, *args: (*map(f, iterable, *args),) and you can be more efficient by avoiding constructing the tuple of Nones

When to drop list Comprehension and the Pythonic way?

I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr

Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.

I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?

You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]

Python `for` syntax: block code vs single line generator expressions

I'm familiar with the for loop in a block-code context. eg:
for c in "word":
print c
I just came across some examples that use for differently. Rather than beginning with the for statement, they tag it at the end of an expression (and don't involve an indented code-block). eg:
sum(x*x for x in range(10))
Can anyone point me to some documentation that outlines this use of for? I've been able to find examples, but not explanations. All the for documentation I've been able to find describes the previous use (block-code example). I'm not even sure what to call this use, so I apologize if my question's title is unclear.

What you are pointing to is Generator in Python. Take a look at: -
http://wiki.python.org/moin/Generators
http://www.python.org/dev/peps/pep-0255/
http://docs.python.org/whatsnew/2.5.html#pep-342-new-generator-features
See the documentation: - Generator Expression which contains exactly the same example you have posted
From the documentation: -
Generators are a simple and powerful tool for creating iterators. They
are written like regular functions but use the yield statement
whenever they want to return data. Each time next() is called, the
generator resumes where it left-off (it remembers all the data values
and which statement was last executed)
Generators are similar to List Comprehension that you use with square brackets instead of brackets, but they are more memory efficient. They don't return the complete list of result at the same time, but they return generator object. Whenever you invoke next() on the generator object, the generator uses yield to return the next value.
List Comprehension for the above code would look like: -
[x * x for x in range(10)]
You can also add conditions to filter out results at the end of the for.
[x * x for x in range(10) if x % 2 != 0]
This will return a list of numbers multiplied by 2 in the range 1 to 5, if the number is not divisible by 2.
An example of Generators depicting the use of yield can be: -
def city_generator():
yield("Konstanz")
yield("Zurich")
yield("Schaffhausen")
yield("Stuttgart")
>>> x = city_generator()
>>> x.next()
Konstanz
>>> x.next()
Zurich
>>> x.next()
Schaffhausen
>>> x.next()
Stuttgart
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So, you see that, every call to next() executes the next yield() in generator. and at the end it throws StopIteration.

Those are generator expressions and they are related to list comprehensions
List comprehensions allow for the easy creation of lists. For example, if you wanted to create a list of perfect squares you could do this:
>>> squares = []
>>> for x in range(10):
... squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
But instead you could use a list comprehension:
squares = [x**2 for x in range(10)]
Generator expressions are like list comprehensions, except they return a generator object instead of a list. You can iterate over this generator object in a similar manner to list comprehensions, but you don't have to store the whole list in memory at once, as you would if you created the list in a list comprehension.

Documentation for generator expressions is here https://www.python.org/dev/peps/pep-0289/
Following is the code using generator expression .
list(x**2 for x in range(0,10))

Your specific example is called a generator expression. List comprehensions, dictionary comprehensions, and set comprehensions are similar in meaning (different result types, and generator expressions are lazy) and have the same syntax, modulo being inside other kinds of brackets, and in the case of a dict comprehension having expr1: expr2 instead of a single expression (x*x in your example).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

List comprehension as substitute for reduce() in Python - python

Related

What is a better pythonic version of this conditional deleting?

Does python interpreter reuse results for efficiency?

Is there a 'foreach' function in Python 3?

When to drop list Comprehension and the Pythonic way?

Python `for` syntax: block code vs single line generator expressions

Categories

Resources