What is the most pythonic way to execute a full generator comprehension where you don't care about the return values and instead the operations are purely side-effect-based?
An example would be splitting a list based on a predicate value as discussed here. It's natural to think of writing a generator comprehension
split_me = [0, 1, 2, None, 3, '']
a, b = [], []
gen_comp = (a.append(v) if v else b.append(v) for v in split_me)
In this case the best solution I can come up with is to use any
any(gen_comp)
However that's not immediately obvious what's happening for someone who hasn't seen this pattern. Is there a better way to cycle through that full comprehension without holding all the return values in memory?
You do so by not using a generator expression.
Just write a proper loop:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
or perhaps:
for v in split_me:
target = a if v else b
target.append(v)
Using a generator expression here is pointless if you are going to execute the generator immediately anyway. Why produce an object plus a sequence of None return values when all you wanted was to append values to two other lists?
Using an explicit loop is both more comprehensible for future maintainers of the code (including you) and more efficient.
itertools has this consume recipe
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
in your case n is None, so:
collections.deque(iterator, maxlen=0)
Which is interesting, but also a lot of machinery for a simple task
Most people would just use a for loop
As others have said, don't use comprehensions just for side-effects.
Here's a nice way to do what you're actually trying to do using the partition() recipe from itertools:
try: # Python 3
from itertools import filterfalse
except ImportError: # Python 2
from itertools import ifilterfalse as filterfalse
from itertools import ifilter as filter
from itertools import tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# From itertools recipes:
# https://docs.python.org/3/library/itertools.html#itertools-recipes
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
split_me = [0, 1, 2, None, 3, '']
trueish, falseish = partition(lambda x: x, split_me)
# You can iterate directly over trueish and falseish,
# or you can put them into lists
trueish_list = list(trueish)
falseish_list = list(falseish)
print(trueish_list)
print(falseish_list)
Output:
[0, None, '']
[1, 2, 3]
There's nothing non-pythonic in writing things on many lines and make use of if-statements:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
If you want a one-liner you could do so by putting the loop on one line anyway:
for v in split_me: a.append(v) if v else b.append(v)
If you want it in an expression (which still beats me why you want unless you have a value you want to get out of it) you could use list comprehension to force looping:
[x for x in (a.append(v) if v else b.append(v) for v in split_me) if False]
Which solution do you think best shows what you're doing? I'd say the first solution. To be pythonic you should probably consider the zen of python, especially:
Readability counts.
If the implementation is hard to explain, it's a bad idea.
Just to throw in another reason why using any() to consume a generator is a horrible idea, you need to remember that any() and all() are guaranteed to do short-circuit evaluation which means that if the generator ever returns a True value then all() will early-out on you and leave your generator incompletely consumed.
This is adding an extra conditional test / stop condition that you A) probably don't want, and B) may be far away from where the generator is created.
Many standard library functions return None so you could get away with all() for a while until suddenly it's not doing what you expect, and you might stare at that code for a long time before it occurs to you if you've gotten into the habit of using all() in this way.
If you must do something like this, then itertools.consume() is really the only reasonable way to do it I think.
any is short, but is not a general solution. Something which works for any generator is the straightforward
for _ in gen_comp: pass
which is also shorter and more efficient than a generally working any method,
any(None for _ in gen_comp)
so the for loop is really the clearest and best. Its only downside is that it cannot be used in expressions.
Related
i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length
This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]
It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter
Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]
"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.
For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.
The following python tutorial says that:
List comprehension is a complete substitute for the lambda function as well as the functions map(), filter() and reduce().
http://python-course.eu/python3_list_comprehension.php
However, it does not mention an example how a list comprehension can substitute a reduce() and I can't think of an example how it should be possible.
Can please someone explain how to achieve a reduce-like functionality with list comprehension or confirm that it isn't possible?
Ideally, list comprehension is to create a new list. Quoting official documentation,
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
whereas reduce is used to reduce an iterable to a single value. Quoting functools.reduce,
Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value.
So, list comprehension cannot be used as a drop-in replacement for reduce.
I was surprised at first to find that Guido van Rossum, creator of Python, was against reduce. His reasoning was that beyond summing, multiplying, and-ing, and or-ing, using reduce yields an unreadable solution that is better suited by a function which iterates through and updates an accumulator. His article on the matter is here. So no, there isn't a list comprehension alternative to reduce, instead the "pythonic" way is to implement an accumulating function the old fashioned way:
Instead of:
out = reduce((lambda x,y: x*y),[1,2,3])
Use:
def prod(myList):
out = 1
for el in myList:
out *= el
return out
Of course nothing stops you from continuing to use reduce (python 2) or functools.reduce (python 3)
List comprehensions are supposed to return lists. If your reduce is supposed to return a list, then yes, you can replace it with a list comprehension.
But this is no obstacle to providing "reduce-like functionality". Python lists can contain any object. If you'll accept your result contained in a single-item list, then there is a [...][0] list comprehension form that can replace any reduce() whatsoever.
This should be obvious, but that form is
[x for x in [reduce(function, sequence, initial)]][0]
for some binary function and and some iterable sequence and some initial value. Or, if you want the initial from the first of the iterable,
[x for x in [reduce(function, sequence)]][0]
Arguably, the above is cheating, and also pointless, since you could just use reduce without the comprehension. So let's try it without reduce.
[stack.append(function(stack.pop(), e)) or stack[0]
for stack in ([initial],)
for e in sequence][-1]
This produces a list of all the intermediate values, and we want the last one. [-1] is just as easy as [0]. We need an accumulator to reduce, but can't use assignment statements in a comprehension, hence the stack (which is just a list), but we could have used many other data structures here. The .append() always returns None, so we use or stack[0] to put the value so far in the resulting list.
It's a little more difficult without initial,
[stack.append(function(stack.pop(), e)) or stack[0]
for it in [iter(sequence)]
for stack in [[next(it)]]
for e in it][-1]
Really, you might as well use a for statement at this point.
But this takes up memory for the list of intermediate values. For a very long sequence, that might be a problem. But we can avoid that too by using generator expressions.
Doing this is tricky, so let's start with an easier example and work up to it.
stack = [initial]
[stack.append(function(stack.pop(), e)) for e in sequence]
stack.pop() # returns the answer
It computes the answer, but also creates a useless list of Nones. We can avoid that by converting it to a generator expression inside a list comprehension.
stack = [initial]
[_ for _s in (stack.append(function(stack.pop(), e)) or ()
for e in sequence)
for _ in _s]
stack.pop()
The list comprehension exhausts the generator that updates the stack, but returns an empty list itself. This is possible because the inner loop always has zero iterations, because _s is always an empty tuple.
We can move the stack.pop() inside if the last _s has one element. It doesn't matter what that element is though. So we chain on a [None] as the final _s.
from itertools import chain
stack = [initial]
[stack.pop()
for _s in chain((stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]])
for _ in _s][0]
Again, we have a single-item list comprehension. We can also implement chain as a generator expression. And you've already seen how to move the stack variable inside using a single-item list.
[stack.pop()
for stack in [[initial]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]],
]
for x in xs)
for _ in _s][0]
And we can also get the initial from the sequence for the two-argument reduce.
[stack.pop()
for it in [iter(sequence)]
for stack in [[next(it)]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in it),
[[None]],
]
for x in xs)
for _ in _s][0]
This is insane. But it works. So yes, it's possible to get "reduce-like functionality" with comprehensions. That doesn't mean you should. Seven fors is too hard!
You could accomplish something like a reduce with a comprehension by using a couple of helper functions that I've named last and cofold:
>>> last(r(a+b) for a, b, r in cofold(range(10)))
45
This is functionally equivalent to
>>> reduce(lambda a, b: a+b, range(10))
45
Note that unlike reduce() the comprehension didn't use a lambda.
The trick is to use a generator with a callback to "return" the result of the operator. cofold is the corecursive dual of the reduce (or fold) function.
_sentinel = object()
def cofold(it, initial=_sentinel):
if initial is _sentinel:
it = iter(it)
accumulator = next(it)
else:
accumulator = initial
def callback(result):
nonlocal accumulator
accumulator = result
return result
for element in it:
yield accumulator, element, callback
Here's cofold in a list comprehension.
>>> [r(a+b) for a, b, r in cofold(range(10))]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
The elements represent each step in the dual reduction. The last one is our answer. The last function is trivial.
def last(it):
for e in it:
pass
return e
Unlike reduce, cofold is a lazy generator, so it can safely act on infinite iterables when used in a generator expression.
>>> from itertools import islice, count
>>> lazy_results = (r(a+b) for a, b, r in cofold(count()))
>>> [*islice(lazy_results, 0, 9)]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
>>> next(lazy_results)
55
>>> next(lazy_results)
66
I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]
I am using generators to perform searches in lists like this simple example:
>>> a = [1,2,3,4]
>>> (i for i, v in enumerate(a) if v == 4).next()
3
(Just to frame the example a bit, I am using very much longer lists compared to the one above, and the entries are a little bit more complicated than int. I do it this way so the entire lists won't be traversed each time I search them)
Now if I would instead change that to i == 666, it would return a StopIteration because it can't find any 666 entry in a.
How can I make it return None instead? I could of course wrap it in a try ... except clause, but is there a more pythonic way to do it?
If you are using Python 2.6+ you should use the next built-in function, not the next method (which was replaced with __next__ in 3.x). The next built-in takes an optional default argument to return if the iterator is exhausted, instead of raising StopIteration:
next((i for i, v in enumerate(a) if i == 666), None)
You can chain the generator with (None,):
from itertools import chain
a = [1,2,3,4]
print chain((i for i, v in enumerate(a) if v == 6), (None,)).next()
but I think a.index(2) will not traverse the full list, when 2 is found, the search is finished. you can test this:
>>> timeit.timeit("a.index(0)", "a=range(10)")
0.19335955439601094
>>> timeit.timeit("a.index(99)", "a=range(100)")
2.1938486138533335
I'm wondering if there's a reason that there's no first(iterable) in the Python built-in functions, somewhat similar to any(iterable) and all(iterable) (it may be tucked in a stdlib module somewhere, but I don't see it in itertools). first would perform a short-circuit generator evaluation so that unnecessary (and a potentially infinite number of) operations can be avoided; i.e.
def identity(item):
return item
def first(iterable, predicate=identity):
for item in iterable:
if predicate(item):
return item
raise ValueError('No satisfactory value found')
This way you can express things like:
denominators = (2, 3, 4, 5)
lcd = first(i for i in itertools.count(1)
if all(i % denominators == 0 for denominator in denominators))
Clearly you can't do list(generator)[0] in that case, since the generator doesn't terminate.
Or if you have a bunch of regexes to match against (useful when they all have the same groupdict interface):
match = first(regex.match(big_text) for regex in regexes)
You save a lot of unnecessary processing by avoiding list(generator)[0] and short-circuiting on a positive match.
In Python 2, if you have an iterator, you can just call its next method. Something like:
>>> (5*x for x in xrange(2,4)).next()
10
In Python 3, you can use the next built-in with an iterator:
>>> next(5*x for x in range(2,4))
10
There's a Pypi package called “first” that does this:
>>> from first import first
>>> first([0, None, False, [], (), 42])
42
Here's how you would use to return the first odd number, for example:
>> first([2, 14, 7, 41, 53], key=lambda x: x % 2 == 1)
7
If you just want to return the first element from the iterator regardless of whether is true or not, do this:
>>> first([0, None, False, [], (), 42], key=lambda x: True)
0
It's a very small package: it only contains this function, it has no dependencies, and it works on Python 2 and 3. It's a single file, so you don't even have to install it to use it.
In fact, here's almost the entire source code (from version 2.0.1, by Hynek Schlawack, released under the MIT licence):
def first(iterable, default=None, key=None):
if key is None:
for el in iterable:
if el:
return el
else:
for el in iterable:
if key(el):
return el
return default
I asked a similar question recently (it got marked as a duplicate of this question by now). My concern also was that I'd liked to use built-ins only to solve the problem of finding the first true value of a generator. My own solution then was this:
x = next((v for v in (f(x) for x in a) if v), False)
For the example of finding the first regexp match (not the first matching pattern!) this would look like this:
patterns = [ r'\d+', r'\s+', r'\w+', r'.*' ]
text = 'abc'
firstMatch = next(
(match for match in
(re.match(pattern, text) for pattern in patterns)
if match),
False)
It does not evaluate the predicate twice (as you would have to do if just the pattern was returned) and it does not use hacks like locals in comprehensions.
But it has two generators nested where the logic would dictate to use just one. So a better solution would be nice.
There is a "slice" iterator in itertools. It emulates the slice operations that we're familiar with in python. What you're looking for is something similar to this:
myList = [0,1,2,3,4,5]
firstValue = myList[:1]
The equivalent using itertools for iterators:
from itertools import islice
def MyGenFunc():
for i in range(5):
yield i
mygen = MyGenFunc()
firstValue = islice(mygen, 0, 1)
print firstValue
There's some ambiguity in your question. Your definition of first and the regex example imply that there is a boolean test. But the denominators example explicitly has an if clause; so it's only a coincidence that each integer happens to be true.
It looks like the combination of next and itertools.ifilter will give you what you want.
match = next(itertools.ifilter(None, (regex.match(big_text) for regex in regexes)))
Haskell makes use of what you just described, as the function take (or as the partial function take 1, technically). Python Cookbook has generator-wrappers written that perform the same functionality as take, takeWhile, and drop in Haskell.
But as to why that's not a built-in, your guess is as good as mine.