What is a better pythonic version of this conditional deleting? - python

i am refreshing my python (2.7) and i am discovering iterators and generators.
As i understood, they are an efficient way of navigating over values without consuming too much memory.
So the following code do some kind of logical indexing on a list:
removing the values of a list L that triggers a False conditional statement represented here by the function f.
I am not satisfied with my code because I feel this code is not optimal for three reasons:
I read somewhere that it is better to use a for loop than a while loop.
However, in the usual for i in range(10), i can't modify the value of 'i' because it seems that the iteration doesn't care.
Logical indexing is pretty strong in matrix-oriented languages, and there should be a way to do the same in python (by hand granted, but maybe better than my code).
Third reason is just that i want to use generator/iterator on this example to help me understand.
Third reason is just that i want to use generator/iterator on this example to help me understand.
TL;DR : Is this code a good pythonic way to do logical indexing ?
#f string -> bool
def f(s):
return 'c' in s
L=['','a','ab','abc','abcd','abcde','abde'] #example
length=len(L)
i=0
while i < length:
if not f(L[i]): #f is a conditional statement (input string output bool)
del L[i]
length-=1 #cut and push leftwise
else:
i+=1
print 'Updated list is :', L
print length

This code has a few problems, but the main one is that you must never modify a list you're iterating over. Rather, you create a new list from the elements that match your condition. This can be done simply in a for loop:
newlist = []
for item in L:
if f(item):
newlist.append(item)
which can be shortened to a simple list comprehension:
newlist = [item for item in L if f(item)]

It looks like filter() is what you're after:
newlist = filter(lambda x: not f(x), L)
filter() filters (...) an iterable and only keeps the items for which a predicate returns True. In your case f(..) is not quite the predicate but not f(...).
Simpler:
def f(s):
return 'c' not in s
newlist = filter(f, L)
See: https://docs.python.org/2/library/functions.html#filter

Never modify a list with del, pop or other methods that mutate the length of the list while iterating over it. Read this for more information.
The "pythonic" way to filter a list is to use reassignment and either a list comprehension or the built-in filter function:
List comprehension:
>>> [item for item in L if f(item)]
['abc', 'abcd', 'abcde']
i want to use generator/iterator on this example to help me understand
The for item in L part is implicitly making use of the iterator protocol. Python lists are iterable, and iter(somelist) returns an iterator .
>>> from collections import Iterable, Iterator
>>> isinstance([], Iterable)
True
>>> isinstance([], Iterator)
False
>>> isinstance(iter([]), Iterator)
True
__iter__ is not only being called when using a traditional for-loop, but also when you use a list comprehension:
>>> class mylist(list):
... def __iter__(self):
... print('iter has been called')
... return super(mylist, self).__iter__()
...
>>> m = mylist([1,2,3])
>>> [x for x in m]
iter has been called
[1, 2, 3]
Filtering:
>>> filter(f, L)
['abc', 'abcd', 'abcde']
In Python3, use list(filter(f, L)) to get a list.
Of course, to filter a list, Python needs to iterate over it, too:
>>> filter(None, mylist())
iter has been called
[]

"The python way" to do it would be to use a generator expression:
# list comprehension
L = [l for l in L if f(l)]
# alternative generator comprehension
L = (l for l in L if f(l))
It depends on your context if a list or a generator is "better" (see e.g. this so question). Because your source data is coming from a list, there is no real benefit of using a generator here.

For simply deleting elements, especially if the original list is no longer needed, just iterate backwards:
Python 2.x:
for i in xrange(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
Python 3.x:
for i in range(len(L) - 1, -1, -1):
if not f(L[i]):
del L[i]
By iterating from the end, the "next" index does not change after deletion and a for loop is possible. Note that you should use the xrange generator in Python 2, or the range generator in Python 3, to save memory*.
In cases where you must iterate forward, use your given solution above.
*Note that Python 2's xrange will break if there are >= 2 ** 32 - 1 elements. Python 3's range, as well as the less efficient Python 2's range do not have this limitation.

Related

List comprehension as substitute for reduce() in Python

The following python tutorial says that:
List comprehension is a complete substitute for the lambda function as well as the functions map(), filter() and reduce().
http://python-course.eu/python3_list_comprehension.php
However, it does not mention an example how a list comprehension can substitute a reduce() and I can't think of an example how it should be possible.
Can please someone explain how to achieve a reduce-like functionality with list comprehension or confirm that it isn't possible?
Ideally, list comprehension is to create a new list. Quoting official documentation,
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
whereas reduce is used to reduce an iterable to a single value. Quoting functools.reduce,
Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value.
So, list comprehension cannot be used as a drop-in replacement for reduce.
I was surprised at first to find that Guido van Rossum, creator of Python, was against reduce. His reasoning was that beyond summing, multiplying, and-ing, and or-ing, using reduce yields an unreadable solution that is better suited by a function which iterates through and updates an accumulator. His article on the matter is here. So no, there isn't a list comprehension alternative to reduce, instead the "pythonic" way is to implement an accumulating function the old fashioned way:
Instead of:
out = reduce((lambda x,y: x*y),[1,2,3])
Use:
def prod(myList):
out = 1
for el in myList:
out *= el
return out
Of course nothing stops you from continuing to use reduce (python 2) or functools.reduce (python 3)
List comprehensions are supposed to return lists. If your reduce is supposed to return a list, then yes, you can replace it with a list comprehension.
But this is no obstacle to providing "reduce-like functionality". Python lists can contain any object. If you'll accept your result contained in a single-item list, then there is a [...][0] list comprehension form that can replace any reduce() whatsoever.
This should be obvious, but that form is
[x for x in [reduce(function, sequence, initial)]][0]
for some binary function and and some iterable sequence and some initial value. Or, if you want the initial from the first of the iterable,
[x for x in [reduce(function, sequence)]][0]
Arguably, the above is cheating, and also pointless, since you could just use reduce without the comprehension. So let's try it without reduce.
[stack.append(function(stack.pop(), e)) or stack[0]
for stack in ([initial],)
for e in sequence][-1]
This produces a list of all the intermediate values, and we want the last one. [-1] is just as easy as [0]. We need an accumulator to reduce, but can't use assignment statements in a comprehension, hence the stack (which is just a list), but we could have used many other data structures here. The .append() always returns None, so we use or stack[0] to put the value so far in the resulting list.
It's a little more difficult without initial,
[stack.append(function(stack.pop(), e)) or stack[0]
for it in [iter(sequence)]
for stack in [[next(it)]]
for e in it][-1]
Really, you might as well use a for statement at this point.
But this takes up memory for the list of intermediate values. For a very long sequence, that might be a problem. But we can avoid that too by using generator expressions.
Doing this is tricky, so let's start with an easier example and work up to it.
stack = [initial]
[stack.append(function(stack.pop(), e)) for e in sequence]
stack.pop() # returns the answer
It computes the answer, but also creates a useless list of Nones. We can avoid that by converting it to a generator expression inside a list comprehension.
stack = [initial]
[_ for _s in (stack.append(function(stack.pop(), e)) or ()
for e in sequence)
for _ in _s]
stack.pop()
The list comprehension exhausts the generator that updates the stack, but returns an empty list itself. This is possible because the inner loop always has zero iterations, because _s is always an empty tuple.
We can move the stack.pop() inside if the last _s has one element. It doesn't matter what that element is though. So we chain on a [None] as the final _s.
from itertools import chain
stack = [initial]
[stack.pop()
for _s in chain((stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]])
for _ in _s][0]
Again, we have a single-item list comprehension. We can also implement chain as a generator expression. And you've already seen how to move the stack variable inside using a single-item list.
[stack.pop()
for stack in [[initial]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in sequence),
[[None]],
]
for x in xs)
for _ in _s][0]
And we can also get the initial from the sequence for the two-argument reduce.
[stack.pop()
for it in [iter(sequence)]
for stack in [[next(it)]]
for _s in (
x
for xs in [
(stack.append(function(stack.pop(), e)) or ()
for e in it),
[[None]],
]
for x in xs)
for _ in _s][0]
This is insane. But it works. So yes, it's possible to get "reduce-like functionality" with comprehensions. That doesn't mean you should. Seven fors is too hard!
You could accomplish something like a reduce with a comprehension by using a couple of helper functions that I've named last and cofold:
>>> last(r(a+b) for a, b, r in cofold(range(10)))
45
This is functionally equivalent to
>>> reduce(lambda a, b: a+b, range(10))
45
Note that unlike reduce() the comprehension didn't use a lambda.
The trick is to use a generator with a callback to "return" the result of the operator. cofold is the corecursive dual of the reduce (or fold) function.
_sentinel = object()
def cofold(it, initial=_sentinel):
if initial is _sentinel:
it = iter(it)
accumulator = next(it)
else:
accumulator = initial
def callback(result):
nonlocal accumulator
accumulator = result
return result
for element in it:
yield accumulator, element, callback
Here's cofold in a list comprehension.
>>> [r(a+b) for a, b, r in cofold(range(10))]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
The elements represent each step in the dual reduction. The last one is our answer. The last function is trivial.
def last(it):
for e in it:
pass
return e
Unlike reduce, cofold is a lazy generator, so it can safely act on infinite iterables when used in a generator expression.
>>> from itertools import islice, count
>>> lazy_results = (r(a+b) for a, b, r in cofold(count()))
>>> [*islice(lazy_results, 0, 9)]
[1, 3, 6, 10, 15, 21, 28, 36, 45]
>>> next(lazy_results)
55
>>> next(lazy_results)
66

Pythonic way to cycle through purely side-effect-based comprehension

What is the most pythonic way to execute a full generator comprehension where you don't care about the return values and instead the operations are purely side-effect-based?
An example would be splitting a list based on a predicate value as discussed here. It's natural to think of writing a generator comprehension
split_me = [0, 1, 2, None, 3, '']
a, b = [], []
gen_comp = (a.append(v) if v else b.append(v) for v in split_me)
In this case the best solution I can come up with is to use any
any(gen_comp)
However that's not immediately obvious what's happening for someone who hasn't seen this pattern. Is there a better way to cycle through that full comprehension without holding all the return values in memory?
You do so by not using a generator expression.
Just write a proper loop:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
or perhaps:
for v in split_me:
target = a if v else b
target.append(v)
Using a generator expression here is pointless if you are going to execute the generator immediately anyway. Why produce an object plus a sequence of None return values when all you wanted was to append values to two other lists?
Using an explicit loop is both more comprehensible for future maintainers of the code (including you) and more efficient.
itertools has this consume recipe
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
in your case n is None, so:
collections.deque(iterator, maxlen=0)
Which is interesting, but also a lot of machinery for a simple task
Most people would just use a for loop
As others have said, don't use comprehensions just for side-effects.
Here's a nice way to do what you're actually trying to do using the partition() recipe from itertools:
try: # Python 3
from itertools import filterfalse
except ImportError: # Python 2
from itertools import ifilterfalse as filterfalse
from itertools import ifilter as filter
from itertools import tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# From itertools recipes:
# https://docs.python.org/3/library/itertools.html#itertools-recipes
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
split_me = [0, 1, 2, None, 3, '']
trueish, falseish = partition(lambda x: x, split_me)
# You can iterate directly over trueish and falseish,
# or you can put them into lists
trueish_list = list(trueish)
falseish_list = list(falseish)
print(trueish_list)
print(falseish_list)
Output:
[0, None, '']
[1, 2, 3]
There's nothing non-pythonic in writing things on many lines and make use of if-statements:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
If you want a one-liner you could do so by putting the loop on one line anyway:
for v in split_me: a.append(v) if v else b.append(v)
If you want it in an expression (which still beats me why you want unless you have a value you want to get out of it) you could use list comprehension to force looping:
[x for x in (a.append(v) if v else b.append(v) for v in split_me) if False]
Which solution do you think best shows what you're doing? I'd say the first solution. To be pythonic you should probably consider the zen of python, especially:
Readability counts.
If the implementation is hard to explain, it's a bad idea.
Just to throw in another reason why using any() to consume a generator is a horrible idea, you need to remember that any() and all() are guaranteed to do short-circuit evaluation which means that if the generator ever returns a True value then all() will early-out on you and leave your generator incompletely consumed.
This is adding an extra conditional test / stop condition that you A) probably don't want, and B) may be far away from where the generator is created.
Many standard library functions return None so you could get away with all() for a while until suddenly it's not doing what you expect, and you might stare at that code for a long time before it occurs to you if you've gotten into the habit of using all() in this way.
If you must do something like this, then itertools.consume() is really the only reasonable way to do it I think.
any is short, but is not a general solution. Something which works for any generator is the straightforward
for _ in gen_comp: pass
which is also shorter and more efficient than a generally working any method,
any(None for _ in gen_comp)
so the for loop is really the clearest and best. Its only downside is that it cannot be used in expressions.

Is there a 'foreach' function in Python 3?

When I meet the situation I can do it in javascript, I always think if there's an foreach function it would be convenience. By foreach I mean the function which is described below:
def foreach(fn,iterable):
for x in iterable:
fn(x)
they just do it on every element and didn't yield or return something,i think it should be a built-in function and should be more faster than writing it with pure Python, but I didn't found it on the list,or it just called another name?or I just miss some points here?
Maybe I got wrong, cause calling an function in Python cost high, definitely not a good practice for the example. Rather than an out loop, the function should do the loop in side its body looks like this below which already mentioned in many python's code suggestions:
def fn(*args):
for x in args:
dosomething
but I thought foreach is still welcome base on the two facts:
In normal cases, people just don't care about the performance
Sometime the API didn't accept iterable object and you can't rewrite its source.
Every occurence of "foreach" I've seen (PHP, C#, ...) does basically the same as pythons "for" statement.
These are more or less equivalent:
// PHP:
foreach ($array as $val) {
print($val);
}
// C#
foreach (String val in array) {
console.writeline(val);
}
// Python
for val in array:
print(val)
So, yes, there is a "foreach" in python. It's called "for".
What you're describing is an "array map" function. This could be done with list comprehensions in python:
names = ['tom', 'john', 'simon']
namesCapitalized = [capitalize(n) for n in names]
Python doesn't have a foreach statement per se. It has for loops built into the language.
for element in iterable:
operate(element)
If you really wanted to, you could define your own foreach function:
def foreach(function, iterable):
for element in iterable:
function(element)
As a side note the for element in iterable syntax comes from the ABC programming language, one of Python's influences.
Other examples:
Python Foreach Loop:
array = ['a', 'b']
for value in array:
print(value)
# a
# b
Python For Loop:
array = ['a', 'b']
for index in range(len(array)):
print("index: %s | value: %s" % (index, array[index]))
# index: 0 | value: a
# index: 1 | value: b
map can be used for the situation mentioned in the question.
E.g.
map(len, ['abcd','abc', 'a']) # 4 3 1
For functions that take multiple arguments, more arguments can be given to map:
map(pow, [2, 3], [4,2]) # 16 9
It returns a list in python 2.x and an iterator in python 3
In case your function takes multiple arguments and the arguments are already in the form of tuples (or any iterable since python 2.6) you can use itertools.starmap. (which has a very similar syntax to what you were looking for). It returns an iterator.
E.g.
for num in starmap(pow, [(2,3), (3,2)]):
print(num)
gives us 8 and 9
The correct answer is "python collections do not have a foreach". In native python we need to resort to the external for _element_ in _collection_ syntax which is not what the OP is after.
Python is in general quite weak for functionals programming. There are a few libraries to mitigate a bit. I helped author one of these infixpy
pip install infixpy https://pypi.org/project/infixpy/
from infixpy import Seq
(Seq([1,2,3]).foreach(lambda x: print(x)))
1
2
3
Also see: Left to right application of operations on a list in Python 3
Here is the example of the "foreach" construction with simultaneous access to the element indexes in Python:
for idx, val in enumerate([3, 4, 5]):
print (idx, val)
Yes, although it uses the same syntax as a for loop.
for x in ['a', 'b']: print(x)
This does the foreach in python 3
test = [0,1,2,3,4,5,6,7,8,"test"]
for fetch in test:
print(fetch)
Look at this article. The iterator object nditer from numpy package, introduced in NumPy 1.6, provides many flexible ways to visit all the elements of one or more arrays in a systematic fashion.
Example:
import random
import numpy as np
ptrs = np.int32([[0, 0], [400, 0], [0, 400], [400, 400]])
for ptr in np.nditer(ptrs, op_flags=['readwrite']):
# apply random shift on 1 for each element of the matrix
ptr += random.choice([-1, 1])
print(ptrs)
d:\>python nditer.py
[[ -1 1]
[399 -1]
[ 1 399]
[399 401]]
If I understood you right, you mean that if you have a function 'func', you want to check for each item in list if func(item) returns true; if you get true for all, then do something.
You can use 'all'.
For example: I want to get all prime numbers in range 0-10 in a list:
from math import sqrt
primes = [x for x in range(10) if x > 2 and all(x % i !=0 for i in range(2, int(sqrt(x)) + 1))]
If you really want you can do this:
[fn(x) for x in iterable]
But the point of the list comprehension is to create a list - using it for the side effect alone is poor style. The for loop is also less typing
for x in iterable: fn(x)
I know this is an old thread but I had a similar question when trying to do a codewars exercise.
I came up with a solution which nests loops, I believe this solution applies to the question, it replicates a working "for each (x) doThing" statement in most scenarios:
for elements in array:
while elements in array:
array.func()
If you're just looking for a more concise syntax you can put the for loop on one line:
array = ['a', 'b']
for value in array: print(value)
Just separate additional statements with a semicolon.
array = ['a', 'b']
for value in array: print(value); print('hello')
This may not conform to your local style guide, but it could make sense to do it like this when you're playing around in the console.
In short, the functional programming way to do this is:
def do_and_return_fn(og_fn: Callable[[T], None]):
def do_and_return(item: T) -> T:
og_fn(item)
return item
return do_and_return
# where og_fn is the fn referred to by the question.
# i.e. a function that does something on each element, but returns nothing.
iterable = map(do_and_return_fn(og_fn), iterable)
All of the answers that say "for" loops are the same as "foreach" functions are neglecting the point that other similar functions that operate on iters in python such as map, filter, and others in itertools are lazily evaluated.
Suppose, I have an iterable of dictionaries coming from my database and I want to pop an item off of each dictionary element when the iterator is iterated over. I can't use map because pop returns the item popped, not the original dictionary.
The approach I gave above would allow me to achieve this if I pass lambda x: x.pop() as my og_fn,
What would be nice is if python had a built-in lazy function with an interface like I constructed:
foreach(do_fn: Callable[[T], None], iterable: Iterable)
Implemented with the function given before, it would look like:
def foreach(do_fn: Callable[[T], None], iterable: Iterable[T]) -> Iterable[T]:
return map(do_and_return_fn(do_fn), iterable)
# being called with my db code.
# Lazily removes the INSERTED_ON_SEC_FIELD on every element:
doc_iter = foreach(lambda x: x.pop(INSERTED_ON_SEC_FIELD, None), doc_iter)
No there is no from functools import foreach support in python. However, you can just implement in the same number of lines as the import takes, anyway:
foreach = lambda f, iterable: (*map(f, iterable),)
Bonus:
variadic support: foreach = lambda f, iterable, *args: (*map(f, iterable, *args),) and you can be more efficient by avoiding constructing the tuple of Nones

When to drop list Comprehension and the Pythonic way?

I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]

How can I get a Python generator to return None rather than StopIteration?

I am using generators to perform searches in lists like this simple example:
>>> a = [1,2,3,4]
>>> (i for i, v in enumerate(a) if v == 4).next()
3
(Just to frame the example a bit, I am using very much longer lists compared to the one above, and the entries are a little bit more complicated than int. I do it this way so the entire lists won't be traversed each time I search them)
Now if I would instead change that to i == 666, it would return a StopIteration because it can't find any 666 entry in a.
How can I make it return None instead? I could of course wrap it in a try ... except clause, but is there a more pythonic way to do it?
If you are using Python 2.6+ you should use the next built-in function, not the next method (which was replaced with __next__ in 3.x). The next built-in takes an optional default argument to return if the iterator is exhausted, instead of raising StopIteration:
next((i for i, v in enumerate(a) if i == 666), None)
You can chain the generator with (None,):
from itertools import chain
a = [1,2,3,4]
print chain((i for i, v in enumerate(a) if v == 6), (None,)).next()
but I think a.index(2) will not traverse the full list, when 2 is found, the search is finished. you can test this:
>>> timeit.timeit("a.index(0)", "a=range(10)")
0.19335955439601094
>>> timeit.timeit("a.index(99)", "a=range(100)")
2.1938486138533335

Categories