Why is there no first(iterable) built-in function in Python? - python

I'm wondering if there's a reason that there's no first(iterable) in the Python built-in functions, somewhat similar to any(iterable) and all(iterable) (it may be tucked in a stdlib module somewhere, but I don't see it in itertools). first would perform a short-circuit generator evaluation so that unnecessary (and a potentially infinite number of) operations can be avoided; i.e.
def identity(item):
return item
def first(iterable, predicate=identity):
for item in iterable:
if predicate(item):
return item
raise ValueError('No satisfactory value found')
This way you can express things like:
denominators = (2, 3, 4, 5)
lcd = first(i for i in itertools.count(1)
if all(i % denominators == 0 for denominator in denominators))
Clearly you can't do list(generator)[0] in that case, since the generator doesn't terminate.
Or if you have a bunch of regexes to match against (useful when they all have the same groupdict interface):
match = first(regex.match(big_text) for regex in regexes)
You save a lot of unnecessary processing by avoiding list(generator)[0] and short-circuiting on a positive match.

In Python 2, if you have an iterator, you can just call its next method. Something like:
>>> (5*x for x in xrange(2,4)).next()
10
In Python 3, you can use the next built-in with an iterator:
>>> next(5*x for x in range(2,4))
10

There's a Pypi package called “first” that does this:
>>> from first import first
>>> first([0, None, False, [], (), 42])
42
Here's how you would use to return the first odd number, for example:
>> first([2, 14, 7, 41, 53], key=lambda x: x % 2 == 1)
7
If you just want to return the first element from the iterator regardless of whether is true or not, do this:
>>> first([0, None, False, [], (), 42], key=lambda x: True)
0
It's a very small package: it only contains this function, it has no dependencies, and it works on Python 2 and 3. It's a single file, so you don't even have to install it to use it.
In fact, here's almost the entire source code (from version 2.0.1, by Hynek Schlawack, released under the MIT licence):
def first(iterable, default=None, key=None):
if key is None:
for el in iterable:
if el:
return el
else:
for el in iterable:
if key(el):
return el
return default

I asked a similar question recently (it got marked as a duplicate of this question by now). My concern also was that I'd liked to use built-ins only to solve the problem of finding the first true value of a generator. My own solution then was this:
x = next((v for v in (f(x) for x in a) if v), False)
For the example of finding the first regexp match (not the first matching pattern!) this would look like this:
patterns = [ r'\d+', r'\s+', r'\w+', r'.*' ]
text = 'abc'
firstMatch = next(
(match for match in
(re.match(pattern, text) for pattern in patterns)
if match),
False)
It does not evaluate the predicate twice (as you would have to do if just the pattern was returned) and it does not use hacks like locals in comprehensions.
But it has two generators nested where the logic would dictate to use just one. So a better solution would be nice.

There is a "slice" iterator in itertools. It emulates the slice operations that we're familiar with in python. What you're looking for is something similar to this:
myList = [0,1,2,3,4,5]
firstValue = myList[:1]
The equivalent using itertools for iterators:
from itertools import islice
def MyGenFunc():
for i in range(5):
yield i
mygen = MyGenFunc()
firstValue = islice(mygen, 0, 1)
print firstValue

There's some ambiguity in your question. Your definition of first and the regex example imply that there is a boolean test. But the denominators example explicitly has an if clause; so it's only a coincidence that each integer happens to be true.
It looks like the combination of next and itertools.ifilter will give you what you want.
match = next(itertools.ifilter(None, (regex.match(big_text) for regex in regexes)))

Haskell makes use of what you just described, as the function take (or as the partial function take 1, technically). Python Cookbook has generator-wrappers written that perform the same functionality as take, takeWhile, and drop in Haskell.
But as to why that's not a built-in, your guess is as good as mine.

Related

Pythonic way to cycle through purely side-effect-based comprehension

What is the most pythonic way to execute a full generator comprehension where you don't care about the return values and instead the operations are purely side-effect-based?
An example would be splitting a list based on a predicate value as discussed here. It's natural to think of writing a generator comprehension
split_me = [0, 1, 2, None, 3, '']
a, b = [], []
gen_comp = (a.append(v) if v else b.append(v) for v in split_me)
In this case the best solution I can come up with is to use any
any(gen_comp)
However that's not immediately obvious what's happening for someone who hasn't seen this pattern. Is there a better way to cycle through that full comprehension without holding all the return values in memory?
You do so by not using a generator expression.
Just write a proper loop:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
or perhaps:
for v in split_me:
target = a if v else b
target.append(v)
Using a generator expression here is pointless if you are going to execute the generator immediately anyway. Why produce an object plus a sequence of None return values when all you wanted was to append values to two other lists?
Using an explicit loop is both more comprehensible for future maintainers of the code (including you) and more efficient.
itertools has this consume recipe
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
in your case n is None, so:
collections.deque(iterator, maxlen=0)
Which is interesting, but also a lot of machinery for a simple task
Most people would just use a for loop
As others have said, don't use comprehensions just for side-effects.
Here's a nice way to do what you're actually trying to do using the partition() recipe from itertools:
try: # Python 3
from itertools import filterfalse
except ImportError: # Python 2
from itertools import ifilterfalse as filterfalse
from itertools import ifilter as filter
from itertools import tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# From itertools recipes:
# https://docs.python.org/3/library/itertools.html#itertools-recipes
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
split_me = [0, 1, 2, None, 3, '']
trueish, falseish = partition(lambda x: x, split_me)
# You can iterate directly over trueish and falseish,
# or you can put them into lists
trueish_list = list(trueish)
falseish_list = list(falseish)
print(trueish_list)
print(falseish_list)
Output:
[0, None, '']
[1, 2, 3]
There's nothing non-pythonic in writing things on many lines and make use of if-statements:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
If you want a one-liner you could do so by putting the loop on one line anyway:
for v in split_me: a.append(v) if v else b.append(v)
If you want it in an expression (which still beats me why you want unless you have a value you want to get out of it) you could use list comprehension to force looping:
[x for x in (a.append(v) if v else b.append(v) for v in split_me) if False]
Which solution do you think best shows what you're doing? I'd say the first solution. To be pythonic you should probably consider the zen of python, especially:
Readability counts.
If the implementation is hard to explain, it's a bad idea.
Just to throw in another reason why using any() to consume a generator is a horrible idea, you need to remember that any() and all() are guaranteed to do short-circuit evaluation which means that if the generator ever returns a True value then all() will early-out on you and leave your generator incompletely consumed.
This is adding an extra conditional test / stop condition that you A) probably don't want, and B) may be far away from where the generator is created.
Many standard library functions return None so you could get away with all() for a while until suddenly it's not doing what you expect, and you might stare at that code for a long time before it occurs to you if you've gotten into the habit of using all() in this way.
If you must do something like this, then itertools.consume() is really the only reasonable way to do it I think.
any is short, but is not a general solution. Something which works for any generator is the straightforward
for _ in gen_comp: pass
which is also shorter and more efficient than a generally working any method,
any(None for _ in gen_comp)
so the for loop is really the clearest and best. Its only downside is that it cannot be used in expressions.

Is there a 'foreach' function in Python 3?

When I meet the situation I can do it in javascript, I always think if there's an foreach function it would be convenience. By foreach I mean the function which is described below:
def foreach(fn,iterable):
for x in iterable:
fn(x)
they just do it on every element and didn't yield or return something,i think it should be a built-in function and should be more faster than writing it with pure Python, but I didn't found it on the list,or it just called another name?or I just miss some points here?
Maybe I got wrong, cause calling an function in Python cost high, definitely not a good practice for the example. Rather than an out loop, the function should do the loop in side its body looks like this below which already mentioned in many python's code suggestions:
def fn(*args):
for x in args:
dosomething
but I thought foreach is still welcome base on the two facts:
In normal cases, people just don't care about the performance
Sometime the API didn't accept iterable object and you can't rewrite its source.
Every occurence of "foreach" I've seen (PHP, C#, ...) does basically the same as pythons "for" statement.
These are more or less equivalent:
// PHP:
foreach ($array as $val) {
print($val);
}
// C#
foreach (String val in array) {
console.writeline(val);
}
// Python
for val in array:
print(val)
So, yes, there is a "foreach" in python. It's called "for".
What you're describing is an "array map" function. This could be done with list comprehensions in python:
names = ['tom', 'john', 'simon']
namesCapitalized = [capitalize(n) for n in names]
Python doesn't have a foreach statement per se. It has for loops built into the language.
for element in iterable:
operate(element)
If you really wanted to, you could define your own foreach function:
def foreach(function, iterable):
for element in iterable:
function(element)
As a side note the for element in iterable syntax comes from the ABC programming language, one of Python's influences.
Other examples:
Python Foreach Loop:
array = ['a', 'b']
for value in array:
print(value)
# a
# b
Python For Loop:
array = ['a', 'b']
for index in range(len(array)):
print("index: %s | value: %s" % (index, array[index]))
# index: 0 | value: a
# index: 1 | value: b
map can be used for the situation mentioned in the question.
E.g.
map(len, ['abcd','abc', 'a']) # 4 3 1
For functions that take multiple arguments, more arguments can be given to map:
map(pow, [2, 3], [4,2]) # 16 9
It returns a list in python 2.x and an iterator in python 3
In case your function takes multiple arguments and the arguments are already in the form of tuples (or any iterable since python 2.6) you can use itertools.starmap. (which has a very similar syntax to what you were looking for). It returns an iterator.
E.g.
for num in starmap(pow, [(2,3), (3,2)]):
print(num)
gives us 8 and 9
The correct answer is "python collections do not have a foreach". In native python we need to resort to the external for _element_ in _collection_ syntax which is not what the OP is after.
Python is in general quite weak for functionals programming. There are a few libraries to mitigate a bit. I helped author one of these infixpy
pip install infixpy https://pypi.org/project/infixpy/
from infixpy import Seq
(Seq([1,2,3]).foreach(lambda x: print(x)))
1
2
3
Also see: Left to right application of operations on a list in Python 3
Here is the example of the "foreach" construction with simultaneous access to the element indexes in Python:
for idx, val in enumerate([3, 4, 5]):
print (idx, val)
Yes, although it uses the same syntax as a for loop.
for x in ['a', 'b']: print(x)
This does the foreach in python 3
test = [0,1,2,3,4,5,6,7,8,"test"]
for fetch in test:
print(fetch)
Look at this article. The iterator object nditer from numpy package, introduced in NumPy 1.6, provides many flexible ways to visit all the elements of one or more arrays in a systematic fashion.
Example:
import random
import numpy as np
ptrs = np.int32([[0, 0], [400, 0], [0, 400], [400, 400]])
for ptr in np.nditer(ptrs, op_flags=['readwrite']):
# apply random shift on 1 for each element of the matrix
ptr += random.choice([-1, 1])
print(ptrs)
d:\>python nditer.py
[[ -1 1]
[399 -1]
[ 1 399]
[399 401]]
If I understood you right, you mean that if you have a function 'func', you want to check for each item in list if func(item) returns true; if you get true for all, then do something.
You can use 'all'.
For example: I want to get all prime numbers in range 0-10 in a list:
from math import sqrt
primes = [x for x in range(10) if x > 2 and all(x % i !=0 for i in range(2, int(sqrt(x)) + 1))]
If you really want you can do this:
[fn(x) for x in iterable]
But the point of the list comprehension is to create a list - using it for the side effect alone is poor style. The for loop is also less typing
for x in iterable: fn(x)
I know this is an old thread but I had a similar question when trying to do a codewars exercise.
I came up with a solution which nests loops, I believe this solution applies to the question, it replicates a working "for each (x) doThing" statement in most scenarios:
for elements in array:
while elements in array:
array.func()
If you're just looking for a more concise syntax you can put the for loop on one line:
array = ['a', 'b']
for value in array: print(value)
Just separate additional statements with a semicolon.
array = ['a', 'b']
for value in array: print(value); print('hello')
This may not conform to your local style guide, but it could make sense to do it like this when you're playing around in the console.
In short, the functional programming way to do this is:
def do_and_return_fn(og_fn: Callable[[T], None]):
def do_and_return(item: T) -> T:
og_fn(item)
return item
return do_and_return
# where og_fn is the fn referred to by the question.
# i.e. a function that does something on each element, but returns nothing.
iterable = map(do_and_return_fn(og_fn), iterable)
All of the answers that say "for" loops are the same as "foreach" functions are neglecting the point that other similar functions that operate on iters in python such as map, filter, and others in itertools are lazily evaluated.
Suppose, I have an iterable of dictionaries coming from my database and I want to pop an item off of each dictionary element when the iterator is iterated over. I can't use map because pop returns the item popped, not the original dictionary.
The approach I gave above would allow me to achieve this if I pass lambda x: x.pop() as my og_fn,
What would be nice is if python had a built-in lazy function with an interface like I constructed:
foreach(do_fn: Callable[[T], None], iterable: Iterable)
Implemented with the function given before, it would look like:
def foreach(do_fn: Callable[[T], None], iterable: Iterable[T]) -> Iterable[T]:
return map(do_and_return_fn(do_fn), iterable)
# being called with my db code.
# Lazily removes the INSERTED_ON_SEC_FIELD on every element:
doc_iter = foreach(lambda x: x.pop(INSERTED_ON_SEC_FIELD, None), doc_iter)
No there is no from functools import foreach support in python. However, you can just implement in the same number of lines as the import takes, anyway:
foreach = lambda f, iterable: (*map(f, iterable),)
Bonus:
variadic support: foreach = lambda f, iterable, *args: (*map(f, iterable, *args),) and you can be more efficient by avoiding constructing the tuple of Nones

When to drop list Comprehension and the Pythonic way?

I created a line that appends an object to a list in the following manner
>>> foo = list()
>>> def sum(a, b):
... c = a+b; return c
...
>>> bar_list = [9,8,7,6,5,4,3,2,1,0]
>>> [foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
[None, None, None, None, None, None, None, None, None, None]
>>> foo
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
>>>
The line
[foo.append(sum(i,x)) for i, x in enumerate(bar_list)]
would give a pylint W1060 Expression is assigned to nothing, but since I am already using the foo list to append the values I don't need to assing the List Comprehension line to something.
My questions is more of a matter of programming correctness
Should I drop list comprehension and just use a simple for expression?
>>> for i, x in enumerate(bar_list):
... foo.append(sum(i,x))
or is there a correct way to use both list comprehension an assign to nothing?
Answer
Thank you #user2387370, #kindall and #Martijn Pieters. For the rest of the comments I use append because I'm not using a list(), I'm not using i+x because this is just a simplified example.
I left it as the following:
histogramsCtr = hist_impl.HistogramsContainer()
for index, tupl in enumerate(local_ranges_per_histogram_list):
histogramsCtr.append(doSubHistogramData(index, tupl))
return histogramsCtr
Yes, this is bad style. A list comprehension is to build a list. You're building a list full of Nones and then throwing it away. Your actual desired result is a side effect of this effort.
Why not define foo using the list comprehension in the first place?
foo = [sum(i,x) for i, x in enumerate(bar_list)]
If it is not to be a list but some other container class, as you mentioned in a comment on another answer, write that class to accept an iterable in its constructor (or, if it's not your code, subclass it to do so), then pass it a generator expression:
foo = MyContainer(sum(i, x) for i, x in enumerate(bar_list))
If foo already has some value and you wish to append new items:
foo.extend(sum(i,x) for i, x in enumerate(bar_list))
If you really want to use append() and don't want to use a for loop for some reason then you can use this construction; the generator expression will at least avoid wasting memory and CPU cycles on a list you don't want:
any(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
But this is a good deal less clear than a regular for loop, and there's still some extra work being done: any is testing the return value of foo.append() on each iteration. You can write a function to consume the iterator and eliminate that check; the fastest way uses a zero-length collections.deque:
from collections import deque
do = deque([], maxlen=0).extend
do(foo.append(sum(i, x)) for i, x in enumerate(bar_list))
This is actually fairly readable, but I believe it's not actually any faster than any() and requires an extra import. However, either do() or any() is a little faster than a for loop, if that is a concern.
I think it's generally frowned upon to use list comprehensions just for side-effects, so I would say a for loop is better in this case.
But in any case, couldn't you just do foo = [sum(i,x) for i, x in enumerate(bar_list)]?
You should definitely drop the list comprehension. End of.
You are confusing anyone reading your code. You are building a list for the side-effects.
You are paying CPU cycles and memory for building a list you are discarding again.
In your simplified case, you are overlooking the fact you could have used a list comprehension directly:
[sum(i,x) for i, x in enumerate(bar_list)]

What is the 'pythonic' equivalent to the 'fold' function from functional programming?

What is the most idiomatic way to achieve something like the following, in Haskell:
foldl (+) 0 [1,2,3,4,5]
--> 15
Or its equivalent in Ruby:
[1,2,3,4,5].inject(0) {|m,x| m + x}
#> 15
Obviously, Python provides the reduce function, which is an implementation of fold, exactly as above, however, I was told that the 'pythonic' way of programming was to avoid lambda terms and higher-order functions, preferring list-comprehensions where possible. Therefore, is there a preferred way of folding a list, or list-like structure in Python that isn't the reduce function, or is reduce the idiomatic way of achieving this?
The Pythonic way of summing an array is using sum. For other purposes, you can sometimes use some combination of reduce (from the functools module) and the operator module, e.g.:
def product(xs):
return reduce(operator.mul, xs, 1)
Be aware that reduce is actually a foldl, in Haskell terms. There is no special syntax to perform folds, there's no builtin foldr, and actually using reduce with non-associative operators is considered bad style.
Using higher-order functions is quite pythonic; it makes good use of Python's principle that everything is an object, including functions and classes. You are right that lambdas are frowned upon by some Pythonistas, but mostly because they tend not to be very readable when they get complex.
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), which gives the possibility to name the result of an expression, we can use a list comprehension to replicate what other languages call fold/foldleft/reduce operations:
Given a list, a reducing function and an accumulator:
items = [1, 2, 3, 4, 5]
f = lambda acc, x: acc * x
accumulator = 1
we can fold items with f in order to obtain the resulting accumulation:
[accumulator := f(accumulator, x) for x in items]
# accumulator = 120
or in a condensed formed:
acc = 1; [acc := acc * x for x in [1, 2, 3, 4, 5]]
# acc = 120
Note that this is actually also a "scanleft" operation as the result of the list comprehension represents the state of the accumulation at each step:
acc = 1
scanned = [acc := acc * x for x in [1, 2, 3, 4, 5]]
# scanned = [1, 2, 6, 24, 120]
# acc = 120
Haskell
foldl (+) 0 [1,2,3,4,5]
Python
reduce(lambda a,b: a+b, [1,2,3,4,5], 0)
Obviously, that is a trivial example to illustrate a point. In Python you would just do sum([1,2,3,4,5]) and even Haskell purists would generally prefer sum [1,2,3,4,5].
For non-trivial scenarios when there is no obvious convenience function, the idiomatic pythonic approach is to explicitly write out the for loop and use mutable variable assignment instead of using reduce or a fold.
That is not at all the functional style, but that is the "pythonic" way. Python is not designed for functional purists. See how Python favors exceptions for flow control to see how non-functional idiomatic python is.
In Python 3, the reduce has been removed: Release notes. Nevertheless you can use the functools module
import operator, functools
def product(xs):
return functools.reduce(operator.mul, xs, 1)
On the other hand, the documentation expresses preference towards for-loop instead of reduce, hence:
def product(xs):
result = 1
for i in xs:
result *= i
return result
Not really answer to the question, but one-liners for foldl and foldr:
a = [8,3,4]
## Foldl
reduce(lambda x,y: x**y, a)
#68719476736
## Foldr
reduce(lambda x,y: y**x, a[::-1])
#14134776518227074636666380005943348126619871175004951664972849610340958208L
You can reinvent the wheel as well:
def fold(f, l, a):
"""
f: the function to apply
l: the list to fold
a: the accumulator, who is also the 'zero' on the first call
"""
return a if(len(l) == 0) else fold(f, l[1:], f(a, l[0]))
print "Sum:", fold(lambda x, y : x+y, [1,2,3,4,5], 0)
print "Any:", fold(lambda x, y : x or y, [False, True, False], False)
print "All:", fold(lambda x, y : x and y, [False, True, False], True)
# Prove that result can be of a different type of the list's elements
print "Count(x==True):",
print fold(lambda x, y : x+1 if(y) else x, [False, True, True], 0)
The actual answer to this (reduce) problem is: Just use a loop!
initial_value = 0
for x in the_list:
initial_value += x #or any function.
This will be faster than a reduce and things like PyPy can optimize loops like that.
BTW, the sum case should be solved with the sum function
I believe some of the respondents of this question have missed the broader implication of the fold function as an abstract tool. Yes, sum can do the same thing for a list of integers, but this is a trivial case. fold is more generic. It is useful when you have a sequence of data structures of varying shape and want to cleanly express an aggregation. So instead of having to build up a for loop with an aggregate variable and manually recompute it each time, a fold function (or the Python version, which reduce appears to correspond to) allows the programmer to express the intent of the aggregation much more plainly by simply providing two things:
A default starting or "seed" value for the aggregation.
A function that takes the current value of the aggregation (starting with the "seed") and the next element in the list, and returns the next aggregation value.

How can I get a Python generator to return None rather than StopIteration?

I am using generators to perform searches in lists like this simple example:
>>> a = [1,2,3,4]
>>> (i for i, v in enumerate(a) if v == 4).next()
3
(Just to frame the example a bit, I am using very much longer lists compared to the one above, and the entries are a little bit more complicated than int. I do it this way so the entire lists won't be traversed each time I search them)
Now if I would instead change that to i == 666, it would return a StopIteration because it can't find any 666 entry in a.
How can I make it return None instead? I could of course wrap it in a try ... except clause, but is there a more pythonic way to do it?
If you are using Python 2.6+ you should use the next built-in function, not the next method (which was replaced with __next__ in 3.x). The next built-in takes an optional default argument to return if the iterator is exhausted, instead of raising StopIteration:
next((i for i, v in enumerate(a) if i == 666), None)
You can chain the generator with (None,):
from itertools import chain
a = [1,2,3,4]
print chain((i for i, v in enumerate(a) if v == 6), (None,)).next()
but I think a.index(2) will not traverse the full list, when 2 is found, the search is finished. you can test this:
>>> timeit.timeit("a.index(0)", "a=range(10)")
0.19335955439601094
>>> timeit.timeit("a.index(99)", "a=range(100)")
2.1938486138533335

Categories