There is a more general question here: In what situation should the built-in operator module be used in python?
The top answer claims that operator.itemgetter(x) is "neater" than, presumably, than lambda a: a[x]. I feel the opposite is true.
Are there any other benefits, like performance?
You shouldn't worry about performance unless your code is in a tight inner loop, and is actually a performance problem. Instead, use code that best expresses your intent. Some people like lambdas, some like itemgetter. Sometimes it's just a matter of taste.
itemgetter is more powerful, for example, if you need to get a number of elements at once. For example:
operator.itemgetter(1,3,5)
is the same as:
lambda s: (s[1], s[3], s[5])
There are benefits in some situations, here is a good example.
>>> data = [('a',3),('b',2),('c',1)]
>>> from operator import itemgetter
>>> sorted(data, key=itemgetter(1))
[('c', 1), ('b', 2), ('a', 3)]
This use of itemgetter is great because it makes everything clear while also being faster as all operations are kept on the C side.
>>> sorted(data, key=lambda x:x[1])
[('c', 1), ('b', 2), ('a', 3)]
Using a lambda is not as clear, it is also slower and it is preferred not to use lambda unless you have to. Eg. list comprehensions are preferred over using map with a lambda.
Performance. It can make a big difference. In the right circumstances, you can get a bunch of stuff done at the C level by using itemgetter.
I think the claim of what is clearer really depends on which you use most often and would be very subjective
When using this in the key parameter of sorted() or min(), given the choice between say operator.itemgetter(1) and lambda x: x[1], the former is typically significantly faster in both cases:
Using sorted()
The compared functions are defined as follows:
import operator
def sort_key_itemgetter(items, key=1):
return sorted(items, key=operator.itemgetter(key))
def sort_key_lambda(items, key=1):
return sorted(items, key=lambda x: x[key])
Result: sort_key_itemgetter() is faster by ~10% to ~15%.
(Full analysis here)
Using min()
The compared functions are defined as follows:
import operator
def min_key_itemgetter(items, key=1):
return min(items, key=operator.itemgetter(key))
def min_key_lambda(items, key=1):
return min(items, key=lambda x: x[key])
Result: min_key_itemgetter() is faster by ~20% to ~60%.
(Full analysis here)
As performance was mentioned, I've compared both methods operator.itemgetter and lambda and for a small list it turns out that operator.itemgetter outperforms lambda by 10%. I personally like the itemgetter method as I mostly use it during sort and it became like a keyword for me.
import operator
import timeit
x = [[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
def sortOperator():
x.sort(key=operator.itemgetter(1, 2))
def sortLambda():
x.sort(key=lambda x:(x[1], x[2]))
if __name__ == "__main__":
print(timeit.timeit(stmt="sortOperator()", setup="from __main__ import sortOperator", number=10**7))
print(timeit.timeit(stmt="sortLambda()", setup="from __main__ import sortLambda", number=10**7))
>>Tuple: 9.79s, Single: 8.835s
>>Tuple: 11.12s, Single: 9.26s
Run on Python 3.6
Leaving aside performance and code style, itemgetter is picklable, while lambda is not. This is important if the function needs to be saved, or passed between processes (typically as part of a larger object). In the following example, replacing itemgetter with lambda will result in a PicklingError.
from operator import itemgetter
def sort_by_key(sequence, key):
return sorted(sequence, key=key)
if __name__ == "__main__":
from multiprocessing import Pool
items = [([(1,2),(4,1)], itemgetter(1)),
([(5,3),(2,7)], itemgetter(0))]
with Pool(5) as p:
result = p.starmap(sort_by_key, items)
print(result)
Some programmers understand and use lambdas, but there is a population of programmers who perhaps didn't take computer science and aren't clear on the concept. For those programmers itemgetter() can make your intention clearer. (I don't write lambdas and any time I see one in code it takes me a little extra time to process what's going on and understand the code).
If you're coding for other computer science professionals go ahead and use lambdas if they are more comfortable. However, if you're coding for a wider audience. I suggest using itemgetter().
Related
I have a list of Python objects that I want to sort by a specific attribute of each object:
>>> ut
[Tag(name="toe", count=10), Tag(name="leg", count=2), ...]
How do I sort the list by .count in descending order?
# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)
# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)
More on sorting by keys.
A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:
try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda
ut.sort(key=keyfun, reverse=True) # sort in-place
Readers should notice that the key= method:
ut.sort(key=lambda x: x.count, reverse=True)
is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of "Python in a Nutshell"). You can confirm this by running tests on this little program:
#!/usr/bin/env python
import random
class C:
def __init__(self,count):
self.count = count
def __cmp__(self,other):
return cmp(self.count,other.count)
longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]
longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs
My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).
Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.
Object-oriented approach
It's good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.
This ensures consistency and removes the need for boilerplate code.
At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).
class Card(object):
def __init__(self, rank, suit):
self.rank = rank
self.suit = suit
def __eq__(self, other):
return self.rank == other.rank and self.suit == other.suit
def __lt__(self, other):
return self.rank < other.rank
hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand] # [10, 2, 12, 13, 14]
hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted] # [2, 10, 12, 13, 14]
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
It looks much like a list of Django ORM model instances.
Why not sort them on query like this:
ut = Tag.objects.order_by('-count')
Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.
Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.
If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property's fget method instead.
For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:
result = sorted(circles, key=Circle.radius.fget)
This is not the most well-known feature but often saves me a line with the import.
Also if someone wants to sort list that contains strings and numbers for e.g.
eglist=[
"some0thing3",
"some0thing2",
"some1thing2",
"some1thing0",
"some3thing10",
"some3thing2",
"some1thing1",
"some0thing1"]
Then here is the code for that:
import re
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
return [ atoi(c) for c in re.split(r'(\d+)', text) ]
eglist=[
"some0thing3",
"some0thing2",
"some1thing2",
"some1thing0",
"some3thing10",
"some3thing2",
"some1thing1",
"some0thing1"
]
eglist.sort(key=natural_keys)
print(eglist)
I have a list of Python objects that I want to sort by a specific attribute of each object:
>>> ut
[Tag(name="toe", count=10), Tag(name="leg", count=2), ...]
How do I sort the list by .count in descending order?
# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)
# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)
More on sorting by keys.
A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:
try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda
ut.sort(key=keyfun, reverse=True) # sort in-place
Readers should notice that the key= method:
ut.sort(key=lambda x: x.count, reverse=True)
is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of "Python in a Nutshell"). You can confirm this by running tests on this little program:
#!/usr/bin/env python
import random
class C:
def __init__(self,count):
self.count = count
def __cmp__(self,other):
return cmp(self.count,other.count)
longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]
longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs
My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).
Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.
Object-oriented approach
It's good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.
This ensures consistency and removes the need for boilerplate code.
At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).
class Card(object):
def __init__(self, rank, suit):
self.rank = rank
self.suit = suit
def __eq__(self, other):
return self.rank == other.rank and self.suit == other.suit
def __lt__(self, other):
return self.rank < other.rank
hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand] # [10, 2, 12, 13, 14]
hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted] # [2, 10, 12, 13, 14]
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
It looks much like a list of Django ORM model instances.
Why not sort them on query like this:
ut = Tag.objects.order_by('-count')
Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.
Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.
If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property's fget method instead.
For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:
result = sorted(circles, key=Circle.radius.fget)
This is not the most well-known feature but often saves me a line with the import.
Also if someone wants to sort list that contains strings and numbers for e.g.
eglist=[
"some0thing3",
"some0thing2",
"some1thing2",
"some1thing0",
"some3thing10",
"some3thing2",
"some1thing1",
"some0thing1"]
Then here is the code for that:
import re
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
return [ atoi(c) for c in re.split(r'(\d+)', text) ]
eglist=[
"some0thing3",
"some0thing2",
"some1thing2",
"some1thing0",
"some3thing10",
"some3thing2",
"some1thing1",
"some0thing1"
]
eglist.sort(key=natural_keys)
print(eglist)
What is the most pythonic way to execute a full generator comprehension where you don't care about the return values and instead the operations are purely side-effect-based?
An example would be splitting a list based on a predicate value as discussed here. It's natural to think of writing a generator comprehension
split_me = [0, 1, 2, None, 3, '']
a, b = [], []
gen_comp = (a.append(v) if v else b.append(v) for v in split_me)
In this case the best solution I can come up with is to use any
any(gen_comp)
However that's not immediately obvious what's happening for someone who hasn't seen this pattern. Is there a better way to cycle through that full comprehension without holding all the return values in memory?
You do so by not using a generator expression.
Just write a proper loop:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
or perhaps:
for v in split_me:
target = a if v else b
target.append(v)
Using a generator expression here is pointless if you are going to execute the generator immediately anyway. Why produce an object plus a sequence of None return values when all you wanted was to append values to two other lists?
Using an explicit loop is both more comprehensible for future maintainers of the code (including you) and more efficient.
itertools has this consume recipe
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
in your case n is None, so:
collections.deque(iterator, maxlen=0)
Which is interesting, but also a lot of machinery for a simple task
Most people would just use a for loop
As others have said, don't use comprehensions just for side-effects.
Here's a nice way to do what you're actually trying to do using the partition() recipe from itertools:
try: # Python 3
from itertools import filterfalse
except ImportError: # Python 2
from itertools import ifilterfalse as filterfalse
from itertools import ifilter as filter
from itertools import tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# From itertools recipes:
# https://docs.python.org/3/library/itertools.html#itertools-recipes
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
split_me = [0, 1, 2, None, 3, '']
trueish, falseish = partition(lambda x: x, split_me)
# You can iterate directly over trueish and falseish,
# or you can put them into lists
trueish_list = list(trueish)
falseish_list = list(falseish)
print(trueish_list)
print(falseish_list)
Output:
[0, None, '']
[1, 2, 3]
There's nothing non-pythonic in writing things on many lines and make use of if-statements:
for v in split_me:
if v:
a.append(v)
else:
b.append(v)
If you want a one-liner you could do so by putting the loop on one line anyway:
for v in split_me: a.append(v) if v else b.append(v)
If you want it in an expression (which still beats me why you want unless you have a value you want to get out of it) you could use list comprehension to force looping:
[x for x in (a.append(v) if v else b.append(v) for v in split_me) if False]
Which solution do you think best shows what you're doing? I'd say the first solution. To be pythonic you should probably consider the zen of python, especially:
Readability counts.
If the implementation is hard to explain, it's a bad idea.
Just to throw in another reason why using any() to consume a generator is a horrible idea, you need to remember that any() and all() are guaranteed to do short-circuit evaluation which means that if the generator ever returns a True value then all() will early-out on you and leave your generator incompletely consumed.
This is adding an extra conditional test / stop condition that you A) probably don't want, and B) may be far away from where the generator is created.
Many standard library functions return None so you could get away with all() for a while until suddenly it's not doing what you expect, and you might stare at that code for a long time before it occurs to you if you've gotten into the habit of using all() in this way.
If you must do something like this, then itertools.consume() is really the only reasonable way to do it I think.
any is short, but is not a general solution. Something which works for any generator is the straightforward
for _ in gen_comp: pass
which is also shorter and more efficient than a generally working any method,
any(None for _ in gen_comp)
so the for loop is really the clearest and best. Its only downside is that it cannot be used in expressions.
This is a subtle question about notation.
I want to call a function with specific arguments, but without having to redefine it.
For example, min() with a key function on the second argument key = itemgetter(1) would look like:
min_arg2 = lambda p,q = min(p,q, key = itemgetter(1))
I'm hoping to just call it as something like min( *itemgetter(1) )...
Does anyone know how to do this? Thank you.
You want to use functools.partial():
min_arg2 = functools.partial(min, key=itemgetter(1))
See http://docs.python.org/library/functools.html for the docs.
Example:
>>> import functools
>>> from operator import itemgetter
>>> min_arg2 = functools.partial(min, key=itemgetter(1))
>>> min_arg2(vals)
('b', 0)
Using functools (as in Duncan's answer) is a better approach, however you can use a lambda expression, you just didn't get the syntax correct:
min_arg2 = lambda p,q: min(p,q, key=itemgetter(1))
I was wondering whether for most examples it is more 'pythonic' to use lambda or the partial function?
For example, I might want to apply imap on some list, like add 3 to every element using:
imap(lambda x : x + 3, my_list)
Or to use partial:
imap(partial(operator.add, 3), my_list)
I realize in this example a loop could probably accomplish it easier, but I'm thinking about more non-trivial examples.
In Haskell, I would easily choose partial application in the above example, but I'm not sure for Python. To me, the lambda seems the the better choice, but I don't know what the prevailing choice is for most python programmers.
To be truly equivalent to imap, use a generator expression:
(x + 3 for x in mylist)
Like imap, this doesn't immediately construct an entire new list, but instead computes elements of the resulting sequence on-demand (and is thus much more efficient than a list comprehension if you're chaining the result into another iteration).
If you're curious about where partial would be a better option than lambda in the real world, it tends to be when you're dealing with variable numbers of arguments:
>>> from functools import partial
>>> def a(*args):
... return sum(args)
...
>>> b = partial(a, 2, 3)
>>> b(6, 7, 8)
26
The equivalent version using lambda would be...
>>> b = lambda *args: a(2, 3, *args)
>>> b(6, 7, 8)
26
which is slightly less concise - but lambda does give you the option of out-of-order application, which partial does not:
>>> def a(x, y, z):
... return x + y - z
...
>>> b = lambda m, n: a(m, 1, n)
>>> b(2, 5)
-2
In the given example, lambda seems most appropriate. It's also easier on the eyes.
I have never seen the use of partial functions in the wild.
lambda is certainly many times more common. Unless you're doing functional programming in an academic setting, you should probably steer away from functools.
This is pythonic. No library needed, or even builtins, just a simple generator expression.
( x + 3 for x in my_list )
This creates a generator, similar to imap.
If you're going to make a list out of it anyway, use a list comprehension instead:
[ x + 3 for x in my_list ]