Python shortcuts

Python shortcuts - python

Python is filled with little neat shortcuts.
For example:
self.data = map(lambda x: list(x), data)
and (although not so pretty)
tuple(t[0] for t in self.result if t[0] != 'mysql' and t[0] != 'information_schema')
among countless others.
In the irc channel, they said "too many to know them all".
I think we should list some here, as i love using these shortcuts to shorten & refctor my code. I'm sure this would benefit many.

self.data = map(lambda x: list(x), data)
is dreck -- use
self.data = map(list, data)
if you're a map fanatic (list comprehensions are generally preferred these days). More generally, lambda x: somecallable(x) can always be productively changed to just somecallable, in every context, with nothing but good effect.
As for shortcuts in general, my wife and I did our best to list the most important and useful one in the early part of the Python Cookbook's second edition -- could be a start.

Alex Martelli provided an even shorter version of your first example. I shall provide a (slightly) shorter version of your second:
tuple(t[0] for t in self.result if t[0] not in ('mysql', 'information_schema'))
Obviously the in operator becomes more advantageous the more values you're testing for.
I would also like to stress that shortening and refactoring is good only to the extent that it improves clarity and readability. (Unless you are code-golfing. ;)

I'm not sure if this is a shortcut, but I love it:
>>> class Enum(object):
def __init__(self, *keys):
self.keys = keys
self.__dict__.update(zip(keys, range(len(keys))))
def value(self, key):
return self.keys.index(key)
>>> colors = Enum("Red", "Blue", "Green", "Yellow", "Purple")
>>> colors.keys
('Red', 'Blue', 'Green', 'Yellow', 'Purple')
>>> colors.Green
2
(I don't know who came up with this, but it wasn't me.)

I always liked the "unzip" idiom:
>>> zipped = [('a', 1), ('b', 2), ('c', 3)]
>>> zip(*zipped)
[('a', 'b', 'c'), (1, 2, 3)]
>>>
>>> l,n = zip(*zipped)
>>> l
('a', 'b', 'c')
>>> n
(1, 2, 3)

Related

Importing module in a function when returning

I've being searching on how to do this, but I could not find if there is a solution. I thought __import__? But I still couldn't manage to figure it out.
For example:
>>> def combs(s = []):
... from itertools import combinations
... return [list(combinations(s, 2))]
...
>>> lst = ["A","B",'C']
>>> print(combs(lst))
[[('A', 'B'), ('A', 'C'), ('B', 'C')]]
>>>
I'm curious if something like this could be done?
def combs(s = []):
return [list(combinations(s, 2))]__import__(itertools, list)

Here is how to achieve a dynamic import in your example:
def combs(s = []):
return list(__import__('itertools').combinations(s, 2))
NB: the python docs for __import__ state that:
This is an advanced function that is not needed in everyday Python programming
Many Pythonistas would prefer an explicit import (as in your original example), and would probably consider excessive use of __import__ to be a bit of a code smell.

Python deep zip

I am trying to write a function like zip. I am not good at explaining what I mean, so i will just show 'code' of what i'm trying to do.
a = [1,2,3,[4,5]]
b = a[:]
zip(a, b) == [(1,1), (2,2), (3,3), ([4,5],[4,5])]
myzip(a, b) == [(1,1), (2,2), (3,3), [(4,4), (5,5)]]
I am so stuck on this it's not even funny. I am trying to write it in a simple functional way with recursive lambdas, to make my code prettier. I want myzip like this because i want to use its output with another function I wrote which maps a function to a tree
def tree_map(func, tree):
return map(lambda x: func(x) if not isinstance(x, list) else tree_map(func, x),
tree)
I have been trying to do something similar to this with zip, but I can't seem to wrap my head around it. Does anyone have any ideas on how i could write myzip?
Edit: Look at tree_map! isn't that pretty! i think so at least, but my mother tongue is Scheme :P
and also, I want myzip to go as deep as it needs to. basically, I want myzip to retain the structure of the trees i pass it. Also, myzip will only handle trees that are the same shape.

I think the following should work:
import collections
def myzip(*args):
if all(isinstance(arg, collections.Iterable) for arg in args):
return [myzip(*vals) for vals in zip(*args)]
return args
Result:
>>> a = [1,2,3,[4,[5,6]]]
>>> b = [1,2,3,[4,[5,6]]]
>>> myzip(a, b)
[(1, 1), (2, 2), (3, 3), [(4, 4), [(5, 5), (6, 6)]]]
Note that I use collections.Iterable instead of list in the type checking so that the behavior is more like zip() with tuples and other iterables.

Why should I use operator.itemgetter(x) instead of [x]?

There is a more general question here: In what situation should the built-in operator module be used in python?
The top answer claims that operator.itemgetter(x) is "neater" than, presumably, than lambda a: a[x]. I feel the opposite is true.
Are there any other benefits, like performance?

You shouldn't worry about performance unless your code is in a tight inner loop, and is actually a performance problem. Instead, use code that best expresses your intent. Some people like lambdas, some like itemgetter. Sometimes it's just a matter of taste.
itemgetter is more powerful, for example, if you need to get a number of elements at once. For example:
operator.itemgetter(1,3,5)
is the same as:
lambda s: (s[1], s[3], s[5])

There are benefits in some situations, here is a good example.
>>> data = [('a',3),('b',2),('c',1)]
>>> from operator import itemgetter
>>> sorted(data, key=itemgetter(1))
[('c', 1), ('b', 2), ('a', 3)]
This use of itemgetter is great because it makes everything clear while also being faster as all operations are kept on the C side.
>>> sorted(data, key=lambda x:x[1])
[('c', 1), ('b', 2), ('a', 3)]
Using a lambda is not as clear, it is also slower and it is preferred not to use lambda unless you have to. Eg. list comprehensions are preferred over using map with a lambda.

Performance. It can make a big difference. In the right circumstances, you can get a bunch of stuff done at the C level by using itemgetter.
I think the claim of what is clearer really depends on which you use most often and would be very subjective

When using this in the key parameter of sorted() or min(), given the choice between say operator.itemgetter(1) and lambda x: x[1], the former is typically significantly faster in both cases:
Using sorted()
The compared functions are defined as follows:
import operator
def sort_key_itemgetter(items, key=1):
return sorted(items, key=operator.itemgetter(key))
def sort_key_lambda(items, key=1):
return sorted(items, key=lambda x: x[key])
Result: sort_key_itemgetter() is faster by ~10% to ~15%.
(Full analysis here)
Using min()
The compared functions are defined as follows:
import operator
def min_key_itemgetter(items, key=1):
return min(items, key=operator.itemgetter(key))
def min_key_lambda(items, key=1):
return min(items, key=lambda x: x[key])
Result: min_key_itemgetter() is faster by ~20% to ~60%.
(Full analysis here)

As performance was mentioned, I've compared both methods operator.itemgetter and lambda and for a small list it turns out that operator.itemgetter outperforms lambda by 10%. I personally like the itemgetter method as I mostly use it during sort and it became like a keyword for me.
import operator
import timeit
x = [[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]
def sortOperator():
x.sort(key=operator.itemgetter(1, 2))
def sortLambda():
x.sort(key=lambda x:(x[1], x[2]))
if __name__ == "__main__":
print(timeit.timeit(stmt="sortOperator()", setup="from __main__ import sortOperator", number=10**7))
print(timeit.timeit(stmt="sortLambda()", setup="from __main__ import sortLambda", number=10**7))
>>Tuple: 9.79s, Single: 8.835s
>>Tuple: 11.12s, Single: 9.26s
Run on Python 3.6

Leaving aside performance and code style, itemgetter is picklable, while lambda is not. This is important if the function needs to be saved, or passed between processes (typically as part of a larger object). In the following example, replacing itemgetter with lambda will result in a PicklingError.
from operator import itemgetter
def sort_by_key(sequence, key):
return sorted(sequence, key=key)
if __name__ == "__main__":
from multiprocessing import Pool
items = [([(1,2),(4,1)], itemgetter(1)),
([(5,3),(2,7)], itemgetter(0))]
with Pool(5) as p:
result = p.starmap(sort_by_key, items)
print(result)

Some programmers understand and use lambdas, but there is a population of programmers who perhaps didn't take computer science and aren't clear on the concept. For those programmers itemgetter() can make your intention clearer. (I don't write lambdas and any time I see one in code it takes me a little extra time to process what's going on and understand the code).
If you're coding for other computer science professionals go ahead and use lambdas if they are more comfortable. However, if you're coding for a wider audience. I suggest using itemgetter().

Python sorting multiple attributes

I have a dictionary like the following. Key value pairs or username:name
d = {"user2":"Tom Cruise", "user1": "Tom Cruise"}
My problem is that i need to sort these by the Name, but if multiple users contain the same name like above, i then need to sort those by their username. I looked up the sorted function but i dont really understand the cmp parameter and the lambda. If someone could explain those and help me with this that would be great! Thanks :)

cmp is obsolescent. lambda just makes a function.
sorted(d.iteritems(), key=operator.itemgetter(1, 0))

I'm just going to elaborate on Ignacio Vazquez-Abrams's answer. cmp is deprecated. Don't use it. Use the key attribute instead.
lambda makes a function. It's an expression and so can go places that a normal def statement can't but it's body is limited to a single expression.
my_func = lambda x: x + 1
This defines a function that takes a single argument, x and returns x + 1. lambda x, y=1: x + y defines a function that takes an x argument, an optional y argument with a default value of 1 and returns x + y. As you can see, it's really just like a def statement except that it's an expression and limited to a single expression for the body.
The purpose of the key attribute is that sorted will call it for each element of the sequence to be sorted and use the value that it returns for comparison.
list_ = ['a', 'b', 'c']
sorted(list_, key=lambda x: 1)
Just read the rest for a hypothetical example. I didn't look at problem closely enough before writing this. It will still be educational though so I'll leave it up.
We can't really say much more because
You can't sort dicts. Do you have a list of dictss? We could sort that.
You haven't shown a username key.
I'll assume that it's something like
users = [{'name': 'Tom Cruise', 'username': user234234234, 'reputation': 1},
{'name': 'Aaron Sterling', 'username': 'aaronasterling', 'reputation': 11725}]
If you wanted to confirm that I'm more awesome than Tom Cruise, you could do:
sorted(users, key=lambda x: x['reputation'])
This just passes a function that returns the 'reputation' value for each dictionary in the list. But lambdas can be slower. Most of the time operator.itemgetter is what you want.
operator.itemgetter takes a series of keys and returns a function that takes an object and returns a tuple of the value of its argument.
so f = operator.itemgetter('name', 'username') will return essentially the same function as
lambda d: (d['name'], d['username']) The difference is that it should, in principle run much faster and you don't have to look at ugly lambda expressions.
So to sort a list of dicts by name and then username, just do
sorted(list_of_dicts, operator.itemgetter('name', 'username'))
which is exactly what Ignacio Vazquez-Abrams suggested.

You should know that dict can't be sorted. But python 2.7 & 3.1 have this class collections.OrderedDict.
So,
>>> from collections import OrderedDict
>>> d=OrderedDict({'D':'X','B':'Z','C':'X','A':'Y'})
>>> d
OrderedDict([('A', 'Y'), ('C', 'X'), ('B', 'Z'), ('D', 'X')])
>>> OrderedDict(sorted((d.items()), key=lambda t:(t[1],t[0])))
OrderedDict([('C', 'X'), ('D', 'X'), ('A', 'Y'), ('B', 'Z')])

Python: Analyzing complex statements during execution

I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.
Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)
if A or B and ((C or D and E) or F) or G and H:
and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:
A
B and ((C or D and E) or F)
G and H
And in the second case, I would like to know if it was (C or D and E) or F that evaluated to True and so on...
Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.
Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like
if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
return 'Category2a'
elif ...
So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the ors within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.
Thanks a lot for your help!
Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!

Could you recode your original code:
if A or B and ((C or D and E) or F) or G and H:
as, say:
e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):
...? If so, there's hope!-). The Evaluator class, upon __call__, would compile its string argument, then eval the result with (an empty real dict for globals, and) a pseudo-dict for locals that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python's and and or's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in an X or Y or Z, the first true value (if any) will be the last one looked up, and in a X and Y and Z, the first false one will.
Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for Evaluator would indeed be solving whatever problem it is that you're trying to address!-)
Edit: so here's coding implementing Evaluator and exemplifying its use:
import inspect
import random
class TracingDict(object):
def __init__(self, loc, glob):
self.loc = loc
self.glob = glob
self.vars = []
def __getitem__(self, name):
try: v = self.loc[name]
except KeyError: v = self.glob[name]
self.vars.append((name, v))
return v
class Evaluator(object):
def __init__(self):
f = inspect.currentframe()
f = inspect.getouterframes(f)[1][0]
self.d = TracingDict(f.f_locals, f.f_globals)
def __call__(self, expr):
return eval(expr, {}, self.d)
def f(A, B, C, D, E):
e = Evaluator()
res = e('A or B and ((C or D and E) or F) or G and H')
print 'R=%r from %s' % (res, e.d.vars)
for x in range(20):
A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
f(A, B, C, D, E)
and here's output from a sample run:
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.

As far as I remember, Python does not return True or False per se:
Important exception: the Boolean
operations or and and always return
one of their operands.
The Python Standard Library - Truth Value Testing
Therefore, following is valid:
A = 1
B = 0
result = B or A # result == 1

The Python interpreter doesn't give you a way to introspect the evaluation of an expression at runtime. The sys.settrace() function lets you register a callback that is invoked for every line of source code, but that's too coarse-grained for what you want to do.
That said, I've experimented with a crazy hack to have the function invoked for every bytecode executed: Python bytecode tracing.
But even then, I don't know how to find the execution state, for example, the values on the interpreter stack.
I think the only way to get at what you want is to modify the code algorithmically. You could either transform your source (though you said you didn't want to parse the code), or you could transform the compiled bytecode. Neither is a simple undertaking, and I'm sure there are a dozen difficult hurdles to overcome if you try it.
Sorry to be discouraging...
BTW: What application do you have for this sort of technology?

I would just put something like this before the big statement (assuming the statement is in a class):
for i in ("A","B","C","D","E","F","G","H"):
print i,self.__dict__[i]

"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".
Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?
In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:
part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
do_something()
??
Update:
"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""
So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?
In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the part1 = yadda; part2 = blah; etc nature?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.