Python change namespace to access attributes of a class

Python change namespace to access attributes of a class - python

I have a class with a lot of attributes to be set. In order to do so, I do:
opts.a = 1
opts.b = 2
# ...
opts.xyz = 0
After repeatedly writing opts. at the beginning of variables I am wondering: Is it possible to wrap this into a function or context so that the namespace is set to the class attributes so that I don't have to write opts all the time? I.e. that I only have to do something like:
with opts:
a = 1
b = 2
# ...
xyz = 3
I thought of moving the code inside a function of the class, but that doesn't make things easier to read or write, since then I'd need to write self instead of opts everytime.
One side condition for my case: the __setattr__ of the class should be called, since I did override that with a custom function to store the order in which attributes are set.

If I take your question and the comments together, you want to use the class-as-enum approach, but with mutable values, you want a quick way to update multiple attributes, and you want to store the order in which the attributes are set.
Here is a partial implementation that does what I think you want:
class MyClass:
def __init__(self):
self.__dict__['values'] = collections.OrderedDict()
def __setattr__ (self,a,v):
self.__dict__['values'][a]=v
def __getattr__ (self,a):
return self.__dict__['values'][a]
def __repr__(self):
return repr(self.__dict__['values'])
Given this definition you can do:
>>> m = MyClass()
>>> m.z = 56
>>> m.y = 22
>>> m.b = 34
>>> m.c = 12
>>> m
OrderedDict([('z', 56), ('y', 22), ('b', 34), ('c', 12)])
Quick update:
>>> m.values.update(f=2, g=3, h=5)
>>> m
OrderedDict([('z', 56), ('y', 22), ('b', 34), ('c', 12), ('h', 5), ('g', 3), ('f', 2)])
It is true that the relative order of f-g is a bit surprising, but since the update happens in a single statement, there is an argument for saying that the updates are simultaneous and so the relative order is arbitrary.

Related

Generator expressions vs generator functions and surprisingly eager evaluation

For reasons which are not relevant I am combining some data structures in a certain way, whilst also replacing the Python 2.7's default dict with OrderedDict. The data structures use tuples as keys in dictionaries. Please ignore those details (the replacement of the dict type is not useful below, but it is in the real code).
import __builtin__
import collections
import contextlib
import itertools
def combine(config_a, config_b):
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
#contextlib.contextmanager
def dict_as_ordereddict():
dict_orig = __builtin__.dict
try:
__builtin__.dict = collections.OrderedDict
yield
finally:
__builtin__.dict = dict_orig
This works as expected initially (dict can take non-string keyword arguments as a special case):
print 'one level nesting'
with dict_as_ordereddict():
result = combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
)
print list(result)
print
Output:
one level nesting
[{(0, 1): 'a', (4, 5): 'c', (2, 3): 'b', (6, 7): 'd'}]
However, when nesting calls to the combine generator expression, it can be seen that the dict reference is treated as OrderedDict, lacking the special behaviour of dict to use tuples as keyword arguments:
print 'two level nesting'
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting
Traceback (most recent call last):
File "test.py", line 36, in <module>
[{(8, 9): 'e', (10, 11): 'f'}]
File "test.py", line 8, in combine
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
File "test.py", line 8, in <genexpr>
return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
TypeError: __init__() keywords must be strings
Furthermore, implementing via yield instead of a generator expression fixes the problem:
def combine_yield(config_a, config_b):
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
print 'two level nesting, yield'
with dict_as_ordereddict():
result = combine_yield(combine_yield(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print list(result)
print
Output:
two level nesting, yield
[{(0, 1): 'a', (8, 9): 'e', (2, 3): 'b', (4, 5): 'c', (6, 7): 'd', (10, 11): 'f'}]
Questions:
Why does some item (only the first?) from the generator expression get evaluated before required in the second example, or what is it required for?
Why is it not evaluated in the first example? I actually expected this behaviour in both.
Why does the yield-based version work?

Before going into the details note the following: itertools.product evaluates the iterator arguments in order to compute the product. This can be seen from the equivalent Python implementation in the docs (the first line is relevant):
def product(*args, **kwds):
pools = map(tuple, args) * kwds.get('repeat', 1)
...
You can also try this with a custom class and a short test script:
import itertools
class Test:
def __init__(self):
self.x = 0
def __iter__(self):
return self
def next(self):
print('next item requested')
if self.x < 5:
self.x += 1
return self.x
raise StopIteration()
t = Test()
itertools.product(t, t)
Creating the itertools.product object will show in the output that all the iterators items are immediately requested.
This means, as soon as you call itertools.product the iterator arguments are evaluated. This is important because in the first case the arguments are just two lists and so there's no problem. Then you evaluate the final result via list(result after the context manager dict_as_ordereddict has returned and so all calls to dict will be resolved as the normal builtin dict.
Now for the second example the inner call to combine works still fine, now returning a generator expression which is then used as one of the arguments to the second combine's call to itertools.product. As we've seen above these arguments are immediately evaluated and so the generator object is asked to generate its values. In order to do so, it needs to resolve dict. However now we're still inside the context manager dict_as_ordereddict and for that reason dict will be resolved as OrderedDict which doesn't accept non-string keys for keyword arguments.
It is important to notice here that the first version which uses return needs to create the generator object in order to return it. That involves creating the itertools.product object. That means this version is as lazy as itertools.product.
Now to the question why the yield version works. By using yield, invoking the function will return a generator. Now this is a truly lazy version in the sense that execution of the function body doesn't start until items are requested. This means neither the inner nor the outer call to convert will start executing the function body and thus invoking itertools.product until the items are requested via list(result). You can check that by putting an additional print statement inside that function and right behind the context manager:
def combine(config_a, config_b):
print 'start'
# return (dict(first, **second) for first, second in itertools.product(config_a, config_b))
for first, second in itertools.product(config_a, config_b):
yield dict(first, **second)
with dict_as_ordereddict():
result = combine(combine(
[{(0, 1): 'a', (2, 3): 'b'}],
[{(4, 5): 'c', (6, 7): 'd'}]
),
[{(8, 9): 'e', (10, 11): 'f'}]
)
print 'end of context manager'
print list(result)
print
With the yield version we'll notice that it prints the following:
end of context manager
start
start
I.e. the generators are started only when the results are requested via list(result). This is different from the return version (uncomment in the above code). Now you'll see
start
start
and before the end of the context manager is reached the error is already raised.
On a side note, in order for your code to work, the replacement of dict needs to be ineffective (and it is for the first version), so I don't see why you would use that context manager at all. Secondly, dict literals are not ordered in Python 2, and neither are keyword arguments so that also defeats the purpose of using OrderedDict. Also note that in Python 3 that non-string keyword arguments behavior of dict has been removed and the clean way to update dictionaries of any keys is to use dict.update.

Importing module in a function when returning

I've being searching on how to do this, but I could not find if there is a solution. I thought __import__? But I still couldn't manage to figure it out.
For example:
>>> def combs(s = []):
... from itertools import combinations
... return [list(combinations(s, 2))]
...
>>> lst = ["A","B",'C']
>>> print(combs(lst))
[[('A', 'B'), ('A', 'C'), ('B', 'C')]]
>>>
I'm curious if something like this could be done?
def combs(s = []):
return [list(combinations(s, 2))]__import__(itertools, list)

Here is how to achieve a dynamic import in your example:
def combs(s = []):
return list(__import__('itertools').combinations(s, 2))
NB: the python docs for __import__ state that:
This is an advanced function that is not needed in everyday Python programming
Many Pythonistas would prefer an explicit import (as in your original example), and would probably consider excessive use of __import__ to be a bit of a code smell.

Strange Python set and hash behaviour - how does this work?

I have a class called GraphEdge which I would like to be uniquely defined within a set (the built-in set type) by its tail and head members, which are set via __init__.
If I do not define __hash__, I see the following behaviour:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
139731804758160
>>> hash(H)
139731804760784
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('A', 'B')])
The set has no way to know that E and H are the same by my definition, since they have differing hashes (which is what the set type uses to determine uniqueness, to my knowledge), so it adds both as distinct elements. So I define a rather naive hash function for GraphEdge like so:
def __hash__( self ):
return hash( self.tail ) ^ hash( self.head )
Now the above works as expected:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('A', 'B')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B')])
But clearly, ('A', 'B') and ('B', 'A') in this case will return the same hash, so I would expect that I could not add ('B', 'A') to a set already containing ('A', 'B'). But this is not what happens:
>>> E = GraphEdge('A', 'B')
>>> H = GraphEdge('B', 'A')
>>> hash(E)
409150083
>>> hash(H)
409150083
>>> S = set()
>>> S.add(E)
>>> S.add(H)
>>> S
set([('A', 'B'), ('B', 'A')])
So is the set type using the hash or not? If so, how is the last scenario possible? If not, why doesn't the first scenario (no __hash__ defined) work? Am I missing something?
Edit: For reference for future readers, I already had __eq__ defined (also based on tail and head).

You have a hash collision. On hash collision, the set uses the == operator to check on whether or not they are truly equal to each other.

It's important to understand how hash and == work together, because both are used by sets. For two values x and y, the important rule is that:
x == y ==> hash(x) == hash(y)
(x equals y implies that x and y's hashes are equal). But, the inverse is not true: two unequal values can have the same hash.
Sets (and dicts) will use the hash to get an approximation to equality, but will use the real equality operation to figure out if two values are the same or not.

You should always define both __eq__() and __hash__() if you need at least one of them. If the hashes of two objects are equal, an extra __eq__() check is done to verify uniqueness.

Python: Analyzing complex statements during execution

I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.
Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)
if A or B and ((C or D and E) or F) or G and H:
and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:
A
B and ((C or D and E) or F)
G and H
And in the second case, I would like to know if it was (C or D and E) or F that evaluated to True and so on...
Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.
Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like
if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
return 'Category2a'
elif ...
So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the ors within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.
Thanks a lot for your help!
Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!

Could you recode your original code:
if A or B and ((C or D and E) or F) or G and H:
as, say:
e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):
...? If so, there's hope!-). The Evaluator class, upon __call__, would compile its string argument, then eval the result with (an empty real dict for globals, and) a pseudo-dict for locals that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python's and and or's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in an X or Y or Z, the first true value (if any) will be the last one looked up, and in a X and Y and Z, the first false one will.
Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for Evaluator would indeed be solving whatever problem it is that you're trying to address!-)
Edit: so here's coding implementing Evaluator and exemplifying its use:
import inspect
import random
class TracingDict(object):
def __init__(self, loc, glob):
self.loc = loc
self.glob = glob
self.vars = []
def __getitem__(self, name):
try: v = self.loc[name]
except KeyError: v = self.glob[name]
self.vars.append((name, v))
return v
class Evaluator(object):
def __init__(self):
f = inspect.currentframe()
f = inspect.getouterframes(f)[1][0]
self.d = TracingDict(f.f_locals, f.f_globals)
def __call__(self, expr):
return eval(expr, {}, self.d)
def f(A, B, C, D, E):
e = Evaluator()
res = e('A or B and ((C or D and E) or F) or G and H')
print 'R=%r from %s' % (res, e.d.vars)
for x in range(20):
A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
f(A, B, C, D, E)
and here's output from a sample run:
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.

As far as I remember, Python does not return True or False per se:
Important exception: the Boolean
operations or and and always return
one of their operands.
The Python Standard Library - Truth Value Testing
Therefore, following is valid:
A = 1
B = 0
result = B or A # result == 1

The Python interpreter doesn't give you a way to introspect the evaluation of an expression at runtime. The sys.settrace() function lets you register a callback that is invoked for every line of source code, but that's too coarse-grained for what you want to do.
That said, I've experimented with a crazy hack to have the function invoked for every bytecode executed: Python bytecode tracing.
But even then, I don't know how to find the execution state, for example, the values on the interpreter stack.
I think the only way to get at what you want is to modify the code algorithmically. You could either transform your source (though you said you didn't want to parse the code), or you could transform the compiled bytecode. Neither is a simple undertaking, and I'm sure there are a dozen difficult hurdles to overcome if you try it.
Sorry to be discouraging...
BTW: What application do you have for this sort of technology?

I would just put something like this before the big statement (assuming the statement is in a class):
for i in ("A","B","C","D","E","F","G","H"):
print i,self.__dict__[i]

"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".
Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?
In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:
part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
do_something()
??
Update:
"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""
So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?
In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the part1 = yadda; part2 = blah; etc nature?

Python shortcuts

Python is filled with little neat shortcuts.
For example:
self.data = map(lambda x: list(x), data)
and (although not so pretty)
tuple(t[0] for t in self.result if t[0] != 'mysql' and t[0] != 'information_schema')
among countless others.
In the irc channel, they said "too many to know them all".
I think we should list some here, as i love using these shortcuts to shorten & refctor my code. I'm sure this would benefit many.

self.data = map(lambda x: list(x), data)
is dreck -- use
self.data = map(list, data)
if you're a map fanatic (list comprehensions are generally preferred these days). More generally, lambda x: somecallable(x) can always be productively changed to just somecallable, in every context, with nothing but good effect.
As for shortcuts in general, my wife and I did our best to list the most important and useful one in the early part of the Python Cookbook's second edition -- could be a start.

Alex Martelli provided an even shorter version of your first example. I shall provide a (slightly) shorter version of your second:
tuple(t[0] for t in self.result if t[0] not in ('mysql', 'information_schema'))
Obviously the in operator becomes more advantageous the more values you're testing for.
I would also like to stress that shortening and refactoring is good only to the extent that it improves clarity and readability. (Unless you are code-golfing. ;)

I'm not sure if this is a shortcut, but I love it:
>>> class Enum(object):
def __init__(self, *keys):
self.keys = keys
self.__dict__.update(zip(keys, range(len(keys))))
def value(self, key):
return self.keys.index(key)
>>> colors = Enum("Red", "Blue", "Green", "Yellow", "Purple")
>>> colors.keys
('Red', 'Blue', 'Green', 'Yellow', 'Purple')
>>> colors.Green
2
(I don't know who came up with this, but it wasn't me.)

I always liked the "unzip" idiom:
>>> zipped = [('a', 1), ('b', 2), ('c', 3)]
>>> zip(*zipped)
[('a', 'b', 'c'), (1, 2, 3)]
>>>
>>> l,n = zip(*zipped)
>>> l
('a', 'b', 'c')
>>> n
(1, 2, 3)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.