Can I "hook" variable reassignment? - python

Python is a language in which the following is possible:
>>> class A(object):
... def __eq__(self, _):
... return True
...
>>> a = A()
>>> a == "slfjghsjdfhgklsfghksjd"
True
>>> a == -87346957234576293847658734659834628572384657346573465
True
>>> a == None
True
>>> a is None
False
Other things, like gt and lt are also "overloadable"1 in this way, and this is a great feature, in my opinion.
I'm curious if the = assignment operator is also "overloadable"1 in a similar kind of fashion, or if I'd have to recompile Python into NotPython to do this.
(As far as I know, classes and objects don't implement some __assign__ method; that's implemented with the C bytecode runner/compiler.)
Specifically, I want to implement an LL(k) parser by iterating on a token list, without using an iterator such that I can change the iterator's index arbitrarily to jump to a given token.
The catch is that on a given cycle of the loop, if I've already arbitrarily modified the index in preparation for the next cycle (something I'll surely do a lot) the last thing I want to do is mess up that variable by adding one to it as if it hadn't been changed.
Probably the easiest, simplest solution to this is to have a flag that gets set on jumps and which is reset to False every cycle, but this can and will introduce tiny hiding bugs I'd like to avoid ahead of time. (See here for what I mean -- this is the pathetic iterative LL(1) parser I'm rewriting, and I want its reincarnation to be somewhat maintainable and/or readable.)
The flag solution:
idx = 0
while True:
jmpd = False
# maybe we jumped, maybe we didn't; how am I to know!?!?
if jmpd == False:
idx += 1
Great! One big drawback: at each possible branch resulting in an arbitrary change in the index, I have to set that flag. Trivial, perhaps (obviously in this example), but to me it seems like a harbinger of unreadable, bug-friendly code.
What's the best way, without using a second variable, to test if a value's changed over time?
If your answer is, "there isn't one, just be quiet and use a flag variable", then I congratulate you for promoting bad code design. /s
1Yeah, I know it's not technically operator overloading; have you a better, more quickly understandable term?

You can use another magic method to intercept assignment of the attribute. It's called __setattr__(). Here is an example of how to use it:
In [2]: class Test(object):
def __setattr__(self, name, value):
print(name, "changed to", value)
super().__setattr__(name, value)
...:
In [3]: t = Test()
In [4]: t.name = 4
name changed to 4
In [5]: t.name = 5
name changed to 5

Related

Why do function attributes (setattr ones) only become available after assigning it as a property to a class and instantiating it?

I apologize if I'm butchering the terminology. I'm trying to understand the code in this example on how to chain a custom function onto a PySpark dataframe. I'd really want to understand exactly what it's doing, and if it is not awful practice before I implement anything.
From the way I'm understanding the code, it:
defines a function g with sub-functions inside of it, that returns a copy of itself
assigns the sub-functions to g as attributes
assigns g as a property of the DataFrame class
I don't think at any step in the process do any of them become a method (when I do getattr, it always says "function")
When I run a (as best as I can do) simplified version of the code (below), it seems like only when I assign the function as a property to a class, and then instantiate at least one copy of the class, do the attributes on the function become available (even outside of the class). I want to understand what and why that is happening.
An answer [here(https://stackoverflow.com/a/17007966/19871699) indicates that this is a behavior, but doesn't really explain what/why it is. I've read this too but I'm having trouble seeing the connection to the code above.
I read here about the setattr part of the code. He doesn't mention exactly the use case above. this post has some use cases where people do it, but I'm not understanding how it directly applies to the above, unless I've missed something.
The confusing part is when the inner attributes become available.
class SampleClass():
def __init__(self):
pass
def my_custom_attribute(self):
def inner_function_one():
pass
setattr(my_custom_attribute,"inner_function",inner_function_one)
return my_custom_attribute
[x for x in dir(my_custom_attribute) if x[0] != "_"]
returns []
then when I do:
SampleClass.custom_attribute = property(my_custom_attribute)
[x for x in dir(my_custom_attribute) if x[0] != "_"]
it returns []
but when I do:
class_instance = SampleClass()
class_instance.custom_attribute
[x for x in dir(my_custom_attribute) if x[0] != "_"]
it returns ['inner_function']
In the code above though, if I do SampleClass.custom_attribute = my_custom_attribute instead of =property(...) the [x for x... code still returns [].
edit: I'm not intending to access the function itself outside of the class. I just don't understand the behavior, and don't like implementing something I don't understand.
So, setattr is not relevant here. This would all work exactly the same without it, say, by just doing my_custom_attribute.inner_function = inner_function_one etc. What is relevant is that the approach in the link you showed (which your example doesn't exactly make clear what the purpose is) relies on using a property, which is a descriptor. But the function won't get called unless you access the attribute corresponding to the property on an instance. This comes down to how property works. For any property, given a class Foo:
Foo.attribute_name = property(some_function)
Then some_function won't get called until you do Foo().attribute_name. That is the whole point of property.
But this whole solution is very confusingly engineered. It relies on the above behavior, and it sets attributes on the function object.
Note, if all you want to do is add some method to your DataFrame class, you don't need any of this. Consider the following example (using pandas for simplicity):
>>> import pandas as pd
>>> def foobar(self):
... print("in foobar with instance", self)
...
>>> pd.DataFrame.baz = foobar
>>> df = pd.DataFrame(dict(x=[1,2,3], y=['a','b','c']))
>>> df
x y
0 1 a
1 2 b
2 3 c
>>> df.baz()
in foobar with instance x y
0 1 a
1 2 b
2 3 c
That's it. You don't need all that rigamarole. Of course, if you wanted to add a nested accessor, df.custom.whatever, you would need something a bit more complicated. You could use the approach in the OP, but I would prefer something more explicit:
import pandas as pd
class AccessorDelegator:
def __init__(self, accessor_type):
self.accessor_type = accessor_type
def __get__(self, instance, cls=None):
return self.accessor_type(instance)
class CustomMethods:
def __init__(self, instance):
self.instance = instance
def foo(self):
# do something with self.instance as if this were your `self` on the dataframe being augmented
print(self.instance.value_counts())
pd.DataFrame.custom = AccessorDelegator(CustomMethods)
df = pd.DataFrame(dict(a=[1,2,3], b=['a','b','c']))
df.foo()
The above will print:
a b
1 a 1
2 b 1
3 c 1
Because when you call a function the attributes within that function aren't returned only the returned value is passed back.
In other words the additional attributes are only available on the returned function and not with 'g' itself.
Try moving setattr() outside of the function.

Assign results of function call in one line in python

How can I assign the results of a function call to multiple variables when the results are stored by name (not index-able), in python.
For example (tested in Python 3),
import random
# foo, as defined somewhere else where we can't or don't want to change it
def foo():
t = random.randint(1,100)
# put in a dummy class instead of just "return t,t+1"
# because otherwise we could subscript or just A,B = foo()
class Cat(object):
x = t
y = t + 1
return Cat()
# METHOD 1
# clearly wrong; A should be 1 more than B; they point to fields of different objects
A,B = foo().x, foo().y
print(A,B)
# METHOD 2
# correct, but requires two lines and an implicit variable
t = foo()
A,B = t.x, t.y
del t # don't really want t lying around
print(A,B)
# METHOD 3
# correct and one line, but an obfuscated mess
A,B = [ (t.x,t.y) for t in (foo(),) ][0]
print(A,B)
print(t) # this will raise an exception, but unless you know your python cold it might not be obvious before running
# METHOD 4
# Conforms to the suggestions in the links below without modifying the initial function foo or class Cat.
# But while all subsequent calls are pretty, but we have to use an otherwise meaningless shell function
def get_foo():
t = foo()
return t.x, t.y
A,B = get_foo()
What we don't want to do
If the results were indexable ( Cat extended tuple/list, we had used a namedtuple, etc.), we could simply write A,B = foo() as indicated in the comment above the Cat class. That's what's recommended here , for example.
Let's assume we have a good reason not to allow that. Maybe we like the clarity of assigning from the variable names (if they're more meaningful than x and y) or maybe the object is not primarily a container. Maybe the fields are properties, so access actually involves a method call. We don't have to assume any of those to answer this question though; the Cat class can be taken at face value.
This question already deals with how to design functions/classes the best way possible; if the function's expected return value are already well defined and does not involve tuple-like access, what is the best way to accept multiple values when returning?
I would strongly recommend either using multiple statements, or just keeping the result object without unpacking its attributes. That said, you can use operator.attrgetter for this:
from operator import attrgetter
a, b, c = attrgetter('a', 'b', 'c')(foo())

How to test if an Enum member with a certain name exists?

Using Python 3.4 I want to test whether an Enum class contains a member with a certain name.
Example:
class Constants(Enum):
One = 1
Two = 2
Three = 3
print(Constants['One'])
print(Constants['Four'])
gives:
Constants.One
File "C:\Python34\lib\enum.py", line 258, in __getitem__
return cls._member_map_[name]
KeyError: 'Four'
I could catch the KeyError and take the exception as indication of existence but maybe there is a more elegant way?
You could use Enum.__members__ - an ordered dictionary mapping names to members:
In [12]: 'One' in Constants.__members__
Out[12]: True
In [13]: 'Four' in Constants.__members__
Out[13]: False
I would say this falls under EAFP (Easier to ask for forgiveness than permission), a concept that is relatively unique to Python.
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.
This contrasts with LBYL (Look before you leap), which is what I think you want when you say you are looking for "a more elegant way."
Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many if statements.
In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping”. For example, the code, if key in mapping: return mapping[key] can fail if another thread removes key from mapping after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach.
Therefore based on the documentation, it is actually better to use try/except blocks for your problem.
TL;DR
Use try/except blocks to catch the KeyError exception.
Could use the following to test if the name exists:
if any(x for x in Constants if x.name == "One"):
# Exists
else:
# Doesn't Exist
Of use x.value to test for the enum value:
if any(x for x in Constants if x.value == 1):
# Exists
else:
# Doesn't Exist
In order to improve legibility, you can put these suggestions above as class method.
For instance:
class Constants(Enum):
One = 1
Two = 2
Three = 3
#classmethod
def has_key(cls, name):
return name in cls.__members__ # solution above 1
# return any(x for x in cls if x.name == name) # or solution above 2
In order to use:
In [6]: Constants.has_key('One')
Out[6]: True
In [7]: Constants.has_key('Four')
Out[7]: False
Reading the source code for the Enum class:
def __call__(cls, value, names=None, *, module=None, qualname=None, type=None, start=1):
"""Either returns an existing member, or creates a new enum class.
So, based on the docstring notes, a Pythonic way of checking membership would be:
from enum import Enum
class TestEnum(Enum):
TEST = 'test'
def enum_contains(enum_type, value):
try:
enum_type(value)
except ValueError:
return False
return True
>>> enum_contains(TestEnum, 'value_doesnt_exist')
False
>>> enum_contains(TestEnum, 'test')
True

Python idiom for applying a function only when value is not None

A function is receiving a number of values that are all strings but need to be parsed in various ways, e.g.
vote_count = int(input_1)
score = float(input_2)
person = Person(input_3)
This is all fine except the inputs can also be None and in this case, instead of parsing the values I would like to end up with None assigned to the left hand side. This can be done with
vote_count = int(input_1) if input_1 is not None else None
...
but this seems much less readable especially with many repeated lines like this one. I'm considering defining a function that simplifies this, something like
def whendefined(func, value):
return func(value) if value is not None else None
which could be used like
vote_count = whendefined(int, input_1)
...
My question is, is there a common idiom for this? Possibly using built-in Python functions? Even if not, is there a commonly used name for a function like this?
In other languages there's Option typing, which is a bit different (solves the problem with a type system), but has the same motivation (what do do about nulls).
In Python there's more of a focus on runtime detection of this kind of thing, so you can wrap the function with an None-detecting guard (rather the data which is what Option typing does).
You could write a decorator that only executes a function if the argument is not None:
def option(function):
def wrapper(*args, **kwargs):
if len(args) > 0 and args[0] is not None:
return function(*args, **kwargs)
return wrapper
You should probably adapt that third line to be more suitable to the kind of data you're working with.
In use:
#option
def optionprint(inp):
return inp + "!!"
>>> optionprint(None)
# Nothing
>>> optionprint("hello")
'hello!!'
and with a return value
#option
def optioninc(input):
return input + 1
>>> optioninc(None)
>>> # Nothing
>>> optioninc(100)
101
or wrap a type-constructing function
>>> int_or_none = option(int)
>>> int_or_none(None)
# Nothing
>>> int_or_none(12)
12
If you can safely treat falsy values (such as 0 and the empty string) as None, you can use boolean and:
vote_count = input_1 and int(input_1)
Since it looks like you're taking strings for input, this might work; you can't turn an empty string to an int or float (or person) anyway. It's not overly readable for some, though the idiom is commonly used in Lua.

How should I best emulate and/or avoid enum's in Python? [duplicate]

This question already has answers here:
How can I represent an 'Enum' in Python?
(43 answers)
Closed 4 years ago.
I've been using a small class to emulate Enums in some Python projects. Is there a better way or does this make the most sense for some situations?
Class code here:
class Enum(object):
'''Simple Enum Class
Example Usage:
>>> codes = Enum('FOO BAR BAZ') # codes.BAZ will be 2 and so on ...'''
def __init__(self, names):
for number, name in enumerate(names.split()):
setattr(self, name, number)
Enums have been proposed for inclusion into the language before, but were rejected (see http://www.python.org/dev/peps/pep-0354/), though there are existing packages you could use instead of writing your own implementation:
enum: http://pypi.python.org/pypi/enum
SymbolType (not quite the same as enums, but still useful): http://pypi.python.org/pypi/SymbolType
Or just do a search
The most common enum case is enumerated values that are part of a State or Strategy design pattern. The enums are specific states or specific optional strategies to be used. In this case, they're almost always part and parcel of some class definition
class DoTheNeedful( object ):
ONE_CHOICE = 1
ANOTHER_CHOICE = 2
YET_ANOTHER = 99
def __init__( self, aSelection ):
assert aSelection in ( self.ONE_CHOICE, self.ANOTHER_CHOICE, self.YET_ANOTHER )
self.selection= aSelection
Then, in a client of this class.
dtn = DoTheNeeful( DoTheNeeful.ONE_CHOICE )
There's a lot of good discussion here.
What I see more often is this, in top-level module context:
FOO_BAR = 'FOO_BAR'
FOO_BAZ = 'FOO_BAZ'
FOO_QUX = 'FOO_QUX'
...and later...
if something is FOO_BAR: pass # do something here
elif something is FOO_BAZ: pass # do something else
elif something is FOO_QUX: pass # do something else
else: raise Exception('Invalid value for something')
Note that the use of is rather than == is taking a risk here -- it assumes that folks are using your_module.FOO_BAR rather than the string 'FOO_BAR' (which will normally be interned such that is will match, but that certainly can't be counted on), and so may not be appropriate depending on context.
One advantage of doing it this way is that by looking anywhere a reference to that string is being stored, it's immediately obvious where it came from; FOO_BAZ is much less ambiguous than 2.
Besides that, the other thing that offends my Pythonic sensibilities re the class you propose is the use of split(). Why not just pass in a tuple, list or other enumerable to start with?
The builtin way to do enums is:
(FOO, BAR, BAZ) = range(3)
which works fine for small sets, but has some drawbacks:
you need to count the number of elements by hand
you can't skip values
if you add one name, you also need to update the range number
For a complete enum implementation in python, see:
http://code.activestate.com/recipes/67107/
I started with something that looks a lot like S.Lott's answer but I only overloaded 'str' and 'eq' (instead of the whole object class) so I could print and compare the enum's value.
class enumSeason():
Spring = 0
Summer = 1
Fall = 2
Winter = 3
def __init__(self, Type):
self.value = Type
def __str__(self):
if self.value == enumSeason.Spring:
return 'Spring'
if self.value == enumSeason.Summer:
return 'Summer'
if self.value == enumSeason.Fall:
return 'Fall'
if self.value == enumSeason.Winter:
return 'Winter'
def __eq__(self,y):
return self.value==y.value
Print(x) will yield the name instead of the value and two values holding Spring will be equal to one another.
>>> x = enumSeason(enumSeason.Spring)
>>> print(x)
Spring
>>> y = enumSeason(enumSeason.Spring)
>>> x == y
True

Categories