Python creates a redundant reference to a class - python

I observe a strange behavior with Python 2.7.12 and Python 3.5.2:
import sys
class Foo:
def __init__(self):
self.b = self.bar
def bar(self):
pass
f = Foo()
print(sys.getrefcount(f) - 1) # subtract the extra reference created by
# passing a reference to sys.getrefcount.
When I run the code in python I get 2 which means that there are two references to the Foo object. The only solution is to remove self.b = self.bar. It seems to create a cycle reference.
Can anyone explain the behavior?
UPDATE:
Here are the run results:
class Foo:
def __init__(self):
self.b = self.bar
def bar(self):
pass
def __del__(self):
print "deleted"
def test_function():
f = Foo()
print f
if __name__ == "__main__":
for i in xrange(5):
test_function()
print "finished"
python ./test.py
<__main__.Foo instance at 0x104797e60>
<__main__.Foo instance at 0x104797ef0>
<__main__.Foo instance at 0x104797f38>
<__main__.Foo instance at 0x104797f80>
<__main__.Foo instance at 0x104797fc8>
finished
As you may see on every iteration python creates a new instance of the class Foo, but never releases them!
UPDATE 2 AND THE ROOT CAUSE
Objects that have del() methods and are part of a reference cycle
cause the entire reference cycle to be uncollectable, including
objects not necessarily in the cycle but reachable only from it.
https://docs.python.org/2/library/gc.html#gc.garbage
Foo class instance has both a __del__ method and the bound method self.b = self.bar which means it is definitely a part of the reference cycle.
Thus, according to py docs the instance is uncollectable!

The extra reference is hidden in the bound method stored in f.b.
print(f.b.__self__) # <__main__.Foo object at 0x000001CD147FEF98>
When storing self.b = self.bar, you create a bound method. A bound method keeps track of the instance to which it is bound. This is what allows it to pass self implicitly.
The same would happen if you were to manually create the bound method.
import sys
class Foo:
def bar(self):
pass
f = Foo()
print(sys.getrefcount(f) - 1) # 1
bound_method = f.bar
print(sys.getrefcount(f) - 1) # 2
So you are correct: instantiating a bound method creates the reference cycle f -> f.b -> f. Although, this is not an issue since Python handles cyclic reference since version 2.0.

Related

Super class appears to wrongly reference properties of derived class

In the following python code (I'm using 3.8), an object of class B derived from class A calls methods bar and foo that access members of the parent class through the super() function. Hence, I would expect the same result as calling bar and foo directly on A. Oddly, what is returned is affected by the parameterization of p of B, which should not happen because A should shielded from its children, shouldn't it?! Here is the code to reproduce:
class A(object):
#property
def p(self):
return 3
def bar(self):
return self.p
def foo(self):
return self.bar()
class B(A):
#property
def p(self):
return 6
def bar(self):
return super().p
def foo(self):
return super().bar()
a, b = A(), B()
print(a.p) # prints 3, OK
print(b.p) # prints 6, OK
print(a.bar()) # prints 3, OK
print(b.bar()) # prints 3, OK, since where accessing super().p
print(a.foo()) # prints 3, OK
print(b.foo()) # prints 6, NOT OK, because we are accessing super().bar() and expect 3
I'm out of my wits here, so if someone could iluminate on the rationale of this behavior and show a way to avoid it, this would be most helpful. Thanks a lot.
Welcome to the intricacies of super!
super() here is a shortcut for super(B, self). It returns a proxy that will look in the class MRO for the class coming before B, so A and super().bar() will actually call:
A.bar(self)
without changing the original b object...
And A.bar(self) is actually... b.p and will give 6
If you are used to other object oriented languages like C++, all happens as if all method in Python were virtual (non final in Java wordings)
super().attr means to look up the attr attribue in the parent. If the attr is a method, it looks up the method's code (instructions to execute). But that does not modify the passed arguments in any way, it just sets the instuctions.
In Python the self is an argument behind the scenes. If c=C(), then c.meth(...) means C.meth(c, ...), i.e. a call of method meth defined in the class C with first argument c (other args follow, if any). The first arg becomes the self argument in the method's implementation. The name self is just a convention, not a special keyword)
Back to the question. Here is a simplified program without properties, it behaves the same:
class A:
P = 3
def bar(self):
return self.P
def foo(self):
return self.bar()
class B(A):
P = 6
def bar(self):
return super().P
def foo(self):
return super().bar()
b.foo() invokes super().bar(), i.e. the bar() in the parent class A. That method contains code that simply returns self.P. But self is b, so the lookup returns 6. (In your original program p is a property that returns 6)

Self Attributes Live in Function Pointer?

Suppose I have a simple python3 class like so:
class Test():
def __init__(self):
self.a = 'a'
def checkIsA(self, checkA):
return self.a == checkA
And some further code such as:
def tester(func, item):
return func(item)
testObject = Test()
print(tester(testObject.checkIsA, 'a')) # prints True
How is the function pointer(?) checkIsA still aware of its class member variables (defined by self) when used independently by another function?
I want to use this functionality in a program I'm writing but I'm worried I'm not understanding these semantics correctly.
testObject.checkIsA is what's called a bound method - it remembers the instance it was taken from so that its self attribute gets automatically populated by that instance.
You can easily check a difference between a class function (unbound method) and an instance method (bound method), Python will quite happily tell you all the details:
testObject = Test()
print(Test.checkIsA) # <function Test.checkIsA at 0x00000000041236A8>
print(testObject.checkIsA) # <bound method Test.checkIsA of
# <__main__.Test object at 0x0000000004109390>>
You can simulate a call to testObject.checkIsA() directly through the unbound Test.checkIsA function by passing your instance, too, e.g.:
def tester(func, instance, item):
return func(instance, item)
testObject = Test()
print(tester(Test.checkIsA, testObject, 'a')) # prints True
Or, with functools.partial:
import functools
def tester(func, item):
return func(item)
testObject = Test()
print(tester(functools.partial(Test.checkIsA, testObject), 'a')) # prints True
And that's exactly what a bound instance method does for you in the back by supplementing the first self attribute with its stored __self__ value. You can check that, too:
testObject = Test()
print(testObject.checkIsA.__self__ is testObject) # True

Keep object reference when class is deleted?

I have a class Foo which is instantiated an indefinite number of times during my program sequence. Like so:
def main():
f = Foo()
while f.run():
del f
f = Foo()
with run() being a method that runs an decisive condition for keeping the program alive.
Now, my Foo class creates on its __init__ method two objects a and b:
Foo class
class Foo:
def __init__(self):
a = A()
b = B(a.var)
I'm looking for a way to a being declared only at the first Foo instantiation and use that same first-instantiated a at the other Foo instantiations.
Problem arises because b depends on a. I thought about a couple solutions - from playing with __new__ and __init__ to override __del__ and global variable as cache - but none of them worked.
note: A needs to be at the same module as Foo
Maybe using a class variable?
class Foo:
a = None
def __init__(self):
if not Foo.a:
Foo.a = A()
b = B(Foo.a.var)
And function B needs to check whether a is None.
If I understand you correctly, you should be able to just make a a class variable.
class Foo:
a = A()
def __init__(self):
b = B(Foo.a.var)
I'm afraid some of your requirements will make Foo extremely difficult to test. Instead, I would suggest that you move some of the dependencies from your constructor to a start class method that would be responsible for creating the initial A instance (at the same module as Foo) and then reusing that instance in a refresh method.
class Foo:
def __init__(self, a, b):
self.a = a
self.b = b
#classmethod
def start(cls):
a = A()
b = B(a.var)
return cls(a, b)
def refresh(self):
b = B(self.a.var)
return self.__class__(self.a, b)
Then, your main function would look something like:
def main():
f = Foo.start()
while f.run():
f = f.refresh()
By overwriting the f variable, you are effectively deleting the reference to the old instance which will eventually be garbage collected.

Monkey patching a #property

Is it at all possible to monkey patch the value of a #property of an instance of a class that I do not control?
class Foo:
#property
def bar(self):
return here().be['dragons']
f = Foo()
print(f.bar) # baz
f.bar = 42 # MAGIC!
print(f.bar) # 42
Obviously the above would produce an error when trying to assign to f.bar. Is # MAGIC! possible in any way? The implementation details of the #property are a black box and not indirectly monkey-patchable. The entire method call needs to be replaced. It needs to affect a single instance only (class-level patching is okay if inevitable, but the changed behaviour must only selectively affect a given instance, not all instances of that class).
Subclass the base class (Foo) and change single instance's class to match the new subclass using __class__ attribute:
>>> class Foo:
... #property
... def bar(self):
... return 'Foo.bar'
...
>>> f = Foo()
>>> f.bar
'Foo.bar'
>>> class _SubFoo(Foo):
... bar = 0
...
>>> f.__class__ = _SubFoo
>>> f.bar
0
>>> f.bar = 42
>>> f.bar
42
from module import ClassToPatch
def get_foo(self):
return 'foo'
setattr(ClassToPatch, 'foo', property(get_foo))
To monkey patch a property, there is an even simpler way:
from module import ClassToPatch
def get_foo(self):
return 'foo'
ClassToPatch.foo = property(get_foo)
Idea: replace property descriptor to allow setting on certain objects. Unless a value is explicitly set this way, original property getter is called.
The problem is how to store the explicitly set values. We cannot use a dict keyed by patched objects, since 1) they are not necessarily comparable by identity; 2) this prevents patched objects from being garbage-collected. For 1) we could write a Handle that wraps objects and overrides comparison semantics by identity and for 2) we could use weakref.WeakKeyDictionary. However, I couldn't make these two work together.
Therefore we use a different approach of storing the explicitly set values on the object itself, using a "very unlikely attribute name". It is of course still possible that this name would collide with something, but that's pretty much inherent to languages such as Python.
This won't work on objects that lack a __dict__ slot. Similar problem would arise for weakrefs though.
class Foo:
#property
def bar (self):
return 'original'
class Handle:
def __init__(self, obj):
self._obj = obj
def __eq__(self, other):
return self._obj is other._obj
def __hash__(self):
return id (self._obj)
_monkey_patch_index = 0
_not_set = object ()
def monkey_patch (prop):
global _monkey_patch_index, _not_set
special_attr = '$_prop_monkey_patch_{}'.format (_monkey_patch_index)
_monkey_patch_index += 1
def getter (self):
value = getattr (self, special_attr, _not_set)
return prop.fget (self) if value is _not_set else value
def setter (self, value):
setattr (self, special_attr, value)
return property (getter, setter)
Foo.bar = monkey_patch (Foo.bar)
f = Foo()
print (Foo.bar.fset)
print(f.bar) # baz
f.bar = 42 # MAGIC!
print(f.bar) # 42
It looks like you need to move on from properties to the realms of data descriptors and non-data descriptors. Properties are just a specialised version of data descriptors. Functions are an example of non-data descriptors -- when you retrieve them from an instance they return a method rather than the function itself.
A non-data descriptor is just an instance of a class that has a __get__ method. The only difference with a data descriptor is that it has a __set__ method as well. Properties initially have a __set__ method that throws an error unless you provide a setter function.
You can achieve what you want really easily just by writing your own trivial non-data descriptor.
class nondatadescriptor:
"""generic nondata descriptor decorator to replace #property with"""
def __init__(self, func):
self.func = func
def __get__(self, obj, objclass):
if obj is not None:
# instance based access
return self.func(obj)
else:
# class based access
return self
class Foo:
#nondatadescriptor
def bar(self):
return "baz"
foo = Foo()
another_foo = Foo()
assert foo.bar == "baz"
foo.bar = 42
assert foo.bar == 42
assert another_foo.bar == "baz"
del foo.bar
assert foo.bar == "baz"
print(Foo.bar)
What makes all this work is that logic under the hood __getattribute__. I can't find the appropriate documentation at the moment, but order of retrieval is:
Data descriptors defined on the class are given the highest priority (objects with both __get__ and __set__), and their __get__ method is invoked.
Any attribute of the object itself.
Non-data descriptors defined on the class (objects with only a __get__ method).
All other attributes defined on the class.
Finally the __getattr__ method of the object is invoked as a last resort (if defined).
You can also patch property setters. Using #fralau 's answer:
from module import ClassToPatch
def foo(self, new_foo):
self._foo = new_foo
ClassToPatch.foo = ClassToPatch.foo.setter(foo)
reference
In case someone needs to patch a property while being able to call the original implementation, here is an example:
#property
def _cursor_args(self, __orig=mongoengine.queryset.base.BaseQuerySet._cursor_args):
# TODO: remove this hack when we upgrade MongoEngine
# https://github.com/MongoEngine/mongoengine/pull/2160
cursor_args = __orig.__get__(self)
if self._timeout:
cursor_args.pop("no_cursor_timeout", None)
return cursor_args
mongoengine.queryset.base.BaseQuerySet._cursor_args = _cursor_args

In a Python class, what is the difference between creating a variable with the self syntax, and creating one without ?

What is the difference between creating a variable using the self.variable syntax and creating one without?
I was testing it out and I can still access both from an instance:
class TestClass(object):
j = 10
def __init__(self):
self.i = 20
if __name__ == '__main__':
testInstance = TestClass()
print testInstance.i
print testInstance.j
However, if I swap the location of the self, it results in an error.
class TestClass(object):
self.j = 10
def __init__(self):
i = 20
if __name__ == '__main__':
testInstance = TestClass()
print testInstance.i
print testInstance.j
>>NameError: name 'self' is not defined
So I gather that self has a special role in initialization.. but, I just don't quite get what it is.
self refers to the current instance of the class. If you declare a variable outside of a function body, you're referring to the class itself, not an instance, and thus all instances of the class will share the same value for that attribute.
In addition, variables declared as part of the class (rather than part of an instance) can be accessed as part of the class itself:
class Foo(object):
a = 1
one = Foo()
two = Foo()
Foo.a = 3
Since this value is class-wide, not only can you read it directly from the class:
print Foo.a # prints 3
But it will also change the value for every instance of the class:
print one.a # prints 3
print two.a # prints 3
Note, however, that this is only the case if you don't override a class variable with an instance variable. For instance, if you created the following:
class Bar(object)
a = 1
def __init__(self):
self.a = 2
and then did the following:
one = Bar()
two = Bar()
two.a = 3
Then you'd get the following results:
print Bar.a # prints "1"
print one.a # prints "2"
print two.a # prints "3"
As noted in the comments, assigning to two.a creates an instance-local entry on that instance, which overrides the a from Bar, hence why Bar.a is still 1 but two.a is 3.
j is a class variable as pointed by Amber. Now, if you come from C++ background, self is akin to the this pointer. While python doesn't deal with pointers, self plays the similar role of referring to current instance of the class.
In the python way, explicit is better than implicit. In C++, the availability of this is conventionally assumed for each class. Python, on the other hand, explicitly passes self as first argument to each of your instance methods.
Hence self is available only inside the scope of your instance methods, making it undefined for the place from which you tried using it.
Since you're made to explicitly pass self to instance methods, you could also call it something else if you want to -
>>> class Foo:
... b = 20
... def __init__(them):
... them.beep = "weee"
...
>>> f = Foo()
>>> f.beep
'weee'

Categories