Can't override __init__ of class from Cython extension - python

I am trying to subclass pysam's Tabixfile class and add additional attributes on instantiation.
class MyTabixfile(pysam.Tabixfile):
def __init__(self, filename, mode='r', *args, **kwargs):
super().__init__(filename, mode=mode, *args, **kwargs)
self.x = 'foo'
When I try to instantiate my MyTabixfile subclass, I get a TypeError: object.__init__() takes no parameters:
>>> mt = MyTabixfile('actn2-oligos-forward.tsv.gz')
Traceback (most recent call last):
File "<ipython-input-11-553015ac7d43>", line 1, in <module>
mt = MyTabixfile('actn2-oligos-forward.tsv.gz')
File "mytabix.py", line 4, in __init__
super().__init__(filename, mode=mode, *args, **kwargs)
TypeError: object.__init__() takes no parameters
I also tried calling the Tabixfile constructor explicitly:
class MyTabixfile(pysam.Tabixfile):
def __init__(self, filename, mode='r', *args, **kwargs):
pysam.Tabixfile.__init__(self, filename, mode=mode, *args, **kwargs)
self.x = 'foo'
but this still raises TypeError: object.__init__() takes no parameters.
This class is actually implemented in Cython; the constructor code is below:
cdef class Tabixfile:
'''*(filename, mode='r')*
opens a :term:`tabix file` for reading. A missing
index (*filename* + ".tbi") will raise an exception.
'''
def __cinit__(self, filename, mode = 'r', *args, **kwargs ):
self.tabixfile = NULL
self._open( filename, mode, *args, **kwargs )
I read through the Cython documentation on __cinit__ and __init__ which says
Any arguments passed to the constructor will be passed to both the
__cinit__() method and the __init__() method. If you anticipate
subclassing your extension type in Python, you may find it useful to
give the __cinit__() method * and ** arguments so that it can
accept and ignore extra arguments. Otherwise, any Python subclass
which has an __init__() with a different signature will have to
override __new__() 1 as well as __init__(), which the writer of
a Python class wouldn’t expect to have to do.
The pysam developers did take the care to add *args and **kwargs to the Tabixfile.__cinit__ method, and my subclass __init__ matches the signature of __cinit__ so I do not understand why I'm unable to override the initialization of Tabixfile.
I'm developing with Python 3.3.1, Cython v.0.19.1, and pysam v.0.7.5.

The documentation is a little confusing here, in that it assumes that you're familiar with using __new__ and __init__.
The __cinit__ method is roughly equivalent to a __new__ method in Python.*
Like __new__, __cinit__ is not called by your super().__init__; it's called before Python even gets to your subclass's __init__ method. The reason __cinit__ needs to handle the signature of your subclass __init__ methods is the exact same reason __new__ does.
If your subclass does explicitly call super().__init__, that looks for an __init__ method in a superclass—again, like __new__, a __cinit__ is not an __init__. So, unless you've also defined an __init__, it will pass through to object.
You can see the sequence with the following code.
cinit.pyx:
cdef class Foo:
def __cinit__(self, a, b, *args, **kw):
print('Foo.cinit', a, b, args, kw)
def __init__(self, *args, **kw):
print('Foo.init', args, kw)
init.py:
import pyximport; pyximport.install()
import cinit
class Bar(cinit.Foo):
def __new__(cls, *args, **kw):
print('Bar.new', args, kw)
return super().__new__(cls, *args, **kw)
def __init__(self, a, b, c, d):
print('Bar.init', a, b, c, d)
super().__init__(a, b, c, d)
b = Bar(1, 2, 3, 4)
When run, you'll see something like:
Bar.new (1, 2, 3, 4) {}
Foo.cinit 1 2 (3, 4) {}
Bar.init 1 2 3 4
Foo.init (1, 2, 3, 4) {}
So, the right fix here depends on what you're trying to do, but it's one of these:
Add an __init__ method to the Cython base class.
Remove the super().__init__ call entirely.
Change the super().__init__ to not pass any params.
Add an appropriate __new__ method to the Python subclass.
I suspect in this case it's #2 you want.
* It's worth noting that __cinit__ definitely isn't identical to __new__. Instead of getting a cls parameter, you get a partially-constructed self object (where you can trust __class__ and C attributes but not Python attributes or methods), the __new__ methods of all classes in the MRO have already been called before any __cinit__; the __cinit__ of your bases gets called automatically instead of manually; you don't get to return a different object besides the one that's been requested; etc. It's just that it's called before the __init__, and expected to take pass-through parameters, in the same way as __new__ is.

I would have commented rather than posting an answer but I don't have enough StackOverflow foo as yet.
#abarnert's post is excellent and very helpful. I would just add a few pysam specifics here as I have just done subclassing on pysam.AlignmentFile in a very similar way.
Option #4 was the cleanest/easiest choice which meant only changes in my own subclass __new__ to filter out the unknown params:
def __new__(cls, file_path, mode, label=None, identifier=None, *args, **kwargs):
# Suck up label and identifier unknown to pysam.AlignmentFile.__cinit__
return super().__new__(cls, file_path, mode, *args, **kwargs)
It should also be noted that the pysam file classes don't seem to have explicit __init__ method's, so you also need to omit param pass through as that goes straight to object.__init__ which does not accept parameters:
def __init__(self, label=None, identifier=None, *args, **kwargs):
# Handle subclass params/attrs here
# pysam.AlignmentFile doesn't have an __init__ so passes straight through to
# object which doesn't take params. __cinit__ via new takes care of params
super(pysam.AlignmentFile, self).__init__()

Related

Python patching __new__ method

I am trying to patch __new__ method of a class, and it is not working as I expect.
from contextlib import contextmanager
class A:
def __init__(self, arg):
print('A init', arg)
#contextmanager
def patch_a():
new = A.__new__
def fake_new(cls, *args, **kwargs):
print('call fake_new')
return new(cls, *args, **kwargs)
# here I get error: TypeError: object.__new__() takes exactly one argument (the type to instantiate)
A.__new__ = fake_new
try:
yield
finally:
A.__new__ = new
if __name__ == '__main__':
A('foo')
with patch_a():
A('bar')
A('baz')
I expect the following output:
A init foo
call fake_new
A init bar
A init baz
But after call fake_new I get an error (see comment in the code).
For me It seems like I just decorate a __new__ method and propagate all args unchanged.
It doesn't work and the reason is obscure for me.
Also I can write return new(cls) and call A('bar') works fine. But then A('baz') breaks.
Can someone explain what is going on?
Python version is 3.8
You've run into a complicated part of Python object instantiation - in which the language opted for a design that would allow one to create a custom __init__ method with parameters, without having to touch __new__.
However, the in the base of class hierarchy, object, both __new__ and __init__ take one single parameter each.
IIRC, it goes this way: if your class have a custom __init__ and you did not touch __new__ and there are more any parameters to the class instantiation that would be passed to both __init__ and __new__, the parameters will be stripped from the call do __new__, so you don't have to customize it just to swallow the parameters you consume in __init__. The converse is also true: if your class have a custom __new__ with extra parameters, and no custom __init__, these are not passed to object.__init__.
With your design, Python sees a custom __new__ and passes it the same extra arguments that are passed to __init__ - and by using *args, **kw, you forward those to object.__new__ which accepts a single parameter - and you get the error you presented us.
The fix is to not pass those extra parameters to the original __new__ method - unless they are needed there - so you have to make the same check Python's type does when initiating an object.
And an interesting surprise to top it: while making the example work, I found out that even if A.__new__
is deleted when restoring the patch, it is still considered as "touched" by cPython's type instantiation, and the arguments are passed through.
In order to get your code working I needed to leave a permanent stub A.__new__ that will forward only the cls argument:
from contextlib import contextmanager
class A:
def __init__(self, arg):
print('A init', arg)
#contextmanager
def patch_a():
new = A.__new__
def fake_new(cls, *args, **kwargs):
print('call fake_new')
if new is object.__new__:
return new(cls)
return new(cls, *args, **kwargs)
# here I get error: TypeError: object.__new__() takes exactly one argument (the type to instantiate)
A.__new__ = fake_new
try:
yield
finally:
del A.__new__
if new is not object.__new__:
A.__new__ = new
else:
A.__new__ = lambda cls, *args, **kw: object.__new__(cls)
print(A.__new__)
if __name__ == '__main__':
A('foo')
with patch_a():
A('bar')
A('baz')
(I tried inspecting the original __new__ signature instead of the new is object.__new__ comparison - to no avail: object.__new__ signature is *args, **kwargs - possibly made so that it will never fail on static checking)

self lost when using partial inside a decorator

I'm trying to code a method from a class that uses a decorator from another class. The problem is that I need information stored in the Class that contains the decorator (ClassWithDecorator.decorator_param). To achieve that I'm using partial, injecting self as the first argument, but when I do that the self, from the class that uses the decorator " gets lost" somehow and I end up getting an error. Note that this does not happen if I remove partial() from my_decorator() and "self" will be correctly stored inside *args.
See the code sample:
from functools import partial
class ClassWithDecorator:
def __init__(self):
self.decorator_param = "PARAM"
def my_decorator(self, decorated_func):
def my_callable(ClassWithDecorator_instance, *args, **kwargs):
# Do something with decorator_param
print(ClassWithDecorator_instance.decorator_param)
return decorated_func(*args, **kwargs)
return partial(my_callable, self)
decorator_instance = ClassWithDecorator()
class WillCallDecorator:
def __init__(self):
self.other_param = "WillCallDecorator variable"
#decorator_instance.my_decorator
def decorated_method(self):
pass
WillCallDecorator().decorated_method()
I get
PARAM
Traceback (most recent call last):
File "****/decorator.py", line 32, in <module>
WillCallDecorator().decorated_method()
File "****/decorator.py", line 12, in my_callable
return decorated_func(*args, **kwargs)
TypeError: decorated_method() missing 1 required positional argument: 'self'
How can I pass the self corresponding to WillCallDecorator() into decorated_method() but at the same time pass information from its own class to my_callable() ?
It seems that you may want to use partialmethod instead of partial:
From the docs:
class functools.partialmethod(func, /, *args, **keywords)
When func is a non-descriptor callable, an appropriate bound method is created dynamically. This behaves like a normal Python function when used as a method: the self argument will be inserted as the first positional argument, even before the args and keywords supplied to the partialmethod constructor.
So much simpler just to use the self variable you already have. There is absolutely no reason to be using partial or partialmethod here at all:
from functools import partial
class ClassWithDecorator:
def __init__(self):
self.decorator_param = "PARAM"
def my_decorator(self, decorated_func):
def my_callable(*args, **kwargs):
# Do something with decorator_param
print(self.decorator_param)
return decorated_func(*args, **kwargs)
return my_callable
decorator_instance = ClassWithDecorator()
class WillCallDecorator:
def __init__(self):
self.other_param = "WillCallDecorator variable"
#decorator_instance.my_decorator
def decorated_method(self):
pass
WillCallDecorator().decorated_method()
Also, to answer your question about why your code didn't work, when you access something.decorated_method() the code checks whether decorated_method is a function and if so turns it internally into a call WillCallDecorator.decorated_method(something). But the value returned from partial is a functools.partial object, not a function. So the class lookup binding won't happen here.
In more detail, something.method(arg) is equivalent to SomethingClass.method.__get__(something, arg) when something doesn't have an attribute method and its type SomethingClass does have the attribute and the attribute has a method __get__ but the full set of steps for attribute lookup is quite complicated.

Using classes as decorators to decorate a method from another class [duplicate]

Consider this small example:
import datetime as dt
class Timed(object):
def __init__(self, f):
self.func = f
def __call__(self, *args, **kwargs):
start = dt.datetime.now()
ret = self.func(*args, **kwargs)
time = dt.datetime.now() - start
ret["time"] = time
return ret
class Test(object):
def __init__(self):
super(Test, self).__init__()
#Timed
def decorated(self, *args, **kwargs):
print(self)
print(args)
print(kwargs)
return dict()
def call_deco(self):
self.decorated("Hello", world="World")
if __name__ == "__main__":
t = Test()
ret = t.call_deco()
which prints
Hello
()
{'world': 'World'}
Why is the self parameter (which should be the Test obj instance) not passed as first argument to the decorated function decorated?
If I do it manually, like :
def call_deco(self):
self.decorated(self, "Hello", world="World")
it works as expected. But if I must know in advance if a function is decorated or not, it defeats the whole purpose of decorators. What is the pattern to go here, or do I misunderstood something?
tl;dr
You can fix this problem by making the Timed class a descriptor and returning a partially applied function from __get__ which applies the Test object as one of the arguments, like this
class Timed(object):
def __init__(self, f):
self.func = f
def __call__(self, *args, **kwargs):
print(self)
start = dt.datetime.now()
ret = self.func(*args, **kwargs)
time = dt.datetime.now() - start
ret["time"] = time
return ret
def __get__(self, instance, owner):
from functools import partial
return partial(self.__call__, instance)
The actual problem
Quoting Python documentation for decorator,
The decorator syntax is merely syntactic sugar, the following two function definitions are semantically equivalent:
def f(...):
...
f = staticmethod(f)
#staticmethod
def f(...):
...
So, when you say,
#Timed
def decorated(self, *args, **kwargs):
it is actually
decorated = Timed(decorated)
only the function object is passed to the Timed, the object to which it is actually bound is not passed on along with it. So, when you invoke it like this
ret = self.func(*args, **kwargs)
self.func will refer to the unbound function object and it is invoked with Hello as the first argument. That is why self prints as Hello.
How can I fix this?
Since you have no reference to the Test instance in the Timed, the only way to do this would be to convert Timed as a descriptor class. Quoting the documentation, Invoking descriptors section,
In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol: __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.
The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the base classes of type(a) excluding metaclasses.
However, if the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead.
We can make Timed a descriptor, by simply defining a method like this
def __get__(self, instance, owner):
...
Here, self refers to the Timed object itself, instance refers to the actual object on which the attribute lookup is happening and owner refers to the class corresponding to the instance.
Now, when __call__ is invoked on Timed, the __get__ method will be invoked. Now, somehow, we need to pass the first argument as the instance of Test class (even before Hello). So, we create another partially applied function, whose first parameter will be the Test instance, like this
def __get__(self, instance, owner):
from functools import partial
return partial(self.__call__, instance)
Now, self.__call__ is a bound method (bound to Timed instance) and the second parameter to partial is the first argument to the self.__call__ call.
So, all these effectively translate like this
t.call_deco()
self.decorated("Hello", world="World")
Now self.decorated is actually Timed(decorated) (this will be referred as TimedObject from now on) object. Whenever we access it, the __get__ method defined in it will be invoked and it returns a partial function. You can confirm that like this
def call_deco(self):
print(self.decorated)
self.decorated("Hello", world="World")
would print
<functools.partial object at 0x7fecbc59ad60>
...
So,
self.decorated("Hello", world="World")
gets translated to
Timed.__get__(TimedObject, <Test obj>, Test.__class__)("Hello", world="World")
Since we return a partial function,
partial(TimedObject.__call__, <Test obj>)("Hello", world="World"))
which is actually
TimedObject.__call__(<Test obj>, 'Hello', world="World")
So, <Test obj> also becomes a part of *args, and when self.func is invoked, the first argument will be the <Test obj>.
You first have to understand how function become methods and how self is "automagically" injected.
Once you know that, the "problem" is obvious: you are decorating the decorated function with a Timed instance - IOW, Test.decorated is a Timed instance, not a function instance - and your Timed class does not mimick the function type's implementation of the descriptor protocol. What you want looks like this:
import types
class Timed(object):
def __init__(self, f):
self.func = f
def __call__(self, *args, **kwargs):
start = dt.datetime.now()
ret = self.func(*args, **kwargs)
time = dt.datetime.now() - start
ret["time"] = time
return ret
def __get__(self, instance, cls):
return types.MethodType(self, instance, cls)

Implementation of `Exception.__str__()` in Python

I've never fully understood exception handling in Python (or any language to be honest). I was experimenting with custom exceptions, and found the following behaviour.
class MyError(Exception):
def __init__(self, anything):
pass
me = MyError("iiiiii")
print(me)
Output:
iiiiii
I assume that print() calls Exception.__str__().
How does the base class Exception know to print iiiiii? The string "iiiiii" was passed to the constructor of MyError via the argument anything, but anything isn't stored anywhere in MyError at all!
Furthermore, the constructor of MyError does not call its superclass's (Exception's) constructor. So, how did print(me) print iiiiii?
In Python 3, the BaseException class has a __new__ method that stores the arguments in self.args:
>>> me.args
('iiiiii',)
You didn't override the __new__ method, only __init__. You'd need to override both to completely prevent from self.args to be set, as both implementations happily set that attribute:
>>> class MyError(Exception):
... def __new__(cls, *args, **kw):
... return super().__new__(cls) # ignoring args and kwargs!
... def __init__(self, *args, **kw):
... super().__init__() # again ignoring args and kwargs
...
>>> me = MyError("iiiiii")
>>> me
MyError()
>>> print(me)
>>> me.args
()
In Python 2, exceptions do not implement __new__ and your sample would not print anything. See issue #1692335 as to why the __new__ method was added; basically to avoid issues like yours where the __init__ method does not also call super().__init__().
Note that __init__ is not a constructor; the instance is already constructed by that time, by __new__. __init__ is merely the initialiser.

"TypeError: object() takes no parameters" With python2 metaclass converted to python3

I'm converting some code from python2 to python3 and I'm hitting an error with a metaclass.
This is the working python2 code (simplified):
#!/usr/bin/env python2
# test2.py
class Meta(type):
def __new__(mcs, name, bases, clsdict):
new_class = type.__new__(mcs, name, bases, clsdict)
return new_class
class Root(object):
__metaclass__ = Meta
def __init__(self, value=None):
self.value = value
super(Root, self).__init__()
class Sub(Root):
def __init__(self, value=None):
super(Sub, self).__init__(value=value)
def __new__(cls, value=None):
super(Sub, cls).__new__(cls, value)
if __name__ == '__main__':
sub = Sub(1)
And here's the converted python3 code:
#!/usr/bin/env python3
# test3.py
class Meta(type):
def __new__(mcs, name, bases, clsdict):
new_class = type.__new__(mcs, name, bases, clsdict)
return new_class
class Root(object, metaclass=Meta):
def __init__(self, value=None):
self.value = value
super(Root, self).__init__()
class Sub(Root):
def __init__(self, value=None):
super(Sub, self).__init__(value=value)
def __new__(cls, value=None):
super(Sub, cls).__new__(cls, value)
if __name__ == '__main__':
sub = Sub(1)
If I run python2 test2.py, it runs. If I do python3 test3.py, I get
Traceback (most recent call last):
File "test.py", line 21, in <module>
sub = Sub(1)
File "test.py", line 18, in __new__
super(Sub, cls).__new__(cls, value)
TypeError: object() takes no parameters
This isn't a duplicate of the linked question because in that one the asker wasn't invoking a simple class correctly. In this one I have code which worked in python 2 and doesn't work with 2to3 run on it
As described in depth by a comment in the Python 2 source code (as linked by user2357112 in a comment), Python considers it an error if you pass arguments to either object.__new__ or object.__init__ when both __init__ and __new__ have been overridden. If you override just one of those functions, the other one will ignore excess arguments, but if you override them both you're supposed to make sure you only pass on arguments that are appropriate.
In this case, your Root class overrides __init__ but not __new__, so the extra argument that gets passed to the inherited object.__new__ when you create an instance are ignored.
However, in Sub, you're overriding both functions, and Sub.__new__ passes the parameter value on to object.__new__ in its super call. This is where you get an exception.
It's technically an error in Python 2 as well as Python 3, but the Python developers decided that raising an exception in that situation would cause too much old code to break, so Python 2 only issues a warning (which is suppressed by default). Python 3 breaks backwards compatibility in several other ways, so breaking old code for this issue as well is not as big a deal.
Anyway, the proper way to fix your code is either to add a __new__ method to Root that accepts and suppresses the value argument (e.g. it doesn't pass it on to object.__new__), or to change Sub so that it doesn't pass the value to its parent at all (e.g. it just calls super(Sub, cls).__new__(cls)). You might want to think a bit about whether you actually need both __new__ and __init__ methods in Sub, since most classes only need to override one of them.

Categories