How to handle the pylint message: Warning: Method could be a function - python

I have a python class and ran pylint against it. One message it gave was:
Warning: Method could be a function
Is this telling me that it would be better to move this method out of the class because it doesn't use any instance variables?
In C# I would make this a static method. What's the most pythonic thing to do here?

Moving it to a function is common, if it doesn't touch the class at all.
If it manipulates class attributes, use the classmethod decorator:
#classmethod
def spam(cls, ...):
# cls is the class, you can use it to get class attributes
classmethod and staticmethod (which is the same as the former, except that the method doesn't get a reference to the class from its first parameter) have been introduced quite recently.
It means that some Python programmers are used to avoid static and class methods.
Some hardcore Python programmers will tell you that these decorators just complicate things; some other people (usually former C# or Java programmers) will tell you that using a function isn't object-oriented enough.
I think it's just a matter of preference.

Related

What is "static method" in python

I'm quite new to python, and could not understand what is static method in python(for example __new__()) and what does it do. Can anyone possibly explain it? Thanks a million
Have you already read this?
https://en.wikipedia.org/wiki/Method_(computer_programming)
Especially this?
https://en.wikipedia.org/wiki/Method_(computer_programming)#Static_methods
Explanation
In OOP you define classes that you later on instantiate. A class is nothing more than a blueprint: Once you instantiate objects from a class your object will follow exactly the blueprint of your class. That means: If you define a field named "abc" in your class you will later on have a field "abc" in your object. If you define a method "foo()" in your class, you will later on have a method "foo()" to be invoked on your object.
Please note that this "on your object" is essential: You always instantiate a class and then you can invoke the method. This is the "normal" way.
A static method is different. While a normal method always requires to have an instance (where you then can invoke this method at) a static method does not. A static method exists independently from your instances (that's why it is named "static"). So a static method is associated with your class definition itself and therefore is always there and therefore can be invoked only at your class itself. It is completely independent from all instances.
That's a static method.
Python's implementation is a bit ... well ... simple. In details there are deviations from this description above. But that does not make any difference: To be in line with OOP concepts you always should use methods exactly as described above.
Example
Let's give you an example:
class FooBar:
def someMethod(self):
print("abc")
This is a regular (instance) method. You use it like this:
myObj = FooBar()
myObj.someMethod()
If you have ...
myObjB = FooBar()
myObjB.someMethod()
... you have an additional instance and therefore invoking someMethod() on this second instance will be the invocation of a second someMethod() method - defined at the second object. This is because you instantiate objects before use so all instances follow the blueprint FooBar defined. All instances therefore receive some kind of copy of someMethod().
(In practice Python will use optimizations internally, so there actually is only one piece of code that implements your someMethod() in memory, but forget about this for now. To a programmer it appears as that every instance of a class will have a copy of the method someMethod(). And that's the level of abstraction that is relevant to us as this is the "surface" we work on. Deep within the implementation of a programming or script language things might be a bit different but this is not very relevant.)
Let's have a look at a static method:
class FooBar:
#staticmethod
def someStaticMethod():
print("abc")
Such static methods can be invoked like this:
FooBar.someStaticMethod()
As you can see: No instance. You directly invoke this method in the context of the class itself. While regular methods work on the particular instance itself - they typically modify data within this instance itself - a class method does not. It could modify static (!) data, but typically it does not anyway.
Consider a static method a special case. It is rarely needed. What you typically want if you write code is not to implement a static method. Only in very specific situations it makes sense to implement a static method.
The self parameter
Please note that a standard "instance" method always must have self as a first argument. This is a python specific. In the real world Python will (of course!) store your method only once in memory, even if you instantiate thousands of objects of your class. Consider this an optimization. If you then invoke your method on one of your thousands of instances, always the same single piece of code is called. But for it to distinguish on which particular object the code of the method should work on your instance is passed to this internally stored piece of code as the very first argument. This is the self argument. It is some kind of implicit argument and always needed for regular (instance) methods. (Not: static methods - there you don't need an instance to invoke them).
As this argument is implicit and always needed most programming languages hide it to the programmer (and handle this internally - under the hood - in the correct way). It does not really make much sense to expose this special argument anyway.
Unfortunately Python does not follow this principle. Python does not hide this argument which is implicitly required. (Python's incorporation of OOP concepts is a bit ... simple.) Therefore you see self almost everywhere in methods. In your mind you can ignore it, but you need to write it explicitly if you define your own classes. (Which is something you should do in order to structure your programs in a good way.)
The static method __new__()
Python is quite special. While regular programming languages follow a strict and immutable concept of how to create instances of particular classes, Python is a bit different here. This behavior can be changed. This behavior is implemented in __new__(). So if you do this ...
myObj = FooBar()
... Python implicitly invokes FooBar.__new__() which in turn invokes a constructor-like (instance) method named __init__() that you could (!) define in your class (as an instance method) and then returns the fully initialized instance. This instance is then what is stored in myObj in this example her.
You could modify this behavior if you want. But this would requires a very very very particularly unusual use case. You will likely never have anything to do with __new__() itself in your entire work with Python. My advice: If you're somehow new to Python just ignore it.

How to use implementation inheritance?

How to use implementation inheritance in Python, that is to say public attributes x and protected attributes _x of the implementation inherited base classes becoming private attributes __x of the derived class?
In other words, in the derived class:
accessing the public attribute x or protected attribute _x should look up x or _x respectively like usual, except it should skip the implementation inherited base classes;
accessing the private attribute __x should look up __x like usual, except it should look up x and _x instead of __x for the implementation inherited base classes.
In C++, implementation inheritance is achieved by using the private access specifier in the base class declarations of a derived class, while the more common interface inheritance is achieved by using the public access specifier:
class A: public B, private C, private D, public E { /* class body */ };
For instance, implementation inheritance is needed to implement the class Adapter design pattern which relies on class inheritance (not to be confused with the object Adapter design pattern which relies on object composition) and consists in converting the interface of an Adaptee class into the interface of a Target abstract class by using an Adapter class that inherits both the interface of the Target abstract class and the implementation of the Adaptee class (cf. the Design Patterns book by Erich Gamma et al.):
Here is a Python program specifying what is intended, based on the above class diagram:
import abc
class Target(abc.ABC):
#abc.abstractmethod
def request(self):
raise NotImplementedError
class Adaptee:
def __init__(self):
self.state = "foo"
def specific_request(self):
return "bar"
class Adapter(Target, private(Adaptee)):
def request(self):
# Should access self.__state and Adaptee.specific_request(self)
return self.__state + self.__specific_request()
a = Adapter()
# Test 1: the implementation of Adaptee should be inherited
try:
assert a.request() == "foobar"
except AttributeError:
assert False
# Test 2: the interface of Adaptee should NOT be inherited
try:
a.specific_request()
except AttributeError:
pass
else:
assert False
You don't want to do this. Python is not C++, nor is C++ Python. How classes are implemented is completely different and so will lead to different design patterns. You do not need to use the class adapter pattern in Python, nor do you want to.
The only practical way to implement the adapter pattern in Python is either by using composition, or by subclassing the Adaptee without hiding that you did so.
I say practical here because there are ways to sort of make it work, but this path would take a lot of work to implement and is likely to introduce hard to track down bugs, and would make debugging and code maintenance much, much harder. Forget about 'is it possible', you need to worry about 'why would anyone ever want to do this'.
I'll try to explain why.
I'll also tell you how the impractical approaches might work. I'm not actually going to implement these, because that's way too much work for no gain, and I simply don't want to spend any time on that.
But first we have to clear several misconceptions here. There are some very fundamental gaps in your understanding of Python and how it's model differs from the C++ model: how privacy is handled, and compilation and execution philosophies, so lets start with those:
Privacy models
First of all, you can't apply C++'s privacy model to Python, because Python has no encapsulation privacy. At all. You need to let go of this idea, entirely.
Names starting with a single underscore are not actually private, not in the way C++ privacy works. Nor are they 'protected'. Using an underscore is just a convention, Python does not enforce access control. Any code can access any attribute on instances or classes, whatever naming convention was used. Instead, when you see a name that start with an underscore you can assume that the name is not part of the conventions of a public interface, that is, that these names can be changed without notice or consideration for backwards compatibility.
Quoting from the Python tutorial section on the subject:
“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.
It's a good convention, but not even something you can rely on, consistently. E.g. the collections.namedtuple() class generator generates a class with 5 different methods and attributes that all start with an underscore but are all meant to be public, because the alternative would be to place arbitrary restrictions on what attribute names you can give the contained elements, and making it incredibly hard to add additional methods in future Python versions without breaking a lot of code.
Names starting with two underscores (and none at the end), are not private either, not in a class encapsulation sense such as the C++ model. They are class-private names, these names are re-written at compile time to produce a per-class namespace, to avoid collisions.
In other words, they are used to avoid a problem very similar to the namedtuple issue described above: to remove limits on what names a subclass can use. If you ever need to design base classes for use in a framework, where subclasses should have the freedom to name methods and attributes without limit, that's where you use __name class-private names. The Python compiler will rewrite __attribute_name to _ClassName__attribute_name when used inside a class statement as well as in any functions that are being defined inside a class statement.
Note that C++ doesn't use names to indicate privacy. Instead, privacy is a property of each identifier, within a given namespace, as processed by the compiler. The compiler enforces access control; private names are not accessible and will lead to compilation errors.
Without a privacy model, your requirement where "public attributes x and protected attributes _x of the implementation inherited base classes becoming private attributes __x of the derived class" are not attainable.
Compilation and execution models
C++
C++ compilation produces binary machine code aimed at execution directly by your CPU. If you want to extend a class from another project, you can only do so if you have access to additional information, in the form of header files, to describe what API is available. The compiler combines information in the header files with tables stored with the machine code and your source code to build more machine code; e.g. inheritance across library boundaries is handled through virtualisation tables.
Effectively, there is very little left of the objects used to construct the program with. You generally don't create references to class or method or function objects, the compiler has taken those abstract ideas as inputs but the output produced is machine code that doesn't need most of those concepts to exist any more. Variables (state, local variables in methods, etc.) are stored either on the heap or on the stack, and the machine code accesses these locations directly.
Privacy is used to direct compiler optimisations, because the compiler can, at all times, know exactly what code can change what state. Privacy also makes virtualisation tables and inheritance from 3rd-party libraries practical, as only the public interface needs to be exposed. Privacy is an efficiency measure, primarily.
Python
Python, on the other hand, runs Python code using a dedicated interpreter runtime, itself a piece of machine code compiled from C code, which has a central evaluation loop that takes Python-specific op-codes to execute your code. Python source code is compiled into bytecode roughly at the module and function levels, stored as a nested tree of objects.
These objects are fully introspectable, using a common model of attributes, sequences and mappings. You can subclass classes without having to have access to additional header files.
In this model, a class is an object with references to base classes, as well as a mapping of attributes (which includes any functions which become bound methods through access on instances). Any code to be executed when a method is called on an instance is encapsulated in code objects attached to function objects stored in the class attribute mapping. The code objects are already compiled to bytecode, and interaction with other objects in the Python object model is through runtime lookups of references, with the attribute names used for those lookups stored as constants within the compiled bytecode if the source code used fixed names.
From the point of view of executing Python code, variables (state and local variables) live in dictionaries (the Python kind, ignoring the internal implementation as hash maps) or, for local variables in functions, in an array attached to the stack frame object. The Python interpreter translates access to these to access to values stored on the heap.
This makes Python slow, but also much more flexible when executing. You can not only introspect the object tree, most of the tree is writeable letting you replace objects at will and so change how the program behaves in nearly limitless ways. And again, there are no privacy controls enforced.
Why use class adapters in C++, and not in Python
My understanding is that experienced C++ coders will use a class adapter (using subclassing) over an object adapter (using composition), because they need to pass compiler-enforced type checks (they need to pass the instances to something that requires the Target class or a subclass thereof), and they need to have fine control over object lifetimes and memory footprints. So, rather than have to worry about the lifetime or memory footprint of an encapsulated instance when using composition, subclassing gives you more complete control over the instance lifetime of your adapter.
This is especially helpful when it might not be practical or even possible to alter the implementation of how the adaptee class would control instance lifetime. At the same time, you wouldn't want to deprive the compiler from optimisation opportunities offered by private and protected attribute access. A class that exposes both the Target and Adaptee interfaces offers fewer options for optimisation.
In Python you almost never have to deal with such issues. Python's object lifetime handling is straightforward, predictable and works the same for every object anyway. If lifetime management or memory footprints were to become an issue you'd probably already be moving the implementation to an extension language like C++ or C.
Next, most Python APIs do not require a specific class or subclass. They only care about the right protocols, that is, if the right methods and attributes are implemented. As long as your Adapter has the right methods and attributes, it'll do fine. See Duck Typing; if your adapter walks like a duck, and talks like a duck, it surely must be a duck. It doesn't matter if that same duck can also bark like a dog.
The practical reasons why you don't do this in Python
Let's move to practicalities. We'll need to update your example Adaptee class to make it a bit more realistic:
class Adaptee:
def __init__(self, arg_foo=42):
self.state = "foo"
self._bar = arg_foo % 17 + 2 * arg_foo
def _ham_spam(self):
if self._bar % 2 == 0:
return f"ham: {self._bar:06d}"
return f"spam: {self._bar:06d}"
def specific_request(self):
return self._ham_spam()
This object not only has a state attribute, it also has a _bar attribute and a private method _ham_spam.
Now, from here on out I'm going to ignore the fact that your basic premise is flawed because there is no privacy model in Python, and instead re-interpret your question as a request to rename the attributes.
For the above example that would become:
state -> __state
_bar -> __bar
_ham_spam -> __ham_spam
specific_request -> __specific_request
You now have a problem, because the code in _ham_spam and specific_request has already been compiled. The implementation for these methods expects to find _bar and _ham_spam attributes on the self object passed in when called. Those names are constants in their compiled bytecode:
>>> import dis
>>> dis.dis(Adaptee._ham_spam)
8 0 LOAD_FAST 0 (self)
2 LOAD_ATTR 0 (_bar)
4 LOAD_CONST 1 (2)
6 BINARY_MODULO
# .. etc. remainder elided ..
The LOAD_ATTR opcode in the above Python bytecode disassembly excerpt will only work correctly if the local variable self has an attribute named _bar.
Note that self can be bound to an instance of Adaptee as well as of Adapter, something you'd have to take into account if you wanted to change how this code operates.
So, it is not enough to simply rename method and attribute names.
Overcoming this problem would require one of two approaches:
intercept all attribute access on both the class and instance levels to translate between the two models.
rewriting the implementations of all methods
Neither of these is a good idea. Certainly neither of them are going to be more efficient or practical, compared to creating a composition adapter.
Impractical approach #1: rewrite all attribute access
Python is dynamic, and you could intercept all attribute access on both the class and the instance levels. You need both, because you have a mix of class attributes (_ham_spam and specific_request), and instance attributes (state and _bar).
You can intercept instance-level attribute access by implementing all methods in the Customizing attribute access section (you don't need __getattr__ for this case). You'll have to be very careful, because you'll need access to various attributes of your instances while controlling access to those very attributes. You'll need to handle setting and deleting as well as getting. This lets you control most attribute access on instances of Adapter().
You would do the same at the class level by creating a metaclass for whatever class your private() adapter would return, and implementing the exact same hook methods for attribute access there. You'll have to take into account that your class can have multiple base classes, so you'd need to handle these as layered namespaces, using their MRO ordering. Attribute interactions with the Adapter class (such as Adapter._special_request to introspect the inherited method from Adaptee) will be handled at this level.
Sounds easy enough, right? Except than the Python interpreter has many optimisations to ensure it isn't completely too slow for practical work. If you start intercepting every attribute access on instances, you will kill a lot of these optimisations (such as the method call optimisations introduced in Python 3.7). Worse, Python ignores the attribute access hooks for special method lookups.
And you have now injected a translation layer, implemented in Python, invoked multiple times for every interaction with the object. This will be a performance bottleneck.
Last but not least, to do this in a generic way, where you can expect private(Adaptee) to work in most circumstances is hard. Adaptee could have other reasons to implement the same hooks. Adapter or a sibling class in the hierarchy could also be implementing the same hooks, and implement them in a way that means the private(...) version is simply bypassed.
Invasive all-out attribute interception is fragile and hard to get right.
Impractical approach #2: rewriting the bytecode
This goes down the rabbit hole quite a bit further. If attribute rewriting isn't practical, how about rewriting the code of Adaptee?
Yes, you could, in principle, do this. There are tools available to directly rewrite bytecode, such as codetransformer. Or you could use the inspect.getsource() function to read the on-disk Python source code for a given function, then use the ast module to rewrite all attribute and method access, then compile the resulting updated AST to bytecode. You'd have to do so for all methods in the Adaptee MRO, and produce a replacement class dynamically that'll achieve what you want.
This, again, is not easy. The pytest project does something like this, they rewrite test assertions to provide much more detailed failure information than otherwise possible. This simple feature requires a 1000+ line module to achieve, paired with a 1600-line test suite to ensure that it does this correctly.
And what you've then achieved is bytecode that doesn't match the original source code, so anyone having to debug this code will have to deal with the fact that the source code the debugger sees doesn't match up with what Python is executing.
You'll also lose the dynamic connection with the original base class. Direct inheritance without code rewriting lets you dynamically update the Adaptee class, rewriting the code forces a disconnect.
Other reason these approaches can't work
I've ignored a further issue that neither of the above approaches can solve. Because Python doesn't have a privacy model, there are plenty of projects out there where code interacts with class state directly.
E.g., what if your Adaptee() implementation relies on a utility function that will try to access state or _bar directly? It's part of the same library, the author of that library would be well within their rights to assume that accessing Adaptee()._bar is safe and normal. Neither attribute intercepting nor code rewriting will fix this issue.
I also ignored the fact that isinstance(a, Adaptee) will still return True, but if you have hidden it's public API by renaming, you have broken that contract. For better or worse, Adapter is a subclass of Adaptee.
TLDR
So, in summary:
Python has no privacy model. There is no point in trying to enforce one here.
The practical reasons that necessitate the class adapter pattern in C++, don't exist in Python
Neither dynamic attribute proxying nor code tranformation is going to be practical in this case and introduce more problems than are being solved here.
You should instead use composition, or just accept that your adapter is both a Target and an Adaptee and so use subclassing to implement the methods required by the new interface without hiding the adaptee interface:
class CompositionAdapter(Target):
def __init__(self, adaptee):
self._adaptee = adaptee
def request(self):
return self._adaptee.state + self._adaptee.specific_request()
class SubclassingAdapter(Target, Adaptee):
def request(self):
return self.state + self.specific_request()
Python doesn't have a way of defining private members like you've described (docs).
You could use encapsulation instead of inheritance and call the method directly, as you noted in your comment. This would be my preferred approach, and it feels the most "pythonic".
class Adapter(Target):
def request(self):
return Adaptee.specific_request(self)
In general, Python's approach to classes is much more relaxed than what is found in C++. Python supports duck-typing, so there is no requirement to subclass Adaptee, as long as the interface of Target is satisfied.
If you really want to use inheritance, you could override interfaces you don't want exposed to raise an AttributeError, and use the underscore convention to denote private members.
class Adaptee:
def specific_request(self):
return "foobar"
# make "private" copy
_specific_request = specific_request
class Adapter(Target, Adaptee):
def request(self):
# call "private" implementation
return self._specific_request()
def specific_request(self):
raise AttributeError()
This question has more suggestions if you want alternatives for faking private methods.
If you really wanted true private methods, you could probably implement a metaclass that overrides object.__getattribute__. But I wouldn't recommend it.

Python: is there a use case for changing an instance's class?

Related: Python object conversion
I recently learned that Python allows you to change an instance's class like so:
class Robe:
pass
class Dress:
pass
r = Robe()
r.__class__ = Dress
I'm trying to figure out whether there is a case where 'transmuting' an object like this can be useful. I've messed around with this in IDLE, and one thing I've noticed is that assigning a different class doesn't call the new class's __init__ method, though this can be done explicitly if needed.
Virtually every use case I can think of would be better served by composition, but I'm a coding newb so what do I know. ;)
There is rarely a good reason to do this for unrelated classes, like Robe and Dress in your example. Without a bit of work, it's hard to ensure that the object you get in the end is in a sane state.
However, it can be useful when inheriting from a base class, if you want to use a non-standard factory function or constructor to build the base object. Here's an example:
class Base(object):
pass
def base_factory():
return Base() # in real code, this would probably be something opaque
def Derived(Base):
def __new__(cls):
self = base_factory() # get an instance of Base
self.__class__ = Derived # and turn it into an instance of Derived
return self
In this example, the Derived class's __new__ method wants to construct its object using the base_factory method which returns an instance of the Base class. Often this sort of factory is in a library somewhere, and you can't know for certain how it's making the object (you can't just call Base() or super(Derived, cls).__new__(cls) yourself to get the same result).
The instance's __class__ attribute is rewritten so that the result of calling Derived.__new__ will be an instance of the Derived class, which ensures that it will have the Derived.__init__ method called on it (if such a method exists).
I remember using this technique ages ago to “upgrade” existing objects after recognizing what kind of data they hold. It was a part of an experimental XMPP client. XMPP uses many short XML messages (“stanzas”) for communication.
When the application received a stanza, it was parsed into a DOM tree. Then the application needed to recognize what kind of stanza it is (a presence stanza, message, automated query etc.). If, for example, it was recognized as a message stanza, the DOM object was “upgraded” to a subclass that provided methods like “get_author”, “get_body” etc.
I could of course just make a new class to represent a parsed message, make new object of that class and copy the relevant data from the original XML DOM object. There were two benefits of changing object's class in-place, though. Firstly, XMPP is a very extensible standard, and it was useful to still have an easy access to the original DOM object in case some other part of the code found something useful there, or while debugging. Secondly, profiling the code told me that creating a new object and explicitly copying data is much slower than just reusing the object that would be quickly destroyed anyway—the difference was enough to matter in XMPP, which uses many short messages.
I don't think any of these reasons justifies the use of this technique in production code, unless maybe you really need the (not that big) speedup in CPython. It's just a hack which I found useful to make code a bit shorter and faster in the experimental application. Note also that this technique will easily break JIT engines in non-CPython implementations, making the code much slower!

Is this normal behaviour for an OO language?

I've defined this class:
class RequiredFormSet(BaseFormSet):
def __init__(self, *args, **kwargs):
super(RequiredFormSet, self).__init__(*args, **kwargs)
And overridden this method:
def total_form_count(self):
return self._total_form_count
It so happens that super(...).__init__ uses total_form_count() somewhere in that function. It's calling my function rather than the one defined in the base class.
I guess I thought because I called super() it would use its own stuff, but apparently in Python that's not true. Is this the way it works in other languages, like C#? If I call the base constructor, it will still call all the derived functions from there?
Yes, this is typical OOP behavior (polymorphism) to have subclass methods invoked by dynamic dispatch. This is part of the reason why C# requires the programmer to mark an overridable method as virtual. I'm sure you're familiar with this notion in general, and the surprise mainly comes from the fact that this is happening in a constructor.
As you have observed, this can be very dangerous in constructors because a superclass's constructor can invoke a subclass's method that may rely on properties initialized in the subclass's constructor. This problem is explicitly noted in Effective Java, and you can read more about it here: What's wrong with overridable method calls in constructors?
This is normal behavior. Note the first argument: self. That's a reference to the object the methods are being called on, so even when you call a superclass method, any overridden methods that method calls will be the subclass methods.
The only way I know of to force it to use a superclass method is with an unbound reference, ie SuperClass.overridenMethod(self, param1, param2)...
This is normal behaviour in python, C# has the same behaviour (for virtual functions), C++ has not (some people consider that as a design flaw of C++). In C++, it makes a difference if you call an overriden virtual function from the constructor or from another member function. The reason for that is that when the constructor of a superclass runs, the V-table is not complete.
Have you heard of polymorphism? If not, you have no idea what OOP is about and should look it up.
The self the base class constructor uses is of course an instance of the derived class (the same self), so when it calls self.m(), the implementation of m is dispatched dynamically. Some OO languages require explicit annotation of methods that are dispatched dynamically (virtual keyword) - although (as #Doc Brown pointed out) it doesn't work in constructors specifically in C++ - while others make it the default. Anyway, polymorphism is an essential part of OOP and although it's possible in some languages to get static dispatch, polymorphism is the only option in many languages and the generally preferred way in all others. So yes, this is the normal behaviour.
In C++, you can choose between the two behaviors based on whether total_form_count is declared virtual or not. But in Python, all methods behave like virtual: the object will always use the methods from the actual type of the object.
In C++ the constructor is a kind of special case, the calls to virtual methods result in calling the methods of the class itself (not the one at the leaf of the virtual chain).
Yes, it's a behaviour called polymorphism. Method lookup in every OOP language is done from the class of the object instance, not the one where the code resides.

Which is more pythonic, factory as a function in a module, or as a method on the class it creates?

I have some Python code that creates a Calendar object based on parsed VEvent objects from and iCalendar file.
The calendar object just has a method that adds events as they get parsed.
Now I want to create a factory function that creates a calendar from a file object, path, or URL.
I've been using the iCalendar python module, which implements a factory function as a class method directly on the Class that it returns an instance of:
cal = icalendar.Calendar.from_string(data)
From what little I know about Java, this is a common pattern in Java code, though I seem to find more references to a factory method being on a different class than the class you actually want to instantiate instances from.
The question is, is this also considered Pythonic ? Or is it considered more pythonic to just create a module-level method as the factory function ?
[Note. Be very cautious about separating "Calendar" a collection of events, and "Event" - a single event on a calendar. In your question, it seems like there could be some confusion.]
There are many variations on the Factory design pattern.
A stand-alone convenience function (e.g., calendarMaker(data))
A separate class (e.g., CalendarParser) which builds your target class (Calendar).
A class-level method (e.g. Calendar.from_string) method.
These have different purposes. All are Pythonic, the questions are "what do you mean?" and "what's likely to change?" Meaning is everything; change is important.
Convenience functions are Pythonic. Languages like Java can't have free-floating functions; you must wrap a lonely function in a class. Python allows you to have a lonely function without the overhead of a class. A function is relevant when your constructor has no state changes or alternate strategies or any memory of previous actions.
Sometimes folks will define a class and then provide a convenience function that makes an instance of the class, sets the usual parameters for state and strategy and any other configuration, and then calls the single relevant method of the class. This gives you both the statefulness of class plus the flexibility of a stand-alone function.
The class-level method pattern is used, but it has limitations. One, it's forced to rely on class-level variables. Since these can be confusing, a complex constructor as a static method runs into problems when you need to add features (like statefulness or alternative strategies.) Be sure you're never going to expand the static method.
Two, it's more-or-less irrelevant to the rest of the class methods and attributes. This kind of from_string is just one of many alternative encodings for your Calendar objects. You might have a from_xml, from_JSON, from_YAML and on and on. None of this has the least relevance to what a Calendar IS or what it DOES. These methods are all about how a Calendar is encoded for transmission.
What you'll see in the mature Python libraries is that factories are separate from the things they create. Encoding (as strings, XML, JSON, YAML) is subject to a great deal of more-or-less random change. The essential thing, however, rarely changes.
Separate the two concerns. Keep encoding and representation as far away from state and behavior as you can.
It's pythonic not to think about esoteric difference in some pattern you read somewhere and now want to use everywhere, like the factory pattern.
Most of the time you would think of a #staticmethod as a solution it's probably better to use a module function, except when you stuff multiple classes in one module and each has a different implementation of the same interface, then it's better to use a #staticmethod
Ultimately weather you create your instances by a #staticmethod or by module function makes little difference.
I'd probably use the initializer ( __init__ ) of a class because one of the more accepted "patterns" in python is that the factory for a class is the class initialization.
IMHO a module-level method is a cleaner solution. It hides behind the Python module system that gives it a unique namespace prefix, something the "factory pattern" is commonly used for.
The factory pattern has its own strengths and weaknesses. However, choosing one way to create instances usually has little pragmatic effect on your code.
A staticmethod rarely has value, but a classmethod may be useful. It depends on what you want the class and the factory function to actually do.
A factory function in a module would always make an instance of the 'right' type (where 'right' in your case is the 'Calendar' class always, but you might also make it dependant on the contents of what it is creating the instance out of.)
Use a classmethod if you wish to make it dependant not on the data, but on the class you call it on. A classmethod is like a staticmethod in that you can call it on the class, without an instance, but it receives the class it was called on as first argument. This allows you to actually create an instance of that class, which may be a subclass of the original class. An example of a classmethod is dict.fromkeys(), which creates a dict from a list of keys and a single value (defaulting to None.) Because it's a classmethod, when you subclass dict you get the 'fromkeys' method entirely for free. Here's an example of how one could write dict.fromkeys() oneself:
class dict_with_fromkeys(dict):
#classmethod
def fromkeys(cls, keys, value=None):
self = cls()
for key in keys:
self[key] = value
return self

Categories