I have a question about what I see as a potential bad habit when using inheritance in python
suppose I have a base class
class FourLeggedAnimal():
def __init__(self, name):
self.name = name
self.number_of_legs = 4
and two daughter classes
class Cat(FourLeggedAnimal):
def __init__(self, name):
super().__init__(name)
def claw_the_furniture(self):
for leg in range(self.number_of_legs):
print("scratch")
class Dog(FourLeggedAnimal):
def __init__(self, name):
super().__init__(name)
def run_in_sleep(self):
for leg in range(self.number_of_legs):
self.move_leg(leg)
def move_leg(i):
pass
For the purposes of this example, I intend to keep Animal in a different file than Cat. For someone reading the code for the Cat or Dog class, the number_of_legs attribute is used but not defined in the file. My understanding is that it is best not to have variables whose definitions are opaque (which is why its best to avoid from x import *.
I see the alternative to be repeating the definition self.number_of_legs in both daughter classes but that defeats the purposes of inheritance.
Is there a best-practice to deal with this kind of situation?
Is there a best-practice to deal with this kind of situation?
Normally, class variables are used for this purpose.
class FourLeggedAnimal():
number_of_legs = 4 # class variable
def __init__(self, name):
self.name = name
class Cat(FourLeggedAnimal):
def __init__(self, name):
super().__init__(name)
def claw_the_furniture(self):
for leg in range(self.number_of_legs):
print("scratch")
class Dog(FourLeggedAnimal):
def __init__(self, name):
super().__init__(name)
def run_in_sleep(self):
for leg in range(self.number_of_legs):
self.move_leg(leg)
def move_leg(i):
pass
Note that even if these classes are in different files, the attribute is part of the parent's public API and is knowable by the subclasses. Also, the class name, "FourLeggedAnimal" does a great job of communicating what the number of legs would be.
My understanding is that it is best not to have variables whose definitions are opaque (which is why its best to avoid from x import *.
I think perhaps you are misunderstanding the source of this advice. It may even be a mix of different pieces of advice. I'll try to explain what I think might have been the underlying ideas people were trying to convey.
Firstly, it's pretty widely agreed that from x import * is best avoided in Python. This is because it makes it hard for readers to find out where a name comes from or indeed if it's defined at all. It also confuses some code analysis tools. It's the only way that a (non-builtin) name will normally get into a top level namespace without appearing in the source code and being easy to search for. As far as this advice goes, it's only for this case. You could barely write Python code at all if you couldn't use fields and methods on objects, and you generally have a clear breadcrumb trail to follow. (Moreso if you're using type annotations.)
However, you may also be thinking of the principle of encapsulation. In object-oriented programming it's considered preferable to separate the interface from the implementation of your objects. You make the interface as small, simple and clear as you can and hide away the implementation from the code using the objects. In this way you can reason about and change the implementation in isolation, with confidence that doing so won't affect other code. This principle is applied even between base classes and sub-classes - the sub-class shouldn't "know" anything about the base class that it doesn't need to. Now, modifying variables, and to a lesser extent reading modifiable variables requires knowing an awful lot about what expectations the base class has for their values, their relationship with other state and when it's possible/permissible for them to change. Depending on them can make it much harder to safely change the base class.
Now, Python does have more flexibility than some other languages in this respect. In Python you can seamlessly replace a variable with a property, and thus make "reading" and "setting" a field into methods that you can implement however you want. In other languages once a sub-class starts using a field exposed by a base class it is impossible to refactor the base class to remove the field or add any extra behaviour when it is accessed, unless you also update all the sub-classes. So it's a bit less of a concern. Or rather, there's no particular reason to treat fields differently from methods.
With all this in mind, the question becomes - what interface is your base class presenting to its sub-classes? Does it support them setting as well as reading this field? Can you reduce the size and complexity of the interface between the two classes without making your code more complex? An interface is simpler and easier to reason about if it is read-only, and moreso if it does not involve mutable state at all. Where possible the base class should not give the sub-class any unnecessary opportunities to break its invariants. (I.e. it's expectations about its own state.) In Python these things are more often achieved through convention (e.g. fields and methods beginning with an underscore are considered not to be part of the public interface unless documented otherwise) and documentation than through language features.
I want to create a configuration class with cascading feature. What do I mean by this? let say we have a configuration class like this
class BaseConfig(metaclass=ConfigMeta, ...):
def getattr():
return 'default values provided by the metaclass'
class Config(BaseConfig):
class Embedding(BaseConfig, size=200):
class WordEmbedding(Embedding):
size = 300
when I use this in code I will access the configuration as follows,
def function(Config, blah, blah):
word_embedding_size = Config.Embedding.Word.size
char_embedding_size = Config.Embedding.Char.size
The last line access a property which does not exist in Embedding class 'Char'. That should invoke getattr() which should return 200 in this case. I am not familiar with metaclasses enough to make a good judgement, but I gues I need to define the __new__() of the metaclass.
does this approach makes sense or is there a better way to do it?
EDIT:
class Config(BaseConfig):
class Embedding(BaseConfig, size=200):
class WordEmbedding(Embedding):
size = 300
class Log(BaseConfig, level=logging.DEBUG):
class PREPROCESS(Log):
level = logging.INFO
#When I use
log = logging.getLogger(level=Config.Log.Model.level) #level should be INFO
This is a bit confuse. I am not sure if this would be the best notation to declare configurations with default parameters - it seems verbose. But yes, given the flexibility of metaclasses and magic methods in Python, it is possible for something like this to old all flexibility you need.
Just for the sake of it, I'd like to say that using nested classes as namespaces, like you are doing, is probably the only useful thing for them. (nested classes). It is common to see a lot of people that misunderstands Python OO at all trying to make use of nested classes.
So - for your problem, you need that in the final class, a __getattr__ method exists that can fetch default values for atributes. These attributes in turn are declared as keywords to nested classes - which also can have the same metaclass. Otherwise, the hierarchy of nested classes just work for you to fetch nested attributes, using the dot notation in Python.
Moreover, for each class in a nested set, one can pass in keyword parameters that are to be used as default, if the next level of nested classes is not defined. In the given example, trying to access Config.Embedding.Char.size with a non exisitng Char should return the default "size". Not that a __getattr__ in "Embedding" can return you a fake "Char" object - but that object is the one that have to yield a size attribute. So, our __getattr__ have yet to yield an object that has itself a propper __getattr__;
However, I will suggest a change to your requirements - instead of passing in the default values as keyword parameters, to have a reserved name - like _default inside which you can put your default attributes. That way, you can provide deeply nested default subtress, instead of just scalar values as well, and the implementation can possibly be simpler.
Actually - a lot simpler. By using keywords to the class as you propose, you'd actually need to have a metaclass set those default parameters in a data structure(it would be possible in either __new__ or __init__ though). But by just using the nested classes all the way, with a reserved name, a custom __getattr__ on the metac class will work. That will retrieve unexisting class attributes on the configuration classes themselves, and all one have to do, if a requested attribute does not exist, is try to retrieve the _default class I mentioned.
Thus, you can work with something like:
class ConfigMeta(type):
def __getattr__(cls, attr):
return cls._default
class Base(metaclass=ConfigMeta):
pass
class Config(Base):
class Embed(Base):
class _default(Base):
size = 200
class Word(Base):
size = 300
assert Config.Embed.Char.size == 200
assert Config.Embed.Word.size == 300
Btw - just last year I was working on a project to have configurations like this, with default values, but using a dictionary syntax - that is why I mentioned I am not sure the nested class would be a nice design. But since all the functionality can be provided by a metaclass with 3 LoC I guess this beats anything in the way.
Also, that is why I think being able to nest whole default subtrees can be useful for what you want - I've been there.
You can use a metaclass to set the attribute:
class ConfigMeta(type):
def __new__(mt, clsn, bases, attrs):
try:
_ = attrs['size']
except KeyError:
attrs['size'] = 300
return super().__new__(mt, clsn, bases, attrs)
Now if the class does not have the size attribute, it would be set to 300 (change this to meet your need).
I'm trying to figure out how to organize app engine code with transactions. Currently I have a separate python file with all my transaction functions. For transactions that are closely related to entities, I was wondering if it made sense to use a #staticmethod for the transaction.
Here is a simple example:
class MyEntity(ndb.Model):
n = ndb.IntegerProperty(default=0)
#staticmethod
#ndb.transactional # does the order of the decorators matter?
def increment_n(my_entity_key):
my_entity = my_entity_key.get()
my_entity.n += 1
my_entity.put()
def do_something(self):
MyEntity.increment_n(self.key)
It would be nice to have increment_n associated with the entity definition, but I have never seen anyone do this so I was wondering if this would be a bad idea.
MY SOLUTION:
Following Brent's answer, I've implemented this:
class MyEntity(ndb.Model):
n = ndb.IntegerProperty(default=0)
#staticmethod
#ndb.transactional
def increment_n_transaction(my_entity_key):
my_entity = my_entity_key.get()
my_entity.increment_n()
def increment_n(self):
self.n += 1
self.put()
This way I can keep entity related code all in one place and I can easily use the transactional version or not as needed.
Yes, it makes sense to use a #staticmethod in this case, since the function doesn't use a class or an instance (self).
And yes, the order of decorators is important, as noted in #Kekito's later answer.
I later came across the accepted answer to this question. Here is a quote:
A decorator would wrap the function it is decorating. Here, your
add_cost function is wrapped by ndb.transactional so everything thing
within the function happens in the context of a transaction and then
the method returned by that is wrapped by classmethod which returns a
descriptor object.
So, when you apply multiple decorators in a class, then decorators
such as classmethod or staticmethod should be the top ones. If you
change the order you would receive an TypeError: unbound method ....
type of error if the other decorator doesn't accept descriptors.
So it looks like the order of decorators is very important. By luck, I had put mine in the right order, but updating this for others who come across this question.
Say I have a class, which has a number of subclasses.
I can instantiate the class. I can then set its __class__ attribute to one of the subclasses. I have effectively changed the class type to the type of its subclass, on a live object. I can call methods on it which invoke the subclass's version of those methods.
So, how dangerous is doing this? It seems weird, but is it wrong to do such a thing? Despite the ability to change type at run-time, is this a feature of the language that should completely be avoided? Why or why not?
(Depending on responses, I'll post a more-specific question about what I would like to do, and if there are better alternatives).
Here's a list of things I can think of that make this dangerous, in rough order from worst to least bad:
It's likely to be confusing to someone reading or debugging your code.
You won't have gotten the right __init__ method, so you probably won't have all of the instance variables initialized properly (or even at all).
The differences between 2.x and 3.x are significant enough that it may be painful to port.
There are some edge cases with classmethods, hand-coded descriptors, hooks to the method resolution order, etc., and they're different between classic and new-style classes (and, again, between 2.x and 3.x).
If you use __slots__, all of the classes must have identical slots. (And if you have the compatible but different slots, it may appear to work at first but do horrible things…)
Special method definitions in new-style classes may not change. (In fact, this will work in practice with all current Python implementations, but it's not documented to work, so…)
If you use __new__, things will not work the way you naively expected.
If the classes have different metaclasses, things will get even more confusing.
Meanwhile, in many cases where you'd think this is necessary, there are better options:
Use a factory to create an instance of the appropriate class dynamically, instead of creating a base instance and then munging it into a derived one.
Use __new__ or other mechanisms to hook the construction.
Redesign things so you have a single class with some data-driven behavior, instead of abusing inheritance.
As a very most common specific case of the last one, just put all of the "variable methods" into classes whose instances are kept as a data member of the "parent", rather than into subclasses. Instead of changing self.__class__ = OtherSubclass, just do self.member = OtherSubclass(self). If you really need methods to magically change, automatic forwarding (e.g., via __getattr__) is a much more common and pythonic idiom than changing classes on the fly.
Assigning the __class__ attribute is useful if you have a long time running application and you need to replace an old version of some object by a newer version of the same class without loss of data, e.g. after some reload(mymodule) and without reload of unchanged modules. Other example is if you implement persistency - something similar to pickle.load.
All other usage is discouraged, especially if you can write the complete code before starting the application.
On arbitrary classes, this is extremely unlikely to work, and is very fragile even if it does. It's basically the same thing as pulling the underlying function objects out of the methods of one class, and calling them on objects which are not instances of the original class. Whether or not that will work depends on internal implementation details, and is a form of very tight coupling.
That said, changing the __class__ of objects amongst a set of classes that were particularly designed to be used this way could be perfectly fine. I've been aware that you can do this for a long time, but I've never yet found a use for this technique where a better solution didn't spring to mind at the same time. So if you think you have a use case, go for it. Just be clear in your comments/documentation what is going on. In particular it means that the implementation of all the classes involved have to respect all of their invariants/assumptions/etc, rather than being able to consider each class in isolation, so you'd want to make sure that anyone who works on any of the code involved is aware of this!
Well, not discounting the problems cautioned about at the start. But it can be useful in certain cases.
First of all, the reason I am looking this post up is because I did just this and __slots__ doesn't like it. (yes, my code is a valid use case for slots, this is pure memory optimization) and I was trying to get around a slots issue.
I first saw this in Alex Martelli's Python Cookbook (1st ed). In the 3rd ed, it's recipe 8.19 "Implementing Stateful Objects or State Machine Problems". A fairly knowledgeable source, Python-wise.
Suppose you have an ActiveEnemy object that has different behavior from an InactiveEnemy and you need to switch back and forth quickly between them. Maybe even a DeadEnemy.
If InactiveEnemy was a subclass or a sibling, you could switch class attributes. More exactly, the exact ancestry matters less than the methods and attributes being consistent to code calling it. Think Java interface or, as several people have mentioned, your classes need to be designed with this use in mind.
Now, you still have to manage state transition rules and all sorts of other things. And, yes, if your client code is not expecting this behavior and your instances switch behavior, things will hit the fan.
But I've used this quite successfully on Python 2.x and never had any unusual problems with it. Best done with a common parent and small behavioral differences on subclasses with the same method signatures.
No problems, until my __slots__ issue that's blocking it just now. But slots are a pain in the neck in general.
I would not do this to patch live code. I would also privilege using a factory method to create instances.
But to manage very specific conditions known in advance? Like a state machine that the clients are expected to understand thoroughly? Then it is pretty darn close to magic, with all the risk that comes with it. It's quite elegant.
Python 3 concerns? Test it to see if it works but the Cookbook uses Python 3 print(x) syntax in its example, FWIW.
The other answers have done a good job of discussing the question of why just changing __class__ is likely not an optimal decision.
Below is one example of a way to avoid changing __class__ after instance creation, using __new__. I'm not recommending it, just showing how it could be done, for the sake of completeness. However it is probably best to do this using a boring old factory rather than shoe-horning inheritance into a job for which it was not intended.
class ChildDispatcher:
_subclasses = dict()
def __new__(cls, *args, dispatch_arg, **kwargs):
# dispatch to a registered child class
subcls = cls.getsubcls(dispatch_arg)
return super(ChildDispatcher, subcls).__new__(subcls)
def __init_subclass__(subcls, **kwargs):
super(ChildDispatcher, subcls).__init_subclass__(**kwargs)
# add __new__ contructor to child class based on default first dispatch argument
def __new__(cls, *args, dispatch_arg = subcls.__qualname__, **kwargs):
return super(ChildDispatcher,cls).__new__(cls, *args, **kwargs)
subcls.__new__ = __new__
ChildDispatcher.register_subclass(subcls)
#classmethod
def getsubcls(cls, key):
name = cls.__qualname__
if cls is not ChildDispatcher:
raise AttributeError(f"type object {name!r} has no attribute 'getsubcls'")
try:
return ChildDispatcher._subclasses[key]
except KeyError:
raise KeyError(f"No child class key {key!r} in the "
f"{cls.__qualname__} subclasses registry")
#classmethod
def register_subclass(cls, subcls):
name = subcls.__qualname__
if cls is not ChildDispatcher:
raise AttributeError(f"type object {name!r} has no attribute "
f"'register_subclass'")
if name not in ChildDispatcher._subclasses:
ChildDispatcher._subclasses[name] = subcls
else:
raise KeyError(f"{name} subclass already exists")
class Child(ChildDispatcher): pass
c1 = ChildDispatcher(dispatch_arg = "Child")
assert isinstance(c1, Child)
c2 = Child()
assert isinstance(c2, Child)
How "dangerous" it is depends primarily on what the subclass would have done when initializing the object. It's entirely possible that it would not be properly initialized, having only run the base class's __init__(), and something would fail later because of, say, an uninitialized instance attribute.
Even without that, it seems like bad practice for most use cases. Easier to just instantiate the desired class in the first place.
Here's an example of one way you could do the same thing without changing __class__. Quoting #unutbu in the comments to the question:
Suppose you were modeling cellular automata. Suppose each cell could be in one of say 5 Stages. You could define 5 classes Stage1, Stage2, etc. Suppose each Stage class has multiple methods.
class Stage1(object):
…
class Stage2(object):
…
…
class Cell(object):
def __init__(self):
self.current_stage = Stage1()
def goToStage2(self):
self.current_stage = Stage2()
def __getattr__(self, attr):
return getattr(self.current_stage, attr)
If you allow changing __class__ you could instantly give a cell all the methods of a new stage (same names, but different behavior).
Same for changing current_stage, but this is a perfectly normal and pythonic thing to do, that won't confuse anyone.
Plus, it allows you to not change certain special methods you don't want changed, just by overriding them in Cell.
Plus, it works for data members, class methods, static methods, etc., in ways every intermediate Python programmer already understands.
If you refuse to change __class__, then you might have to include a stage attribute, and use a lot of if statements, or reassign a lot of attributes pointing to different stage's functions
Yes, I've used a stage attribute, but that's not a downside—it's the obvious visible way to keep track of what the current stage is, better for debugging and for readability.
And there's not a single if statement or any attribute reassignment except for the stage attribute.
And this is just one of multiple different ways of doing this without changing __class__.
In the comments I proposed modeling cellular automata as a possible use case for dynamic __class__s. Let's try to flesh out the idea a bit:
Using dynamic __class__:
class Stage(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Stage1(Stage):
def step(self):
if ...:
self.__class__ = Stage2
class Stage2(Stage):
def step(self):
if ...:
self.__class__ = Stage3
cells = [Stage1(x,y) for x in range(rows) for y in range(cols)]
def step(cells):
for cell in cells:
cell.step()
yield cells
For lack of a better term, I'm going to call this
The traditional way: (mainly abarnert's code)
class Stage1(object):
def step(self, cell):
...
if ...:
cell.goToStage2()
class Stage2(object):
def step(self, cell):
...
if ...:
cell.goToStage3()
class Cell(object):
def __init__(self, x, y):
self.x = x
self.y = y
self.current_stage = Stage1()
def goToStage2(self):
self.current_stage = Stage2()
def __getattr__(self, attr):
return getattr(self.current_stage, attr)
cells = [Cell(x,y) for x in range(rows) for y in range(cols)]
def step(cells):
for cell in cells:
cell.step(cell)
yield cells
Comparison:
The traditional way creates a list of Cell instances each with a
current stage attribute.
The dynamic __class__ way creates a list of instances which are
subclasses of Stage. There is no need for a current stage
attribute since __class__ already serves this purpose.
The traditional way uses goToStage2, goToStage3, ... methods to
switch stages.
The dynamic __class__ way requires no such methods. You just
reassign __class__.
The traditional way uses the special method __getattr__ to delegate
some method calls to the appropriate stage instance held in the
self.current_stage attribute.
The dynamic __class__ way does not require any such delegation. The
instances in cells are already the objects you want.
The traditional way needs to pass the cell as an argument to
Stage.step. This is so cell.goToStageN can be called.
The dynamic __class__ way does not need to pass anything. The
object we are dealing with has everything we need.
Conclusion:
Both ways can be made to work. To the extent that I can envision how these two implementations would pan-out, it seems to me the dynamic __class__ implementation will be
simpler (no Cell class),
more elegant (no ugly goToStage2 methods, no brain-teasers like why
you need to write cell.step(cell) instead of cell.step()),
and easier to understand (no __getattr__, no additional level of
indirection)
I've a following problem. I have a model class in MVC and it has a special purpose. In certain cases it should be able to override itself. Is this kind of behavior possible?
Class Text(Document):
a = StringField()
b = StringField()
def save(self):
if 1==Text.object(a=self.a).count(): # if similar object exists in db,
self = Text.object(a=self.a).first() # get the instance from db and
# override the origian class.
else: #use super class' save-function
return super(Text, self).save()
There's no trivial way for an object to become another object in python. Assigning to self won't do this; self is a local variable in the method definition, And assigning to it won't change the existing instance in any way; only make it inaccessible for the rest of the method.
There are a few ways to approach this problem. The preferred way is to have a method that returns the correct instance.
class Foo(...):
def get_or_save(self):
existing = load_from_database(self.bar)
if existing is not None:
return existing
else:
save_to_database(self)
return self
new_inst = Text()
new_inst.bar = "baz"
inst = new_inst.get_or_save()
# stop using new_inst
There is also a hackish way to get a similar effect to your original example. Ordinary python classes store most of their attributes in a special __dict__ attribute. You can copy that and it will be as though one instance is replaced by the other. Of course, that only works for perfectly plain python classes, and may or may not work classes defined in an ORM, or which retain state in more clever ways.
class Foo(...):
def save(self):
existing = load_from_database(self.bar)
if existing is not None:
self.__dict__ = existing.__dict__
else:
save_to_database(self)
Yes it is possible :-)
Seriously, using a conditional call to super as in your example will achieve the result.
However, the style of your example is a little confusing, and changing it may allow you to achieve your overall objectives more easily. (But neither of these directly affects your question.)
I would not recommend putting a method in your class called object unless I had no other choice.
The fact that you are passing self.a to Text.object, within method Text.save, doesn't seem right. It would be cleaner to simply call self.object() and have method object use self.a directly in its code.