I'm writing a small Python application that contains a few nested classes, like the example below:
class SuperBar(object):
pass
class Foo(object):
NAME = 'this is foo'
class Bar(SuperBar):
MSG = 'this is how Bar handle stuff'
class AnotherBar(SuperBar):
MSG = 'this is how Another Bar handle stuff'
I'm using nested classes to create some sort of hierarchy and to provide a clean way to implement features for a parser.
At some point, I want to create a list of the inner classes. I'd like to have the following output:
[<class '__main__.Bar'>, <class '__main__.AnotherBar'>]
The question is: What is the recommended method to get a list of inner classes in a pythonic way?
I managed to get a list of inner class objects with the method below:
import inspect
def inner_classes_list(cls):
return [cls_attribute for cls_attribute in cls.__dict__.values()
if inspect.isclass(cls_attribute)
and issubclass(cls_attribute, SuperBar)]
It works, but I'm not sure if using __dict__ directly is a good thing to do. I'm using it because it contains the actual class instances that I need and seems to be portable across Python 2 and 3.
First: I can't see how nested classes can be of any use for you. Once you have an instance f of Foo, do you realize that f.Bar and f.AnotherBar will be the same object for all instances? That is - you can't record any attribute specific from f on f.Bar, like f.Bar.speed - or it will collide with an attribute from another instance g.Bar.speed.
To overcome this, and actually, the only thing that makes sense, you'd need to have instances of Bar and AnotherBar attached to the instance f. These instances usually can't be declared on the class body - you have to create them on your Foo's __init__ method.
The only thing that Bar and AntherBar can do doing there is: (1) to have a lot of class and static methods, then they work as namespaces only.
Or, if a metaclass for SuperBar or themselves implement the descriptor protocol - https://docs.python.org/3/reference/datamodel.html#implementing-descriptors - but them, you'd be much better if superbar itself would implement the descriptor prootocol (by having either __get__ or __set__ methods), and attached to Foo's body you'd have instances of these classes, not the classes themselves.
That said, you came with the solution of using __dict__ to getting the inner classes: that won't work if Foo itself inherit from other classes that also have nested classes. The Superclasses of Foo are never searched. You can have a method to either look on all classes on Foo's __mro__, or simply use dir and issubclass :
class Foo:
#classmethod
def inner_classes_list(cls):
results = []
for attrname in dir(cls):
obj = getattr(cls, attrname)
if isinstance(obj, type) and issubclass(obj, SuperBar):
results.append(obj)
return results
(If you want this to work to all classes like Foo that does not share a common base, the same code will work if it is nto declared as a class method, of course - and also, SuperBar can be a parameter to this function, if you have more than one nested-class hierarchy.)
Now you have this, we urge you to ask other questions saying what do you want to actually do - and to read about "descriptors" - and even "properties". Really: there is very little use one can think of to nested subclasses.
Related
I want to create a configuration class with cascading feature. What do I mean by this? let say we have a configuration class like this
class BaseConfig(metaclass=ConfigMeta, ...):
def getattr():
return 'default values provided by the metaclass'
class Config(BaseConfig):
class Embedding(BaseConfig, size=200):
class WordEmbedding(Embedding):
size = 300
when I use this in code I will access the configuration as follows,
def function(Config, blah, blah):
word_embedding_size = Config.Embedding.Word.size
char_embedding_size = Config.Embedding.Char.size
The last line access a property which does not exist in Embedding class 'Char'. That should invoke getattr() which should return 200 in this case. I am not familiar with metaclasses enough to make a good judgement, but I gues I need to define the __new__() of the metaclass.
does this approach makes sense or is there a better way to do it?
EDIT:
class Config(BaseConfig):
class Embedding(BaseConfig, size=200):
class WordEmbedding(Embedding):
size = 300
class Log(BaseConfig, level=logging.DEBUG):
class PREPROCESS(Log):
level = logging.INFO
#When I use
log = logging.getLogger(level=Config.Log.Model.level) #level should be INFO
This is a bit confuse. I am not sure if this would be the best notation to declare configurations with default parameters - it seems verbose. But yes, given the flexibility of metaclasses and magic methods in Python, it is possible for something like this to old all flexibility you need.
Just for the sake of it, I'd like to say that using nested classes as namespaces, like you are doing, is probably the only useful thing for them. (nested classes). It is common to see a lot of people that misunderstands Python OO at all trying to make use of nested classes.
So - for your problem, you need that in the final class, a __getattr__ method exists that can fetch default values for atributes. These attributes in turn are declared as keywords to nested classes - which also can have the same metaclass. Otherwise, the hierarchy of nested classes just work for you to fetch nested attributes, using the dot notation in Python.
Moreover, for each class in a nested set, one can pass in keyword parameters that are to be used as default, if the next level of nested classes is not defined. In the given example, trying to access Config.Embedding.Char.size with a non exisitng Char should return the default "size". Not that a __getattr__ in "Embedding" can return you a fake "Char" object - but that object is the one that have to yield a size attribute. So, our __getattr__ have yet to yield an object that has itself a propper __getattr__;
However, I will suggest a change to your requirements - instead of passing in the default values as keyword parameters, to have a reserved name - like _default inside which you can put your default attributes. That way, you can provide deeply nested default subtress, instead of just scalar values as well, and the implementation can possibly be simpler.
Actually - a lot simpler. By using keywords to the class as you propose, you'd actually need to have a metaclass set those default parameters in a data structure(it would be possible in either __new__ or __init__ though). But by just using the nested classes all the way, with a reserved name, a custom __getattr__ on the metac class will work. That will retrieve unexisting class attributes on the configuration classes themselves, and all one have to do, if a requested attribute does not exist, is try to retrieve the _default class I mentioned.
Thus, you can work with something like:
class ConfigMeta(type):
def __getattr__(cls, attr):
return cls._default
class Base(metaclass=ConfigMeta):
pass
class Config(Base):
class Embed(Base):
class _default(Base):
size = 200
class Word(Base):
size = 300
assert Config.Embed.Char.size == 200
assert Config.Embed.Word.size == 300
Btw - just last year I was working on a project to have configurations like this, with default values, but using a dictionary syntax - that is why I mentioned I am not sure the nested class would be a nice design. But since all the functionality can be provided by a metaclass with 3 LoC I guess this beats anything in the way.
Also, that is why I think being able to nest whole default subtrees can be useful for what you want - I've been there.
You can use a metaclass to set the attribute:
class ConfigMeta(type):
def __new__(mt, clsn, bases, attrs):
try:
_ = attrs['size']
except KeyError:
attrs['size'] = 300
return super().__new__(mt, clsn, bases, attrs)
Now if the class does not have the size attribute, it would be set to 300 (change this to meet your need).
What is the difference between class and instance variables in Python?
class Complex:
a = 1
and
class Complex:
def __init__(self):
self.a = 1
Using the call: x = Complex().a in both cases assigns x to 1.
A more in-depth answer about __init__() and self will be appreciated.
When you write a class block, you create class attributes (or class variables). All the names you assign in the class block, including methods you define with def become class attributes.
After a class instance is created, anything with a reference to the instance can create instance attributes on it. Inside methods, the "current" instance is almost always bound to the name self, which is why you are thinking of these as "self variables". Usually in object-oriented design, the code attached to a class is supposed to have control over the attributes of instances of that class, so almost all instance attribute assignment is done inside methods, using the reference to the instance received in the self parameter of the method.
Class attributes are often compared to static variables (or methods) as found in languages like Java, C#, or C++. However, if you want to aim for deeper understanding I would avoid thinking of class attributes as "the same" as static variables. While they are often used for the same purposes, the underlying concept is quite different. More on this in the "advanced" section below the line.
An example!
class SomeClass:
def __init__(self):
self.foo = 'I am an instance attribute called foo'
self.foo_list = []
bar = 'I am a class attribute called bar'
bar_list = []
After executing this block, there is a class SomeClass, with 3 class attributes: __init__, bar, and bar_list.
Then we'll create an instance:
instance = SomeClass()
When this happens, SomeClass's __init__ method is executed, receiving the new instance in its self parameter. This method creates two instance attributes: foo and foo_list. Then this instance is assigned into the instance variable, so it's bound to a thing with those two instance attributes: foo and foo_list.
But:
print instance.bar
gives:
I am a class attribute called bar
How did this happen? When we try to retrieve an attribute through the dot syntax, and the attribute doesn't exist, Python goes through a bunch of steps to try and fulfill your request anyway. The next thing it will try is to look at the class attributes of the class of your instance. In this case, it found an attribute bar in SomeClass, so it returned that.
That's also how method calls work by the way. When you call mylist.append(5), for example, mylist doesn't have an attribute named append. But the class of mylist does, and it's bound to a method object. That method object is returned by the mylist.append bit, and then the (5) bit calls the method with the argument 5.
The way this is useful is that all instances of SomeClass will have access to the same bar attribute. We could create a million instances, but we only need to store that one string in memory, because they can all find it.
But you have to be a bit careful. Have a look at the following operations:
sc1 = SomeClass()
sc1.foo_list.append(1)
sc1.bar_list.append(2)
sc2 = SomeClass()
sc2.foo_list.append(10)
sc2.bar_list.append(20)
print sc1.foo_list
print sc1.bar_list
print sc2.foo_list
print sc2.bar_list
What do you think this prints?
[1]
[2, 20]
[10]
[2, 20]
This is because each instance has its own copy of foo_list, so they were appended to separately. But all instances share access to the same bar_list. So when we did sc1.bar_list.append(2) it affected sc2, even though sc2 didn't exist yet! And likewise sc2.bar_list.append(20) affected the bar_list retrieved through sc1. This is often not what you want.
Advanced study follows. :)
To really grok Python, coming from traditional statically typed OO-languages like Java and C#, you have to learn to rethink classes a little bit.
In Java, a class isn't really a thing in its own right. When you write a class you're more declaring a bunch of things that all instances of that class have in common. At runtime, there's only instances (and static methods/variables, but those are really just global variables and functions in a namespace associated with a class, nothing to do with OO really). Classes are the way you write down in your source code what the instances will be like at runtime; they only "exist" in your source code, not in the running program.
In Python, a class is nothing special. It's an object just like anything else. So "class attributes" are in fact exactly the same thing as "instance attributes"; in reality there's just "attributes". The only reason for drawing a distinction is that we tend to use objects which are classes differently from objects which are not classes. The underlying machinery is all the same. This is why I say it would be a mistake to think of class attributes as static variables from other languages.
But the thing that really makes Python classes different from Java-style classes is that just like any other object each class is an instance of some class!
In Python, most classes are instances of a builtin class called type. It is this class that controls the common behaviour of classes, and makes all the OO stuff the way it does. The default OO way of having instances of classes that have their own attributes, and have common methods/attributes defined by their class, is just a protocol in Python. You can change most aspects of it if you want. If you've ever heard of using a metaclass, all that is is defining a class that is an instance of a different class than type.
The only really "special" thing about classes (aside from all the builtin machinery to make them work they way they do by default), is the class block syntax, to make it easier for you to create instances of type. This:
class Foo(BaseFoo):
def __init__(self, foo):
self.foo = foo
z = 28
is roughly equivalent to the following:
def __init__(self, foo):
self.foo = foo
classdict = {'__init__': __init__, 'z': 28 }
Foo = type('Foo', (BaseFoo,) classdict)
And it will arrange for all the contents of classdict to become attributes of the object that gets created.
So then it becomes almost trivial to see that you can access a class attribute by Class.attribute just as easily as i = Class(); i.attribute. Both i and Class are objects, and objects have attributes. This also makes it easy to understand how you can modify a class after it's been created; just assign its attributes the same way you would with any other object!
In fact, instances have no particular special relationship with the class used to create them. The way Python knows which class to search for attributes that aren't found in the instance is by the hidden __class__ attribute. Which you can read to find out what class this is an instance of, just as with any other attribute: c = some_instance.__class__. Now you have a variable c bound to a class, even though it probably doesn't have the same name as the class. You can use this to access class attributes, or even call it to create more instances of it (even though you don't know what class it is!).
And you can even assign to i.__class__ to change what class it is an instance of! If you do this, nothing in particular happens immediately. It's not earth-shattering. All that it means is that when you look up attributes that don't exist in the instance, Python will go look at the new contents of __class__. Since that includes most methods, and methods usually expect the instance they're operating on to be in certain states, this usually results in errors if you do it at random, and it's very confusing, but it can be done. If you're very careful, the thing you store in __class__ doesn't even have to be a class object; all Python's going to do with it is look up attributes under certain circumstances, so all you need is an object that has the right kind of attributes (some caveats aside where Python does get picky about things being classes or instances of a particular class).
That's probably enough for now. Hopefully (if you've even read this far) I haven't confused you too much. Python is neat when you learn how it works. :)
What you're calling an "instance" variable isn't actually an instance variable; it's a class variable. See the language reference about classes.
In your example, the a appears to be an instance variable because it is immutable. It's nature as a class variable can be seen in the case when you assign a mutable object:
>>> class Complex:
>>> a = []
>>>
>>> b = Complex()
>>> c = Complex()
>>>
>>> # What do they look like?
>>> b.a
[]
>>> c.a
[]
>>>
>>> # Change b...
>>> b.a.append('Hello')
>>> b.a
['Hello']
>>> # What does c look like?
>>> c.a
['Hello']
If you used self, then it would be a true instance variable, and thus each instance would have it's own unique a. An object's __init__ function is called when a new instance is created, and self is a reference to that instance.
Concretely, I have a user-defined class of type
class Foo(object):
def __init__(self, bar):
self.bar = bar
def bind(self):
val = self.bar
do_something(val)
I need to:
1) be able to call on the class (not an instance of the class) to recover all the self.xxx attributes defined within the class.
For an instance of a class, this can be done by doing a f = Foo('') and then f.__dict__. Is there a way of doing it for a class, and not an instance? If yes, how? I would expect Foo.__dict__ to return {'bar': None} but it doesn't work this way.
2) be able to access all the self.xxx parameters called from a particular function of a class. For instance I would like to do Foo.bind.__selfparams__ and recieve in return ['bar']. Is there a way of doing this?
This is something that is quite hard to do in a dynamic language, assuming I understand correctly what you're trying to do. Essentially this means going over all the instances in existence for the class and then collecting all the set attributes on those instances. While not infeasible, I would question the practicality of such approach both from a design as well as performance points of view.
More specifically, you're talking of "all the self.xxx attributes defined within the class"—but these things are not defined at all, not at least in a single place—they more like "evolve" as more and more instances of the class are brought to life. Now, I'm not saying all your instances are setting different attributes, but they might, and in order to have a reliable generic solution, you'd literally have to keep track of anything the instances might have done to themselves. So unless you have a static analysis approach in mind, I don't see a clean and efficient way of achieving it (and actually even static analysis is of no help generally speaking in a dynamic language).
A trivial example to prove my point:
class Foo(object):
def __init__(self):
# statically analysable
self.bla = 3
# still, but more difficult
if SOME_CONSTANT > 123:
self.x = 123
else:
self.y = 321
def do_something(self):
import random
setattr(self, "attr%s" % random.randint(1, 100), "hello, world of dynamic languages!")
foo = Foo()
foo2 = Foo()
# only `bla`, `x`, and `y` attrs in existence so far
foo2.do_something()
# now there's an attribute with a random name out there
# in order to detect it, we'd have to get all instances of Foo existence at the moment, and individually inspect every attribute on them.
And, even if you were to iterate all instances in existence, you'd only be getting a snapshot of what you're interested, not all possible attributes.
This is not possible. The class doesn't have those attributes, just functions that set them. Ergo, there is nothing to retrieve and this is impossible.
This is only possible with deep AST inspection. Foo.bar.func_code would normally have the attributes you want under co_freevars but you're looking up the attributes on self, so they are not free variables. You would have to decompile the bytecode from func_code.co_code to AST and then walk said AST.
This is a bad idea. Whatever you're doing, find a different way of doing it.
To do this, you need some way to find all the instances of your class. One way to do this is just to have the class itself keep track of its instances. Unfortunately, keeping a reference to every instance in the class means that those instances can never be garbage-collected. Fortunately, Python has weakref, which will keep a reference to an object but does not count as a reference to Python's memory management, so the instances can be garbage-collected as per usual.
A good place to update the list of instances is in your __init__() method. You could also do it in __new__() if you find the separation of concerns a little cleaner.
import weakref
class Foo(object):
_instances = []
def __init__(self, value):
self.value = value
cls = type(self)
type(self)._instances.append(weakref.ref(self,
type(self)._instances.remove))
#classmethod
def iterinstances(cls):
"Returns an iterator over all instances of the class."
return (ref() for ref in cls._instances)
#classmethod
def iterattrs(cls, attr, default=None):
"Returns an iterator over a named attribute of all instances of the class."
return (getattr(ref(), attr, default) for ref in cls._instances)
Now you can do this:
f1, f2, f3 = Foo(1), Foo(2), Foo(3)
for v in Foo.iterattrs("value"):
print v, # prints 1 2 3
I am, for the record, with those who think this is generally a bad idea and/or not really what you want. In particular, instances may live longer than you expect depending on where you pass them and what that code does with them, so you may not always have the instances you think you have. (Some of this may even happen implicitly.) It is generally better to be explicit about this: rather than having the various instances of your class be stored in random variables all over your code (and libraries), have their primary repository be a list or other container, and access them from there. Then you can easily iterate over them and get whatever attributes you want. However, there may be use cases for something like this and it's possible to code it up, so I did.
I have a method on one of my objects that returns a new instance of that same class. I'm trying to figure out the most idiomatic way to write this method such that it generates a new object of the same type without duplicating code.
Since this method uses data from the instance, my first pass is:
class Foo(object):
def get_new(self):
data = # Do interesting things
return Foo(data)
However, if I subclass Foo and don't override get_new, calling get_new on SubFoo would return a Foo! So, I could write a classmethod:
class Foo(object):
#classmethod
def get_new(cls, obj):
data = # Munge about in objects internals
return cls(data)
However, the data I'm accessing is specific to the object, so it seems to break encapsulation for this not to be a "normal" (undecorated) method. Additionally, you then have to call it like SubFoo.get_new(sub_foo_inst), which seems redundant. I'd like the object to just "know" which type to return -- the same type as itself!
I suppose it's also possible to add a factory method to the class, and override the return type everywhere, without duplicating the logic, but that seems to put a lot of work on the subclasses.
So, my question is, what's the best way to write a method that gives flexibility in type of class without having to annotate the type all over the place?
If you want to make it more flexible for subclassing, you can simply use the self.__class__ special attribute:
class Foo(object):
def __init__(self, data):
self.data = data
def get_new(self):
data = # Do interesting things
return self.__class__(data)
Note that using the #classmethod approach will prevent you from accessing data within any one instance, removing it as a viable solution in instances where #Do interesting things relies on data stored within an instance.
For Python 2, I do not recommend using type(self), as this will return an inappropriate value for classic classes (i.e., those not subclassed from the base object):
>>> class Foo:
... pass
...
>>> f = Foo()
>>> type(f)
<type 'instance'>
>>> f.__class__ # Note that the __class__ attribute still works
<class '__main__.Foo'>
For Python 3, this is not as much of an issue, as all classes are derived from object, however, I believe self.__class__ is considered the more Pythonic idiom.
You can use the builtin 'type'.
type(instance)
is that instance's class.
I'd like to modify all classes in Python. For example str and int and others like Person(object).
I'd like to add an attribute to them and to change the way its methods works.
Which is the best approach for this? Metaclasses?
While you can do this for classes defined in python code (it will not work for builtin ones) by reassigning their attributes please do not actually do so. Just subclass and use the subclass, or write functions that take an instance of the class as argument instead of adding your own methods. Doing what you have to mind leads to awkward, fragile code, especially if you end up using multiple libraries simultaneously that try to do this to the same classes.
Is there an actual problem you're trying to solve this way?
Built-in classes can't be modified, but you can "hide" a built-in class (or any other of course) by one of the same name.
For example, suppose that the change is to add to a bunch of classes a new attribute "foobar" whose initial value is 23, and to every instance of those classes a new attribute "murf" whose initial value is 45. Here's one way:
def changedclass(cls):
def __init__(self, *a, **k):
cls.__init__(self, *a, **k)
self.murf = 45
return type(cls.__name__, (cls,), {'foobar': 23, '__init__': __init__})
def changemany(changed, classes_by_module):
for module, classnames in classes_by_module.iteritems():
for name in classnames:
cls = getattr(module, name)
subcls = changed(cls)
setattr(module, name, subcls)
import __builtin__
import mymod
changemany(changedclass, {__builtin__: ('int', 'str'), mymod: ('Person',)})
Note that bare literals like 'ciao' and 23 will still belong to the real classes -- there's no way to change that; you'll need to use str('ciao') and int(23) to use the "fake" classes.
You can't edit the class directly like you might with javascript's prototype attribute, it's better if you subclass them. This let's you add the functionality you want and not force it to be used everywhere.
subclass:
class int(int):
def foo(self):
print "foo"
int(2).foo()