I have a base class, and several sub classes that inherit from it. I am trying to detect dynamically which sub classes inherit from the base class dynamically. I am currently doing it by dynamically importing all the sub classes in the base class __init__(), and then using the __subclasses__() method.
I have the following file structure:
proj/
|-- __init__.py
|-- base.py
`-- sub
|-- __init__.py
|-- sub1.py
|-- sub2.py
`-- sub3.py
base.py:
import importlib
class Base(object):
def __init__(self):
importlib.import_module('sub.sub1')
importlib.import_module('sub.sub2')
importlib.import_module('sub.sub3')
#classmethod
def inheritors(cls):
print(cls.__subclasses__())
b = Base()
b.inheritors()
sub1.py:
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from base import Base
class Sub1(Base):
pass
sub2.py:
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from base import Base
class Sub2(Base):
pass
and finally sub3.py:
import sys
import os
class Sub3(object):
pass
You will notice that sub.sub1.Sub1 and sub.sub2.Sub2 both inherit from base.Base while sub.sub3.Sub3 does not.
When I open IPython3, and run import base I get the following output:
In [1]: import base
[<class 'sub.sub1.Sub1'>, <class 'sub.sub2.Sub2'>]
The output above is exactly as I would expect it to be. It gets weird when I run base.py using Python command line:
python3 base.py
[<class 'sub.sub2.Sub2'>]
[]
Now I think that I understand that there are two prints in the second case because the Python importer initially does not see base.py in the sys.modules global variable, so when a subclass is imported it will import base.py again and the code will be executed a second time. This explanation does not explain why the first time it prints [<class 'sub.sub2.Sub2'>] and not [<class 'sub.sub1.Sub1'>] as sub.sub1.Sub1 is imported first, and it does not explain why only sub.sub2.Sub2 appears in the __subclasses__() while sub.sub1.Sub1 does not.
Any explanation that would help me understand how Python works in this regard will be greatly appreciated!
EDIT: I would like to run the module using python base.py, so maybe I can be pointed in the correct direction for that?
You made a knot.
A complicated, uneeded knot. I could figure it out - but I don't know if I can keep it in mind to explain what is going on in a clear way :-)
But one thing first: this has less to do with "inheritance detection", andvall to do with the import system - which you tied in a complicated knot.
So, you get the unexpected result because when you do python base.py, the contents of base are recorded as the module named __main__ in sys.modules.
Ordinarily, Python will never import the module and run the same code again: upon fiding an import statement that tries to import an existing module, it just creates a new variable poiting to the existing module. If that module did not finish the execution of its body yet, not all classes or variables will be seem on the place where there is the second import statement. Calls to importlib do no better - they just don t automate the variable biding part. When you do circular imports, change the import path, and import a module named base from another file, Python does not know this is the same base that is __main__. So, the new one gets a new fresh import, and a second entry in sys.modules,as base.
If you just print the __class__ in your inheritors method, it will be clear:
#classmethod
def inheritors(cls):
print("At class {}. Subclasses: {}".format(__class__, cls.__subclasses__()))
Then you will see that "base.Base" has the "sub2" subclass and __main__.Base has no subclasses.
Now, let me try to put the timeline for it:
base.py is imported as __main__ and runs up to the line b =
Base(). At this point the __init__ method of Base will import the
submodules
submodule sub1 is run, changes the sys.path, and
re-imports base.py as the base module.
The contents of the
base module are run until the __init__ method in base.Base is met;
therein, it imports sub.sub1,and Python finds out this module has
already been imported and is in sys.modules. Its code has not been
completed, and the Sub1 base is not yet defined, though.
Inside the sub1 import of base, __init__ tries to import sub.sub2. That
is a new module to Python, so it is imported
On the import of
sub2, when import base is met, Python recognizes the module as
imported already (although, again, not all the initialization code
is complete)- it just brings the name alias to sub2 globals, and
keeps on
Sub2 is defined as subclass of base.Base
sub.sub2 import finishes, and Python resumes to the __init__ method on step (4); Python imports sub.sub3 and resumes to the b.inheritors() call
(from base, not from main). At this point the only subclass of
base.Base is sub2 - that is printed
The importing of
base.py as base finishes, and Python resumes executing the bodu
of sub.sub1- class Sub1 is defined as a subclass of base.Base
Python resumes the __main__.base.__init__ execution, imports
sub.sub2 - but it is already run, the same for sub.sub3
__main__.Base.inheritors is called in __main__, and prints no
sub-classes.
And that is the end of a complicated history.
What you should be doing
first: if you need to do the sys.path.append trickery, there is something wrong with your package. Let your package be proj, and point proj.__init__ to import base if you want that to be run (and dynamically import the other modules) - but stop fidling with sys.path to find things in your own package.
second:
the cls.__subclasses__ call is of little use, as it will only tell you about the imediate subclasses of cls - if there is a grand-chid subclass it will go unoticed,
The most usual pattern is to have a register of subclasses of your Base - an as they are created, just add the new classes to this record. This can be done with a metaclass, in Python < 3.6, or with the __init_subclass__ method on Python 3.6 and on.
Related
I am trying to implement a small library for Python 3.5 but keep struggling with how to correctly handle the structuring of the packages/modules and how to get the imports to work.
I keep running into the problem where python complains of being unable to import some name with an error like
ImportError: cannot import name 'SubClass1'
This seems to happen when "SubClass1" needs to import some other module but that other module also needs to know about SubClass1 (a cyclic import).
I need the cyclic import in my library because the base class has a factory method that creates the proper subclass instances (there are also other situations where cyclic imports are needed, e.g. checking the type of a function argument needs the import of where that type is defined, but that module may itself need the class where that check is done: another cyclic dependency!)
Here is example code:
Root directory contains the subdirectory dir1. The directory dir1 contains and empty file init.py, a file baseclass.py and a file subclass1.py.
The file ./dir1/subclass1.py contains:
from . baseclass import BaseClass
class SubClass1(BaseClass):
pass
The file ./dir1/baseclass.py contains:
from . subclass1 import SubClass1
class BaseClass(object):
def make(self,somearg):
# .. some logic to decide which subclass to create
ret = SubClass1()
# .. which gets eventually returned by this factory method
return ret
The file ./test1.py contains:
from dir1.subclass1 import SubClass1
sc1 = SubClass1()
This results in the following error:
Traceback (most recent call last):
File "test1.py", line 1, in <module>
from dir1.subclass1 import SubClass1
File "/data/johann/tmp/python1/dir1/subclass1.py", line 1, in <module>
from . baseclass import BaseClass
File "/data/johann/tmp/python1/dir1/baseclass.py", line 1, in <module>
from . subclass1 import SubClass1
ImportError: cannot import name 'SubClass1'
What is the standard/best way to solve this problem, ideally in a way that is backwards compatible to python 2.x and python 3 up to version 3.2?
I have read elsewhere that importing the module instead of something from a module may help here but I do not know how to just import the module (e.g. subclass1) in a relative way because "import . subclass1" or similar does not work.
Your issue is caused by a circular import. The baseclass module is trying to import SubClass1 from the subclass1 module, but subclass is trying to import BaseClass right back. You get NameError because the classes haven't been defined yet when the import statements are running.
There are a few ways to solve the issue.
One option would be to change your style of import. Instead of importing the classes by name, just import the modules and look up the names as attributes later on.
from . import baseclass
class SubClass1(baseclass.BaseClass):
pass
And:
from . import subclass1
class BaseClass:
def make(self,somearg):
# ...
ret = subclass1.SubClass1()
Because SubClass1 needs to be able to use BaseClass immediately at definition time, this code may still fail if the baseclass module is imported before subclass1. So it's not ideal
Another option would be to change baseclass to do its import below the definition of BaseClass. This way the subclass module will be able to import the name when it needs to:
class BaseClass:
def make(self,somearg):
# .. some logic to decide which subclass to create
ret = SubClass1()
from .subclass1 import SubClass1
This is not ideal because the normal place to put imports is at the top of the file. Putting them elsewhere makes the code more confusing. You may want to put a comment up at the top of the file explaining why you're delaying the import if you go this route.
Another option may be to combine your two modules into a single file. Python doesn't require each class to have its own module like some other languages do. When you have tightly coupled classes (like the ones in your example), it makes a lot of sense to put them all in one place. This lets you avoid the whole issue, since you don't need any imports at all.
Finally, there are some more complicated solutions, like dependency injection. Rather than the base class needing to know about the subclasses, each subclass could register itself by calling some function and passing a reference to itself. For example:
# no imports of subclasses!
def BaseClass:
subclasses = []
def make(self, somearg):
for sub in self.subclasses:
if sub.accepts(somearg):
return sub()
raise ValueError("no subclass accepts value {!r}".format(somearg))
#classmethod
def register(cls, sub):
cls.subclasses.append(sub)
return sub # return the class so it can be used as a decorator!
And in subclass.py
from .baseclass import BaseClass
#BaseClass.register
class SubClass1(BaseClass):
#classmethod
def accepts(cls, somearg):
# put logic for picking this subclass here!
return True
This style of programming is a bit more complicated, but it can be nice since it's easier to extend than a version where BaseClass needs to know about all of the subclasses up front. There are a variety of ways you can implement this style of code, using a register function is just one of them. One nice thing about it is that it doesn't strictly require inheritance (so you could register a class that doesn't actually inherit from BaseClass if you wanted to). If you are only dealing with actual inheriting subclasses, you might want to consider using a metaclass that does all the registration of subclasses for you automatically.
I have a python source file with a class defined in it, and a class from another module imported into it. Essentially, this structure:
from parent import SuperClass
from other import ClassA
class ClassB(SuperClass):
def __init__(self): pass
What I want to do is look in this module for all the classes defined in there, and only to find ClassB (and to overlook ClassA). Both ClassA and ClassB extend SuperClass.
The reason for this is that I have a directory of plugins which are loaded at runtime, and I get a full list of the plugin classes by introspecting on each .py file and loading the classes which extend SuperClass. In this particular case, ClassB uses the plugin ClassA to do some work for it, so is dependent upon it (ClassA, meanwhile, is not dependent on ClassB). The problem is that when I load the plugins from the directory, I get 2 instances of ClassA, as it gets one from ClassA's file, and one from ClassB's file.
For packages there is the approach:
__all__ = ['module_a', 'module-b']
to explicitly list the modules that you can import, but this lives in the __init__.py file, and each of the plugins is a .py file not a directory in its own right.
The question, then, is: can I limit access to the classes in a .py file, or do I have to make each one of them a directory with its own init file? Or, is there some other clever way that I could distinguish between these two classes?
You meant "for packages there is the approach...". Actually, that works for every module (__init__.py is a module, just with special semantics). Use __all__ inside the plugin modules and that's it.
But remember: __all__ only limits what you import using from xxxx import *; you can still access the rest of the module, and there's no way to avoid that using the standard Python import mechanism.
If you're using some kind of active introspection technique (eg. exploring the namespace in the module and then importing classes from it), you could check if the class comes from the same file as the module itself.
You could also implement your own import mechanism (using importlib, for example), but that may be overkill...
Edit: for the "check if the class come from the same module":
Say that I have two modules, mod1.py:
class A(object):
pass
and mod2.py:
from mod1 import A
class B(object):
pass
Now, if I do:
from mod2 import *
I've imported both A and B. But...
>>> A
<class 'mod1.A'>
>>> B
<class 'mod2.B'>
as you see, the classes carry information about where did they originate. And actually you can check it right away:
>>> A.__module__
'mod1'
>>> B.__module__
'mod2'
Using that information you can discriminate them easily.
I am new to python and found that I can import a module without importing any of the classes inside it. I have the following structure --
myLib/
__init__.py
A.py
B.py
driver.py
Inside driver.py I do the following --
import myLib
tmp = myLib.A()
I get the following error trying to run it.
AttributeError: 'module' object has no attribute A
Eclipse does not complain when I do this, in fact the autocomplete shows A when I type myLib.A.
What does not it mean when I import a module and not any of the classes inside it?
Thanks
P
Python is not Java. A and B are not classes. They are modules. You need to import them separately. (And myLib is not a module but a package.)
The modules A and B might themselves contain classes, which might or might not be called A and B. You can have as many classes in a module as you like - or even none at all, as it is quite possible to write a large Python program with no classes.
To answer your question though, importing myLib simply places the name myLib inside your current namespace. Anything in __init__.py will be executed: if that file itself defines or imports any names, they will be available as attributes of myLib.
If you do from myLib import A, you have now imported the module A into the current namespace. But again, any of its classes still have to be referenced via the A name: so if you do have a class A there, you would instantiate it via A.A().
A third option is to do from myLib.A import A, which does import the class A into your current namespace. In this case, you can just call A() to instantiate the class.
You need to do
from mylib import A
Because A is not an attribute of __init__.py inside mylib
When you do import mylib it imports __init__.py
See my answer.
About packages
Here's a python module. foo is in sys.path.
foo\
__init__.py
bar\
__init__.py
base.py
class Base(object)
derived.py
import foo.bar.base as base
class Derived(base.Base)
I've got nothing fancy going on yet. If I want to instantiate the Derived class from the derived module, I can do that easily enough:
import foo.bar.derived as derived
print(derived.Derived())
However, I'd like to just import the bar module and call bar.Derived(), because I plan to have lots of classes within lots of different modules, and I don't want to deal with all these tentacular import paths. My understanding is that I can simply import Derived into the namespace of the bar module, by modifying my project like so:
foo\
__init__.py
bar\
__init__.py
from foo.bar.derived import Derived
base.py
class Base(object)
derived.py
import foo.bar.base as base
class Derived(base.Base)
Now I should be able to do the following:
import foo.bar as bar
print(bar.Derived())
But I get an AttributeError complaining that the foo module has no submodule called bar:
test.py (1): import foo.bar
foo\bar\__init__.py (1): from foo.bar.derived import Derived
foo\bar\derived.py (1): import foo.bar.base as base
AttributeError: 'module' object has no attribute 'bar'
In fact, my original test code (at top) doesn't work either! As soon as I try to import foo.bar, I get errors.
What I can gleam from this error is that the import statement in __init__.py causes derived.py to be executed before bar is fully loaded, and therefore it can't import the module (also from bar) which contains its own base class. I'm coming from the C++ world, where ultra-nested namespaces aren't as integral and a simple forward declaration would negate this problem, but I've been led to believe that what I'm looking for is possible and at least a somewhat acceptably Pythonic solution. What am I doing wrong? What's the correct way to make classes from a submodule available in the parent module's namespace?
If you're working with Python 2.5 or later, try using explicit relative imports (http://www.python.org/dev/peps/pep-0328/#guido-s-decision):
test.py (1): import foo.bar
foo\bar\__init__.py (1): from .derived import Derived
foo\bar\derived.py (1): from . import base
(Note that if you are indeed working with Python 2.5 or 2.6, you'll need to include from __future__ import absolute_import in your modules.)
in derived.py, use this:
EDIT: as JAB pointed out, implicit relative imports are deprecated, to the following isn't recommended (although it does work still in Python 2.7 - with no deprecation errors!).
import base # this is all you need - it's in the current directory
Instead, use:
from . import base #
(or)
from foo.bar import base
instead of:
import foo.bar.base as base
This will solve both your errors (since they're from the same issue). Your import doesn't work since there is no base function or class inside the foo.bar.base module.
If I were to use a class in a module how would I make it "top level" in it's instance?
Structure:
/package
__init__.py
/subPackage
__init__.py
module.py
subModule.py
/theScript.py
python theScript.py
Source of theScript:
import package.subPackage.module
package.subPackage.module.method()
Source of /package/subPackage/module.py:
class module:
def method(self): pass
moduleInstance = module()
I guess what I am asking is how would I make it so that I don't have to do package.subPackage.module.moduleInstance.method() and could package.subPackage.module.method()
I know I can just remove the class and instance but I prefer the class because it makes it easier to subclass later if somebody wants to without needing to modify our source directly but if I ultimately have to I will just use methods instead of class/method in module.
Here is how the random module in the standard lib solved this problem:
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
...
Seems a reasonable solution to me. Of course there is the drawback that you have to manually keep the method lists in sync, but the worst thing that will happen if you forget to add a method name is an error message that tells you exactly what is missing.