Dynamically imported module doesn't think it has class - python

Setup: Python 3.3
I'm making an application that looks through a folder called 'sources' for .py files, and look in them to find classes that extend a class called 'SourceBase' that I defined. If they extend SourceBase, I want to make a new instance of the class to work on.
I've done some fair amount of research through the following posts, which I understand for the most part:
Dynamic importing of modules followed by instantiation of objects with a certain baseclass from said modules
How to dynamically load a Python class
Dynamic loading of python modules
My folder setup is like this, which I beleive is relevant:
EPDownloader [package]
\
epdownloader.py [main]
SourceBase.py [contains SourceBase class]
imageutils.py [this class will find and dynamically load the classes in the sources package]
sources [package]
\
source1.py [has class that extends SourceBase]
source2.py
...other plugins here...
My issue is that I'm using the following code (from the other stack overflow questions I listed above) and it's searching through my module for classes, but it doesn't find my classes. It just skips them. I'm not sure what's wrong. Here is my code that is performing the search (its based off the first link I posted):
<!--language: python-->
def getSources(self):
pluginbase=SourceBase.SourceBase
searchpath='sources'
#We want to iterate over all modules in the sources/ directory, allowing the user to make their own.
for root, dirs, files in os.walk('./'+searchpath):
print('files: ',files)
candidates = [fname for fname in files if fname.endswith('.py')
and not fname.startswith('__')]
classList=[]
if candidates:
for c in candidates:
modname = os.path.splitext(c)[0]
print('importing: ',modname)
module=__import__(searchpath+'.'+modname) #<-- You can get the module this way
print('Parsing module '+modname)
for cls in dir(module): #<-- Loop over all objects in the module's namespace
print('Inspecting item from module: '+str(cls))
cls=getattr(module,cls) #this seems to still be a module when it hits source1
print('Get attribute: '+str(cls))
if (inspect.isclass(cls)): # Make sure it is a class
print('...is a class')
if inspect.getmodule(cls)==module: # Make sure it was defined in module, not just imported
print('...is in a module')
if issubclass(cls,pluginbase): # Make sure it is a subclass of base
print('...subclasses '+pluginbase.__name__)
classList.append(cls)
print(classList)
Here is the relevant output it gives me (I trimmed a lot of other stuff this code outputs):
Inspecting item from module: source1
Get attribute: <module 'sources.source1' from '/Users/Mgamerz/Documents/workspace/code/EPDownloader/sources/source1.py'>
[] <--- signifies it failed to find the source class
I'm pretty sure my subclassing works, here's a snippet of the class:
from EPDownloader import SourceBase
class source1(SourceBase.SourceBase):
def __init__(self):
pass
I'm stumped by this problem. I've spent the last few hours on it and I don't know what to do. I have a feeling its a simple fix I'm not seeing. Can someone help me find the bug here?
[Note: I looked through the StackOverflow formatting help, and don't see any way to format a 'highlight', the ones where it puts a grey background on text, but inline. It would help highlight parts of this question I'm trying to convey.]

Look at the documentation: http://docs.python.org/3.1/library/functions.html#import
When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned.
Simply replace
module=__import__(searchpath+'.'+modname)
with
module=__import__(searchpath+'.'+modname, None, None, "*")
It's the same as "from sources.source1 import *" which tells __import__ to fetch everything inside the given module.

There is something wrong with your __import__: instead of importing a module,
you're importing the whole package (the whole 'sources' directory as a package).
I could fix your code doing this:
for c in candidates:
modname = os.path.splitext(c)[0]
print('importing: ',modname)
# NEW CODE
sys.path.insert(0, searchpath)
module=__import__(modname) #<-- You can get the module this way
# END OF NEW CODE
print('Parsing module '+modname)
...

Related

ImportError for top-level package when trying to use dill to pickle entire package source code alongside instance

I have the following project structure:
Package1
|--__init__.py
|--__main__.py
|--Module1.py
|--Module2.py
where Module1.py contains something like:
import dill as pickle
import Package1.Module2
# from https://stackoverflow.com/questions/52402783/pickle-class-definition-in-module-with-dill
def mainify(obj):
import __main__
import inspect
import ast
s = inspect.getsource(obj)
m = ast.parse(s)
co = compile(m, "<string>", "exec")
exec(co, __main__.__dict__)
def Module1():
"""I hope the details of this class are not necessary for this example. I can add detail if necessary
"""
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
I run the code as a module via python -m Package1 ..., thus __main__.py is the entry point to package execution, though I hope these details aren't relevant (I can improve my example if necessary).
Now, when I try to load the pickled object, I get ModuleNotFoundError: No module named Package1.
How can tell dill in this situation to understand that Package1 is the package? The mainify function seems to be getting the modules' source code into the pickle, but I believe the import statement in Module1.py that is import Package1.Module2.py is causing the ImportError. How can I tell dill to understand the reference to Package1?
NOTE: this reference can be fixed by adding the directory that Package1 is in via sys.path.append. But the whole point of pickling the package source alongside the instance is to make pickled instance unpicklable without needed to do this.
Relevant posts:
Pickle class definition in module with dill
Why dill dumps external classes by reference, no matter what?
#courtyardz. I'm a contributor of dill and your question is similar to others that have been asked in the past.
First, let me explain that generally dill assumes that all the modules necessary to deserialize an object are importable in the "unpickling" environment. Therefore modules are almost always saved by reference, with the current exception of modules that are not properly installed, like local modules (e.g. located in the working directory) or modules at non-canonical paths added to sys.path. There's also a function that's able to save the complete state of a module, which can be restored afterwards, but not the module itself.
That said, what exactly do you need? It's to serialize an object alongside its class (including any objects in the module's namespace that it refers to), or it's really the whole module?
If you need to transfer the complete module to an interpreter session where it's not available, like in a different machine, this problem is under active discussion here: https://github.com/uqfoundation/dill/issues/123. There's no complete solution for this currently, but one possibility is to ship the module as a ZIP archive, and load it using the zipimport module (indirectly, by saving the zip file to disk, maybe in a temporary location, and adding its path to sys.path as described in Python's documentation).
If you just need to serialize an object with its class, note that doing such has the limitation that objects of that class pickled by separate calls to dill.dump() or dill.dumps() will end up having different (although identical) classes when unpickled. This may or may not be a problem. There's also an open discussion about forcing the serialization of a class by value: https://github.com/uqfoundation/dill/issues/424.
The workaround you are trying to use should work because dill pickles classes defined in the __main__ module by value, as well as "orphaned" classes, i.e. classes that can't be found in the module where they were defined. However, for this to work the object must be created by the __main__.Module1 class (I suppose this is a class, even though you used def instead of class in your code example), not the Package1.Module1.Module1 class. If the class references global objects in Module1 in its methods, you may need to use the option recurse=True with dill.dump(s).
A simpler workaround, that may not work for your specific case as it involves multiple modules, is to temporarily change the __module__ attribute of the class. For example, at a module's body:
import dill
class X:
pass
obj = X()
X.__module__ = None # temporarily orphan the class
with open('/path/to/file.pkl', 'wb') as file:
dill.dump(obj) # X will be pickled by value because __module__ is None
X.__module__ = __name__ # de-orphan the class
Going back to your example, if you can't create the object with the "mainified" class, you may change the object's class temporarily too:
obj_to_pickle = Module1()
def write_session():
mainify(Module1)
mainify(Module2)
obj_to_pickle.__class__ = __main__.Module1
with FileHandler.open_file(...) as f:
pickle.dump(obj_to_pickle, f)
obj_to_pickle.__class__ = Module1
If the object has instance attributes of types defined in Package1, it won't work however.

Lazy-loading modules in python

I'm trying to put together a system that will handle lazy-loading of modules that don't explicitly exist. Basically I have an http server with a number of endpoints that I don't know ahead of time that I would like to programmatically offer for import. These modules would all have a uniform method signature, they just wouldn't exist ahead of time.
import lazy.route as test
import lazy.fake as test2
test('Does this exist?') # This sends a post request.
test2("This doesn't exist.") # Also sends a post request
I can handle all the logic I need around these imports with a uniform decorator, I just can't find any way of "decorating" imports in python, or actually interacting with them in any kind of programmatic way.
Does anyone have experience with this? I've been hunting around, and the closest thing I've found is the ast module, which would lead to a really awful kind of hacky implementation in my current under my current understanding (something like finding all import statements and manually over-writing the import function)
Not looking for a handout, just a piece of the python codebase to start looking at, or an example of someone that's done something similar.
I got a little clever in my googling and managed to find a PEP that specifically addressed this issue, it just happens to be relatively unknown, probably because the subset of reasonable uses for this is pretty narrow.
I found an excellent piece of example code showing off the new sys.meta_path implementation. I've posted it below for information on how to dynamically bootstrap your import statements.
import sys
class VirtualModule(object):
def hello(self):
return 'Hello World!'
class CustomImporter(object):
virtual_name = 'my_virtual_module'
def find_module(self, fullname, path):
"""This method is called by Python if this class
is on sys.path. fullname is the fully-qualified
name of the module to look for, and path is either
__path__ (for submodules and subpackages) or None (for
a top-level module/package).
Note that this method will be called every time an import
statement is detected (or __import__ is called), before
Python's built-in package/module-finding code kicks in."""
if fullname == self.virtual_name:
# As per PEP #302 (which implemented the sys.meta_path protocol),
# if fullname is the name of a module/package that we want to
# report as found, then we need to return a loader object.
# In this simple example, that will just be self.
return self
# If we don't provide the requested module, return None, as per
# PEP #302.
return None
def load_module(self, fullname):
"""This method is called by Python if CustomImporter.find_module
does not return None. fullname is the fully-qualified name
of the module/package that was requested."""
if fullname != self.virtual_name:
# Raise ImportError as per PEP #302 if the requested module/package
# couldn't be loaded. This should never be reached in this
# simple example, but it's included here for completeness. :)
raise ImportError(fullname)
# PEP#302 says to return the module if the loader object (i.e,
# this class) successfully loaded the module.
# Note that a regular class works just fine as a module.
return VirtualModule()
if __name__ == '__main__':
# Add our import hook to sys.meta_path
sys.meta_path.append(CustomImporter())
# Let's use our import hook
import my_virtual_module
print my_virtual_module.hello()
The full blog post is here

Importing Class in Python Subpackage imports more than requested

Overview
I'm running some scientific simulations and I want to process the resulting data in Python. The simulation produces a custom data type that is not used outside of the chain of programs that the authors of the simulation produced, so unfortunately I need what they provide me.
They want me to install two files:
A module called sdds.py that defines a class that provides all user functions and two demos
A compiled module called sddsdatamodule.so that only provides helper functions to sdds.py.
(I find it strange that they're offering me two modules that are so inextricably connected, it doesn't seem like good coding practice to me, but using their code is probably better than rewriting things from scratch.) I'd prefer not to install them directly into my path, side by side. They come from the same company, they're designed to do one specific task together: access and manipulate SDDS-type files.
So I thought I would put them in a package. I could install that on my path, it would be self-contained, and I could easily find and uninstall or upgrade the modules from one location. Then I could hide their un-Pythonic solution in a more-Pythonic package without significantly rewriting things. Seems elegant.
Details
The package I actually use is found here:
http://www.aps.anl.gov/Accelerator_Systems_Division/Accelerator_Operations_Physics/software.shtml#PythonBinaries
Unfortunately, they only support Windows and Mac OS X right now. Compiling the source code is quite onerous, and apparently they have no significant requests for Linux/Unix. I have a Mac, so thankfully this isn't a problem for me.
So my directory tree looks like this:
SDDSPython/ My toplevel package
__init__.py Designed to only import the SDDS class
sdds.py Defines SDDS class and two demo methods
sddsdatamodule.so Defines sddsdata module used by SDDS class.
My __init__.py file literally only contains this:
from sdds import SDDS
The sdds.py file contains the class definition and the two demo definitions. The only other code in the sdds.py file is:
import sddsdata, sys, time
class SDDS:
(lots of code here)
def demo(output):
(lots of code here)
def demo2(output):
(lots of code here)
I can then import SDDSPython and check, using dir:
>>> import SDDSPython
>>> dir(SDDSPython)
['SDDS', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'sdds', 'sddsdata']
So I can now access the SDDS class via SDDSPython.SDDS
Question
How on earth did SDDSPython.sdds and SDDSPython.sddsdata get loaded into the SDDSPython namespace??
>>> SDDSPython.sdds
<module 'SDDSPython.sdds' from 'SDDSPython/sdds.pyc'>
>>> SDDSPython.sddsdata
<module 'SDDSPython.sddsdata' from 'SDDSPython/sddsdatamodule.so'>
I thought by creating an __init__.py file I was specifically excluding the sdds and sddsdata modules from being loaded into the SDDSPython namespace. What is going on? I can only assume this is happening due to something in the sddsdatamodule.so file? But how can a module affect its parent's namespace like that? I'm rather lost, and I don't know where to start. I've looked at the C code, but I don't see anything suspicious. To be fair- I probably don't know what something suspicious would look like, I'm probably not familiar enough with programming C extensions for Python.
Curious question--I did some investigation for you using a similar test case.
XML/
__init__.py -from indent import XMLIndentGenerator
indent.py -contains class XMLIndentGenerator, and Xml
Sink.py
It appears that importing a class from a module, even though you are importing just a portion, the entire module is accessible in the way you described, that is:
>>>import XML
>>>XML.indent
<module 'XML.indent' from 'XML\indent.py'>
>>>XML.indent.Xml #did not include this in the from
<class 'XML.indent.Xml'>
>>>XML.Sink
Traceback (most recent call last):
AttributeError:yadayada no attribute 'Sink'
This is expected, since I did not import Sink in __init__.py.....BUT!
I added a line to indent.py:
import Sink
class XMLIndentGenerator(XMLGenerator):
(code)
Now, since this class imports a module contained within the XML package, if i do:
>>>import XML
>>>XML.Sink
<module 'XML.Sink' from 'XML\Sink.pyc'>
So, it appears that because your imported sdds module also imports sddsdata, you are able to access it. That answers the "How" portion of your question, but "why" this is the case, I'm sure there's an answer somewhere in the docs :)
I hope this helps - I was literally doing this as I was typing the answer! A learning experience for me as well.
This happens because python imports don't work the way you might think. They work like this:
the import machinery looks for a file that should be the module requested from the import
a types.ModuleType instance is created, several attributes on it are set to the corresponding file (__file__, __name__ and so on), and that object is inserted into sys.modules under the fully qualified module name it would have.
if this is a submodule import (ie, sdds.py which is a submodule in SDDSPython), the newly created module is attached as an attribute to the existing python module of the parent package.
the file is "executed" with that module as its global scope; all names defined by that file appear as attributes of the module.
in the case of a from import, an attribute from the module may be returned to the importing script.
So that means if I import a module (say, foo.py) that has, as its source only:
import bar
then there is a global in foo, called bar, and I can access it as foo.bar.
There is no capacity in python for "only execute the part of this python script i want to use right now." The whole thing runs.

Is there a way to get all modules 'under' a specific module in the hierarchy?

Is there a way of listing all python modules directly underneath a specified model in the hierarchy?
I've got a Django web-app that is slowly growing, and I've re-organised it based on this article:
http://paltman.com/2008/01/29/breaking-apart-models-in-django/
However, I'm trying to improve on his technique by making use of introspection in the module initialization file (__ init __.py) in order to auto-detect all instances of the Django model class in the subordinate jobs. I've got this sort of got this working, but it still needs a static list of modules in the tree above it to work.
In case people are interested, here's what my solution looks like:
from django.db.models.base import ModelBase
from sys import modules
moduleList = ['TechTree', 'PilotAbilities']
__all__ = []
for moduleName in moduleList:
fullyQualifiedModuleName = '%s.%s' % (__name__, moduleName)
moduleObj = __import__(fullyQualifiedModuleName)
__all__ += [item for item in dir(moduleObj) if isinstance(getattr(moduleObj, item), ModelBase)]
Well, you can get os.path.dirname(parent.__file__), and then glob.glob() or os.walk() and look for other init.py files.
This is just an attempt, but it seems to work. I'm running python 2.7.1 and dir(somemodule) gives me all associated modules. Tested with the openpyxl and os modules.
def import_all(name):
__import__(name)
for i in dir(name):
try:
if type(i) == type(name):
import_all(name+'.'+i)
except:
pass
Note:
This is probably extremely unpythonic and discouraged, but it seems to work.
Note 2:
That was because I accidentally had 'openpyxl' (a test I was using) instead of the list submodules. Sorry

Returning an instance of a class from a file in python

In my program I have a package filled with various .py files each containing a class definition. I want to make a list where each entry is an instance of one of those classes. In addition, my program doesn't know how many files are in the package or what the files or classes are called, so I can't just import each file. Ideally, I should be able to modify the contents of the package (take out files, put new ones in, etc.) without having to rewrite other parts of the program. Is there a way to do this?
Originally, I had a 'if __name__ == '__main__': return foo()' line in each file and tried to append to the list using execfile(), but obviously this doesn't work. Any ideas?
Sorry if this is kinda vague. I'll try to clarify if needed. I'm using Python 2.5.4.
EDIT:
My program is a random character generator for Dungeons and Dragons. I made a package for every major data type the program needs. I have a package for Classes, Races, Items, etc. and when making a character, my program makes a list of each data type that it can sort through when making a character. For example, when equipping a character, the program can look at the Weapon list and filter out all the weapons that are unsuitable for that character and then randomly choose from the ones that remain.
I don't want to specify file names because I would like the ability to easily add to this program later. If later on down the road I wanted to add more weapon types to the program, I could just write a few new class descriptions and drop them in the Weapons package, and the program could use them without me needing to edit any other code.
This sounds like a bit of a bad design. It would probably be better if you elaborate on the problem and we can help you to solve it some other way. However, what you want isn't hard:
import types
import my_package
my_package_members = [getattr(my_package, i) for i in dir(my_package)]
my_modules = [i for i in my_package_members if type(i) == types.ModuleType]
instances = []
for my_module in my_modules:
my_module_members = [getattr(my_module, i) for i in dir(my_module)]
my_classes = [i for i in my_module_members
if type(i) in (types.TypeType, types.ClassType)]
for my_class in my_classes:
instances.append(my_class())
EDIT: Simplified the code a bit.
To acheive this you are going to need to do the following things:
Have your code enumerate the source files containing your code.
For each source file, import the code specified in the file into a new module.
For each module, locate all the classes contained, instantiate each one and add it to your final list.
To take each part in turn:
To enumerate the source files, use os.walk and os.path to find the files and build full paths to the source.
To import code from a given source file dynamically, you can do execfile(my_file) in my_dict where my_file is the full path to your source file and my_dict is a dictionary to return the resulting code in (any classes declared in the source file would become members of this dict for example). Note you only need to use this method if the files you are importing are not part of a valid python module/package hierarchy (with an init.py file in the package) - if they are you can use import() instead.
To enumerate the classes declared in a given module you could use inspect.getmembers().
If you're willing to do a bit more work, you can use pkg_resource's entry points to advertise and discover the relevant classes. The Fedora Account System uses this to provide plugin functionality.
Assuming, first, that all your modules exist as .py files in the package's directory:
import inspect, glob, os, sys
def thelistyouwant(pathtothepackage):
sys.path.insert(0, pathtothepackage)
result = []
for fn in glob.glob(os.path.join(pathtothepackage, '*.py')):
if fn.startswith('_'): continue # no __init__ or other private modules
m = __import__(fn[:-3])
classes = inspect.getmembers(m, inspect.isclass)
if len(classes) != 1:
print>>sys.stderr, "Skipping %s (%d != 1 classes!)" % (fn, len(classes))
continue
n, c = classes[0]
try:
result.append(c())
except TypeError:
print>>sys.stderr, "Skipping %s, can't build a %s()" % (fn, n)
del sys.path[0]
return result
Further assumptions: each module should have exactly 1 class (otherwise it's skipped with a warning) instantiable without arguments (ditto ditto); you don't want to look at __init__.py (if any; actually this code does not require the path to be an actual package, any directory will do, so __init__.py may or may not be present) nor any module whose name starts with an underscore ("private" modules of the package).

Categories