How to load modules dynamically on package import?

How to load modules dynamically on package import? - python

Given the following example layout:
test/
test.py
formats/
__init__.py
format_a.py
format_b.py
What I try to archive is, that whenever I import formats, the __init__.py looks for all available modules in the formats subdir, loads them and makes them available (right now simply through a variable, supported_formats). If theres a better, more pythonic or otherwise approach to dynamically loading stuff on runtime, based on physical available files, please tell.
My Approach
I tried something like this (in __init__.py):
supported_formats = [__import__(f[:f.index('.py')]) for f in glob.glob('*.py')]
So far I just get it to work, when I run __init__.py from the command line (from the formats subdir or from other directories) . But when I import it from test.py, it bails on me like this:
ImportError: No module named format_a.py
Same when I import it from the python interpreter, when I started the interpreter in an other directory but the formats subdir.
Here's the whole code. It also looks for a specific class and stores one instance of each class in an dict, but loading the modules dynamically is the main part I don't get:
def dload(get_cls=True, get_mod=True, key=None, fstring_mod='*.py', fstring_class=''):
if p.dirname(__file__):
path = p.split(p.abspath(__file__))[0]
fstring_mod = p.join(path, fstring_mod)
print >> sys.stderr, 'Path-Glob:', fstring_mod
modules = [p.split(fn)[1][:fn.index('.py')] for fn in glob.glob(fstring_mod)]
print >> sys.stderr, 'Modules:', ', '.join(modules)
modules = [__import__(m) for m in modules]
if get_cls:
classes = {} if key else []
for m in modules:
print >> sys.stderr, "-", m
for c in [m.__dict__[c]() for c in m.__dict__ if c.startswith(fstring_class)]:
print >> sys.stderr, " ", c
if key:
classes[getattr(c, key)] = c
else:
classes.append(c)
if get_mod:
return (modules, classes)
else:
return classes
elif get_mod:
return modules
_supported_formats = dload(get_mod=False, key='fid', fstring_mod='format_*.py', fstring_class='Format')
My Idea
The whole messing with filesystem-paths and the like is probably messy anyway. I would like to handle this with module namespaces or something similar, but I'm kinda lost right now on how start and how to address the modules, so they reachable from anywhere.

There are two fixes you need to make to your code:
You should call __import__(m, globals(), locals()) instead of __import__(m). This is needed for Python to locate the modules within the package.
Your code doesn't remove the .py extension properly since you call index() on the wrong string. If it will always be a .py extension, you can simply use p.split(fn)[1][:-3] instead.

First you must make it so that your code works regardless of the current working directory. For that you use the __file__ variable. You should also use absolute imports.
So something like (untested):
supported_formats = {}
for fn in os.listdir(os.path.dirname(__file__)):
if fn.endswith('.py'):
exec ("from formats import %s" % fn[:-3]) in supported_formats

A module is searched in sys.path. You should be able to extend sys.path with the path to your module. I'm also not really sure whether you can load a module on sys.path with a 'module.py' convention, I would think without '.py' is preferred.
This is obviously not a solution, but may be handy nonetheless.

I thought if you did something that, 'formats' would be your package, so when you tell it import formats you should be able to access the rest of the modules inside that package, so, you would have something like formats.format_a.your_method
Not sure though, I'm just a n00b.

Here's the code I came up with after the corrections from interjay. Still not sure if this is good style.
def load_modules(filemask='*.py', ignore_list=('__init__.py', )):
modules = {}
dirname = os.path.dirname(__file__)
if dirname:
filemask = os.path.join(dirname, filemask)
for fn in glob.glob(filemask):
fn = os.path.split(fn)[1]
if fn in ignore_list:
continue
fn = os.path.splitext(fn)[0]
modules[fn] = __import__(fn, globals(), locals())
return modules

Related

Iterate on modules in Python

So I have a nested folder in which I have modules that perform some action.
Note: they are not classes it's just scripts.
I would like to iterate on those modules.
What I have now:
from scripts.module_1 import train_module_1
from scripts.module_2 import train_module_2
from scripts.module_3 import train_module_3
from scripts.module_4 import train_module_4
def test_train_module_1():
try:
train_module_1.main('test.csv')
except ValueError as value_error:
assert False, "test_train_module_1 failed:" + str(value_error)
...
The same for all train modules
This is how my dir looks like, My code is written in my_test.py :
tests
my_test.py
scripts
module_1
__init__.py
train_module_1.py
module_1_blabla.py
module_2
__init__.py
train_module_2.py
module_2_blabla.py
...
I wonder if I can somehow iterate on the those modules, in each module
take only the files that starts with "train_"
and perform the main function in each. I basically know how to do it But I didnt find a good solution for this kind of iterations.
I need dynamically to get the modules from scripts. so that even if someone will add a module I won't need to change the code here.
Is there something like that:
for i in scripts.children():
for j in i.children():
if j.__name__.startswith('train_'):
try:
j.main(f'{j.__name__}_test.csv')
except ValueError as value_error:
assert False, f'test_{j.__name__} failed: {value_error}'
Thanks in advance

Yes, there are several approaches, depending on your exact needs.
You could, for instance, get a list of the module names in your directory and then import them using the built-in function __import__('...') like so:
for module_name in list:
mod = __import__(module_name)
mod.main(module_name + "_test.csv)
If, on the other hand, you have already imported the modules, you can find them by looking at sys.modules (which is a dictionary of all currently imported modules).
import sys
for name in sys.modules:
if name.startswith("train_"):
mod = sys.modules[name]
mod.main(name + "_test.csv")
UPDATE: Here is a more complete version that goes through a directory structure and finds all the Python-modules that start with train_, imports them, and executes their main-function.
import os
for dir in os.scandir('.'):
if dir.is_dir():
for file in os.scandir(dir.path):
if file.name.startswith('train') and file.name.endswith('.py'):
name = file.name[:-3] # without the '.py' at the end
package = __import__(dir.name + '.' + name)
mod = getattr(package, name)
mod.main()
Note that the __import__ function returns the base package (i.e. scripts in your case), so we have to retrieve the module we want through getattr() first.

You need two different things:
import some modules and only one specific name from each one
execute the main function from them
The importlib module can help for the first part:
train = {}
for num in '1234':
mod = importlib.import_module('script.module_' + i)
train[i] = getattr(mod, 'train_module_' + i)
You can then easily invoke them:
for i, t in train.items():
try:
t.main('test.csv')
except ValueError as value_error:
assert False, f"test_train_module_{i} failed:" + str(value_error)

name ' ' is not defined

I have a function which is stored in builtins. This is used to load python modules with relative paths from the projects base directory. The projecs base directory is stored under builtins.absolute
Function below:
def projectRelativeImport(fileName, projectRelativePath, moduleName = None):
# if moduleName not set, set it to file name with first letter capatilised
if moduleName is None:
moduleName = fileName[:1].capitalize() + fileName[1:]
# we shouldn't be passing fileName with an extension unless moduleName is set due to previous if. So in those cases we add .py
if len(fileName) >= 3 and fileName[-3:] != '.py':
fileName = fileName + '.py'
dir = os.path.join(builtins.absolute, projectRelativePath)
full = os.path.join(dir, fileName)
sys.path.append(dir)
imp.load_source(moduleName, full)
sys.path.remove(dir)
On one of my other files I use projectRelativeImport('inputSaveHandler', 'app/util', 'SaveHandler') to import SaveHandler from app/util/inputSaveHandler.py. This runs through the project RelativeImport absolutely fine. Correct strings are being used by imp, I've printed to check.
But a couple of lines after that execution I have a line
handler = SaveHandler.ConfHandler()
Which throws NameError: name 'SaveHandler' is not defined
I realise my project relative import function is a bit odd, especially since I have it globally saved using builtins (there's probably a better way but I only started using python over the last two days). But I'm just a bit confused as to why the name isn't being recognised. Do I need to return something from imp due to scope being rubbish as the project relative import function is in a different file?

I fixed this by returning from projectRelativeImport() what was passed back from imp.load_source as shown below:
sys.path.append(dir)
submodule = imp.load_source(moduleName, full)
sys.path.remove(dir)
return submodule
Then when I used the import function the returned value now goes to a variable with the same name as that I gave to the module (all very strange)
SaveHandler = projectRelativeImport('inputSaveHandler', 'app/util', 'SaveHandler')
I got to this because it worked no problem from the file projectRelativeImport was defined in but not in any others. So it was clearly a scope issue to me, so I figured I'd just try returning whatever imp gave me and it worked

Python - variable for the scriptroot [duplicate]

I would like to see what is the best way to determine the current script directory in Python.
I discovered that, due to the many ways of calling Python code, it is hard to find a good solution.
Here are some problems:
__file__ is not defined if the script is executed with exec, execfile
__module__ is defined only in modules
Use cases:
./myfile.py
python myfile.py
./somedir/myfile.py
python somedir/myfile.py
execfile('myfile.py') (from another script, that can be located in another directory and that can have another current directory.
I know that there is no perfect solution, but I'm looking for the best approach that solves most of the cases.
The most used approach is os.path.dirname(os.path.abspath(__file__)) but this really doesn't work if you execute the script from another one with exec().
Warning
Any solution that uses current directory will fail, this can be different based on the way the script is called or it can be changed inside the running script.

os.path.dirname(os.path.abspath(__file__))
is indeed the best you're going to get.
It's unusual to be executing a script with exec/execfile; normally you should be using the module infrastructure to load scripts. If you must use these methods, I suggest setting __file__ in the globals you pass to the script so it can read that filename.
There's no other way to get the filename in execed code: as you note, the CWD may be in a completely different place.

If you really want to cover the case that a script is called via execfile(...), you can use the inspect module to deduce the filename (including the path). As far as I am aware, this will work for all cases you listed:
filename = inspect.getframeinfo(inspect.currentframe()).filename
path = os.path.dirname(os.path.abspath(filename))

In Python 3.4+ you can use the simpler pathlib module:
from inspect import currentframe, getframeinfo
from pathlib import Path
filename = getframeinfo(currentframe()).filename
parent = Path(filename).resolve().parent
You can also use __file__ (when it's available) to avoid the inspect module altogether:
from pathlib import Path
parent = Path(__file__).resolve().parent

#!/usr/bin/env python
import inspect
import os
import sys
def get_script_dir(follow_symlinks=True):
if getattr(sys, 'frozen', False): # py2exe, PyInstaller, cx_Freeze
path = os.path.abspath(sys.executable)
else:
path = inspect.getabsfile(get_script_dir)
if follow_symlinks:
path = os.path.realpath(path)
return os.path.dirname(path)
print(get_script_dir())
It works on CPython, Jython, Pypy. It works if the script is executed using execfile() (sys.argv[0] and __file__ -based solutions would fail here). It works if the script is inside an executable zip file (/an egg). It works if the script is "imported" (PYTHONPATH=/path/to/library.zip python -mscript_to_run) from a zip file; it returns the archive path in this case. It works if the script is compiled into a standalone executable (sys.frozen). It works for symlinks (realpath eliminates symbolic links). It works in an interactive interpreter; it returns the current working directory in this case.

The os.path... approach was the 'done thing' in Python 2.
In Python 3, you can find directory of script as follows:
from pathlib import Path
script_path = Path(__file__).parent

Note: this answer is now a package (also with safe relative importing capabilities)
https://github.com/heetbeet/locate
$ pip install locate
$ python
>>> from locate import this_dir
>>> print(this_dir())
C:/Users/simon
For .py scripts as well as interactive usage:
I frequently use the directory of my scripts (for accessing files stored alongside them), but I also frequently run these scripts in an interactive shell for debugging purposes. I define this_dir as:
When running or importing a .py file, the file's base directory. This is always the correct path.
When running an .ipyn notebook, the current working directory. This is always the correct path, since Jupyter sets the working directory as the .ipynb base directory.
When running in a REPL, the current working directory. Hmm, what is the actual "correct path" when the code is detached from a file? Rather, make it your responsibility to change into the "correct path" before invoking the REPL.
Python 3.4 (and above):
from pathlib import Path
this_dir = Path(globals().get("__file__", "./_")).absolute().parent
Python 2 (and above):
import os
this_dir = os.path.dirname(os.path.abspath(globals().get("__file__", "./_")))
Explanation:
globals() returns all the global variables as a dictionary.
.get("__file__", "./_") returns the value from the key "__file__" if it exists in globals(), otherwise it returns the provided default value "./_".
The rest of the code just expands __file__ (or "./_") into an absolute filepath, and then returns the filepath's base directory.
Alternative:
If you know for certain that __file__ is available to your surrounding code, you can simplify to this:
>= Python 3.4: this_dir = Path(__file__).absolute().parent
>= Python 2: this_dir = os.path.dirname(os.path.abspath(__file__))

Would
import os
cwd = os.getcwd()
do what you want? I'm not sure what exactly you mean by the "current script directory". What would the expected output be for the use cases you gave?

Just use os.path.dirname(os.path.abspath(__file__)) and examine very carefully whether there is a real need for the case where exec is used. It could be a sign of troubled design if you are not able to use your script as a module.
Keep in mind Zen of Python #8, and if you believe there is a good argument for a use-case where it must work for exec, then please let us know some more details about the background of the problem.

First.. a couple missing use-cases here if we're talking about ways to inject anonymous code..
code.compile_command()
code.interact()
imp.load_compiled()
imp.load_dynamic()
imp.load_module()
__builtin__.compile()
loading C compiled shared objects? example: _socket?)
But, the real question is, what is your goal - are you trying to enforce some sort of security? Or are you just interested in whats being loaded.
If you're interested in security, the filename that is being imported via exec/execfile is inconsequential - you should use rexec, which offers the following:
This module contains the RExec class,
which supports r_eval(), r_execfile(),
r_exec(), and r_import() methods, which
are restricted versions of the standard
Python functions eval(), execfile() and
the exec and import statements. Code
executed in this restricted environment
will only have access to modules and
functions that are deemed safe; you can
subclass RExec add or remove capabilities as
desired.
However, if this is more of an academic pursuit.. here are a couple goofy approaches that you
might be able to dig a little deeper into..
Example scripts:
./deep.py
print ' >> level 1'
execfile('deeper.py')
print ' << level 1'
./deeper.py
print '\t >> level 2'
exec("import sys; sys.path.append('/tmp'); import deepest")
print '\t << level 2'
/tmp/deepest.py
print '\t\t >> level 3'
print '\t\t\t I can see the earths core.'
print '\t\t << level 3'
./codespy.py
import sys, os
def overseer(frame, event, arg):
print "loaded(%s)" % os.path.abspath(frame.f_code.co_filename)
sys.settrace(overseer)
execfile("deep.py")
sys.exit(0)
Output
loaded(/Users/synthesizerpatel/deep.py)
>> level 1
loaded(/Users/synthesizerpatel/deeper.py)
>> level 2
loaded(/Users/synthesizerpatel/<string>)
loaded(/tmp/deepest.py)
>> level 3
I can see the earths core.
<< level 3
<< level 2
<< level 1
Of course, this is a resource-intensive way to do it, you'd be tracing
all your code.. Not very efficient. But, I think it's a novel approach
since it continues to work even as you get deeper into the nest.
You can't override 'eval'. Although you can override execfile().
Note, this approach only coveres exec/execfile, not 'import'.
For higher level 'module' load hooking you might be able to use use
sys.path_hooks (Write-up courtesy of PyMOTW).
Thats all I have off the top of my head.

Here is a partial solution, still better than all published ones so far.
import sys, os, os.path, inspect
#os.chdir("..")
if '__file__' not in locals():
__file__ = inspect.getframeinfo(inspect.currentframe())[0]
print os.path.dirname(os.path.abspath(__file__))
Now this works will all calls but if someone use chdir() to change the current directory, this will also fail.
Notes:
sys.argv[0] is not going to work, will return -c if you execute the script with python -c "execfile('path-tester.py')"
I published a complete test at https://gist.github.com/1385555 and you are welcome to improve it.

To get the absolute path to the directory containing the current script you can use:
from pathlib import Path
absDir = Path(__file__).parent.resolve()
Please note the .resolve() call is required, because that is the one making the path absolute. Without resolve(), you would obtain something like '.'.
This solution uses pathlib, which is part of Python's stdlib since v3.4 (2014). This is preferrable compared to other solutions using os.
The official pathlib documentation has a useful table mapping the old os functions to the new ones: https://docs.python.org/3/library/pathlib.html#correspondence-to-tools-in-the-os-module

This should work in most cases:
import os,sys
dirname=os.path.dirname(os.path.realpath(sys.argv[0]))

Hopefully this helps:-
If you run a script/module from anywhere you'll be able to access the __file__ variable which is a module variable representing the location of the script.
On the other hand, if you're using the interpreter you don't have access to that variable, where you'll get a name NameError and os.getcwd() will give you the incorrect directory if you're running the file from somewhere else.
This solution should give you what you're looking for in all cases:
from inspect import getsourcefile
from os.path import abspath
abspath(getsourcefile(lambda:0))
I haven't thoroughly tested it but it solved my problem.

If __file__ is available:
# -- script1.py --
import os
file_path = os.path.abspath(__file__)
print(os.path.dirname(file_path))
For those we want to be able to run the command from the interpreter or get the path of the place you're running the script from:
# -- script2.py --
import os
print(os.path.abspath(''))
This works from the interpreter.
But when run in a script (or imported) it gives the path of the place where
you ran the script from, not the path of directory containing
the script with the print.
Example:
If your directory structure is
test_dir (in the home dir)
├── main.py
└── test_subdir
├── script1.py
└── script2.py
with
# -- main.py --
import script1.py
import script2.py
The output is:
~/test_dir/test_subdir
~/test_dir

As previous answers require you to import some module, I thought that I would write one answer that doesn't. Use the code below if you don't want to import anything.
this_dir = '/'.join(__file__.split('/')[:-1])
print(this_dir)
If the script is on /path/to/script.py then this would print /path/to. Note that this will throw error on terminal as no file is executed. This basically parse the directory from __file__ removing the last part of it. In this case /script.py is removed to produce the output /path/to.

print(__import__("pathlib").Path(__file__).parent)

Importing python modules from multiple directories

I have a python 2.6 Django app which has a folder structure like this:
/foo/bar/__init__.py
I have another couple directories on the filesystem full of python modules like this:
/modules/__init__.py
/modules/module1/__init__.py
/other_modules/module2/__init__.py
/other_modules/module2/file.py
Each module __init__ has a class. For example module1Class() and module2Class() respectively. In module2, file.py contains a class called myFileClass().
What I would like to do is put some code in /foo/bar/__init__.py so I can import in my Django project like this:
from foo.bar.module1 import module1Class
from foo.bar.module2 import module2Class
from foo.bar.module2.file import myFileClass
The list of directories which have modules is contained in a tuple in a Django config which looks like this:
module_list = ("/modules", "/other_modules",)
I've tried using __import__ and vars() to dynamically generate variables like this:
import os
import sys
for m in module_list:
sys.path.insert(0, m)
for d in os.listdir(m):
if os.path.isdir(d):
vars()[d] = getattr(__import__(m.split("/")[-1], fromlist=[d], d)
But that doesn't seem to work. Is there any way to do this?
Thanks!

I can see at least one problem with your code. The line...
if os.path.isdir(d):
...won't work, because os.listdir() returns relative pathnames, so you'll need to convert them to absolute pathnames, otherwise the os.path.isdir() will return False because the path doesn't exist (relative to the current working directory), rather than raising an exception (which would make more sense, IMO).
The following code works for me...
import sys
import os
# Directories to search for packages
root_path_list = ("/modules", "/other_modules",)
# Make a backup of sys.path
old_sys_path = sys.path[:]
# Add all paths to sys.path first, in case one package imports from another
for root_path in root_path_list:
sys.path.insert(0, root_path)
# Add new packages to current scope
for root_path in root_path_list:
filenames = os.listdir(root_path)
for filename in filenames:
full_path = os.path.join(root_path, filename)
if os.path.isdir(full_path):
locals()[filename] = __import__(filename)
# Restore sys.path
sys.path[:] = old_sys_path
# Clean up locals
del sys, os, root_path_list, old_sys_path, root_path, filenames, filename, full_path
Update
Thinking about it, it might be safer to check for the presence of __init__.py, rather than using os.path.isdir() in case you have subdirectories which don't contain such a file, otherwise the __import__() will fail.
So you could change the lines...
full_path = os.path.join(root_path, filename)
if os.path.isdir(full_path):
locals()[filename] = __import__(filename)
...to...
full_path = os.path.join(root_path, filename, '__init__.py')
if os.path.exists(full_path):
locals()[filename] = __import__(filename)
...but it might be unnecessary.

We wound up biting the bullet and changing how we do things. Now the list of directories to find modules is passed in the Django config and each one is added to sys.path (similar to a comment Aya mentioned and something I did before but wasn't too happy with). Then for each module inside of it, we check for an __init__.py and if it exists, attempt to treat it as a module to use inside of the app without using the foo.bar piece.
This required some adjustment on how we interact with the modules and how developers code their modules (they now need to use relative imports within their module instead of the full path imports they used before) but I think this will be an easier design for developers to use long-term.
We didn't add these to INSTALLED_APPS because we do some exception handling where if we cannot import a module due to dependency issues or bad code our software will continue running just without that module. If they were in INSTALLED_APPS we wouldn't be able to leverage that flexibility on when/how to deal with those exceptions.
Thanks for all of the help!

Plugin design question

My program is broken down into two parts: the engine, which deals with user interface and other "main program" stuff, and a set of plugins, which provide methods to deal with specific input.
Each plugin is written in its own module, and provides a function that will allow me to send and retrieve data to and from the plugin.
The name of this function is the same across all plugins, so all I need is to determine which one to call and then the plugin will handle the rest.
I've placed all of the plugins in a sub-folder, wrote an __ init__.py that imports each plugin, and then I import the folder (I think it's called a package?)
Anyways currently I explicitly tell it what to import (which is basically "import this", "import that"). Is there a way for me to write it so that it will import everything in that folder that is a plug-in so that I can add additional plugins without having to edit the init file?

Here is the code I use to do this:
def _loadPackagePlugins(package):
"Load plugins from a specified package."
ppath = package.__path__
pname = package.__name__ + "."
for importer, modname, ispkg in pkgutil.iter_modules(ppath, pname):
module = __import__(modname, fromlist = "dummy")
The main difference from Jakob's answer is that it uses pkgutil.iter_modules instead of os.listdir. I used to use os.listdir and changed to doing it this way, but I don't remember why. It might have been that os.listdir failed when I packaged my app with py2exe and py2app.

You could always have a dict called plugins, use __import__ to import the modules and store them that way.
e.g.
plugins = {}
for plugin in os.listdir('plugins'):
plugin = plugin.split()[0]
plugins[plugin] = __import__(plugin)
This is assuming that every plugin is a single file. Personally I would go with something that looks in each folder for a __run__.py file, like a __init__.py in a package it would indicate a plugin, that code would look more like something like this
for root, dirs, files in os.walk('.'):
for dir in dirs:
if "__run__.py" in os.listdir(os.path.join(root, dir)):
plugins[dir] = __import__(dir)
Code written without testing. YMMV

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to load modules dynamically on package import? - python

I thought if you did something that, 'formats' would be your package, so when you tell it import formats you should be able to access the rest of the modules inside that package, so, you would have something like formats.format_a.your_method Not sure though, I'm just a n00b.

Related

Iterate on modules in Python

name ' ' is not defined

Python - variable for the scriptroot [duplicate]

Importing python modules from multiple directories

Plugin design question

Categories

Resources