I am writing a minimal replacement for mod_python's publisher.py
The basic premise is that it is loading modules based on a URL scheme:
/foo/bar/a/b/c/d
Whereby /foo/ might be a directory and 'bar' is a method ExposedBar in a publishable class in /foo/index.py. Likewise /foo might map to /foo.py and bar is a method in the exposed class. The semantics of this aren't really important. I have a line:
sys.path.insert(0, path_to_file) # /var/www/html/{bar|foo}
mod_obj = __import__(module_name)
mod_obj.__name__ = req.filename
Then the module is inspected for the appropriate class/functions/methods. When the process gets as far as it can the remaining URI data, /a/b/c is passed to that method or function.
This was working fine until I had /var/www/html/foo/index.py and /var/www/html/bar/index.py
When viewing in the browser, it is fairly random which 'index.py' gets selected, even though I set the first search path to '/var/www/html/foo' or '/var/www/html/bar' and then loaded __import__('index'). I have no idea why it is finding either by seemingly random choice. This is shown by:
__name__ is "/var/www/html/foo/index.py"
req.filename is "/var/www/html/foo/index.py"
__file__ is "/var/www/html/bar/index.py"
This question then is, why would the __import__ be randomly selecting either index. I would understand this if the path was '/var/www/html' but it isn't. Secondly:
Can I load a module by it's absolute path into a module object? Without modification of sys.path. I can't find any docs on __import__ or new.module() for this.
Can I load a module by it's absolute
path into a module object? Without
modification of sys.path. I can't find
any docs on __import__ or new.module()
for this.
import imp
import os
def module_from_path(path):
filename = os.path.basename(path)
modulename = os.path.splitext(filename)[0]
with open(path) as f:
return imp.load_module(modulename, f, path, ('py', 'U', imp.PY_SOURCE))
Related
Is there a way to make sure all imports listed in all of my python files are in PYTHONPATH. Basically a validator for all my imports in files.
My approach: Change all file path from / to "." e.g. foo/bar to foo.bar and then run:
pkgutil.find_loader("foo.bar")
Problem: It does not work if I have foo.bar.zoo.data as a module
export PYTHONPATH=$(pwd):"my_lib_path"
for root,dir,files in os.walk(work_dir):
for filename in files:
if not "__init__" in filename and filename.endswith(".py"):
print "Testing file {}".format(os.path.join(root,filename))
filepath = os.path.join(root,filename)
filepath = filepath.replace(work_dir,'',)
filepath = filepath.replace('/', '.')
filepath = filepath.lstrip(".").rstrip(".py")
print " testing filepath" , filepath
# tried this also
'''for imp, name, _ in pkgutil.iter_modules(root):
full_name = "{}.{}".format(root,name)
''' module = imp.find_module(full_name)
mod = pkgutil.find_loader(filepath)
The best way to do this is to use the importlib (Python 3.1+) or imp (Python 2.x) module to do all the steps an import would do up to, but not including, running the code.
The key functions for each Python version are:
3.4-3.8: importlib.util.find_spec
3.3-3.3: importlib.find_loader.
3.1-3.2: importlib.find_module
3.0-3.0: imp.find_module
1.5-2.7: imp.find_module
0.9-1.4: No idea. (There were no packages yet; everything was different…)
Advantages of doing it this way:
It works whether the module is a normal Python module, a .pyc-only module, a C extension module, a builtin, or some funky special type of module that you've installed a custom import hook for.
It works even if the module is inside a .egg, or in the frozen bootstrap collection, or if the whole library is wrapped up in a .zip or even buried inside a .exe.
It automatically takes care of the funky rules, like figuring out what is or isn't a valid namespace package extension directory (including dealing with things like site.py and old-style setuptools path-injection) that would be a huge pain to get right.
In 3.4+, this is literally the same code that import uses, because the entire import system is written in Python. For older versions, that's not true—but for 2.3+, it's guaranteed to get you the same thing you would get in an import hook, which is almost surely close enough.
Quoting the 3.7 docs:
importlib.util.find_spec(name, package=None)
Find the spec for a module, optionally relative to the specified package name. If the module is in sys.modules, then sys.modules[name].__spec__ is returned (unless the spec would be None or is not set, in which case ValueError is raised). Otherwise a search using sys.meta_path is done. None is returned if no spec is found.
If name is for a submodule (contains a dot), the parent module is automatically imported.
name and package work the same as for import_module().
(You don't really have to care what a spec is here; if Python can find the spec for a module, the module is present; if it returns None, the module is not present.)
So, this will just magically handle foo.bar.zoo.data.
If you look at the Examples, there's one that does exactly what you want, "Checking if a module can be imported".
def test_module(name):
if not importlib.util.find_spec(name):
raise ImportError(name)
From the 2.7 docs:
imp.find_module(name[, path])
Try to find the module name. If path is omitted or None, the list of directory names given by sys.path is searched, but first a few special places are searched: the function tries to find a built-in module with the given name (C_BUILTIN), then a frozen module (PY_FROZEN), and on some systems some other places are looked in as well (on Windows, it looks in the registry which may point to a specific file).
Otherwise, path must be a list of directory names; each directory is searched for files with any of the suffixes returned by get_suffixes() above. Invalid names in the list are silently ignored (but all list items must be strings).
If search is successful, the return value is a 3-element tuple (file, pathname, description):
file is an open file object positioned at the beginning, pathname is the pathname of the file found, and description is a 3-element tuple as contained in the list returned by get_suffixes() describing the kind of module found.
If the module does not live in a file, the returned file is None, pathname is the empty string, and the description tuple contains empty strings for its suffix and mode; the module type is indicated as given in parentheses above. If the search is unsuccessful, ImportError is raised. Other exceptions indicate problems with the arguments or environment.
If the module is a package, file is None, pathname is the package path and the last item in the description tuple is PKG_DIRECTORY.
This function does not handle hierarchical module names (names containing dots). In order to find P.M, that is, submodule M of package P, use find_module() and load_module() to find and load package P, and then use find_module() with the path argument set to P.__path__. When P itself has a dotted name, apply this recipe recursively.
As you can see, it's a little more complicated—it doesn't magically handle foo.bar.zoo.data; you will need to find foo, verify that it's a package, load it, find bar, etc., and finally find (but not load) data.
Something like this (untested):
def test_module(name):
parts = name.split('.')
path = None
for part in parts[:-1]:
file, pathname, description = imp.find_module(part, path)
if description[-1] != imp.PKG_DIRECTORY:
raise ImportError(name)
pkg = imp.load_module(part, file, pathname, description)
path = pkg.__path__
file, pathname, description = imp.find_module(part[-1], path)
I have a Python module which uses some resources in a subdirectory of the module directory. After searching around on stack overflow and finding related answers, I managed to direct the module to the resources by using something like
import os
os.path.join(os.path.dirname(__file__), 'fonts/myfont.ttf')
This works fine when I call the module from elsewhere, but it breaks when I call the module after changing the current working directory. The problem is that the contents of __file__ are a relative path, which doesn't take into account the fact that I changed the directory:
>>> mymodule.__file__
'mymodule/__init__.pyc'
>>> os.chdir('..')
>>> mymodule.__file__
'mymodule/__init__.pyc'
How can I encode the absolute path in __file__, or barring that, how can I access my resources in the module no matter what the current working directory is? Thanks!
Store the absolute path to the module directory at the very beginning of the module:
package_directory = os.path.dirname(os.path.abspath(__file__))
Afterwards, load your resources based on this package_directory:
font_file = os.path.join(package_directory, 'fonts', 'myfont.ttf')
And after all, do not modify of process-wide resources like the current working directory. There is never a real need to change the working directory in a well-written program, consequently avoid os.chdir().
Building on lunaryorn's answer, I keep a function at the top of my modules in which I have to build multiple paths. This saves me repeated typing of joins.
def package_path(*paths, package_directory=os.path.dirname(os.path.abspath(__file__))):
return os.path.join(package_directory, *paths)
To build the path, call it like this:
font_file = package_path('fonts', 'myfont.ttf')
Or if you just need the package directory:
package_directory = package_path()
Suppose I have a module file like this:
# my_module.py
print("hello")
Then I have a simple script:
# my_script.py
import my_module
This will print "hello".
Let's say I want to "override" the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?
What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.
I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.
The good solution, I think, is to make use of importlib.
I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:
def get_source(fullname):
source = super().get_source(fullname)
source = source.replace("hello", "world")
return source
Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.
I tried vainly:
spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)
Any help would be welcome, please.
Here's a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.
import importlib
import sys
def modify_and_import(module_name, package, modification_func):
spec = importlib.util.find_spec(module_name, package)
source = spec.loader.get_source(module_name)
new_source = modification_func(source)
module = importlib.util.module_from_spec(spec)
codeobj = compile(new_source, module.__spec__.origin, 'exec')
exec(codeobj, module.__dict__)
sys.modules[module_name] = module
return module
So, using this you can do
my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))
This doesn't answer the general question of dynamically modifying the source code of an imported module, but to "Override" or "monkey-patch" its use of the print() function can be done (since it's a built-in function in Python 3.x). Here's how:
#!/usr/bin/env python3
# my_script.py
import builtins
_print = builtins.print
def my_print(*args, **kwargs):
_print('In my_print: ', end='')
return _print(*args, **kwargs)
builtins.print = my_print
import my_module # -> In my_print: hello
I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.
This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.
This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.
So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.
Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.
In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.
If compatibility with Python 2 is needed, I see no other way than reading the source file:
import imp
import sys
import types
module_name = "my_module"
file, pathname, description = imp.find_module(module_name)
with open(pathname) as f:
source = f.read()
source = source.replace('hello', 'world')
module = types.ModuleType(module_name)
exec(source, module.__dict__)
sys.modules[module_name] = module
If importing the module before the patching it is okay, then a possible solution would be
import inspect
import my_module
source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)
If you're after a more general solution, then you can also take a look at the approach I used in another answer a while ago.
My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won't work. So I update the source file. Hope it help the people who have the same trouble with me.
import inspect
from transformers.models.albert import modeling_albert
# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)
# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
f.write(new_source)
Not elegant, but works for me (may have to add a path):
with open ('my_module.py') as aFile:
exec (aFile.read () .replace (<something>, <something else>))
I have a python 2.6 Django app which has a folder structure like this:
/foo/bar/__init__.py
I have another couple directories on the filesystem full of python modules like this:
/modules/__init__.py
/modules/module1/__init__.py
/other_modules/module2/__init__.py
/other_modules/module2/file.py
Each module __init__ has a class. For example module1Class() and module2Class() respectively. In module2, file.py contains a class called myFileClass().
What I would like to do is put some code in /foo/bar/__init__.py so I can import in my Django project like this:
from foo.bar.module1 import module1Class
from foo.bar.module2 import module2Class
from foo.bar.module2.file import myFileClass
The list of directories which have modules is contained in a tuple in a Django config which looks like this:
module_list = ("/modules", "/other_modules",)
I've tried using __import__ and vars() to dynamically generate variables like this:
import os
import sys
for m in module_list:
sys.path.insert(0, m)
for d in os.listdir(m):
if os.path.isdir(d):
vars()[d] = getattr(__import__(m.split("/")[-1], fromlist=[d], d)
But that doesn't seem to work. Is there any way to do this?
Thanks!
I can see at least one problem with your code. The line...
if os.path.isdir(d):
...won't work, because os.listdir() returns relative pathnames, so you'll need to convert them to absolute pathnames, otherwise the os.path.isdir() will return False because the path doesn't exist (relative to the current working directory), rather than raising an exception (which would make more sense, IMO).
The following code works for me...
import sys
import os
# Directories to search for packages
root_path_list = ("/modules", "/other_modules",)
# Make a backup of sys.path
old_sys_path = sys.path[:]
# Add all paths to sys.path first, in case one package imports from another
for root_path in root_path_list:
sys.path.insert(0, root_path)
# Add new packages to current scope
for root_path in root_path_list:
filenames = os.listdir(root_path)
for filename in filenames:
full_path = os.path.join(root_path, filename)
if os.path.isdir(full_path):
locals()[filename] = __import__(filename)
# Restore sys.path
sys.path[:] = old_sys_path
# Clean up locals
del sys, os, root_path_list, old_sys_path, root_path, filenames, filename, full_path
Update
Thinking about it, it might be safer to check for the presence of __init__.py, rather than using os.path.isdir() in case you have subdirectories which don't contain such a file, otherwise the __import__() will fail.
So you could change the lines...
full_path = os.path.join(root_path, filename)
if os.path.isdir(full_path):
locals()[filename] = __import__(filename)
...to...
full_path = os.path.join(root_path, filename, '__init__.py')
if os.path.exists(full_path):
locals()[filename] = __import__(filename)
...but it might be unnecessary.
We wound up biting the bullet and changing how we do things. Now the list of directories to find modules is passed in the Django config and each one is added to sys.path (similar to a comment Aya mentioned and something I did before but wasn't too happy with). Then for each module inside of it, we check for an __init__.py and if it exists, attempt to treat it as a module to use inside of the app without using the foo.bar piece.
This required some adjustment on how we interact with the modules and how developers code their modules (they now need to use relative imports within their module instead of the full path imports they used before) but I think this will be an easier design for developers to use long-term.
We didn't add these to INSTALLED_APPS because we do some exception handling where if we cannot import a module due to dependency issues or bad code our software will continue running just without that module. If they were in INSTALLED_APPS we wouldn't be able to leverage that flexibility on when/how to deal with those exceptions.
Thanks for all of the help!
This snippet is from an earlier answer here on SO. It is about a year old (and the answer was not accepted). I am new to Python and I am finding the system path a real pain. I have a few functions written in scripts in different directories, and I would like to be able to import them into new projects without having to jump through hoops.
This is the snippet:
def import_path(fullpath):
""" Import a file with full path specification. Allows one to
import from anywhere, something __import__ does not do.
"""
path, filename = os.path.split(fullpath)
filename, ext = os.path.splitext(filename)
sys.path.append(path)
module = __import__(filename)
reload(module) # Might be out of date
del sys.path[-1]
return module
Its from here:
How to do relative imports in Python?
I would like some feedback as to whether I can use it or not - and if there are any undesirable side effects that may not be obvious to a newbie.
I intend to use it something like this:
import_path(/home/pydev/path1/script1.py)
script1.func1()
etc
Is it 'safe' to use the function in the way I intend to?
The "official" and fully safe approach is the imp module of the standard Python library.
Use imp.find_module to find the module on your precisely-specified list of acceptable directories -- it returns a 3-tuple (file, pathname, description) -- if unsuccessful, file is actually None (but it can also raise ImportError so you should use a try/except for that as well as checking if file is None:).
If the search is successful, call imp.load_module (in a try/finally to make sure you close the file!) with the above three arguments after the first one which must be the same name you passed to find_module -- it returns the module object (phew;-).
As mentioned, please consider thread safety, if appropriate. I prefer something closer to a solution posted in a similar post. The main differences below: the use of insert to specify priority of the import, correct restoration of sys.path using try...finally, and setting the global namespace.
# inspired by Alex Martelli's solution to
# http://stackoverflow.com/questions/1096216/override-namespace-in-python/1096247#1096247
def import_from_absolute_path(fullpath, global_name=None):
"""Dynamic script import using full path."""
import os
import sys
script_dir, filename = os.path.split(fullpath)
script, ext = os.path.splitext(filename)
sys.path.insert(0, script_dir)
try:
module = __import__(script)
if global_name is None:
global_name = script
globals()[global_name] = module
sys.modules[global_name] = module
finally:
del sys.path[0]
It does feel like a bit of a hack, but at the moment, I can't think of any unintended side effects that are likely to occur, at least not as long as you're just using this for your own scripts. Basically what it does is temporarily add the parent directory of the specified file (in your example, /home/pydev/path1/) to the list of paths that Python checks when it's looking for a module to import.
The only risk I can think of right now would arise in a multithreaded environment, where two or more threads (or processes) are running this function simultaneously. If thread A wants to import module A from path dirA/A.py, and thread B wants to import module B from path dirB/B.py, you'd wind up with both dirA and dirB in sys.path for a short time. And if there is a file named B.py in dirA, it's possible that thread B will find that (dirA/B.py) instead of the file it's looking for (dirB/B.py), thus importing the wrong module. For this reason, I wouldn't use it in production code, or code that you're going to distribute to other people (at least not without warning them that this hack is in here!). In a situation like that, you could write a more complex function that allows you to specify the file to import without messing with the standard set of paths. (That's what mod_python does, for example)
I would be worried that your script name might correspond with a module that shows up earlier in the path. To dispel this fear, I would fully replace the path with a new list containing just the directory containing the module, then put it back once the import has completed. Also, you should wrap this in some sort of lock so that multiple threads trying to do the same thing don't interfere with each other.