Lazy-loading modules in python

Lazy-loading modules in python - python

I'm trying to put together a system that will handle lazy-loading of modules that don't explicitly exist. Basically I have an http server with a number of endpoints that I don't know ahead of time that I would like to programmatically offer for import. These modules would all have a uniform method signature, they just wouldn't exist ahead of time.
import lazy.route as test
import lazy.fake as test2
test('Does this exist?') # This sends a post request.
test2("This doesn't exist.") # Also sends a post request
I can handle all the logic I need around these imports with a uniform decorator, I just can't find any way of "decorating" imports in python, or actually interacting with them in any kind of programmatic way.
Does anyone have experience with this? I've been hunting around, and the closest thing I've found is the ast module, which would lead to a really awful kind of hacky implementation in my current under my current understanding (something like finding all import statements and manually over-writing the import function)
Not looking for a handout, just a piece of the python codebase to start looking at, or an example of someone that's done something similar.

I got a little clever in my googling and managed to find a PEP that specifically addressed this issue, it just happens to be relatively unknown, probably because the subset of reasonable uses for this is pretty narrow.
I found an excellent piece of example code showing off the new sys.meta_path implementation. I've posted it below for information on how to dynamically bootstrap your import statements.
import sys
class VirtualModule(object):
def hello(self):
return 'Hello World!'
class CustomImporter(object):
virtual_name = 'my_virtual_module'
def find_module(self, fullname, path):
"""This method is called by Python if this class
is on sys.path. fullname is the fully-qualified
name of the module to look for, and path is either
__path__ (for submodules and subpackages) or None (for
a top-level module/package).
Note that this method will be called every time an import
statement is detected (or __import__ is called), before
Python's built-in package/module-finding code kicks in."""
if fullname == self.virtual_name:
# As per PEP #302 (which implemented the sys.meta_path protocol),
# if fullname is the name of a module/package that we want to
# report as found, then we need to return a loader object.
# In this simple example, that will just be self.
return self
# If we don't provide the requested module, return None, as per
# PEP #302.
return None
def load_module(self, fullname):
"""This method is called by Python if CustomImporter.find_module
does not return None. fullname is the fully-qualified name
of the module/package that was requested."""
if fullname != self.virtual_name:
# Raise ImportError as per PEP #302 if the requested module/package
# couldn't be loaded. This should never be reached in this
# simple example, but it's included here for completeness. :)
raise ImportError(fullname)
# PEP#302 says to return the module if the loader object (i.e,
# this class) successfully loaded the module.
# Note that a regular class works just fine as a module.
return VirtualModule()
if __name__ == '__main__':
# Add our import hook to sys.meta_path
sys.meta_path.append(CustomImporter())
# Let's use our import hook
import my_virtual_module
print my_virtual_module.hello()
The full blog post is here

Related

Show whether a Python module is loaded from bytecode

I'm trying to debug Hy's use of bytecode. In particular, each time a module is imported, I want to see the path it was actually imported from, whether source or bytecode. Under the hood, Hy manages modules with importlib. It doesn't explicitly read or write bytecode; that's taken care of by importlib.machinery.SourceFileLoader. So it looks like what I want to do is monkey-patch Python's importing system to print the import path each time an import happens. How can I do that? I should be able to figure out how to do it for Hy once I understand how to do it for Python.

The easiest way that does not involve coding, is to start Python with two(!) verbose flags:
python -vv myscript.py
you'll get a lot of output, including all the import statements and all the files Python tries to access in order to load the module. In this example I have a simple python script that does import json:
lots of output!
[...]
# trying /tmp/json.cpython-310-x86_64-linux-gnu.so
# trying /tmp/json.abi3.so
# trying /tmp/json.so
# trying /tmp/json.py
# trying /tmp/json.pyc
# /usr/lib/python3.10/json/__pycache__/__init__.cpython-310.pyc matches /usr/lib/python3.10/json/__init__.py
# code object from '/usr/lib/python3.10/json/__pycache__/__init__.cpython-310.pyc'
[...]
Alternatively but more complex: you could change the import statement itself. For that, you can overwrite __import__, which is invoked by the import statement itself. This way you could print out all the files import actually opens.

Seems like a good option would be to dynamically patch importlib.machinery.SourceFileLoader(fullname, path) and importlib.machinery.SourcelessFileLoader(fullname, path) to each print or write to a variable (a) the calling method and (b) the argument passed to the function.
If all you need to do is:
I want to see the path it was actually imported from, whether source or bytecode
And you don't need the import to "work properly", perhaps you can do a modified version of something like this. For example, I quickly modified their sample code to get this, I have not tested it so it may not work exactly, but it should get you on the right track:
# custom class to be the mock return value
class MockSourceLoader:
# mock SourceFileLoader method always returns that the module was loaded from source and its path
def SourceFileLoader(fullname, path):
return {"load type": "source", "fullname": fullname, "path": path}
def check_how_imported(monkeypatch):
# Any arguments may be passed and mock_get() will always return our
# mocked object
def mock_get(*args, **kwargs):
return MockSourceLoader
# apply the monkeypatch
monkeypatch.setattr(importlib.machinery, SourceFileLoader, SourceFileLoader)
You would of course provide a similar mock for Sourceless file loading for SourcelessFileLoader
For reference:
https://docs.python.org/3/library/importlib.html#:~:text=importlib.machinery.SourceFileLoader(fullname%2C%20path)%C2%B6
https://docs.python.org/3/library/importlib.html#:~:text=importlib.machinery.SourcelessFileLoader(fullname%2C%20path)

Mock an entire module in python

I have an application that imports a module from PyPI.
I want to write unittests for that application's source code, but I do not want to use the module from PyPI in those tests.
I want to mock it entirely (the testing machine will not contain that PyPI module, so any import will fail).
Currently, each time I try to load the class I want to test in the unittests, I immediately get an import error. so I thought about maybe using
try:
except ImportError:
and catch that import error, then use command_module.run().
This seems pretty risky/ugly and I was wondering if there's another way.
Another idea was writing an adapter to wrap that PyPI module, but I'm still working on that.
If you know any way I can mock an entire python package, I would appreciate it very much.
Thanks.

If you want to dig into the Python import system, I highly recommend David Beazley's talk.
As for your specific question, here is an example that tests a module when its dependency is missing.
bar.py - the module you want to test when my_bogus_module is missing
from my_bogus_module import foo
def bar(x):
return foo(x) + 1
mock_bogus.py - a file in with your tests that will load a mock module
from mock import Mock
import sys
import types
module_name = 'my_bogus_module'
bogus_module = types.ModuleType(module_name)
sys.modules[module_name] = bogus_module
bogus_module.foo = Mock(name=module_name+'.foo')
test_bar.py - tests bar.py when my_bogus_module is not available
import unittest
from mock_bogus import bogus_module # must import before bar module
from bar import bar
class TestBar(unittest.TestCase):
def test_bar(self):
bogus_module.foo.return_value = 99
x = bar(42)
self.assertEqual(100, x)
You should probably make that a little safer by checking that my_bogus_module isn't actually available when you run your test. You could also look at the pydoc.locate() method that will try to import something, and return None if it fails. It seems to be a public method, but it isn't really documented.

While #Don Kirkby's answer is correct, you might want to look at the bigger picture. I borrowed the example from the accepted answer:
import pypilib
def bar(x):
return pypilib.foo(x) + 1
Since pypilib is only available in production, it is not suprising that you have some trouble when you try to unit test bar. The function requires the external library to run, therefore it has to be tested with this library. What you need is an integration test.
That said, you might want to force unit testing, and that's generally a good idea because it will improve the confidence you (and others) have in the quality of your code. To widen the unit test area, you have to inject dependencies. Nothing prevents you (in Python!) from passing a module as a parameter (the type is types.ModuleType):
try:
import pypilib # production
except ImportError:
pypilib = object() # testing
def bar(x, external_lib = pypilib):
return external_lib.foo(x) + 1
Now, you can unit test the function:
import unittest
from unittest.mock import Mock
class Test(unittest.TestCase):
def test_bar(self):
external_lib = Mock(foo = lambda x: 3*x)
self.assertEqual(10, bar(3, external_lib))
if __name__ == "__main__":
unittest.main()
You might disapprove the design. The try/except part is a bit cumbersome, especially if you use the pypilib module in several modules of your application. And you have to add a parameter to each function that relies on the external library.
However, the idea to inject a dependency to the external library is useful, because you can control the input and test the output of your class methods, even if the external library is not within your control. Especially if the imported module is stateful, the state might be difficult to reproduce in a unit test. In this case, passing the module as a parameter may be a solution.
But the usual way to deal with this situation is called dependency inversion principle (the D of SOLID): you should define the (abstract) boundaries of your application, ie what you need from the outside world. Here, this is bar and other functions, preferably grouped in one or many classes:
import pypilib
import other_pypilib
class MyUtil:
"""
All I need from outside world
"""
#staticmethod
def bar(x):
return pypilib.foo(x) + 1
#staticmethod
def baz(x, y):
return other_pypilib.foo(x, y) * 10.0
...
# not every method has to be static
Each time you need one of these functions, just inject an instance of the class in your code:
class Application:
def __init__(self, util: MyUtil):
self._util = util
def something(self, x, y):
return self._util.baz(self._util.bar(x), y)
The MyUtil class must be as slim as possible, but must remain abstract from the underlying library. It is a tradeoff. Obviously, Application can be unit tested (just inject a Mock instead of an instance of MyUtil) while, under some circumstances (like a PyPi library not available during tests, a module that runs inside a framework only, etc.), MyUtil can be only tested within an integration test. If you need to unit test the boundaries of your application, you can use #Don Kirkby's method.
Note that the second benefit, after unit testing, is that if you change the libraries you are using (deprecation, license issue, cost, ...), you just have to rewrite the MyUtil class, using some other libraries or coding it from scratch. Your application is protected from the wild outside world.
Clean Code by Robert C. Martin has a full chapter on the boundaries.
Summary Before using #Don Kirkby's method or any other method, be sure to define the boundaries of your application irrespective of the specific libraries you are using. This, of course, does not apply to the Python standard library...

For a more explicit and granular approach:
import unittest
from unittest.mock import MagicMock, patch
try:
import bogus_module
except ModuleNotFoundError:
bogus_module = MagicMock()
#patch.dict('sys.modules', bogus_module=bogus_module)
class PlatformTests(unittest.TestCase):
...
Using the patch.dict decorator gives you granular control: it only applies to the class / method it is applied to.

Hacking python's import statement

I'm building a Python module for a fairly specific purpose. What I'd like to do with this is get more functionality behind importing things from it.
I'd like to have a setup by which saying from my_module import foo would run a function and pass the string "foo". This function would return the object that should be imported.
For example, maybe I want to make a cloud-based import system. I'd like to store community scripts in the cloud, and then download them when a user tries to import them.
Maybe I use the code from cloud import test_module. This would check a cache to decide whether test_module had been downloaded. If so, it would return that module. If not, it would download the module before importing it.
How can I accomplish something like this in Python, by which a dynamic range of submodules could be seamlessly imported from the cloud?

Full featured support for what you ask probably requires a bunch of complicated code using importlib and hooking into various parts of the import machinery. However, a more limited solution can be implemented with just a single custom class that pretends to be a module.
When you import a module, Python first checks in the sys.modules dictionary to see if the module is a key. If so, it returns the value associated with the key. It does this regardless of what the value is, so you can put any kind of object in sys.modules and Python will treat it like a module. A module's code can even replace its own entry in sys.modules, and the replacement will be used even the first time it is imported!
So, to implement your fancy module that downloads other modules on demand, replace the module itself with an instance of a custom class, and write that class a __getattr__ or __getattribute__ method that does the work you want.
Here's a trivial example module that returns a string for any attribute you look for in it. The string will always be the same as the requested attribute name. In your code, you'd want to do your fancy web-cache lookups and downloading, and then return the fetched module object instead of just returning a string.
class FakeModule(object):
def __getattribute__(self, name):
return name
import sys
sys.modules[__name__] = FakeModule()
On my system I've saved that as fakemodule.py. Now if I do from fakemodule import foo, I get foo with the value 'foo' in my local namespace.
Note that this only works for one level deep imports. If you do from fakemodule.subpackage import name it will not work because there's no fakemodule.subpackage entry in sys.modules.

Package-specific import hooks in Python

I'm working on creating a Python module that maps API provided by a different language/framework into Python. Ideally, I would like this to be presented as a single root package that exposes helper methods, and which maps all namespaces in that other framework to Python packages/modules. For the sake of convenience, let's take CLR as an example:
import clr.System.Data
import clr.System.Windows.Forms
Here clr is the magic top-level package which exposes CLR namespaces System.Data and System.Windows.Forms subpackages/submodules (so far as I can see, a package is just a module with child modules/packages; it is still valid to have other kinds of members therein).
I've read PEP-302 and wrote a simple prototype program that achieves a similar effect by installing a custom meta_path hook. The clr module itself is a proper Python module which, when imported, sets __path__ = [] (making it a package, so that import even attempts lookup for submodules at all), and registers the hook. The hook itself intercepts any package load where full name of the package starts with "clr.", dynamically creates the new module using imp.new_module(), registers it in sys.modules, and uses pixie dust and rainbows to fill it with classes and methods from the original API. Here's the code:
clr.py
import sys
import imp
class MyLoader:
def load_module(self, fullname):
try:
return sys.modules[fullname]
except KeyError:
pass
print("--- load ---")
print(fullname)
m = imp.new_module(fullname)
m.__file__ = "clr:" + fullname
m.__path__ = []
m.__loader__ = self
m.speak = lambda: print("I'm " + fullname)
sys.modules.setdefault(fullname, m)
return m
class MyFinder:
def find_module(self, fullname, path = None):
print("--- find ---")
print(fullname)
print(path)
if fullname.startswith("clr."):
return MyLoader()
return None
print("--- init ---")
__path__ = []
sys.meta_path.append(MyFinder())
test.py
import clr.Foo.Bar.Baz
clr.Foo.speak()
clr.Foo.Bar.speak()
clr.Foo.Bar.Baz.speak()
All in all this seems to work fine. Python guarantees that modules in the chain are imported left to right, so clr is always imported first, and it sets up the hook that allows the remainder of the chain to be imported.
However, I'm wondering if what I'm doing here is overkill. I am, after all, installing a global hook, that will be called for any module import, even though I filter out those that I don't care about. Is there, perhaps, some way to install a hook that will only be called for imports from my particular package, and not others? Or is the above the Right Way to do this kind of thing in Python?

In general, I think your approach looks fine. I wouldn't worry about it being "global", since the whole point is to specify which paths should be handled by you. Moving this test inside the import logic would just needlessly complicate it, so it's left to the implementer of the hook to decide.
Just one small concern, maybe you could use sys.path_hooks? It appears to be a bit less "powerful" than sys.meta_path
sys.path_hooks is a list of callables, which will be checked in
sequence to determine if they can handle a given path item. The
callable is called with one argument, the path item. The callable
must raise ImportError if it is unable to handle the path item, and
return an importer object if it can handle the path item.

Can I call and set the Python gettext module in a library and a module using it at the same time?

Im a coding a library including textual feedback that I need to translate.
I put the following lines in a _config.py module that I import everywhere in my app :
import gettext, os, sys
pathname = os.path.dirname(sys.argv[0])
localdir = os.path.abspath(pathname) + "/locale"
gettext.install("messages", localdir)
I have the *.mo files in ./locale/lang_LANG/LC_MESSAGES and I apply the _() function to all the strings that need to be translated.
Now I just added a feature for the user, supposedly a programmer, to be able to create his own messages. I don't want him to care about the underlying implementation, so I want him to be able to make it something straightforward like :
lib_object.message = "My message"
I used properties to make it clean, but what if my user whats to translate his own code (that uses mine) and does something like :
import gettext, os, sys
pathname = os.path.dirname(sys.argv[0])
localdir = os.path.abspath(pathname) + "/locale"
gettext.install("user_app", localdir)
lib_object.message = _("My message")
Is it a problem ? What can I do to avoid troubles without bothering my user ?

You can use the class based gettext api to isolate message catalogs. This is also what is recommended in the python gettext documentation.
The drawback is that you, or the other dev, will have to use the gettext method or define the _() method in the local scope, bound to the specific gettext class. An example of a class with its own string catalog:
import gettext
class MyClass(object):
def __init__(self, locale_for_instance):
self.lang = gettext.translation("appname", localedir, \
locale=locale_for_instance)
def some_method(self, arg):
return self.lang.gettext("You called some method")
def other_method(self, arg): # does the same thing
_ = self.lang.gettext
return _("You called some method")
You could stick the code for adding the _() in a decorator, so all the methods that need it is prefixed with something like #with_local_gettext
(Note, I've not tested the above could but It Should Work Just Fine(tm) )
If the goal is to not bother your user (and he's not very good) I guess you could use the class based approach in your code and let the user use the global one.

You can only gettext.install() once. In general it's useless for library work -- gettext.install() will only do the right thing if the module calling it is in charge of the whole program, since it will only provide you with one catalog to load from. Library code should do something akin to what Mailman does: have their own wrapper for gettext() that passes the right arguments for this module, then imports that as '_' in each module that wants to use it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Lazy-loading modules in python - python

Related

Show whether a Python module is loaded from bytecode

Mock an entire module in python

Hacking python's import statement

Package-specific import hooks in Python

Can I call and set the Python gettext module in a library and a module using it at the same time?

Categories

Resources