Implicitly use namespace in imported modules in Python - python

I'm trying to make a library out of a Python project I don't own.
The project has the following directory layout:
.
├── MANIFEST.in
├── pyproject.toml
└── src
   ├── all.py
   ├── the.py
   └── sources.py
In pyproject.toml I have:
[tool.setuptools]
packages = ["mypkg"]
[tool.setuptools.package-dir]
mypkg = "src"
The problem I'm facing is that when I build and install this package I can't use it because the author is importing stuff without mypkg prefix in the various source files.
F.ex. in all.py
from the import SomeThing
Since I don't own the package I can't go modify all the sources but I still want to be able to build a library from it by just adding MANIFEST.in and pyproject.toml.
Is it possible to somehow instruct setuptools to build a package that won't litter site-packages with all the sources while still allowing them to be imported without the mypkg prefix?

It isn't possible without adding a custom import hook with the package. The hook takes the form of a module that is shipped with the package, and it must be imported before usage from your module (e.g. in src/all.py)
src/mypkgimp.py
import sys
import importlib
class MyPkgLoader(importlib.abc.Loader):
def find_spec(self, name, path=None, target=None):
# update the list with modules that should be treated special
if name in ['sources', 'the']:
return importlib.util.spec_from_loader(name, self)
return None
def create_module(self, spec):
# Uncomment if "normal" imports should have precedence
# try:
# sys.meta_path = [x for x in sys.meta_path[:] if x is not self]
# return importlib.import_module(spec.name)
# except ImportError:
# pass
# finally:
# sys.meta_path = [self] + sys.meta_path
# Otherwise, this will unconditionally shadow normal imports
module = importlib.import_module('.' + spec.name, 'mypkg')
# Final step: inject the module to the "shortened" name
sys.modules[spec.name] = module
return module
def exec_module(self, module):
pass
if not hasattr(sys, 'frozen'):
sys.meta_path = [MyPkgLoader()] + sys.meta_path
Yes, the above uses different methods described by the thread I have linked previously, as importlib have deprecated those methods in Python 3.10, refer to documentation for details.
Anyway, for the demo, put some dummy classes in the modules:
src/the.py
class SomeThing: ...
src/sources.py
class Source: ...
Now, modify src/all.py to have the following:
import mypkg.mypkgimp
from the import SomeThing
Example usage:
>>> from sources import Source
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'sources'
>>> from mypkg import all
>>> all.SomeThing
<class 'mypkg.the.SomeThing'>
>>> from sources import Source
>>> Source
<class 'mypkg.sources.Source'>
>>> from sources import Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Error' from 'mypkg.sources' (/tmp/mypkg/src/sources.py)
Note how the import initially didn't work, but after mypkg.all got imported, the sources import now works globally. Hence care may be needed to not shadow "real" imports and I have provided the example to import using the "default"[*] import mechanism.
If you want the module names to look different (i.e. without the mypkg. prefix), that will be a separate question, as code typically don't check for their own module name for functionality (and never mind that this actually shows how the namespace is implicitly used - changing the actual name is more akin to a module relocation, yes this can be done, but a bit more complicated and this answer is long enough as it is).
[*] "default" as in not including behaviors introduced by this custom import hook - other import hooks may do their own other weird shenanigans.

Related

Python `pkgutil.get_data` disrupts future imports

Consider the following package structure:
.
├── module
│   ├── __init__.py
│   └── submodule
│   ├── attribute.py
│   ├── data.txt
│   └── __init__.py
└── test.py
and the following piece of code:
import pkgutil
data = pkgutil.get_data('module.submodule', 'data.txt')
import module.submodule.attribute
retval = module.submodule.attribute.hello()
Running this will raise the error:
Traceback (most recent call last):
File "test.py", line 7, in <module>
retval = module.submodule.attribute.hello()
AttributeError: module 'module' has no attribute 'submodule'
However, if you run the following:
import pkgutil
import module.submodule.attribute
data = pkgutil.get_data('module.submodule', 'data.txt')
retval = module.submodule.attribute.hello()
or
import pkgutil
import module.submodule.attribute
retval = module.submodule.attribute.hello()
it works fine.
Why does running pkgutil.get_data disrupt the future import?
First of all, this was a great question and a great opportunity to learn something new about python's import system. So let's dig in!
If we look at the implementation of pkgutil.get_data we see something like this:
def get_data(package, resource):
spec = importlib.util.find_spec(package)
if spec is None:
return None
loader = spec.loader
if loader is None or not hasattr(loader, 'get_data'):
return None
# XXX needs test
mod = (sys.modules.get(package) or
importlib._bootstrap._load(spec))
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
return loader.get_data(resource_name)
And the answer to your question is in this part of the code:
mod = (sys.modules.get(package) or
importlib._bootstrap._load(spec))
It looks at the already loaded packages and if the package we're looking for (module.submodule in this example) exists it uses it and if not, then tries to load the package using importlib._bootstrap._load.
So let's look at the implementation of importlib._bootstrap._load to see what's going on.
def _load(spec):
"""Return a new module object, loaded by the spec's loader.
The module is not added to its parent.
If a module is already in sys.modules, that existing module gets
clobbered.
"""
with _ModuleLockManager(spec.name):
return _load_unlocked(spec)
Well, There's right there! The doc says "The module is not added to its parent."
It means the submodule module is loaded but it's not added to the module module. So when we try to access the submodule via module there's no connection, hence the AtrributeError.
It makes sense for the get_data method to use this function as it just wants some other file in the package and there is no need to import the whole package and add it to its parent and its parents' parent and so on.
to see it yourself I suggest using a debugger and setting some breakpoints. Then you can see what happens step by step along the way.

Creating importable Python 3 package / module

I am having trouble creating an importable Python package/library/module or whatever the right nomenclature is. I am using Python 3.7
The file structure I am using is:
Python37//Lib//mypackage
mypackage
__init__.py
mypackage_.py
The code in __init__.py is:
from mypackage.mypackage_ import MyClass
The code in mypackage_.py is:
class MyClass:
def __init__(self, myarg = None):
self.myvar = myarg
And from my desktop I try running the following code:
import mypackage
x = MyClass(None)
But get the following error:
Traceback (most recent call last):
File "C:\Users\***\Desktop\importtest.py", line 3, in <module>
x = MyClass(None)
NameError: name 'MyClass' is not defined
You haven't imported the name MyClass into your current namespace. You've imported mypackage. To access anything within mypackage, you need to prefix the name with mypackage.<Name>
import mypackage
x = mypackage.MyClass(None)
As #rdas says, you need to prefix the name with mypackage.<Name>.
I don't recommend doing this, but you can wildcard import in order to make x = MyClass(None) work:
from mypackage import *
Now, everything from mypackage is imported and usable in the current namespace. However, you have to be careful with wildcard imports because they can create definition conflictions (if multiple modules have the same name for different things).

How to create a library from multiple sub-modules?

I'm trying to build a library with multiple sub-modules. The general structure looks like so:
package_test/
|___ setup.py
|___ README
|___ package_name/
|___|___ __init__.py
|___|___ submodule/
|___|___|___ __init__.py
|___|___|___ superawesome.py
|___|___ submodule2/
|___|___|___ __init__.py
|___|___|___ prettyokay.py
The goal is to allow a user to do something like:
from package_name.submodule import superawesome
from package_name.submodule2 import prettyokay
The __init__.py for each submodule currently contains (or the appropriate name for submodule2):
from .superawesome import superawesome
__all__ = ['superawesome']
The __init__.py for inside package_name contains:
__version__ = 'v0.0.alpha'
__all__ = ['submodule','submodule2']
When I attempt to use this, iPython shows this error:
From package_name.submodule import superawesome
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-03be4241edd5> in <module>()
----> 1 from package_name.submodule import superawesome
ModuleNotFoundError: No module named 'package_name.submodule'
My setup.py currently contains:
from distutils.core import setup
setup(
name = 'package_name',
packages = ['package_name'], # this must be the same as the name above
version = '0.1.1',
long_description=open('README').read(),
)
I need help figuring out how to do the layered importing structure in order to properly import the submodules as part of the whole module. This is clearly a contrived naming set, but in the real problem, each submodule contains a set of modules that belong together... but all of the submodules together are necessary for the library; so I really want to maintain this structure.

How to access the current executing module's attributes from other modules?

I have several 'app'-modules (which are being started by a main-application)
and a utility module with some functionality:
my_utility/
├── __init__.py
└── __main__.py
apps/
├── app1/
│ ├── __init__.py
│ └── __main__.py
├── app2/
│ ├── __init__.py
│ └── __main__.py
...
main_app.py
The apps are being started like this (by the main application):
python3 -m <app-name>
I need to provide some meta information (tied to the module) about each app which is readable by the main_app and the apps themselves:
apps/app1/__init__.py:
meta_info = {'min_platform_version': '1.0',
'logger_name': 'mm1'}
... and use it like this:
apps/app1/__main__.py:
from my_utility import handle_meta_info
# does something with meta_info (checking, etc.)
handle_meta_info()
main_app.py:
mod = importlib.import_module('app1')
meta_inf = getattr(mod, 'meta_info')
do_something(meta_inf)
The Problem
I don't know how to access meta_info from within the apps. I know I can
import the module itself and access meta_info:
apps/app1/__main__.py:
import app1
do_something(app1.meta_info)
But this is only possible if I know the name of the module. From inside another module - e.g. my_utility I don't know how to access the module which has been started in the first place (or it's name).
my_utility/__main__.py:
def handle_meta_info():
import MAIN_MODULE <-- don't know, what to import here
do_something(MAIN_MODULE.meta_info)
In other words
I don't know how to access meta_info from within an app's process (being started via python3 -m <name> but from another module which does not know the name of the 'root' module which has been started
Approaches
Always provide the module name when calling meta-info-functions (bad, because it's verbose and redundant)
from my_utility import handle_meta_info
handle_meta_info('app1')
add meta_info to __builtins__ (generally bad to pollute global space)
Parse the command line (ugly)
Analyze the call stack on import my_utility (dangerous, ugly)
The solution I'd like to see
It would be nice to be able to either access the "main" modules global space OR know it's name (to import)
my_utility/__main__.py:
def handle_meta_info():
do_something(__main_module__.meta_info)
OR
def handle_meta_info():
if process_has_been_started_as_module():
mod = importlib.import_module(name_of_main_module())
meta_inf = getattr(mod, 'meta_info')
do_something(meta_inf)
Any ideas?
My current (bloody) solution:
Inside my_utility I use psutil to get the command line the module has been started with (why not sys.argv? Because). There I extract the module name. This way I attach the desired meta information to my_utility (so I have to load it only once).
my_utility/__init__.py:
def __get_executed_modules_meta_info__() -> dict:
def get_executed_module_name()
from psutil import Process
from os import getpid
_cmdline = Process(getpid()).cmdline
try:
# normal case: app has been started via 'python3 -m <app>'
return _cmdline[_cmdline.index('-m') + 1]
except ValueError:
return None
from importlib import import_module
try:
_main_module = import_module(get_module_name())
return import_module(get_executed_module_name()).meta_info
except AttributeError:
return {}
__executed_modules_meta_info__ = __get_executed_modules_meta_info__()

Hiding implementation files in a package

I have a module called spellnum. It can be used as a command-line utility (it has the if __name__ == '__main__': block) or it can be imported like a standard Python module.
The module defines a class named Speller which looks like this:
class Speller(object):
def __init__(self, lang="en"):
module = __import__("spelling_" + lang)
# use module's contents...
As you can see, the class constructor loads other modules at runtime. Those modules (spelling_en.py, spelling_es.py, etc.) are located in the same directory as the spellnum.py itself.
Besides spellnum.py, there are other files with utility functions and classes. I'd like to hide those files since I don't want to expose them to the user and since it's a bad idea to pollute the Python's lib directory with random files. The only way to achieve this that I know of is to create a package.
I've come up with this layout for the project (inspired by this great tutorial):
spellnum/ # project root
spellnum/ # package root
__init__.py
spellnum.py
spelling_en.py
spelling_es.py
squash.py
# ... some other private files
test/
test_spellnum.py
example.py
The file __init__.py contains a single line:
from spellnum import Speller
Given this new layout, the code for dynamic module loading had to be changed:
class Speller(object):
def __init__(self, lang="en"):
spelling_mod = "spelling_" + lang
package = __import__("spellnum", fromlist=[spelling_mod])
module = getattr(package, spelling_mod)
# use module as usual
So, with this project layout a can do the following:
Successfully import spellnum inside example.py and use it like a simple module:
# an excerpt from the example.py file
import spellnum
speller = spellnum.Speller(es)
# ...
import spellnum in the tests and run those tests from the project root like this:
$ PYTHONPATH="`pwd`:$PYTHONPATH" python test/test_spellnum.py
The problem
I cannot execute spellnum.py directly with the new layout. When I try to, it shows the following error:
Traceback (most recent call last):
...
File "spellnum/spellnum.py", line 23, in __init__
module = getattr(package, spelling_mod)
AttributeError: 'module' object has no attribute 'spelling_en'
The question
What's the best way to organize all of the files required by my module to work so that users are able to use the module both from command line and from their Python code?
Thanks!
How about keeping spellnum.py?
spellnum.py
spelling/
__init__.py
en.py
es.py
Your problem is, that the package is called the same as the python-file you want to execute, thus importing
from spellnum import spellnum_en
will try to import from the file instead of the package. You could fiddle around with relative imports, but I don't know how to make them work with __import__, so I'd suggest the following:
def __init__(self, lang="en"):
mod = "spellnum_" + lang
module = None
if __name__ == '__main__':
module = __import__(mod)
else:
package = getattr(__import__("spellnum", fromlist=[mod]), mod)

Categories