I have several 'app'-modules (which are being started by a main-application)
and a utility module with some functionality:
my_utility/
├── __init__.py
└── __main__.py
apps/
├── app1/
│ ├── __init__.py
│ └── __main__.py
├── app2/
│ ├── __init__.py
│ └── __main__.py
...
main_app.py
The apps are being started like this (by the main application):
python3 -m <app-name>
I need to provide some meta information (tied to the module) about each app which is readable by the main_app and the apps themselves:
apps/app1/__init__.py:
meta_info = {'min_platform_version': '1.0',
'logger_name': 'mm1'}
... and use it like this:
apps/app1/__main__.py:
from my_utility import handle_meta_info
# does something with meta_info (checking, etc.)
handle_meta_info()
main_app.py:
mod = importlib.import_module('app1')
meta_inf = getattr(mod, 'meta_info')
do_something(meta_inf)
The Problem
I don't know how to access meta_info from within the apps. I know I can
import the module itself and access meta_info:
apps/app1/__main__.py:
import app1
do_something(app1.meta_info)
But this is only possible if I know the name of the module. From inside another module - e.g. my_utility I don't know how to access the module which has been started in the first place (or it's name).
my_utility/__main__.py:
def handle_meta_info():
import MAIN_MODULE <-- don't know, what to import here
do_something(MAIN_MODULE.meta_info)
In other words
I don't know how to access meta_info from within an app's process (being started via python3 -m <name> but from another module which does not know the name of the 'root' module which has been started
Approaches
Always provide the module name when calling meta-info-functions (bad, because it's verbose and redundant)
from my_utility import handle_meta_info
handle_meta_info('app1')
add meta_info to __builtins__ (generally bad to pollute global space)
Parse the command line (ugly)
Analyze the call stack on import my_utility (dangerous, ugly)
The solution I'd like to see
It would be nice to be able to either access the "main" modules global space OR know it's name (to import)
my_utility/__main__.py:
def handle_meta_info():
do_something(__main_module__.meta_info)
OR
def handle_meta_info():
if process_has_been_started_as_module():
mod = importlib.import_module(name_of_main_module())
meta_inf = getattr(mod, 'meta_info')
do_something(meta_inf)
Any ideas?
My current (bloody) solution:
Inside my_utility I use psutil to get the command line the module has been started with (why not sys.argv? Because). There I extract the module name. This way I attach the desired meta information to my_utility (so I have to load it only once).
my_utility/__init__.py:
def __get_executed_modules_meta_info__() -> dict:
def get_executed_module_name()
from psutil import Process
from os import getpid
_cmdline = Process(getpid()).cmdline
try:
# normal case: app has been started via 'python3 -m <app>'
return _cmdline[_cmdline.index('-m') + 1]
except ValueError:
return None
from importlib import import_module
try:
_main_module = import_module(get_module_name())
return import_module(get_executed_module_name()).meta_info
except AttributeError:
return {}
__executed_modules_meta_info__ = __get_executed_modules_meta_info__()
Related
I have the following directory structure for a number of unit tests (directories advanced and basic contain multiple files with test cases implemented in them):
Tests
├── advanced
│ ├── __init__.py
│ └── advanced_test.py
├── basic
│ ├── __init__.py
│ └── basic_test.py
└── helpers.py
with the following file contents.
# helpers.py
import unittest
def determine_option():
# Some logic that returns the option
return 1
class CustomTestCase(unittest.TestCase):
def __init__(self, methodName: str = ...) -> None:
super().__init__(methodName)
self._option = determine_option()
def get_option(self):
# Some custom method used in advanced test cases
return self._option
def customAssert(self, first, second):
# Some custom assertion code used in advanced and basic test cases
self.assertEqual(first, second)
# basic_test.py
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from helpers import CustomTestCase
class BasicTest(CustomTestCase):
# Includes test cases that use custom assertion method
def test_pass(self) -> None:
self.customAssert(1, 1)
# advanced_test.py
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from helpers import CustomTestCase
class AdvancedTest(CustomTestCase):
# Includes test cases that use custom assertion method and some further functionality (e.g. get_option())
def test_pass(self) -> None:
self.customAssert(self.get_option(), 1)
The outlined structure above allows me to use the test discovery functionality of python unittest.
> python -m unittest discover -p *_test.py -v
test_pass (advanced.advanced_test.AdvancedTest) ... ok
test_pass (basic.basic_test.BasicTest) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK
In order to be able to import CustomTestCase from helpers.py I had to resort to the ugly and probably bad idea of adding the parent directory to sys.path. Attempting to import via from ..helpers import CustomTestCase does not play nicely with the test discovery of python unittest (ImportError: attempted relative import beyond top-level package).
How could the Tests directory be structured to allow defining such a CustomTestClass that can be used to implement the test cases in the subdirectory without resorting to the sys.path.insert hack used?
Suppose i have a project structure like this
src
└── app
├── main.py
├── db
│ └── database.py
├── models
│ ├── model_a.py
│ └── model_b.py
└── tests
├── test_x.py
└── test_y.py
I want to check which file uses a class or a function from another file. I have a class called Test in main.py
class Test:
pass
I used that class in model_a,
from ..main import Test
But in model_b i used
from ..main import Test
from ..db.database import Data
I want to to check which file uses another file, just like tree command, just a folder name is enough so i tried an old method but it was inefficient ,dirty and that was not something that i expect. The method was i created a file in src named check.py, i imported all packages
from app.db import database
from app.models import model_a, model_b
from app.tests import test_x, test_y
from app import main
print('__file__={0:<35} | __name__={1:<20} | __package__={2:<20}'.format(__file__,__name__,str(__package__)))
And i added this line in the bottom of all files
print('__file__={0:<35} | __name__={1:<20} | __package__={2:<20}'.format(__file__,__name__,str(__package__)))
So when i run check.py i get this result
__file__=/home/yagiz/Desktop/struct/src/app/main.py | __name__=app.main | __package__=app
__file__=/home/yagiz/Desktop/struct/src/app/db/database.py | __name__=app.db.database | __package__=app.db
__file__=/home/yagiz/Desktop/struct/src/app/models/model_a.py | __name__=app.models.model_a | __package__=app.models
__file__=/home/yagiz/Desktop/struct/src/app/models/model_b.py | __name__=app.models.model_b | __package__=app.models
__file__=/home/yagiz/Desktop/struct/src/app/tests/test_x.py | __name__=app.tests.test_x | __package__=app.tests
__file__=/home/yagiz/Desktop/struct/src/app/tests/test_y.py | __name__=app.tests.test_y | __package__=app.tests
__file__=/home/yagiz/Desktop/struct/src/check.py | __name__=__main__ | __package__=None
The result is dirty and doesn't meet my expectations is there a way to get a output like this?
main.py = app/models/model_a, app/models/model_b # These files imports something from main.py
models_b = None # No file imports from models_b
Update, i tried #Hessam Korki's suggestion it doesn't works.
I looked up the source code of modulefinder and i found it adds a badmodule in every import statement which is not useful for me.
Here is how did it go, first i created a function, also i created an another project structure.
src
├── empty.py
├── __init__.py
├── main.py
├── module_finder.py
├── other
│ └── other.py
├── test
│ └── some_app.py
└── this_imports.py
Here is the module_finder.py that contains my function
from modulefinder import ModuleFinder
file_names = ["this_imports.py", "main.py", "test/some_app.py", "other/other.py", "empty.py"]
def check_imports(file_names):
finder = ModuleFinder()
for file in file_names:
finder.run_script(file)
print("\n", file)
for name, mod in finder.modules.items():
print('%s: ' % name, end='')
print(','.join(list(mod.globalnames.keys())[:3]))
print('\n'.join(finder.badmodules.keys()))
Empty file is empty(as expected), in main.py i have
class Test:
pass
In this_imports.py i only have
from src.main import Test
In other/other.py i have
from src.main import Test
from src.test import DifferentTest
And for the last one in test/some_app.py i have
from src.main import Test
class DifferentTest:
pass
So the result should be:
empty.py = None
main.py = None
other/other.py = src.main , src.test
test/some_app.py = src.main
this_imports.py = src.main
But the function gives a wrong result, here is the output:
Filename: this_imports.py
__main__: Test
src.main
Filename: main.py
__main__: Test,__module__,__qualname__
src.main
Filename: test/some_app.py
__main__: Test,__module__,__qualname__
src.main
Filename: other/other.py
__main__: Test,__module__,__qualname__
src.main
src.test
Filename: empty.py
__main__: Test,__module__,__qualname__
src.main
src.test
I believe python's Modulefinder will effectively solve your problem. There is a key named '__main__' in the Modulefinder().items() which holds the modules that were imported in a python file. After running the script through your project and storing the data in a way that suits your purpose, you should be good to go
What you are looking for is to find import dependencies in your package modules. You can run a static analysis on your package directory and parse the import nodes in the syntax trees (ast), and build a dependency graph. Something like below:
import os
from ast import NodeVisitor, parse
import networkx as nx
class Dependency():
def __init__(self, root):
self.root = root
self.base = os.path.basename(root)
self.dependency = nx.DiGraph()
self.visitor = NodeVisitor()
self.visitor.visit_ImportFrom = self.visit_ImportFrom
self.current_node = None
self.dependency.add_node = self.base
def visit_ImportFrom(self, node):
self.dependency.add_edge(node.module, self.current_node)
self.visitor.generic_visit(node)
def run(self):
for root, dirs, files in os.walk(self.root):
for file in files:
full_path = os.path.join(root+os.sep, file)
loc = full_path.split(self.root+os.sep)[1].replace(os.sep,'.')
self.current_node = self.base+'.'+loc
with open(full_path) as fp:
src = fp.read()
tree = parse(src)
self.visitor.generic_visit(tree)
dependency = {}
for src, target in nx.dfs_edges(self.dependency):
if src in dependency:
dependency[src].add(target)
else:
dependency[src] = set([target])
return dependency
For the root location of any package you want to map the import dependencies, you need to do the following then:
root = "path/to/your/src"
d = Dependency(root)
d.run()
This will return the dependency tree (as a dict). Note, we parsed only ImportFrom, you need to add Import to make it complete. Also, all imports are assumed absolute here (i.e. no .. etc). If required, you can add that too (check the level field of the ImportFrom node to do that).
Consider the following package structure:
.
├── module
│ ├── __init__.py
│ └── submodule
│ ├── attribute.py
│ ├── data.txt
│ └── __init__.py
└── test.py
and the following piece of code:
import pkgutil
data = pkgutil.get_data('module.submodule', 'data.txt')
import module.submodule.attribute
retval = module.submodule.attribute.hello()
Running this will raise the error:
Traceback (most recent call last):
File "test.py", line 7, in <module>
retval = module.submodule.attribute.hello()
AttributeError: module 'module' has no attribute 'submodule'
However, if you run the following:
import pkgutil
import module.submodule.attribute
data = pkgutil.get_data('module.submodule', 'data.txt')
retval = module.submodule.attribute.hello()
or
import pkgutil
import module.submodule.attribute
retval = module.submodule.attribute.hello()
it works fine.
Why does running pkgutil.get_data disrupt the future import?
First of all, this was a great question and a great opportunity to learn something new about python's import system. So let's dig in!
If we look at the implementation of pkgutil.get_data we see something like this:
def get_data(package, resource):
spec = importlib.util.find_spec(package)
if spec is None:
return None
loader = spec.loader
if loader is None or not hasattr(loader, 'get_data'):
return None
# XXX needs test
mod = (sys.modules.get(package) or
importlib._bootstrap._load(spec))
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
return loader.get_data(resource_name)
And the answer to your question is in this part of the code:
mod = (sys.modules.get(package) or
importlib._bootstrap._load(spec))
It looks at the already loaded packages and if the package we're looking for (module.submodule in this example) exists it uses it and if not, then tries to load the package using importlib._bootstrap._load.
So let's look at the implementation of importlib._bootstrap._load to see what's going on.
def _load(spec):
"""Return a new module object, loaded by the spec's loader.
The module is not added to its parent.
If a module is already in sys.modules, that existing module gets
clobbered.
"""
with _ModuleLockManager(spec.name):
return _load_unlocked(spec)
Well, There's right there! The doc says "The module is not added to its parent."
It means the submodule module is loaded but it's not added to the module module. So when we try to access the submodule via module there's no connection, hence the AtrributeError.
It makes sense for the get_data method to use this function as it just wants some other file in the package and there is no need to import the whole package and add it to its parent and its parents' parent and so on.
to see it yourself I suggest using a debugger and setting some breakpoints. Then you can see what happens step by step along the way.
I am running multiple tests in a tests package, and I want to print each module name in the package, without duplicating code.
So, I wanted to insert some code to __init__.py or conftest.py that will give me the executing module name.
Let's say my test modules are called: checker1, checker2, etc...
My directory structure is like this:
tests_dir/
├── __init__.py
├── conftest.py
├── checker1
├── checker2
└── checker3
So, inside __init__.py I tried inserting:
def module_name():
return os.path.splitext(__file__)[0]
But it still gives me __init__.py from each file when I call it.
I also tried using a fixture inside conftest.py, like:
#pytest.fixture(scope='module')
def module_name(request):
return request.node.name
But it seems as if I still need to define a function inside each module to get module_name as a parameter.
What is the best method of getting this to work?
Edit:
In the end, what I did is explained here:
conftest.py
#pytest.fixture(scope='module', autouse=True)
def module_name(request):
return request.node.name
example for a test file with a test function. The same needs to be added to each file and every function:
checker1.py
from conftest import *
def test_columns(expected_res, actual_res, module_name):
expected_cols = expected_res.columns
actual_cols = actual_res.columns
val = expected_cols.difference(actual_cols) # verify all expected cols are in actual_cols
if not val.empty:
log.error('[{}]: Expected columns are missing: {}'.format(module_name, val.values))
assert val.empty
Notice the module_name fixture I added to the function's parameters.
expected_res and actual_res are pandas Dataframes from excel file.
log is a Logger object from logging package
In each module (checker1, checker2, checker3, conftest.py), in the main function, execute
print(__name__)
When the __init__.py file imports those packages, it should print the module name along with it.
Based on your comment, you can perhaps modify the behaviour in the __init__.py file for local imports.
__init.py__
import sys, os
sys.path.append(os.path.split(__file__)[0])
def my_import(module):
print("Module name is {}".format(module))
exec("import {}".format(module))
testerfn.py
print(__name__)
print("Test")
Directory structure
tests_dir/
├── __init__.py
└── testerfn.py
Command to test
import tests_dir
tests_dir.my_import("testerfn")
I have a module - let's call it foo - and I want to make it usable via a python -m foo call. My program look like this:
my_project
├── foo
│ └── __init__.py
└── my_program.py
In __init__.py I have some code which I run when calling python -m foo:
def bar(name):
print(name)
# -- code used to 'run' the module
def main(name):
bar("fritz")
if __name__ == "__main__":
main()
Since I have a fair amount of execution code in __init__.py now (argparse stuff and some logic) I want to separate it into a __main__.py:
my_project
├── foo
│ ├── __init__.py
│ └── __main__.py
└── my_program.py
Despite that looks very simple to me I didn't manage to import stuff located in __init__.py from __main__.py yet.
I know - if foo is located in site-packages or accessible via PYTHONPATH I can just import foo..
But in case I want to execute __main__.py directly (e.g. from some IDE) with foo located anywhere (i.e. not a folder where Python looks for packages) - is there a way to import foo (__init__.py from the same directory)?
I tried import . and import foo - but both approaches fail (because they just mean something else of course)
What I can do - at least to explain my goal - is something like this:
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
import foo
Works, but is ugly and a bit dangerous since I don't even know if I really import foo from the same directory..
You can manually set the module import state as if __main__.py were executed with -m:
# foo/__main__.py
import os
import sys
if __package__ is None and __name__ == "__main__": # executed without -m
# set special attributes as if part of the package
__file__ = os.path.abspath(__file__)
__package__ = os.path.basename(os.path.dirname(__file__))
# replace import path for __main__ with path for package
main_path = os.path.dirname(__file__)
try:
index = sys.path.index(dir_path)
if index != 0 or index != 1:
raise ValueError('expected script directory after current directory or matching it')
except ValueError:
raise RuntimeError('sys.path does not include script directory as expected')
else:
sys.path[index] = main_path
# import regularly
from . import bar
This exploits that python3 path/to/foo/__main__.py executes __main__ as a standalone script: __package__ is None and the __name__ does not include the package either. The search path in this case is <current directory>, <__main__ directory>, ..., though it gets collapsed if the two are the same: the index is either 0 or 1.
As with all trickery on internals, there is some transient state where invariants are violated. Do not perform any imports before the module is patched!