python importlib can't find a module inside a daemon context

python importlib can't find a module inside a daemon context - python

I've got a script that imports modules dynamically based on configuration. I'm trying to implement a daemon context (using the python-daemon module) on the script, and it seems to be interfering with python's ability to find the modules in question.
Insite mymodule/__init__.py in setup() I do this:
load_modules(args, config, logger)
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
main_loop(config)
I've got a call to setup() inside mymodule/__main__.py and I'm loading the whole thing this way:
PYTHONPATH=. python -m mymodule
This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so I want to move that function call inside the daemon context like so:
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
load_modules(args, config, logger)
main_loop(config)
Modules are loaded inside load_modules() this way:
for mysubmodule in modules:
try:
i = importlib.import_module("mymodule.{}".format(mysubmodule))
except ImportError as err:
logger.error("import of mymodule.{} failed: {}".format(
mysubmodule, err))
With load_modules() outside the daemon context this works fine. When I move it inside the daemon context it seems to be unable to find the modules it's looking for. I get this:
import of mymodule.submodule failed: No module named submodule
It looks like some sort of namespace problem -- I note that the exception only refers to the submodule portion of the module name I try to import -- but I've compared everything I can think of inside and outside the daemon context, and I can't find the important difference. sys.path is unchanged, the daemon context isn't clearing the environemnt, or chrooting. The cwd changes to / of course, but that shouldn't have any effect on python's ability to find modules, since the absolute path to . appears in sys.path.
What am I missing here?
EDIT: I'm adding an SSCCE to make the situation more clear. The following three files create a module called "mymodule" that can be run from the command line as PYTHONPATH=. python -m mymodule. There are two calls to load_module() in __init__.py, one commented out. You can demonstrate the problem by swapping which one is commented.
mymodule/__main__.py
from mymodule import setup
import sys
if __name__ == "__main__":
sys.exit(setup())
mymodule/__init__.py
import daemon
import importlib
import logging
def main_loop():
logger = logging.getLogger('loop')
logger.debug("Code runs here.")
def load_module():
logger = logging.getLogger('load_module')
submodule = 'foo'
try:
i = importlib.import_module("mymodule.{}".format(submodule))
except ImportError as e:
logger.error("import of mymodule.{} failed: {}".format(
submodule, e))
def setup_logging():
logfile = 'mymodule.log'
fh = logging.FileHandler(logfile)
root_logger = logging.getLogger()
root_logger.addHandler(fh)
root_logger.setLevel(logging.DEBUG)
def get_logfile_handlers(logger):
handlers = []
for handler in logger.handlers:
handlers.append(handler.stream.fileno())
return handlers
def setup():
setup_logging()
logger = logging.getLogger()
# load_module()
with daemon.DaemonContext(
files_preserve = get_logfile_handlers(logger)
):
load_module()
main_loop()
mymodule/foo.py
import logging
logger=logging.getLogger('foo')
logger.debug("Inside foo.py")

I spent a good 4 hours trying to work this one out when I hit it in my own project. The clue is here:
If the module being imported is supposed to be contained within a package then the second argument passed to find_module(), __path__ on the parent package, is used as the source of paths.
(From https://docs.python.org/2/reference/simple_stmts.html#import)
Once you have successfully imported mymodule, python2 no longer uses sys.path to search for the submodules, it uses sys.modules["mymodule"].__path__. When you import mymodule, python2 unhelpfully sets its __path__ to the relative directory it was stored in:
mymodule.__path__ = ['mymodule']
After daemonizing, python's CWD is set to / and the only place the import internals search for mysubmodule is in /mymodule.
I worked around this by using os.chdir() to change CWD back to the old dir after daemonizing:
oldcwd = os.getcwd()
with DaemonizeContext():
os.chdir(oldcwd)
# ... daemon things

This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so
No. load_modules() should load modules. It should not open ports.
If you need to preserve a file or socket opened outside the context, pass it to files_preserve. If possible, it is preferred to simply open files and such inside the context instead, as I suggest above.

Related

Module or singleton that uses logging and config

I've been trying to figure out how best to set this up. Cutting it down as much as I can. I have 4 python files: core.py (main), logger_controler.py, config_controller.py, and a 4th as a module or singleton well just call it tool.py.
The way I have it setup is logging has an init function that setup pythons built in logging with the necessary levels, formatter, directory location, etc. I call this init function in main.
import logging
import logger_controller
def main():
logger_controller.init_log()
logger = logging.getLogger(__name__)
if __name__ == "__main__":
main()
config_controller is using configparser and is mainly a singleton as a controller for my config.
import configparser
import logging
logger = logging.getLogger(__name__)
class ConfigController(object):
def __init__(self, *file_names):
self.config_parser = configparser.ConfigParser()
found_files = self.config_parser.read(file_names)
if not found_files:
raise ValueError("No config file found.")
self._validate()
def _validate(self):
...
def read_config(self, section, field):
try:
data = self.config_parser.get(section, field)
except (configparser.NoSectionError, configparser.NoOptionError) as e:
logger.error(e)
data = None
return data
config = ConfigController("config.ini")
And then my problem is trying to create the 4th file and making sure both my logger and config parser are running before it. I'm also wanting this 4th one to be a singleton so it's following a similar format as the config_controller.
So tool.py uses config_controller to pull anything it needs from the config file. It also has some error checking for if config_controller's read_config returns None as that isn't validated in _validate. I did this as I wanted my logging to have a general layer for error checking and a more specific layer. So _validate just checks if required fields and sections are in the config file. Then wherever the field is read will handle extra error checking.
So my main problem is this:
How do I have it where my logger and configparser are both running and available before anything else. I'm very much willing to rework all of this, but I'd like to keep the functionality of it all.
One attempt I tried that works, but seems very messy is making my logger_controler a singleton that just returns python's logging object.
import logging
import os
class MyLogger(object):
def __new__(cls, *args, **kwargs):
init_log()
return logging
def init_log():
...
mylogger = MyLogger()
Then in core.py
from logger_controller import mylogger
logger = mylogger.getLogger(__name__)
I feel like there should be a better way to do the above, but I'm honestly not sure how.
A few ideas:
Would I be able to extend the logging class instead of just using that init_log function?
Maybe there's a way I can make all 3 individual modules such that they each initialize in a correct order? My attempts here didn't quite work as I also have some internal data that I wouldn't want exposed to classes using the module, just the functionality.
I'd like to have it where all 3, logging, configparsing, and the tool, available anywhere I import them.
How I have it setup now "works" but if I were to import the tool.py anywhere in core.py and an error occurs that I need to catch, then my logger won't be able to log it as this tool is loading before the init of my logger.

Parallel: Import a python file from sibling folder

I have a directory tree
working_dir\
main.py
my_agent\
my_worker.py
my_utility\
my_utils.py
Code in each file is as follows
""" main.py """
import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from my_agent.my_worker import MyWorker
import ray
ray.init()
workers = [MyWorker.remote(i) for i in range(10)]
ids = [worker.get_id.remote() for worker in workers]
# print(*ids, sep='\n')
print(*ray.get(ids), sep='\n')
""" worker.py """
from my_utility import my_utils
import ray
#ray.remote
class MyWorker():
def __init__(self, id):
self.id = id
def get_id(self):
return my_utils.f(self.id)
""" my_utils.py """
def f(id):
return '{}: Everything is fine...'.format(id)
Here's a part of the error message I received
Traceback (most recent call last):
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/function_manager.py", line 616, in fetch_and_register_actor
unpickled_class = pickle.loads(pickled_class)
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/cloudpickle/cloudpickle.py", line 894, in subimport
import(name)
ImportError: No module named 'my_utility'
Traceback (most recent call last):
File "main.py", line 12, in
print(*ray.get(ids), sep='\n')
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/worker.py", line 2377, in get
raise value
ray.worker.RayTaskError: ray_worker (pid=30025, host=AiMacbook)
Exception: The actor with name MyWorker failed to be imported, and so cannot execute this method
If I remove all statements related to ray, the above code works fine. Therefore, I boldly guess the reason is that ray runs each actor in a new process and sys.path.append only works in the main process. So I add the following code to worker.py
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
But it still does not work: the same error message shows up. Now I run out of ideas, what should I do?

You are correct about what the issue is.
In your example, you modify sys.path in main.py in order to be able to import my_agent.my_worker and my_utility.my_utils.
However, this path change is not propagated to the worker processes, so if you were to run a remote function like
#ray.remote
def f():
# Print the PYTHONPATH on the worker process.
import sys
print(sys.path)
f.remote()
You would see that sys.path on the worker does not include the parent directory that you added.
The reason that modifying sys.path on the worker (e.g., in the MyWorker constructor) doesn't work is that the MyWorker class definition is pickled and shipped to the workers. Then the worker unpickles it, and the process of unpickling the class definition requires my_utils to be imported, and this fails because the actor constructor hasn't had a chance to run yet.
There are a couple possible solutions here.
Run the script with something like
PYTHONPATH=$(dirname $(pwd)):$PYTHONPATH python main.py
(from within working_dir/). That should solve the issue because in this case the worker processes are forked from the scheduler process (which is forked from the main Python interpreter when you call ray.init() and so the environment variable will be inherited by the workers (this doesn't happen for sys.path presumably because it is not an environment variable).
It looks like adding the line
parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
os.environ["PYTHONPATH"] = parent_dir + ":" + os.environ.get("PYTHONPATH", "")
in main.py (before the ray.init() call) also works for the same reason as above.
Consider adding a setup.py and installing your project as a Python package so that it's automatically on the relevant path.

The new "Runtime Environments" feature, which didn't exist at the time of this post, should help with this issue: https://docs.ray.io/en/latest/handling-dependencies.html#runtime-environments. (See the working_dir and py_modules entries.)

How to structure the Logging in this case (python)

I was wondering what would be the best way for me to structure my logs in a special situation.
I have a series of python services that use the same python files for communicating (ex. com.py) with the HW. I have logging implemented in this modules and i would like for it to be dependent(associated) with the main service that is calling the modules.
How should i structure the logger logic so that if i have:
main_service_1->module_for_comunication
The logging goes to file main_serv_1.log
main_service_2->module_for_comunication
The logging goes to file main_serv_2.log
What would be the best practice in this case without harcoding anything?
Is there a way to know the file which is importing the com.py, so that i am able inside of the com.py, to use this information to adapt the logging to the caller?

In my experience, for a situation like this, the cleanest and easiest to implement strategy is to pass the logger to the code that does the logging.
So, create a logger for each service you want to have log to a different file, and pass that logger in to the code from your communications module. You can use __name__ to get the name of the current module (the actual module name, without the .py extension).
In the example below I implemented a fallback for the case when no logger is passed in as well.
com.py
from log import setup_logger
class Communicator(object):
def __init__(self, logger=None):
if logger is None:
logger = setup_logger(__name__)
self.log = logger
def send(self, data):
self.log.info('Sending %s bytes of data' % len(data))
svc_foo.py
from com import Communicator
from log import setup_logger
logger = setup_logger(__name__)
def foo():
c = Communicator(logger)
c.send('foo')
svc_bar.py
from com import Communicator
from log import setup_logger
logger = setup_logger(__name__)
def bar():
c = Communicator(logger)
c.send('bar')
log.py
from logging import FileHandler
import logging
def setup_logger(name):
logger = logging.getLogger(name)
handler = FileHandler('%s.log' % name)
logger.addHandler(handler)
return logger
main.py
from svc_bar import bar
from svc_foo import foo
import logging
# Add a StreamHandler for the root logger, so we get some console output in
# addition to file logging (for easy of testing). Also set the level for
# the root level to INFO so our messages don't get filtered.
logging.basicConfig(level=logging.INFO)
foo()
bar()
So, when you execute python main.py, this is what you'll get:
On the console:
INFO:svc_foo:Sending 3 bytes of data
INFO:svc_bar:Sending 3 bytes of data
And svc_foo.log and svc_bar.log each will have one line
Sending 3 bytes of data
If a client of the Communicator class uses it without passing in a logger, the log output will end up in com.log (fallback).

I see several options:
Option 1
Use __file__. __file__ is the pathname of the file from which the module was loaded (doc). depending of your structure, you should be able to identify the module by performing an os.path.split() like so:
If the folder structure is
+- module1
| +- __init__.py
| +- main.py
+- module2
+- __init__.py
+- main.py
you should be able to obtain the module name with a code placed in main.py:
def get_name():
module_name = os.path.split(__file__)[-2]
return module_name
This is not exactly DRY because you need the same code in both main.py. Reference here.
Option 2
A bit cleaner is to open 2 terminal windows and use an environment variable. E.g. you can define MOD_LOG_NAME as MOD_LOG_NAME="main_service_1" in one terminal and MOD_LOG_NAME="main_service_2" in the other one. Then, in your python code you can use something like:
import os
LOG_PATH_NAME os.environ['MOD_LOG_NAME']
This follows separation of concerns.
Update (since the question evolved a bit)
Once you've established the distinct name, all you have to do is to configure the logger:
import logging
logging.basicConfig(filename=LOG_PATH_NAME,level=logging.DEBUG)
(or get_name())and run the program.

How can a global logging object be set up in an imported Python module?

I want to create a global logging object in a Python module that is imported by many small scripts. The Python module is designed to provide a consistent setup of things like logging, logos, timing etc. for all scripts. I'm trying to set up the logging in this module in order to make it easy to change the logging characteristics of all of these scripts at once.
The relevant part of the Python module is as follows:
if engageLog:
global log
log = logging.getLogger(__name__)
logging.root.addHandler(technicolor.ColorisingStreamHandler())
How could I write this such that the logging object is available in the scripts without requiring any setup beyond importing the module?
In the following example script, the module is called propyte:
#!/usr/bin/env python
"""
################################################################################
# #
# script-1 #
# #
################################################################################
Usage:
script-1 [options]
Options:
-h, --help display help message
--version display version and exit
-v, --verbose verbose logging
-u, --username=USERNAME username
--data=FILENAME input data file [default: data.txt]
"""
name = "script-1"
version = "2015-10-21T1331Z"
import os
import sys
import time
import docopt
import propyte
def main(options):
global program
program = propyte.Program(options = options)
# access options and arguments
input_data_filename = options["--data"]
log.info("")
log.info("input data file: {filename}".format(
filename = input_data_filename
))
print("")
program.terminate()
if __name__ == "__main__":
options = docopt.docopt(__doc__)
if options["--version"]:
print(version)
exit()
main(options)

Just use the log attribute on the module:
import propyte
propyte.log # the global log object in the propyte module
Note however that the logging module configuration is already global. You can use logging.getLogger('somename') and it'll return a singleton logger object associated with that name. Configuration applied to that object will still be present when you retrieve the same logger in another module.

How to temporarily modify sys.path in Python?

In one of my testing scripts in Python I use this pattern several times:
sys.path.insert(0, "somedir")
mod = __import__(mymod)
sys.path.pop(0)
Is there a more concise way to temporarily modify the search path?

You could use a simple context manager:
import sys
class add_path():
def __init__(self, path):
self.path = path
def __enter__(self):
sys.path.insert(0, self.path)
def __exit__(self, exc_type, exc_value, traceback):
try:
sys.path.remove(self.path)
except ValueError:
pass
Then to import a module you can do:
with add_path('/path/to/dir'):
mod = __import__('mymodule')
On exit from the body of the with statement sys.path will be restored to the original state. If you only use the module within that block you might also want to delete its reference from sys.modules:
del sys.modules['mymodule']

Appending a value to sys.path only modifies it temporarily, i.e for that session only.
Permanent modifications are done by changing PYTHONPATH and the default installation directory.
So, if by temporary you meant for current session only then your approach is okay, but you can remove the pop part if somedir is not hiding any important modules that is expected to be found in in PYTHONPATH ,current directory or default installation directory.
http://docs.python.org/2/tutorial/modules.html#the-module-search-path

Here is an alternative implementation of the contextmanager implementation from Eugene Yarmash (use contextlib and pathlib.Path-compatible):
import os
import sys
import contextlib
from typing import Iterator, Union
#contextlib.contextmanager
def add_sys_path(path: Union[str, os.PathLike]) -> Iterator[None]:
"""Temporarily add the given path to `sys.path`."""
path = os.fspath(path)
try:
sys.path.insert(0, path)
yield
finally:
sys.path.remove(path)
with add_sys_path('/path/to/dir'):
mymodule = importlib.import_module('mymodule')

If you're testing uses pytest, they have a great fixture that, among other traits, handles this exact case:
The monkeypatch fixture helps you to safely set/delete an attribute,
dictionary item or environment variable, or to modify sys.path for
importing...
All modifications will be undone after the requesting test function or
fixture has finished. The raising parameter determines if a KeyError
or AttributeError will be raised if the target of the set/deletion
operation does not exist
In describing syspath_prepend:
Use monkeypatch.syspath_prepend to modify sys.path which will also
call pkg_resources.fixup_namespace_packages and
importlib.invalidate_caches().
sample use:
def test_myfunc(monkeypatch):
with monkeypatch.context() as m:
m.syspath_prepend('my/module/path')
mod = __import__(mymod)
# Out here, the context manager expires and syspath is reset

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python importlib can't find a module inside a daemon context - python

Related

Module or singleton that uses logging and config

Parallel: Import a python file from sibling folder

How to structure the Logging in this case (python)

How can a global logging object be set up in an imported Python module?

How to temporarily modify sys.path in Python?

Categories

Resources