Parallel: Import a python file from sibling folder - python

I have a directory tree
working_dir\
main.py
my_agent\
my_worker.py
my_utility\
my_utils.py
Code in each file is as follows
""" main.py """
import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from my_agent.my_worker import MyWorker
import ray
ray.init()
workers = [MyWorker.remote(i) for i in range(10)]
ids = [worker.get_id.remote() for worker in workers]
# print(*ids, sep='\n')
print(*ray.get(ids), sep='\n')
""" worker.py """
from my_utility import my_utils
import ray
#ray.remote
class MyWorker():
def __init__(self, id):
self.id = id
def get_id(self):
return my_utils.f(self.id)
""" my_utils.py """
def f(id):
return '{}: Everything is fine...'.format(id)
Here's a part of the error message I received
Traceback (most recent call last):
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/function_manager.py", line 616, in fetch_and_register_actor
unpickled_class = pickle.loads(pickled_class)
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/cloudpickle/cloudpickle.py", line 894, in subimport
import(name)
ImportError: No module named 'my_utility'
Traceback (most recent call last):
File "main.py", line 12, in
print(*ray.get(ids), sep='\n')
File "/Users/aptx4869/anaconda3/envs/p35/lib/python3.5/site-packages/ray/worker.py", line 2377, in get
raise value
ray.worker.RayTaskError: ray_worker (pid=30025, host=AiMacbook)
Exception: The actor with name MyWorker failed to be imported, and so cannot execute this method
If I remove all statements related to ray, the above code works fine. Therefore, I boldly guess the reason is that ray runs each actor in a new process and sys.path.append only works in the main process. So I add the following code to worker.py
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
But it still does not work: the same error message shows up. Now I run out of ideas, what should I do?

You are correct about what the issue is.
In your example, you modify sys.path in main.py in order to be able to import my_agent.my_worker and my_utility.my_utils.
However, this path change is not propagated to the worker processes, so if you were to run a remote function like
#ray.remote
def f():
# Print the PYTHONPATH on the worker process.
import sys
print(sys.path)
f.remote()
You would see that sys.path on the worker does not include the parent directory that you added.
The reason that modifying sys.path on the worker (e.g., in the MyWorker constructor) doesn't work is that the MyWorker class definition is pickled and shipped to the workers. Then the worker unpickles it, and the process of unpickling the class definition requires my_utils to be imported, and this fails because the actor constructor hasn't had a chance to run yet.
There are a couple possible solutions here.
Run the script with something like
PYTHONPATH=$(dirname $(pwd)):$PYTHONPATH python main.py
(from within working_dir/). That should solve the issue because in this case the worker processes are forked from the scheduler process (which is forked from the main Python interpreter when you call ray.init() and so the environment variable will be inherited by the workers (this doesn't happen for sys.path presumably because it is not an environment variable).
It looks like adding the line
parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
os.environ["PYTHONPATH"] = parent_dir + ":" + os.environ.get("PYTHONPATH", "")
in main.py (before the ray.init() call) also works for the same reason as above.
Consider adding a setup.py and installing your project as a Python package so that it's automatically on the relevant path.

The new "Runtime Environments" feature, which didn't exist at the time of this post, should help with this issue: https://docs.ray.io/en/latest/handling-dependencies.html#runtime-environments. (See the working_dir and py_modules entries.)

Related

dask worker ModuleNotFoundError when import is not in current directory

I have setup the file system as such:
\project
\something
__init__.py
some.py (with a function test() defined)
run.py
And my run.py looks like this:
import os
import sys
import dask
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
import time
def run_task1():
sys.path.append('/project/something')
from some import test
return test()
def run_task2():
from something.some import test # because something dir is in the current working dir
return test()
def run_controller():
cluster = SlurmCluster(...)
cluster.scale_up(2)
client = Client(cluster)
sys.path.append('/project/something')
os.environ['PATH'].append('/project/something')
os.environ['PYTHONPATH'].append('/project/something')
from some import test
v1 = [
#dask.delayed(run_task1)() for _ in range(2) #<--- this works
#dask.delayed(run_task2)() for _ in range(2) #<--- this works too
dask.delayed(test)() for _ in range(2) #<--- fails, but I need to do this
]
values = dask.compute(*v1)
return values
values = run_controller()
And the error is that worker fails immediately as it could not run test() as it could not import it from some.py. I verified that dask worker's os.environ['PATH'], os.environ['PYTHONPATH'] and sys.path all have the added path to some.py, but the dask worker still could not run it. Below is the error logged in the slurm log.
'''
ModuleNotFoundError: No module named 'some'
distributed.worker - ERROR - Could not deserialize task
'''
I need to run the function directly, aka I cannot have a wrapper that executes an explicit import in the dask worker, but that is the method that does not work.
I have a hacky solution to get a solution like run_task2(), by creating a symlink to some.py in the current working dir. But I am wondering if there is a proper way to setup the environment of the dask worker so that a direct dask.delayed call on test() works.

"Most likely due to circular import" in Python

import threading
import time
start = time.perf_counter()
def do_something():
print("Sleeping in 1 second")
time.sleep(1)
print("Done sleeping")
t1 = threading.Thread(target=do_something)
t2 = threading.Thread(target=do_something)
finish = time.perf_counter()
print(f"Finished in {round(finish-start,1)} seconds(s) ")
Does anyone know why this piece of code returns this error when run and how to fix it?
Traceback (most recent call last):
File "c:/Users/amanm/Desktop/Python/Python Crash Course/threading.py", line 1, in <module>
import threading
File "c:\Users\amanm\Desktop\Python\Python Crash Course\threading.py", line 12, in <module>
t1 = threading.Thread(target=do_something)
AttributeError: partially initialized module 'threading' has no attribute 'Thread' (most likely due to a circular import)
When I run this code in normal IDLE it seems to work but it doesn't work in Visual Studio Code.
It seems like the program file you have created is named threading.py, and you are importing a module also called threading. This causes a circular import because your file is shadowing the built-in module.
Please rename your program (e.g., threading_example.py).
I solved my problem, example code:
Main.py
if __name__ == "__main__":
import package2
pack2class = package2.Package2(main=self)
package2.py
import Main
class Package2(object):
def __init__(self, main:Main.MainClass): # for suggestion code
pass # your codes ...
When importing modules, python checks the files in your current working directory first, before checking other built-in modules. So, you probably have a file named threading.py which doesn't have the necessary attributes. In other words, you made a circular import.

Module 'a' has no attribute 'b' while importing module from same directory

I have following directory structure in my Python project:
- dump_specs.py
/impa
- __init__.py
- server.py
- tasks.py
I had a problem with circular references. dump_specs.py needs a reference to app from server.py. server.py is a Flask app which needs a references to celery tasks from tasks.py. So dump_specs.py looks like:
#!/usr/bin/env python3
import impa.server
def dump_to_dir(dir_path):
# Do something
client = impa.server.app.test_client()
# Do the rest of things
impa/server.py looks like:
#!/usr/bin/env python3
import impa.tasks
app = Flask(__name__)
# Definitions of endpoints, some of them use celery tasks -
# that's why I need impa.tasks reference
And impa/tasks.py:
#!/usr/bin/env python3
from celery import Celery
import impa.server
def make_celery(app):
celery = Celery(app.import_name,
broker=app.config['CELERY_BROKER_URL'],
backend=app.config['CELERY_RESULT_BACKEND'])
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
celery = make_celery(impa.server.app)
When I'm trying to dump specs with ./dump_specs.py I've got an error:
./dump_specs.py specs
Traceback (most recent call last):
File "./dump_specs.py", line 9, in <module>
import impa.server
File "/build/impa/server.py", line 23, in <module>
import impa.tasks
File "/build/impa/tasks.py", line 81, in <module>
celery = make_celery(impa.server.app)
AttributeError: module 'impa' has no attribute 'server'
And I can't understand what's wrong. Could someone explain what's happening and how to get rid of this error?
If I have managed to reproduce your problem correctly on my host, it should help youto insert import impa.tasks into dump_specs.py above import impa.server.
The way your modules depend on each other, the loading order is important. IIRC (the loading machinery is described in greater details in the docs), when you first try to import impa.server, it will on line 23 try to import impa.tasks, but import of impa.server is not complete at this point. There is import impa.server in impa.tasks, but we do not got back and import it at this time (we'd otherwise end up in a full circle) and continue importing impa.tasts until we want to access impa.server.app, but we haven't gotten to the point we could do that yet, impa.server has not been imported yet.
When possible, it would also help if the code accessing another module in your package wasn't executed on import (directly called as part of the modules instead of being in a function or a class which would be called/used after the imports have completed).

python importlib can't find a module inside a daemon context

I've got a script that imports modules dynamically based on configuration. I'm trying to implement a daemon context (using the python-daemon module) on the script, and it seems to be interfering with python's ability to find the modules in question.
Insite mymodule/__init__.py in setup() I do this:
load_modules(args, config, logger)
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
main_loop(config)
I've got a call to setup() inside mymodule/__main__.py and I'm loading the whole thing this way:
PYTHONPATH=. python -m mymodule
This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so I want to move that function call inside the daemon context like so:
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
load_modules(args, config, logger)
main_loop(config)
Modules are loaded inside load_modules() this way:
for mysubmodule in modules:
try:
i = importlib.import_module("mymodule.{}".format(mysubmodule))
except ImportError as err:
logger.error("import of mymodule.{} failed: {}".format(
mysubmodule, err))
With load_modules() outside the daemon context this works fine. When I move it inside the daemon context it seems to be unable to find the modules it's looking for. I get this:
import of mymodule.submodule failed: No module named submodule
It looks like some sort of namespace problem -- I note that the exception only refers to the submodule portion of the module name I try to import -- but I've compared everything I can think of inside and outside the daemon context, and I can't find the important difference. sys.path is unchanged, the daemon context isn't clearing the environemnt, or chrooting. The cwd changes to / of course, but that shouldn't have any effect on python's ability to find modules, since the absolute path to . appears in sys.path.
What am I missing here?
EDIT: I'm adding an SSCCE to make the situation more clear. The following three files create a module called "mymodule" that can be run from the command line as PYTHONPATH=. python -m mymodule. There are two calls to load_module() in __init__.py, one commented out. You can demonstrate the problem by swapping which one is commented.
mymodule/__main__.py
from mymodule import setup
import sys
if __name__ == "__main__":
sys.exit(setup())
mymodule/__init__.py
import daemon
import importlib
import logging
def main_loop():
logger = logging.getLogger('loop')
logger.debug("Code runs here.")
def load_module():
logger = logging.getLogger('load_module')
submodule = 'foo'
try:
i = importlib.import_module("mymodule.{}".format(submodule))
except ImportError as e:
logger.error("import of mymodule.{} failed: {}".format(
submodule, e))
def setup_logging():
logfile = 'mymodule.log'
fh = logging.FileHandler(logfile)
root_logger = logging.getLogger()
root_logger.addHandler(fh)
root_logger.setLevel(logging.DEBUG)
def get_logfile_handlers(logger):
handlers = []
for handler in logger.handlers:
handlers.append(handler.stream.fileno())
return handlers
def setup():
setup_logging()
logger = logging.getLogger()
# load_module()
with daemon.DaemonContext(
files_preserve = get_logfile_handlers(logger)
):
load_module()
main_loop()
mymodule/foo.py
import logging
logger=logging.getLogger('foo')
logger.debug("Inside foo.py")
I spent a good 4 hours trying to work this one out when I hit it in my own project. The clue is here:
If the module being imported is supposed to be contained within a package then the second argument passed to find_module(), __path__ on the parent package, is used as the source of paths.
(From https://docs.python.org/2/reference/simple_stmts.html#import)
Once you have successfully imported mymodule, python2 no longer uses sys.path to search for the submodules, it uses sys.modules["mymodule"].__path__. When you import mymodule, python2 unhelpfully sets its __path__ to the relative directory it was stored in:
mymodule.__path__ = ['mymodule']
After daemonizing, python's CWD is set to / and the only place the import internals search for mysubmodule is in /mymodule.
I worked around this by using os.chdir() to change CWD back to the old dir after daemonizing:
oldcwd = os.getcwd()
with DaemonizeContext():
os.chdir(oldcwd)
# ... daemon things
This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so
No. load_modules() should load modules. It should not open ports.
If you need to preserve a file or socket opened outside the context, pass it to files_preserve. If possible, it is preferred to simply open files and such inside the context instead, as I suggest above.

Python: perform relative import when using __import__?

Here are the files in this test:
main.py
app/
|- __init__.py
|- master.py
|- plugin/
|- |- __init__.py
|- |- p1.py
|- |_ p2.py
The idea is to have a plugin-capable app. New .py or .pyc files can be dropped into plugins that adhere to my API.
I have a master.py file at the app level that contains global variables and functions that any and all plugins may need access to, as well as the app itself. For the purposes of this test, the "app" consists of a test function in app/__init__.py. In practice the app would probably be moved to separate code file(s), but then I'd just use import master in that code file to bring in the reference to master.
Here's the file contents:
main.py:
import app
app.test()
app.test2()
app/__init__.py:
import sys, os
from plugin import p1
def test():
print "__init__ in app is executing test"
p1.test()
def test2():
print "__init__ in app is executing test2"
scriptDir = os.path.join ( os.path.dirname(os.path.abspath(__file__)), "plugin" )
print "The scriptdir is %s" % scriptDir
sys.path.insert(0,scriptDir)
m = __import__("p2", globals(), locals(), [], -1)
m.test()
app/master.py:
myVar = 0
app/plugin/__init__.py:
<empty file>
app/plugin/p1.py:
from .. import master
def test():
print "test in p1 is running"
print "from p1: myVar = %d" % master.myVar
app/plugin/p2.py:
from .. import master
def test():
master.myVar = 2
print "test in p2 is running"
print "from p2, myVar: %d" % master.myVar
Since I explicitly import the p1 module, everything works as expected. However, when I use __import__ to import p2, I get the following error:
__init__ in app is executing test
test in p1 is running
from p1: myVar = 0
__init__ in app is executing test2
The scriptdir is ....../python/test1/app/plugin
Traceback (most recent call last):
File "main.py", line 4, in <module>
app.test2()
File "....../python/test1/app/__init__.py", line 17, in test2
m = __import__("p2", globals(), locals(), [], -1)
File "....../python/test1/app/plugin/p2.py", line 1, in <module>
from .. import master
ValueError: Attempted relative import in non-package
Execution proceeds all the way through the test() function and errors out right as test2() tries to execute its __import__ statement, which in turn p2 tries to do a relative import (which does work when p1 is imported explicitly via the import statement, recall)
It's clear that using __import__ is doing something different than using the import statement. The Python docs state that using import simply translates to an __import__ statement internally but there has to be more going on than meets the eye.
Since the app is plugin-based, coding explicit import statements in the main app would of course not be feasible. Using import itself within the
What am I missing here? How can I get Python to behave as expected when manually importing modules using __import__? It seems maybe I'm not fully understanding the idea of relative imports, or that I'm just missing something with respect to where the import is occurring (i.e. inside a function rather than at the root of the code file)
EDIT: I found the following possible, but unsuccessful solutions:
m = __import__("p2",globals(),locals(),"plugin")
(returns the same exact error as above)
m = __import__("plugin",fromlist="p2")
(returns a reference to app.plugin, not to app.plugin.p2)
m = __import__("plugin.p2",globals(),locals())
(returns a reference to app.plugin, not to app.plugin.p2)
import importlib
m = importlib.import_module("plugin.p2")
(returns:)
Traceback (most recent call last):
File "main.py", line 4, in <module>
app.test2()
File "....../python/test1/app/__init__.py", line 20, in test2
m = importlib.import_module("plugin.p2")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named plugin.p2
I've had a similar problem.
__import__ only imports submodules if all parent __init__.py files are empty.
You should use importlib instead
import importlib
p2 = importlib.import_module('plugin.p2')
Have you tried the following syntax:
How to use python's import function properly __import__()
It worked for me with a similar problem...
I never did find a solution, so I ended up deciding to restructure the program.
What I did was set up the main app as a class. Then, I also changed each plugin into a class. Then, as I load plugins using import, I also instantiate the class inside each plugin which has a predefined name, and pass in the reference to the main app class.
This means that each class can directly read and manipulate variables back in the host class simply by using the reference. It is totally flexible because anything that the host class exports is accessible by all the plugins.
This turns out to be more effective and doesn't depend on relative paths and any of that stuff. It also means one Python interpreter could in theory run multiple instances of the host app simultaneously (on different threads for example) and the plugins will still refer back to the correct host instance.
Here's basically what I did:
main.py:
import os, os.path, sys
class MyApp:
_plugins = []
def __init__(self):
self.myVar = 0
def loadPlugins(self):
scriptDir = os.path.join ( os.path.dirname(os.path.abspath(__file__)), "plugin" )
sys.path.insert(0,scriptDir)
for plug in os.listdir(scriptDir):
if (plug[-3:].lower() == ".py"):
m = __import__(os.path.basename(plug)[:-3])
self._plugins.append(m.Plugin(self))
def runTests(self):
for p in self._plugins:
p.test()
if (__name__ == "__main__"):
app = MyApp()
app.loadPlugins()
app.runTests()
plugin/p1.py:
class Plugin:
def __init__(self, host):
self.host = host
def test(self):
print "from p1: myVar = %d" % self.host.myVar
plugin/p2.py:
class Plugin:
def __init__(self, host):
self.host = host
def test(self):
print "from p2: variable set"
self.host.myVar = 1
print "from p2: myVar = %d" % self.host.myVar
There is some room to improve this, for example, validating each imported .py file to see if it's actually a plugin and so on. But this works as expected.
I have managed to find a solution to the problem.
By taking your example the following static import is needed to be dynamic
from .plugin import p2
the "." near plugin means there is a need to relative import and not absolute import.
I was able to do that with the following code snipset:
plugin = __import__('plugin', globals(), locals(), level=1, fromlist=['p2'])
p2 = getattr(plugin, 'p2')
level=1 Relative import parameter
fromlist Specify which sub modules to take from plugin module
As you mentioned, plugin holds the reference to 'plugin', thus additional getattr is needed to grep p2 from plugin

Categories