IPC with a Python subprocess - python

I'm trying to do some simple IPC in Python as follows: One Python process launches another with subprocess. The child process sends some data into a pipe and the parent process receives it.
Here's my current implementation:
# parent.py
import pickle
import os
import subprocess
import sys
read_fd, write_fd = os.pipe()
if hasattr(os, 'set_inheritable'):
os.set_inheritable(write_fd, True)
child = subprocess.Popen((sys.executable, 'child.py', str(write_fd)), close_fds=False)
try:
with os.fdopen(read_fd, 'rb') as reader:
data = pickle.load(reader)
finally:
child.wait()
assert data == 'This is the data.'
# child.py
import pickle
import os
import sys
with os.fdopen(int(sys.argv[1]), 'wb') as writer:
pickle.dump('This is the data.', writer)
On Unix this works as expected, but if I run this code on Windows, I get the following error, after which the program hangs until interrupted:
Traceback (most recent call last):
File "child.py", line 4, in <module>
with os.fdopen(int(sys.argv[1]), 'wb') as writer:
File "C:\Python34\lib\os.py", line 978, in fdopen
return io.open(fd, *args, **kwargs)
OSError: [Errno 9] Bad file descriptor
I suspect the problem is that the child process isn't inheriting the write_fd file descriptor. How can I fix this?
The code needs to be compatible with Python 2.7, 3.2, and all subsequent versions. This means that the solution can't depend on either the presence or the absence of the changes to file descriptor inheritance specified in PEP 446. As implied above, it also needs to run on both Unix and Windows.
(To answer a couple of obvious questions: The reason I'm not using multiprocessing is because, in my real-life non-simplified code, the two Python programs are part of Django projects with different settings modules. This means they can't share any global state. Also, the child process's standard streams are being used for other purposes and are not available for this.)
UPDATE: After setting the close_fds parameter, the code now works in all versions of Python on Unix. However, it still fails on Windows.

subprocess.PIPE is implemented for all platforms. Why don't you just use this?
If you want to manually create and use an os.pipe(), you need to take care of the fact that Windows does not support fork(). It rather uses CreateProcess() which by default not make the child inherit open files. But there is a way: each single file descriptor can be made explicitly inheritable. This requires calling Win API. I have implemented this in gipc, see the _pre/post_createprocess_windows() methods here.

As #Jan-Philip Gehrcke suggested, you could use subprocess.PIPE instead of os.pipe():
#!/usr/bin/env python
# parent.py
import sys
from subprocess import check_output
data = check_output([sys.executable or 'python', 'child.py'])
assert data.decode().strip() == 'This is the data.'
check_output() uses stdout=subprocess.PIPE internally.
You could use obj = pickle.loads(data) if child.py uses data = pickle.dumps(obj).
And the child.py could be simplified:
#!/usr/bin/env python
# child.py
print('This is the data.')
If the child process is written in Python then for greater flexibility you could import the child script as a module and call its function instead of using subprocess. You could use multiprocessing, concurrent.futures modules if you need to run some Python code in a different process.
If you can't use standard streams then your django applications could use sockets to talk to one another.
The reason I'm not using multiprocessing is because, in my real-life non-simplified code, the two Python programs are part of Django projects with different settings modules. This means they can't share any global state.
This seems bogus. multiprocessing under-the-hood also may use subprocess module. If you don't want to share global state -- don't share it -- it is the default for multiple processes. You should probably ask a more specific for your particular case question about how to organize the communication between various parts of your project.

Related

Multiprocessing python within frozen script

I am trying to compile a script utilizing multiprocessing into a Windows executable. At first I ran into the same issue as Why python executable opens new window instance when function by multiprocessing module is called on windows when I compiled it into an executable. Following the accepted answer I adjusted my script such that
from multiprocessing import freeze_support
# my functions
if __name__ == "__main__":
freeze_support()
# my script
And this again works perfectly when run as a script. However, when I compile and run it I encounter:
Where I've underlined in green part of the error. This specific line refers to
freeze_support()
in my script. Furthermore, it is not actually encountered on this line, but when my script goes to multiprocess which is something like:
p = multiprocessing.Process(target=my_function, args=[my_list])
p.start()
p1 = multiprocessing.Process(target=my_function, args=[my_list])
p1.start()
p.join()
p1.join()
Is this an error in the multiprocessing module (specifically line 148) or am I misunderstanding the answer I linked, or something else?
I'll also note that the script does work correctly when compiled, but you have to click "OK" on an error message for every multiprocess that is spawned (quite a lot) and every error message is exactly the same. Would this mean i am improperly ending the process with the p.join()?
I've also tried the solution at Python 3.4 multiprocessing does not work with py2exe which recommends adding
multiprocessing.set_executable(os.path.join(sys.exec_prefix, 'pythonw.exe'))
to your script, yet this causes an error in the script form (not even compiled yet) of:
FileNotFoundError: [WinError 2] The system cannot find the file specified
Thanks for the help!
freeze_support documentation: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.freeze_support
This appears to have been a problem for quite some time - I found references going back to 2014, at least. Since it appears to be harmless, the general recommendation is to suppress the error by replacing sys.stdout (and sys.stderr, which is flushed on the next line) with a dummy. Try this:
import os
import sys
from multiprocessing import freeze_support
if __name__ == '__main__':
if sys.stdout is None:
sys.stdout = sys.stderr = open(os.devnull, 'w')
freeze_support()
This is not an issue of the multiprocessing library or py2exe per se but a side effect of the way you run the application. The py2exe documentation contains some discussion on this topic:
A program running under Windows can be of two types: a console
program or a windows program. A console program is one that runs in
the command prompt window (cmd). Console programs interact with users
using three standard channels: standard input, standard output and
standard error […].
As opposed to a console application, a windows application interacts
with the user using a complex event-driven user interface and
therefore has no need for the standard channels whose use in such
applications usually results in a crash.
Py2exe will work around these issues automatically in some cases, but at least one of your processes has no attached standard output: sys.stdout is None), which means that sys.stdout.flush() is None.flush(), which yields the error you are getting. The documentation linked above has an easy fix that redirects all outputs to files.
import sys
sys.stdout = open(“my_stdout.log”, “w”)
sys.stderr = open(“my_stderr.log”, “w”)
Simply add those lines at the entry point of your processes. There is also a relevant documentation page on the interactions between Py2Exe and subprocesses.

Calling a python script with args from another python script

I am still a newbie to python, so apologies in advance. I have related topics on this but didn't find the best solution. (Run a python script from another python script, passing in args)
Basically, I have a python script (scriptB.py) that takes in a config file as argument and does some stuff. I need to call this script from another python script (scriptA.py).
If I had no arguments to pass, I could have just done
import scriptB.py
However, things got little complicated because we need to pass the config file (mycnofig.yml) as argument.
One of the suggestions was to use;
os.system(python scriptB.py myconfig.yml)
But, it is often reported as not a recommended approach and that it often does not work.
Another suggestion was to use:
import subprocess
subprocess.Popen("scriptB.py myconfig.yaml", shell=True)
I am not very sure if this is a common practice.
Just want to point out that both scripts don't have any main inside the script.
Please advise on the best way to handle this.
Thanks,
this should work just fine
subprocess.Popen(['python', '/full_path/scriptB.py', 'myconfig.yaml'], stdout=PIPE, stderr=PIPE)
See https://docs.python.org/3/library/subprocess.html#replacing-os-popen-os-popen2-os-popen3
If you really need to run a separate process, using the multiprocessing library is probably best. I would make an actual function inside scriptB.py that does the work. In the below example I consider config_handler to be a function inside scriptB.py that actually takes the config file path argument.
1.) create a function that will handle the calling of your external python script, also, import your script and the method inside it that takes arguments
scriptA.py: importing config_handler from scriptB
import multiprocessing
from scriptB import config_handler
def other_process(*args):
p = multiprocessing.Process(*args)
p.start()
2.) Then just call the process and feed your arguments to it:
scriptA.py: calling scriptB.py function, config_handler
other_process(name="config_process_name", target=config_handler, args=("myconfig.yml",))
Opinion:
From the information you have provided, i imagine you could manage to do this without separate processes. Just do things all in sequence and make scriptB.py a library with a function you use in scriptA.py.
It seems you got all your answers in the old thread, but if you really want to run it through os, not through python, this is what I do:
from subprocess import run, PIPE, DEVNULL
your_command = './scriptB.py myconfig.yaml'
run(your_command.split(), stdout=PIPE, stderr=DEVNULL)
In case you need the output:
output = run(your_command.split(), stdout=PIPE, stderr=DEVNULL).stdout.decode('utf-8')
If the scriptB has the shebang header telling the bash its a python script, it should run it correctly.
Path can be both relative and absolute.
It is for Python 3.x

How to create a child process using multiprocessing in Python2.7.10 without the child sharing resources with parent?

We are trying to move our python 2.7.10 codebase from Windows to Linux. We recently discovered that multiprocessing library in Python 2.7 behaves differently on Windows vs Linux. We have found many articles like this one describing the problem however, we are unable to find a solution online for Python 2.7. This is a fix for this issue in Python 3.4 however, we are unable to upgrade to Python 3.4. Is there any way to use multiprocessing in Python 2.7 on Linux without the child and parent sharing memory? We can also use guidance on modifying forking.py code in python 2.7 to ensure child and parent process aren't sharing memory and doing Copy-on-Write. Thanks!
A possible solution is to use loky, a library which provides an implementation of Process with fork-exec in python2.7. The fork-exec start method behaves similarly to spawn, with a fresh interpreter in the newly spawned process. The library is mainly designed to provide a concurrent.futures API but you can use mp = loky.backend.get_context() to get the same API as multiprocessing.
from loky.backend import get_context
import multiprocessing as mp
def child_without_os():
print("Hello from {}".format(os.getpid()))
def child_with_os():
import os
print("Hello from {}".format(os.getpid()))
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser('Test loky backend')
parser.add_argument('--use-fork', action="store_true",
help="Use start_method='fork' instead of 'loky'")
parser.add_argument('--with-os', action="store_true",
help='Import os module in the child interpreter')
args = parser.parse_args()
# Only import os in the main module, this should fail if the interpreter is
# not shared
import os
print("Main is {}".format(os.getpid()))
if args.use_fork:
ctx = mp
print("Using fork context")
else:
ctx = get_context('loky_init_main')
print("Using loky context")
if args.with_os:
target = child_with_os
else:
target = child_without_os
p = ctx.Process(target=target)
p.start()
p.join()
This gives
# Use the default context, the child process has a copy-on-write interpreter
# state and can use the os module.
$ python2 test.py --use-fork
Main is 14630
Using fork context
Hello from 14633
# Use the loky context, the child process has a fresh interpreter
# state and need to import the os module.
$ python2 test.py
Main is 14661
Using loky context
Process LokyInitMainProcess-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/tom/Work/prog/loky/test.py", line 6, in child_without_os
print("Hello from {}".format(os.getpid()))
NameError: global name 'os' is not defined
# Now using the correct child function which import the os module
$ python2 test.py --with-os
Main is 14700
Using loky context
Hello from 14705
(DISCLAIMER: I am one of the maintainers of loky).
As you're no doubt aware, the patches in the CPython bug tracker don't apply cleanly to Python 2.7's version of multiprocessing, and the patches include some extra features for semaphore.c so that semaphores are cleaned up properly afterwards.
I think your best bet would be to backport the multiprocessing module from Python 3. Copy the Python code over, rename it to just processing, discover the missing C features and work around them (e.g. clean up your own semaphores or don't use them). Although the library is big it may be straightforward to port only the features that you use. If you are able to publish the backport I'm sure many people would be interested in that project.
Depending on how heavily you rely on multiprocessing, a different option would be to just run more Pythons by running sys.executable with the subprocess module.

How should a Python file be written such that it can be both a module and a script with command line options and pipe capabilities?

I'm considering how a Python file could be made to be an importable module as well as a script that is capable of accepting command line options and arguments as well as pipe data. How should this be done?
My attempt seems to work, but I want to know if my approach is how such a thing should be done (if such a thing should be done). Could there be complexities (such as when importing it) that I have not considered?
#!/usr/bin/env python
"""
usage:
program [options]
options:
--version display version and exit
--datamode engage data mode
--data=FILENAME input data file [default: data.txt]
"""
import docopt
import sys
def main(options):
print("main")
datamode = options["--datamode"]
filename_input_data = options["--data"]
if datamode:
print("engage data mode")
process_data(filename_input_data)
if not sys.stdin.isatty():
print("accepting pipe data")
input_stream = sys.stdin
input_stream_list = [line for line in input_stream]
print("input stream: {data}".format(data = input_stream_list))
def process_data(filename):
print("process data of file {filename}".format(filename = filename))
if __name__ == "__main__":
options = docopt.docopt(__doc__)
if options["--version"]:
print(version)
exit()
main(options)
That's it, you're good.
Nothing matters[1] except the if __name__ == '__main__', as noted elsewhere
From the docs (emphasis mine):
A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt. A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported
I also like how python 2's docs poetically phrase it
It is this environment in which the idiomatic “conditional script” stanza causes a script to run:
That guard guarantees that the code underneath it will only be accepted if it is the main function being called; put all your argument-grabbing code there. If there is no other top-level code except class/function declarations, it will be safe to import.
Other complications?
Yes:
Multiprocessing (a new interpreter is started and things are re-imported). if __name__ == '__main__' covers that
If you're used to C coding, you might be thinking you can protect your imports with ifdef's and the like. There's some analogous hacks in python, but it's not what you're looking for.
I like having a main method like C and Java - when's that coming out? Never.
But I'm paranoid! What if someone changes my main function. Stop being friends with that person. As long as you're the user, I assume this isn't an issue.
I mentioned the -m flag. That sounds great, what's that?! Here and here, but don't worry about it.
Footnotes:
[1] Well, the fact that you put your main code in a function is nice. Means things will run slightly faster

Capturing log information from bunch of python functions in single file

I have 3 scripts. one is starttest.py which kicks the execution of methods called in test.py. Methods are defined in module.py.
There are many print statements in each of file and I want to capture each print statement in my log file from Starttest.py file itself. I tried using sys.stdout in starttest.py file but this function only takes print statements from starttest.py file. It does not have any control on test.py and module.py file print statements.
Any suggestions to capture the print statements from all of the files in a single place only?
Before importing anything from test.py or module.py, replace tye sys.stdout file object with one of your liking:
import sys
sys.stdout = open("test-output.txt", "wt")
# Import the rest
If you're running on a Unix-like operating system, there is a safer method that does not need to replace the file object reference. Specially since overwriting sys.stdout does not guarantee that previous object is destroyed:
import os
import sys
fd = os.open("test-output.txt", os.O_WRONLY | os.O_CREAT, 0644)
os.dup2(fd, sys.stdout.fileno())
os.close(fd)
Note that the above trick is used by almost all daemonization implementations for Python.
Even though not directly related, remember also that you can use your shell to redirect command output to a file (works on Windows too):
python starttest.py >test-output.txt
Maybe look at the logging module that comes with python

Categories