I have been searching for a way to start and terminate a long-running "batch jobs" in python. Right now I'm using "os.system()" to launch a long-running batch job inside each child process. As you might have guessed, "os.system()" spawns a new process inside that child process (grandchild process?), so I cannot kill the batch job from the grand-parent process. To provide some visualization of what I have just described:
Main (grandparent) process, with PID = AAAA
|
|------> child process with PID = BBBB
|
|------> os.system("some long-running batch file)
[grandchild process, with PID = CCCC]
So, my problem is I cannot kill the grandchild process from the grandparent...
My question is, is there a way to start a long-running batch job inside a child process, and being able to kill that batch job by just terminating the child process?
What are the alternatives to os.system() that I can use so that I can kill the batch-job from the main process ?
Thanks !!
subprocess module is the proper way to spawn and control processes in Python.
from the docs:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
os.systemos.spawnos.popenpopen2commands
so... if you are on Python 2.4+, subprocess is the replacement for os.system
for stopping processes, check out the terminate() and communicate() methods of Popen objects.
If you are on a Posix-compatible system (e.g., Linux or OS X) and no Python code has to be run after the child process, use os.execv. In general, avoid os.system and use the subprocess module instead.
If you want control over start and stop of child processes you have to use threading. In that case, look no further than Python's threading module.
Related
I used multiprocessing.Pool to imporove the performance of my Python server, when a task failed, I want to terminate it's child processes immediately.
I found that, if I create a process using Process, the terminate method can meets my needs, but if I create a process with Pool.apply_async, the return type is 'ApplyResult', and it can't terminate the corresponding process.
Is there any other way to do it?
I am new to threading and processes. I have been trying to understand asyncio. Researching asyncio on Doug Hellinger's Python Module of the Week section of Concurrency, I ran into the multiprocessing, threading, signal and subprocess modules.
I have been wondering why the name subprocess module was named thus. Why is the module not called process. And what is 'sub' [meaning below] about it?
Edit: Forgotten addition
There's a Popen class and I assume the 'P' stands for process.
The Github code comment says:
Popen(...): A class for flexibly executing a command in a new process
Doesn't the existence of the Popen class, give more reason to call the module process instead of subprocess?
Processes in most operating systems form a parent-child relationship. Processes created by another process are called child processes or subprocesses of that process:
A child process in computing is a process created by another
process (the parent process). This technique pertains to multitasking
operating systems, and is sometimes called a subprocess or
traditionally a subtask.
Python subprocess module provides facilities to create new child processes (i.e. every process created with this module will be subprocess of your Python program):
The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes.
It does not deal with arbitrary processes, so it makes sense to name it subprocess instead of just process.
subprocess provides an API for creating and communicating with secondary processes.
The "sub" in the module name refers to the fact that all processes you are going to start here will be child processes of your running Python process. They exist to support your Python code.
I'm wondering if this is the correct way to execute a system process and detach from parent, though allowing the parent to exit without creating a zombie and/or killing the child process. I'm currently using the subprocess module and doing this...
os.setsid()
os.umask(0)
p = subprocess.Popen(['nc', '-l', '8888'],
cwd=self.home,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
os.setsid() changes the process group, which I believe is what lets the process continue running when it's parent exits, as it no longer belongs to the same process group.
Is this correct and also is this a reliable way of performing this?
Basically, I have a remote control utility that communicate through sockets and allows to start processes remotely, but I have to ensure that if the remote control dies, the processes it started continue running unaffected.
I was reading about double-forks and not sure if this is necessary and/or subprocess.POpen close_fds somehow takes care of that and all that's needed is to change the process group?
Thanks.
Ilya
For Python 3.8.x, the process is a bit different. Use the start_new_session parameter available since Python 3.2:
import shlex
import subprocess
cmd = "<full filepath plus arguments of child process>"
cmds = shlex.split(cmd)
p = subprocess.Popen(cmds, start_new_session=True)
This will allow the parent process to exit while the child process continues to run. Not sure about zombies.
The start_new_session parameter is supported on all POSIX systems, i.e. Linux, MacOS, etc.
Tested on Python 3.8.1 on macOS 10.15.5
popen on Unix is done using fork. That means you'll be safe with:
you run Popen in your parent process
immediately exit the parent process
When the parent process exits, the child process is inherited by the init process (launchd on OSX) and will still run in the background.
The first two lines of your python program are not needed, this perfectly works:
import subprocess
p = subprocess.Popen(['nc', '-l', '8888'],
cwd="/",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
I was reading about double-forks and not sure if this is necessary
This would be needed if your parent process keeps running and you need to protect your children from dying with the parent. This answer shows how this can be done.
How the double-fork works:
create a child via os.fork()
in this child call Popen() which launches the long running process
exit child: Popen process is inherited by init and runs in the background
Why the parent has to immediately exit? What happens if it doesn't exit immediately?
If you leave the parent running and the user stops the process e.g. via ctrl-C (SIGINT) or ctrl-\ (SIGQUIT) then it would kill both the parent process and the Popen process.
What if it exits one second after forking?
Then, during this 1s period your Popen process is vulnerable to ctrl-c etc. If you need to be 100% sure then use the double forking.
I am writing a program that uses multiple worker processes (a pre-forking model) with the following code.
from multiprocessing import Process
for i in range(0,3):
Process(target=worker, args=(i,)).start()
I use Windows. I notice that they are run as separate processes when I wanted them to start as subprocesses instead. How do I make them subprocesses of the main process?
I am hesitant to use the subprocess module as it seems suited to run external processes (as far as I have used it).
An update: It seems Windows does not launch new processes as sub-processes. Python doesnt support getppid() (get parent's PID) in Windows.
What do you wall subprocess ? To me they are subprocess of your main process. Here my example and returned output.
import time, os
from multiprocessing import Process
def worker():
print "I'm process %s, my father is %s" % (os.getpid(), os.getppid())
print "I'm the main process %s" % os.getpid()
for i in range(0,3):
Process(target=worker).start()
The output is :
I'm the main process 5897
I'm process 5898, my father is 5897
I'm process 5899, my father is 5897
I'm process 5900, my father is 5897
You have 3 subprocesses attached to a main process...
You seem to be confusing terminology here. A subprocess is a separate process. The processes that are created will be children of the main process of your program, and in that sense are subprocesses. If you want threads, then use multithreading instead of multiprocessing, but note that Python won't use multiple cores/CPUs for multiple threads.
I am hesitant to use the subprocess module as it seems suited to run external processes
I'm sorry, I don't understand this remark.
Short answer: http://docs.python.org/library/threading.html
Longer: I don't understand the question, aitchnyu. In the typical Unix model, the only processes a process can start are subprocesses. I have a strong feeling that there's a vocabulary conflict between the two of us I don't know how to unravel. You seem to have something like an "internal process" in mind; what's an example of that, in any language or operating system?
I can attest that Python's subprocess module is widely used.
You write "... multiple working threads ..." Have you read the documentation to which I refer in the first line at the top of this response?
I'm wondering if this is the correct way to execute a system process and detach from parent, though allowing the parent to exit without creating a zombie and/or killing the child process. I'm currently using the subprocess module and doing this...
os.setsid()
os.umask(0)
p = subprocess.Popen(['nc', '-l', '8888'],
cwd=self.home,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
os.setsid() changes the process group, which I believe is what lets the process continue running when it's parent exits, as it no longer belongs to the same process group.
Is this correct and also is this a reliable way of performing this?
Basically, I have a remote control utility that communicate through sockets and allows to start processes remotely, but I have to ensure that if the remote control dies, the processes it started continue running unaffected.
I was reading about double-forks and not sure if this is necessary and/or subprocess.POpen close_fds somehow takes care of that and all that's needed is to change the process group?
Thanks.
Ilya
For Python 3.8.x, the process is a bit different. Use the start_new_session parameter available since Python 3.2:
import shlex
import subprocess
cmd = "<full filepath plus arguments of child process>"
cmds = shlex.split(cmd)
p = subprocess.Popen(cmds, start_new_session=True)
This will allow the parent process to exit while the child process continues to run. Not sure about zombies.
The start_new_session parameter is supported on all POSIX systems, i.e. Linux, MacOS, etc.
Tested on Python 3.8.1 on macOS 10.15.5
popen on Unix is done using fork. That means you'll be safe with:
you run Popen in your parent process
immediately exit the parent process
When the parent process exits, the child process is inherited by the init process (launchd on OSX) and will still run in the background.
The first two lines of your python program are not needed, this perfectly works:
import subprocess
p = subprocess.Popen(['nc', '-l', '8888'],
cwd="/",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
I was reading about double-forks and not sure if this is necessary
This would be needed if your parent process keeps running and you need to protect your children from dying with the parent. This answer shows how this can be done.
How the double-fork works:
create a child via os.fork()
in this child call Popen() which launches the long running process
exit child: Popen process is inherited by init and runs in the background
Why the parent has to immediately exit? What happens if it doesn't exit immediately?
If you leave the parent running and the user stops the process e.g. via ctrl-C (SIGINT) or ctrl-\ (SIGQUIT) then it would kill both the parent process and the Popen process.
What if it exits one second after forking?
Then, during this 1s period your Popen process is vulnerable to ctrl-c etc. If you need to be 100% sure then use the double forking.