I am new to threading and processes. I have been trying to understand asyncio. Researching asyncio on Doug Hellinger's Python Module of the Week section of Concurrency, I ran into the multiprocessing, threading, signal and subprocess modules.
I have been wondering why the name subprocess module was named thus. Why is the module not called process. And what is 'sub' [meaning below] about it?
Edit: Forgotten addition
There's a Popen class and I assume the 'P' stands for process.
The Github code comment says:
Popen(...): A class for flexibly executing a command in a new process
Doesn't the existence of the Popen class, give more reason to call the module process instead of subprocess?
Processes in most operating systems form a parent-child relationship. Processes created by another process are called child processes or subprocesses of that process:
A child process in computing is a process created by another
process (the parent process). This technique pertains to multitasking
operating systems, and is sometimes called a subprocess or
traditionally a subtask.
Python subprocess module provides facilities to create new child processes (i.e. every process created with this module will be subprocess of your Python program):
The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes.
It does not deal with arbitrary processes, so it makes sense to name it subprocess instead of just process.
subprocess provides an API for creating and communicating with secondary processes.
The "sub" in the module name refers to the fact that all processes you are going to start here will be child processes of your running Python process. They exist to support your Python code.
Related
I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.
Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.
It seems like subprocess.Popen() and os.fork() both are able to create a child process. I would however like to know what the difference is between both. When would you use which one? I tried looking at their source code but I couldn't find fork()'s source code on my machine and it wasn't totally clear how Popen works on Unix machines.
Could someobody please elaborate?
Thanks
subprocess.Popen let's you execute an arbitrary program/command/executable/whatever in its own process.
os.fork only allows you to create a child process that will execute the same script from the exact line in which you called it. As its name suggests, it "simply" forks the current process into 2.
os.fork is only available on Unix, and subprocess.Popen is cross-platfrom.
So I read the documentation for you. Results:
os.fork only exists on Unix. It creates a child process (by cloning the existing process), but that's all it does. When it returns, you have two (mostly) identical processes, both running the same code, both returning from os.fork (but the new process gets 0 from os.fork while the parent process gets the PID of the child process).
subprocess.Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os.fork (to clone the parent process), then os.execvp (to load the program into the new child process). Because Popen is all about executing a program, it lets you customize the initial environment of the program. You can redirect its standard handles, specify command line arguments, override environment variables, set its working directory, etc. None of this applies to os.fork.
In general, subprocess.Popen is more convenient to use. If you use os.fork, there's a lot you need to handle manually, and it'll only work on Unix systems. On the other hand, if you actually want to clone a process and not execute a new program, os.fork is the way to go.
Subprocess.popen() spawns a new OS level process.
os.fork() creates another process which will resume at exactly the same place as this one. So within the first loop run, you get a fork after which you have two processes, the "original one" (which gets a pid value of the PID of the child process) and the forked one (which gets a pid value of 0).
I have a server-client architecture. The server is a QtApplication and contains a QThread.
I am trying to open a new client process using python's built in multiprocessing, from the QThread, and from that new proccess, open a new QtApplication . Such that the server and the client are both running QtApplications
The problem is that I am getting the following error:
WARNING: QApplication was not created in the main() thread.
The QApplication is being created in the main thread of the new processes, so I am not sure why this error is occurring.
A Qoute from Kovid Goyal:
Don't use multiprocessing. multiprocessing is not thread safe, on unix
it uses fork() without exec() which means that it inherits
everything from the parent process including locks (which are in an invalid state in the child process), file handles, global objects
like QApplication and so on. Just as an illustration of the problems
multiprocessing can cause, if you use it with the standard library
logging module you can have your worker processes crash, since the
logging module uses locks.
After all, what do you expect to happen to QApplication on fork()?
There's no way for fork() to have the QApplication object magically
re-initialize itself.
Using multiprocessing will bite you in the rear on any project of
moderate complexity. Heck it bit me on the rear even while
implementing a muti-core replacment for grep. Instead use subprocess
to launch a worker process, feed it a module name, functions and
arguments on stdin using cPickle or json and then have it run the
task.
See: http://python.6.x6.nabble.com/multiprocessing-with-QApplication-td4977972.html
I am writing a program that uses multiple worker processes (a pre-forking model) with the following code.
from multiprocessing import Process
for i in range(0,3):
Process(target=worker, args=(i,)).start()
I use Windows. I notice that they are run as separate processes when I wanted them to start as subprocesses instead. How do I make them subprocesses of the main process?
I am hesitant to use the subprocess module as it seems suited to run external processes (as far as I have used it).
An update: It seems Windows does not launch new processes as sub-processes. Python doesnt support getppid() (get parent's PID) in Windows.
What do you wall subprocess ? To me they are subprocess of your main process. Here my example and returned output.
import time, os
from multiprocessing import Process
def worker():
print "I'm process %s, my father is %s" % (os.getpid(), os.getppid())
print "I'm the main process %s" % os.getpid()
for i in range(0,3):
Process(target=worker).start()
The output is :
I'm the main process 5897
I'm process 5898, my father is 5897
I'm process 5899, my father is 5897
I'm process 5900, my father is 5897
You have 3 subprocesses attached to a main process...
You seem to be confusing terminology here. A subprocess is a separate process. The processes that are created will be children of the main process of your program, and in that sense are subprocesses. If you want threads, then use multithreading instead of multiprocessing, but note that Python won't use multiple cores/CPUs for multiple threads.
I am hesitant to use the subprocess module as it seems suited to run external processes
I'm sorry, I don't understand this remark.
Short answer: http://docs.python.org/library/threading.html
Longer: I don't understand the question, aitchnyu. In the typical Unix model, the only processes a process can start are subprocesses. I have a strong feeling that there's a vocabulary conflict between the two of us I don't know how to unravel. You seem to have something like an "internal process" in mind; what's an example of that, in any language or operating system?
I can attest that Python's subprocess module is widely used.
You write "... multiple working threads ..." Have you read the documentation to which I refer in the first line at the top of this response?
I have been searching for a way to start and terminate a long-running "batch jobs" in python. Right now I'm using "os.system()" to launch a long-running batch job inside each child process. As you might have guessed, "os.system()" spawns a new process inside that child process (grandchild process?), so I cannot kill the batch job from the grand-parent process. To provide some visualization of what I have just described:
Main (grandparent) process, with PID = AAAA
|
|------> child process with PID = BBBB
|
|------> os.system("some long-running batch file)
[grandchild process, with PID = CCCC]
So, my problem is I cannot kill the grandchild process from the grandparent...
My question is, is there a way to start a long-running batch job inside a child process, and being able to kill that batch job by just terminating the child process?
What are the alternatives to os.system() that I can use so that I can kill the batch-job from the main process ?
Thanks !!
subprocess module is the proper way to spawn and control processes in Python.
from the docs:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
os.systemos.spawnos.popenpopen2commands
so... if you are on Python 2.4+, subprocess is the replacement for os.system
for stopping processes, check out the terminate() and communicate() methods of Popen objects.
If you are on a Posix-compatible system (e.g., Linux or OS X) and no Python code has to be run after the child process, use os.execv. In general, avoid os.system and use the subprocess module instead.
If you want control over start and stop of child processes you have to use threading. In that case, look no further than Python's threading module.