I am writing a program that uses multiple worker processes (a pre-forking model) with the following code.
from multiprocessing import Process
for i in range(0,3):
Process(target=worker, args=(i,)).start()
I use Windows. I notice that they are run as separate processes when I wanted them to start as subprocesses instead. How do I make them subprocesses of the main process?
I am hesitant to use the subprocess module as it seems suited to run external processes (as far as I have used it).
An update: It seems Windows does not launch new processes as sub-processes. Python doesnt support getppid() (get parent's PID) in Windows.
What do you wall subprocess ? To me they are subprocess of your main process. Here my example and returned output.
import time, os
from multiprocessing import Process
def worker():
print "I'm process %s, my father is %s" % (os.getpid(), os.getppid())
print "I'm the main process %s" % os.getpid()
for i in range(0,3):
Process(target=worker).start()
The output is :
I'm the main process 5897
I'm process 5898, my father is 5897
I'm process 5899, my father is 5897
I'm process 5900, my father is 5897
You have 3 subprocesses attached to a main process...
You seem to be confusing terminology here. A subprocess is a separate process. The processes that are created will be children of the main process of your program, and in that sense are subprocesses. If you want threads, then use multithreading instead of multiprocessing, but note that Python won't use multiple cores/CPUs for multiple threads.
I am hesitant to use the subprocess module as it seems suited to run external processes
I'm sorry, I don't understand this remark.
Short answer: http://docs.python.org/library/threading.html
Longer: I don't understand the question, aitchnyu. In the typical Unix model, the only processes a process can start are subprocesses. I have a strong feeling that there's a vocabulary conflict between the two of us I don't know how to unravel. You seem to have something like an "internal process" in mind; what's an example of that, in any language or operating system?
I can attest that Python's subprocess module is widely used.
You write "... multiple working threads ..." Have you read the documentation to which I refer in the first line at the top of this response?
Related
I am using a ProcessPool from Pebble library to launch a subprocess that is prone to crashing. I'd like to log the process-id of the subprocess that crashed but from the main process & not the child process(reason for this is I have a log line in the main process with a bunch of relevant information related to one request where I want to include this instead of being scattered across multiple log lines). Is there some way to access this process-id? I can't seem to find this information in the documentation.
I guess as a workaround I can get the pid in the subprocess before doing anything using os.getpid() and use IPC to communicate it back to the parent process. But I'd like to avoid this if possible.
The ProcessPool is designed to abstract its inner workings from the User. Therefore, it hides the access to the processes used to execute the workers.
If you need this information just for debugging purposes, my suggestion would be to mark your jobs with unique identifiers and then log from the worker processes both these identifiers and the worker PID. In this way you can correlate what is the job which is causing your function to crash.
def function(jobid, *args):
logging.debug("Job ID %d started on worker %d", jobid, os.getpid())
...
pool.schedule(function, (jobid, arg1, arg2))
In one of my Django views, I am calling a python script and getting its pid with:
from subprocess import Popen
p = Popen(['python', 'script.py'])
mypid = p.pid
When trying to find out if the process still is running from another page, I use the following function on mypid (thanks to this question):
def doesProcessExist(pid):
if pid < 0:
return False
try:
os.kill(pid, 0)
except OSError, e:
return e.errno == errno.EPERM
else:
return True
No matter how long I wait, the process still shows up as running. The only thing that stops it, is if I spawn a new python script process with Popen. Is there anyway I can fix this? I am not sure if this is caused by Django not closing python properly after the script is finished or something else. In Ubuntu's process status manager, the process shows up as [python] <defunct>.
--
The problem is true for all script.py I have tried. I am currently using one as simple as:
from time import sleep
sleep(5)
Really, what you're doing is wrong. When you use a high-level wrapper like a subprocess.Popen, you need to manage the process through that object. Just having the PID elsewhere isn't enough to manage it.
If you insist on dealing in PIDs instead of Popen objects, then you should use the low-level APIs in os.
Fortunately, you're not doing anything complicated, like creating pipes to talk to the child process. So, you can just launch it with your favorite spawn variant, then wait for it with waitpid or one of its variants.
I'm assuming you're doing this all in a single-process web server. If you're using a forking web server, where the other page could be in a different process, even using PIDs won't work. The parent process has to reap the child, not some other arbitrary process. If you want to make that work, you'll have to make things more complicated, and you're really going to have to learn about the Unix process model before anyone can explain it to you.
What you see is a zombie process. It doesn't keep running. It can't. It is dead. The only thing that is left is some info that allows for related processes to retrieve its status.
To find out whether a subprocess is alive without blocking, call p.poll(). If it returns None then the process is still alive, otherwise you can safely forget about it (it is already reaped by .poll()).
subprocess module calls _cleanup() function that reaps zombie processes inside Popen() constructor. So normally your script won't create many zombie processes anyway.
To see a list of zombie processes:
import os
#NOTE: don't use Popen() here
print os.popen(r"ps aux | grep Z | grep -v grep").read(),
Processes in Unix stick around until the parent waits for them. calling wait on the object returned by thepopen will wait for the process to be done and will wait for it so it goes away. Until you do that it will exist as a zombie process See this message for info on getting the process to go away in the background while your web server runs without waiting for it in a foreground thread/view.
So, let's say that you do
p = subprocess.Popen(...)
At some point you need to call
p.wait()
I had some annoyances with spawning subprocesses, like getting correct output and so on. A wrapper library, envoy, solved all of my problems with an easy-to-use interface that gets rid of most problems.
Using thread, I sometimes struggle with hanging processes that does not end, external programs launched within threads that I can't reach anymore and so on.
Is there any "threading for dummies" python library out there? Thanks
Is there any "threading for dummies" python library out there?
No, there is not. threading is pretty simple to use in simple cases. You want to use it to introduce concurrency in your program. This means you want to use it whenever you want two or more actions to happen simultaneously, i.e. at the same time.
This is how you can let Peter build a house and let Igor drive to Moskow at the same time:
from threading import Thread
import time
def drive_bus():
time.sleep(1)
print "Igor: I'm Igor and I'm driving to... Moskow!"
time.sleep(9)
print "Igor: Yei, Moskow!"
def build_house():
print "Peter: Let's start building a large house..."
time.sleep(10.1)
print "Peter: Urks, we have no tools :-("
threads = [Thread(target=drive_bus), Thread(target=build_house)]
for t in threads:
t.start()
for t in threads:
t.join()
Isn't that simple? Define your function to be run in another thread. Create a threading.Thread instance with that function as target. Nothing happend so far, until you invoke start. It fires off the thread and immediately returns.
Before letting your main thread exit, you should wait for all the threads you have spawned to finish. This is what t.join() does: it blocks and waits for the thread t to finish. Only then it returns.
I would recommend reading more about the actual Python library - it is simple enough. Your problem with hanging threads, provided it prevents your application from exiting, may be solved by using daemon threads.
What kind of task are you trying to achieve? If you are trying to run a task in parallel without actual use of the custom threading, you may find the package multiprocessing useful. Furthermore, there is an interesting piece of information on the python wiki about parallel processing.
Could you elaborate a bit more on the task please?
I want to start, from Python, some other Python code, preferably a function, but in another process.
It is mandatory to run this in another process, because I want to run some concurrency tests, like opening a file that was opened exclusively by the parent process (this has to fail).
Requirements:
multiplatform: linux, osx, windows
compatible with Python 2.6-3.x
I would seriously take a look at the documentation for multiprocessing library of Python. From the first sentence of the package's description:
multiprocessing is a package that supports spawning processes using an API similar to the threading module.
It then goes on to say that it side-steps the GIL, which is what it sounds like you're trying to avoid. See their example of a trivial set up:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
That's a function call being done in another process separate from the process you're inside. Again, all this from the documentation.
I have been searching for a way to start and terminate a long-running "batch jobs" in python. Right now I'm using "os.system()" to launch a long-running batch job inside each child process. As you might have guessed, "os.system()" spawns a new process inside that child process (grandchild process?), so I cannot kill the batch job from the grand-parent process. To provide some visualization of what I have just described:
Main (grandparent) process, with PID = AAAA
|
|------> child process with PID = BBBB
|
|------> os.system("some long-running batch file)
[grandchild process, with PID = CCCC]
So, my problem is I cannot kill the grandchild process from the grandparent...
My question is, is there a way to start a long-running batch job inside a child process, and being able to kill that batch job by just terminating the child process?
What are the alternatives to os.system() that I can use so that I can kill the batch-job from the main process ?
Thanks !!
subprocess module is the proper way to spawn and control processes in Python.
from the docs:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
os.systemos.spawnos.popenpopen2commands
so... if you are on Python 2.4+, subprocess is the replacement for os.system
for stopping processes, check out the terminate() and communicate() methods of Popen objects.
If you are on a Posix-compatible system (e.g., Linux or OS X) and no Python code has to be run after the child process, use os.execv. In general, avoid os.system and use the subprocess module instead.
If you want control over start and stop of child processes you have to use threading. In that case, look no further than Python's threading module.