What are the Windows equivalents to the resource limit mechanisms exposed on Unix systems by Python's resource module, and POSIX setrlimit?
Specifically, I'm limiting processor time for a child process to several seconds. If it hasn't completed within the constraint, it's terminated.
AFAIK, there is no portable way of getting information about the amount of processor time used by a child process in Python. But what subprocess module does (assuming you're starting the child with subprocess.Popen, which is recommended) give you is the process ID of the child process in Popen.pid. What you could do on Windows is run tasklist (see manual) using subprocess.check_output repeatedly and extract the info about the child proces from its output, using the PID as a filter.
As soon as the child process has has enough CPU time and if you used subprocess.Popen() to start the child process, you could use the Popen.kill method to kill it.
But I think it would be easier to kill the child process after after a specified number of seconds of wall time using a timer. Because if the child process hangs without using CPU time (for whatever reason), so does your python program that is waiting for it to consume CPU time.
Related
I wrote a data analysis program with python's multiprocessing library for parallelism. As I don't need to control the subprocess in detail, I used the multiprocessing.Pool for simplicity.
However, when running the program, I find all the sub-processes fall into status S(SLEEPING) after a short period of active(Running) state.
I investigated the wchan of the processes. The parent process and all but one sub-processes are waiting for _futex, the other one is waiting for pipe_wait.
Some information about my program:
I used multiprocessing.Pool#map to distribute the tasks.
The sub-process task contains disk IO and high memory usage. During the course of the program, the sub-process memory cost may exceed the memory capacity (32 sub-processes each takes at most 5% memory). The disk space is ample.
The arguments and return values of the mapped function are not very large in size (just the filenames of the file to be processed, to be specific).
I didn't explicitly create any pipe in my code.
This is the code skeleton of my program.
# imports emitted
def subprocess_task(filename):
read_the_file(filename) # Large disk IO
process_the_data() # High memory cost
write_the_file(new_filename) # Large disk IO
return newfile_name
if __name__=="__main__":
files=["","",...] # The filename of files to process, len(files)=32.
p=multiprocessing.Pool(32) # There are more than 32 cores on the computer.
res=p.map(subprocess_task,files)
p.close()
# Do something with res.
So I want to know why the processes stuck in such a state(especially the pipe_waiting one)? Does it have anything to do with the high memory usage, and how do I solve it?
Much thanks!
OK, after some efforts digging into pipe(7), multiprocessing source code and the log of my troublesome program, I finally identified the problem.
The sole child process which is pipe_wait seems suspicious, because of which I wasted hours trying to find the blocking pipe. However, the key problem has nothing to do with pipes.
The problem is solved when I put some print reporting the pid at some checkpoints in my program. The processes is not same when the tasks are submitted (which I will refer to as original processes) and when the program got stuck (referred as the stuck processes). One of the original 32 child processes is missing in the stuck processes, and the only stuck process which is pipe_wait is not present when the tasks are submitted.
So I can guess the reason now. And the multiprocessing source code corresponds with my guess.
As I said, the program costs lots of memory. At some point when the system is out of memory, the OOM killer kills one of the child processes, selected by some certain algorithm. The OOM killer is forcible and the process exited with all the finishing undone, which includes the communication with the multiprocessing.Pool.
As the source code indicates, the pool uses one thread to collect the task results, and another to manage the workers. The collector thread passively waits for the result to be sent by the child process, while the worker manager thread actively detects process exit by polling all processes.
Therefore, after the process is killed, the worker manager thread detects it, and repopulates the pool by spawning a new process. As no more task is submitted, the process is pipe_wait for some new task. That's the sole pipe_wait child process in my problem. Meanwhile, the result collector thread keeps waiting for the result from the killed thread, which will never arrive. So the other threads are also sleeping.
I have no root access to the environment, or this could be further verified by investigating OOM killer log.
I am trying to implement a job queuing system like torque PBS on a cluster.
One requirement would be to kill all the subprocesses even after the parent has exited. This is important because if someone's job doesn't wait its subprocesses to end, deliberately or unintentionally, the subprocesses become orphans and get adopted by process init, then it will be difficult to track down the subprocesses and kill them.
However, I figured out a trick to work around the problem, the magic trait is the cpu affinity of the subprocesses, because all subprocesses have the same cpu affinity with their parent. But this is not perfect, because the cpu affinity can be changed deliberately too.
I would like to know if there are anything else that are shared by parent process and its offspring, at the same time immutable
The process table in Linux (such as in nearly every other operating system) is simply a data structure in the RAM of a computer. It holds information about the processes that are currently handled by the OS.
This information includes general information about each process
process id
process owner
process priority
environment variables for each process
the parent process
pointers to the executable machine code of a process.
Credit goes to Marcus Gründler
Non of the information available will help you out.
But you can maybe use that fact that the process should stop, when the parent process id becomes 1(init).
#!/usr/local/bin/python
from time import sleep
import os
import sys
#os.getppid() returns parent pid
while (os.getppid() != 1):
sleep(1)
pass
# now that pid is 1, we exit the program.
sys.exit()
Would that be a solution to your problem?
There's 5 processes created and run by subprocess.Popen(someexternalCommand). It's important for the commands to be started sequentially, i.g. the second process must not start until the first has started and running.
The following code was used :
proc1 = TrafficGenUtils.Popen(someexternalCommand)
time.sleep(10)
proc2 = TrafficGenUtils.Popen(someexternalCommand)
time.sleep(10)
...
the order of execution was maintained when the number of processes was 3, but with 5 it's almost never maintained and increasing sleep to even 60 seconds doesn't help.
How to force the execution of Popen processes to be sequential.
Is the reason my code doesn't work linked to the fact that Popen relies on OS to create the processes but OS starts them whenever it pleases? I'm using Windows 8 and python 2.7
Popen starts a new process and therefore from that moment on it's up to the OS to schedule its run time. Not only can you not assume one process will run before another, but you can't even assume your parent process will yield execution to the child, at its actually likely not to.
In order to create sequenced execution you can use proc1.wait() to stop the parent process until the child process finishes its execution. This way you can guarantee that the processes will run in order, since the parent process will not go on until the child process ends and thus will not spawn the other children.
Read more in the Python Documentation.
I have to monitor a process continuously and I use the process ID to monitor the process. I wrote a program to send an email once the process had stopped so that I would manually reschedule it, but often I forget to reschedule the process ( basically another python program). I then came across the apscheduler module and used the cron style scheduling ( http://packages.python.org/APScheduler/cronschedule.html) to spawn a process once it has stopped. Now, I am able to spawn the process once PID of the process has been killed, but when I spawn it using the apscheduler I am not able to get the process id (PID) of the newly scheduled process; Hence, I am not able to monitor the process. Is there a function in apscheduler to get the process ID of the scheduled process?
Instead of relying on APSchedule to return the pid, why not have your program report the pid itself. It's quite common for daemons to have pidfiles, which are files at a known location that just contain the pid of the running process. Just wrap your main function in something like this:
import os
try:
with open("/tmp/myproc.pid") as pidfile:
pidfile.write(str(os.getpid()))
main()
finally:
os.remove("/tmp/myproc.pid")
Now whenever you want to monitor your process you can firstly check to see in the pid file exists, and if it does, retrieve the pid of the process for further monitoring. This has the benefit of being independent of a specific implementation of cron, and will make it easier in future if you want to write more programs that interact with the program locally.
I have been searching for a way to start and terminate a long-running "batch jobs" in python. Right now I'm using "os.system()" to launch a long-running batch job inside each child process. As you might have guessed, "os.system()" spawns a new process inside that child process (grandchild process?), so I cannot kill the batch job from the grand-parent process. To provide some visualization of what I have just described:
Main (grandparent) process, with PID = AAAA
|
|------> child process with PID = BBBB
|
|------> os.system("some long-running batch file)
[grandchild process, with PID = CCCC]
So, my problem is I cannot kill the grandchild process from the grandparent...
My question is, is there a way to start a long-running batch job inside a child process, and being able to kill that batch job by just terminating the child process?
What are the alternatives to os.system() that I can use so that I can kill the batch-job from the main process ?
Thanks !!
subprocess module is the proper way to spawn and control processes in Python.
from the docs:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
os.systemos.spawnos.popenpopen2commands
so... if you are on Python 2.4+, subprocess is the replacement for os.system
for stopping processes, check out the terminate() and communicate() methods of Popen objects.
If you are on a Posix-compatible system (e.g., Linux or OS X) and no Python code has to be run after the child process, use os.execv. In general, avoid os.system and use the subprocess module instead.
If you want control over start and stop of child processes you have to use threading. In that case, look no further than Python's threading module.