There's 5 processes created and run by subprocess.Popen(someexternalCommand). It's important for the commands to be started sequentially, i.g. the second process must not start until the first has started and running.
The following code was used :
proc1 = TrafficGenUtils.Popen(someexternalCommand)
time.sleep(10)
proc2 = TrafficGenUtils.Popen(someexternalCommand)
time.sleep(10)
...
the order of execution was maintained when the number of processes was 3, but with 5 it's almost never maintained and increasing sleep to even 60 seconds doesn't help.
How to force the execution of Popen processes to be sequential.
Is the reason my code doesn't work linked to the fact that Popen relies on OS to create the processes but OS starts them whenever it pleases? I'm using Windows 8 and python 2.7
Popen starts a new process and therefore from that moment on it's up to the OS to schedule its run time. Not only can you not assume one process will run before another, but you can't even assume your parent process will yield execution to the child, at its actually likely not to.
In order to create sequenced execution you can use proc1.wait() to stop the parent process until the child process finishes its execution. This way you can guarantee that the processes will run in order, since the parent process will not go on until the child process ends and thus will not spawn the other children.
Read more in the Python Documentation.
Related
I'm using python multiprocessing module to parallelize some computationally heavy tasks.
The obvious choice is to use a Pool of workers and then use the map method.
However, processes can fail. For instance, they may be silently killed for instance by the oom-killer. Therefore I would like to be able to retrieve the exit code of the processes launched with map.
Additionally, for logging purpose, I would like to be able to know the PID of the process launched to execute each value in the the iterable.
If you're using multiprocessing.Pool.map you're generally not interested in the exit code of the sub-processes in the pool, you're interested in what value they returned from their work item. This is because under normal conditions, the processes in a Pool won't exit until you close/join the pool, so there's no exit codes to retrieve until all work is complete, and the Pool is about to be destroyed. Because of this, there is no public API to get the exit codes of those sub-processes.
Now, you're worried about exceptional conditions, where something out-of-band kills one of the sub-processes while it's doing work. If you hit an issue like this, you're probably going to run into some strange behavior. In fact, in my tests where I killed a process in a Pool while it was doing work as part of a map call, map never completed, because the killed process didn't complete. Python did, however, immediately launch a new process to replace the one I killed.
That said, you can get the pid of each process in your pool by accessing the multiprocessing.Process objects inside the pool directly, using the private _pool attribute:
pool = multiprocessing.Pool()
for proc in pool._pool:
print proc.pid
So, one thing you could do to try to detect when a process had died unexpectedly (assuming you don't get stuck in a blocking call as a result). You can do this by examining the list of processes in the pool before and after making a call to map_async:
before = pool._pool[:] # Make a copy of the list of Process objects in our pool
result = pool.map_async(func, iterable) # Use map_async so we don't get stuck.
while not result.ready(): # Wait for the call to complete
if any(proc.exitcode for proc in before): # Abort if one of our original processes is dead.
print "One of our processes has exited. Something probably went horribly wrong."
break
result.wait(timeout=1)
else: # We'll enter this block if we don't reach `break` above.
print result.get() # Actually fetch the result list here.
We have to make a copy of the list because when a process in the Pool dies, Python immediately replaces it with a new process, and removes the dead one from the list.
This worked for me in my tests, but because it's relying on a private attribute of the Pool object (_pool) it's risky to use in production code. I would also suggest that it may be overkill to worry too much about this scenario, since it's very unlikely to occur and complicates the implementation significantly.
I am trying to implement a job queuing system like torque PBS on a cluster.
One requirement would be to kill all the subprocesses even after the parent has exited. This is important because if someone's job doesn't wait its subprocesses to end, deliberately or unintentionally, the subprocesses become orphans and get adopted by process init, then it will be difficult to track down the subprocesses and kill them.
However, I figured out a trick to work around the problem, the magic trait is the cpu affinity of the subprocesses, because all subprocesses have the same cpu affinity with their parent. But this is not perfect, because the cpu affinity can be changed deliberately too.
I would like to know if there are anything else that are shared by parent process and its offspring, at the same time immutable
The process table in Linux (such as in nearly every other operating system) is simply a data structure in the RAM of a computer. It holds information about the processes that are currently handled by the OS.
This information includes general information about each process
process id
process owner
process priority
environment variables for each process
the parent process
pointers to the executable machine code of a process.
Credit goes to Marcus Gründler
Non of the information available will help you out.
But you can maybe use that fact that the process should stop, when the parent process id becomes 1(init).
#!/usr/local/bin/python
from time import sleep
import os
import sys
#os.getppid() returns parent pid
while (os.getppid() != 1):
sleep(1)
pass
# now that pid is 1, we exit the program.
sys.exit()
Would that be a solution to your problem?
In one of my Django views, I am calling a python script and getting its pid with:
from subprocess import Popen
p = Popen(['python', 'script.py'])
mypid = p.pid
When trying to find out if the process still is running from another page, I use the following function on mypid (thanks to this question):
def doesProcessExist(pid):
if pid < 0:
return False
try:
os.kill(pid, 0)
except OSError, e:
return e.errno == errno.EPERM
else:
return True
No matter how long I wait, the process still shows up as running. The only thing that stops it, is if I spawn a new python script process with Popen. Is there anyway I can fix this? I am not sure if this is caused by Django not closing python properly after the script is finished or something else. In Ubuntu's process status manager, the process shows up as [python] <defunct>.
--
The problem is true for all script.py I have tried. I am currently using one as simple as:
from time import sleep
sleep(5)
Really, what you're doing is wrong. When you use a high-level wrapper like a subprocess.Popen, you need to manage the process through that object. Just having the PID elsewhere isn't enough to manage it.
If you insist on dealing in PIDs instead of Popen objects, then you should use the low-level APIs in os.
Fortunately, you're not doing anything complicated, like creating pipes to talk to the child process. So, you can just launch it with your favorite spawn variant, then wait for it with waitpid or one of its variants.
I'm assuming you're doing this all in a single-process web server. If you're using a forking web server, where the other page could be in a different process, even using PIDs won't work. The parent process has to reap the child, not some other arbitrary process. If you want to make that work, you'll have to make things more complicated, and you're really going to have to learn about the Unix process model before anyone can explain it to you.
What you see is a zombie process. It doesn't keep running. It can't. It is dead. The only thing that is left is some info that allows for related processes to retrieve its status.
To find out whether a subprocess is alive without blocking, call p.poll(). If it returns None then the process is still alive, otherwise you can safely forget about it (it is already reaped by .poll()).
subprocess module calls _cleanup() function that reaps zombie processes inside Popen() constructor. So normally your script won't create many zombie processes anyway.
To see a list of zombie processes:
import os
#NOTE: don't use Popen() here
print os.popen(r"ps aux | grep Z | grep -v grep").read(),
Processes in Unix stick around until the parent waits for them. calling wait on the object returned by thepopen will wait for the process to be done and will wait for it so it goes away. Until you do that it will exist as a zombie process See this message for info on getting the process to go away in the background while your web server runs without waiting for it in a foreground thread/view.
So, let's say that you do
p = subprocess.Popen(...)
At some point you need to call
p.wait()
What are the Windows equivalents to the resource limit mechanisms exposed on Unix systems by Python's resource module, and POSIX setrlimit?
Specifically, I'm limiting processor time for a child process to several seconds. If it hasn't completed within the constraint, it's terminated.
AFAIK, there is no portable way of getting information about the amount of processor time used by a child process in Python. But what subprocess module does (assuming you're starting the child with subprocess.Popen, which is recommended) give you is the process ID of the child process in Popen.pid. What you could do on Windows is run tasklist (see manual) using subprocess.check_output repeatedly and extract the info about the child proces from its output, using the PID as a filter.
As soon as the child process has has enough CPU time and if you used subprocess.Popen() to start the child process, you could use the Popen.kill method to kill it.
But I think it would be easier to kill the child process after after a specified number of seconds of wall time using a timer. Because if the child process hangs without using CPU time (for whatever reason), so does your python program that is waiting for it to consume CPU time.
I am experiencing some problems when using subprocess.Popen() to spawn several instances of the same application from my python script using threads to have them running simultaneously. In each thread I run the application using the popen() call, and then I wait for it to finish by callingwait(). The problem seems to be that the wait()-call does not actually wait for the process to finish. I experimented by using only one thread, and by printing out text messages when the process starts, and when it finishes. So the thread function would look something like this:
def worker():
while True:
job = q.get() # q is a global Queue of jobs
print('Starting process %d' % job['id'])
proc = subprocess.Popen(job['cmd'], shell=True)
proc.wait()
print('Finished process %d' % job['id'])
job.task_done()
But even when I only use one thread, it will print out several "Starting process..." messages, before any "Finished process..." message appears. Are there any cases when wait() does not actually wait? I have several different external applications (C++ console applications), which in turn will have several instances running simultaneously, and for some of them my code works, but for others it won't. Could there be some issue with the external applications that somehow affects the call to wait()?
The code for creating the threads looks something like this:
for i in range(1):
t = Thread(target=worker)
t.daemon = True
t.start()
q.join() # Wait for the queue to empty
Update 1:
I should also add that for some of the external applications I sometimes get a return code (proc.returncode) of -1073471801. For example, one of the external applications will give that return code the first two times Popen is called, but not the last two (when I have four jobs).
Update 2:
To clear things up, right now I have four jobs in the queue, which are four different test cases. When I run my code, for one of the external applications the first two Popen-calls generate the return code -1073471801. But if I print the exact command which Popen calls, and run it in a command window, it executes without any problems.
Solved!
I managed to solve the issues I was having. I think the problem was my lack of experience in threaded programming. I missed the fact that when I had created my first worker threads, they would keep on living until the python script exits. By mistake I created more worker threads each time I put new items on the queue (I do that in batches for every external program I want to run). So by the time I got to the fourth external application, I had four threads running simultaneously even though I only thought I had one.
You could also use check_call() instead of Popen. check_call() waits for the command to finish, even when shell=True and then returns the exit code of the job.
Sadly when running your subprocess using shell=True, wait() will only wait for the sh subprocess to finish and not for the command cmd.
I will suggest if it possible to don't use the shell=True, if not possible you can create a process group like in this answer and use os.waitpid to wait for the process group not just the shell process.
Hope this was helpful :)
Make sure all applications your are calling have valid system return codes when they finish
I was having issues as well, but was inspired by yours.
Mine looks like this, and works beautifully:
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags = subprocess.STARTF_USESHOWWINDOW
startupinfo.wShowWindow = subprocess.SW_HIDE
proc = subprocess.Popen(command, startupinfo=startupinfo)
proc.communicate()
proc.wait()
Notice that this one hides the window as well.