I had an asynchronous function being called like this:
from multiprocessing import Process
def my_function(arg1, arg2):
print 'Long process begins'
p = Process(target=my_function, args=(arg1, arg2,)).start()
How can I make this blocking? I need to finish the process before running the rest of the script.
Use p.join()
Block the calling thread until the process whose join() method is
called terminates or until the optional timeout occurs.
If timeout is None then there is no timeout.
A process can be joined many times.
A process cannot join itself because this would cause a deadlock. It
is an error to attempt to join a process before it has been started.
Related
I am writing a script that to start a process and check's its stdout (while it's being run, not at the end of execution).
The obvious choice seemed to have a thread that will be blocked reading lines from the process stdout.
I have tested it with WSL2 bash using:
python __main__.py 'echo ok'
The outcome is random, resulting in one of the following cases:
Execution terminated without any output
"ok" printed as expected
"ok" printed follow by a 'ValueError: readline of closed file' exception
Any idea on what might be the problem ?
The code:
import argparse
from subprocess import Popen, PIPE
import sys
import threading
class ReadlineThread(threading.Thread):
def __init__(self, proc):
threading.Thread.__init__(self)
self._proc = proc
def run(self):
while self._proc.poll() is None:
line = self._proc.stdout.readline()
sys.stdout.buffer.write(line)
sys.stdout.flush()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('command', nargs='+', help='bar help')
args = parser.parse_args()
with Popen(args.command, stdout=PIPE, stderr=PIPE, shell=True) as proc:
stdout_thread = ReadlineThread(proc)
stdout_thread.start()
if __name__ == "__main__":
main()
When you create a thread, it becomes part of the parent process. The parent process is the thread that runs your main function. In your main function, you call stdout_thread.start(), which begins the process of starting a thread and then immediately returns. Aftfer that, there is no more code in your main function, which results in python shutting down the main process. Since your thread is part of the main process, it will be taken down when the main process terminates. Meanwhile, the thread you've started up is still being created.
Here we have what is called a race condition. Your thread is starting while simultaneously the process it belongs to is shutting down. If your thread manages to start up and complete its work before the process terminates, you get your expected result. If the process terminates before the thread has started, you get no output. In the third situation, the process closes its stdout before the thread has finished reading it, resulting in an error.
To fix this, in your main function you should wait for your spawned thread to finish, which could be achieved by calling stdout_thread.join().
I'm running the following python code:
import threading
import multiprocessing
def forever_print():
while True:
print("")
def main():
t = threading.Thread(target=forever_print)
t.start()
return
if __name__=='__main__':
p = multiprocessing.Process(target=main)
p.start()
p.join()
print("main process on control")
It terminates.
When I unwrapped main from the new process, and just ran it directly, like this:
if name == '__main__':
main()
The script went on forever, as I thought it should. Am I wrong to assume that, given that t is a non-daemon process, p shouldn't halt in the first case?
I basically set up this little test because i've been developing an app in which threads are spawned inside subprocesses, and it's been showing some weird behaviour (sometimes it terminates properly, sometimes it doesn't). I guess what I wanted to know, in a broader sense, is if there is some sort of "gotcha" when mixing these two python libs.
My running environment: python 2.7 # Ubuntu 14.04 LTS
For now, threads created by multiprocessing worker processes act like daemon threads with respect to process termination: the worker process exits without waiting for the threads it created to terminate. This is due to worker processes using os._exit() to shut down, which skips most normal shutdown processing (and in particular skips the normal exit processing code (sys.exit()) that .join()'s non-daemon threading.Threads).
The easiest workaround is for worker processes to explicitly .join() the non-daemon threads they create.
There's an open bug report about this behavior, but it hasn't made much progress: http://bugs.python.org/issue18966
You need to call t.join() in your main function.
As your main function returns, the process gets terminated with both its threads.
p.join() blocks the main thread waiting for the spawned process to end. Your spawned process then, creates a thread but does not wait for it to end. It returns immediately thus trashing the thread itself.
If Threads share memory, Processes don't. Therefore, the Thread you create in the newly spawned process remains relegated to that process. The parent process is not aware of it.
The gotcha is that the multiprocessing machinery calls os._exit() after your target function exits, which violently kills the child process, even if it has background threads running.
The code for Process.start() looks like this:
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
_current_process._children.add(self)
Popen.__init__ looks like this:
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork() # This forks a new process
if self.pid == 0: # This if block runs in the new process
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap() # This calls your target function
sys.stdout.flush()
sys.stderr.flush()
os._exit(code) # Violent death of the child process happens here
The _bootstrap method is the one that actually executes the target function you passed passed to the Process object. In your case, that's main. main returns right after you start your background thread, even though the process doesn't exit, because there's still a non-daemon thread running.
However, as soon execution hits os._exit(code), the child process is killed, regardless of any non-daemon threads still executing.
I am using multiprocessing module's Process class to spawn multiple processes, those processes execute some script and then dies.What I wanted, a timeout to be applied on each process, so that a process would die if cant execute in time timeout. I am using join(timeout) on Process objects.
Since the join() function doesn;t kill the process, it just blocks the process until it finishes
Now my question : Is there any side-effects of using join() with timeout ..like, would the processes be cleaned automatically, after the main process dies ?? or I have to kill those processes manually ??
I am a newbie to python and its multiprocessing module, please be patient.
My Code, which is creating Processes in a for loop ::
q = Queue()
jobs = [
Process(
target=get_current_value,
args=(q,),
kwargs=
{
'device': device,
'service_list': service_list,
'data_source_list': data_source_list
}
) for device in device_list
]
for j in jobs:
j.start()
for k in jobs:
k.join()
The timeout argument just tells join how long to wait for the Process to exit before giving up. If timeout expires, the Process does not exit; the join call simply unblocks. If you want to end your workers when the timeout expires, you need to do so manually. You can either use terminate, as suggested by wRAR, to uncleanly shut things down, or use some other signaling mechanism to tell the children to shutdown cleanly:
p = Process(target=worker, args=(queue,))
p.start()
p.join(50)
if p.isalive(): # join timed out without the process actually finishing
#p.terminate() # unclean shutdown
If you don't want to use terminate, the alternative approach is really dependent on what the workers are doing. If they're consuming from a queue, you can use a sentinel:
def worker(queue):
for item in iter(queue.get, None): # None will break the loop
# Do normal work
if __name__ == "__main__":
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
# Do normal work here
# Time to shut down
queue.put(None)
Or you could use an Event, if you're doing some other operation in a loop:
def worker(event):
while not event.is_set():
# Do work here
if __name__ == "__main__":
event= multiprocessing.Event()
p = multiprocessing.Process(target=worker, args=(event,))
p.start()
# Do normal work here
# Time to shut down
event.set()
Using terminate could be just fine, though, unless your child processes are using resources that could be corrupted if the process is unexpectedly shut down (like writing to a file or db, or holding a lock). If you're just doing some calculations in the worker, using terminate won't hurt anything.
join() does nothing with the child process. If you really want to terminate worker process in a non-clean manner you should use terminate() (you should understand the consequences).
If you want children to be terminated when the main process exits you should set daemon attribute on them.
Do subprocess calls in Python hang? That is, do subprocess calls operate in the same thread as the rest of the Python code, or is it a non-blocking model? I couldn't find anything in the docs or on SO on the matter. Thanks!
Most methods in the subprocess module are blocking, meaning that they want for the subprocess to complete before returning. However, subprocess.Popen is non-blocking.
result = subprocess.call(cmd) # This will block until cmd is complete
p = subprocess.Popen(cmd) # This will return a Popen object right away
Once you have the Popen object, you can use the poll instance method to see if the subprocess is complete without blocking.
if p.poll() is None: # Make sure you check against None, since it could return 0 when the process is complete.
print "Process is still running"
subprocesses run in the background. In the subprocess module, there is a class called Popen that starts a process in the background. It has a wait() method you can use to wait for the process to finish. It also has a communicate() helper method that will handle stdin/stdout/stderr plus wait for the process to complete. It also has convenience functions like call() and check_call() that create a Popen object and then wait for it to complete.
So, subprocess implements a non-blocking model but also gives you blocking helper functions.
I have a python script that calls another python script. Inside the other python script it spawns some threads.How do I make the calling script wait until the called script is completely done running?
This is my code :
while(len(mProfiles) < num):
print distro + " " + str(len(mProfiles))
mod_scanProfiles.main(distro)
time.sleep(180)
mProfiles = readProfiles(mFile,num,distro)
print "yoyo"
How do I wait until mod_scanProfiles.main() and all threads are completely finished? ( I used time.sleep(180) for now but its not good programming habit)
You want to modify the code in mod_scanProfiles.main to block until all it's threads are finished.
Assuming you make a call to subprocess.Popen in that function just do:
# in mod_scanPfiles.main:
p = subprocess.Popen(...)
p.wait() # wait until the process completes.
If you're not currently waiting for your threads to end you'll also want to call Thread.join (docs) to wait for them to complete. For example:
# assuming you have a list of thread objects somewhere
threads = [MyThread(), ...]
for thread in threads:
thread.start()
for thread in threads:
thread.join()