I am writing a script that to start a process and check's its stdout (while it's being run, not at the end of execution).
The obvious choice seemed to have a thread that will be blocked reading lines from the process stdout.
I have tested it with WSL2 bash using:
python __main__.py 'echo ok'
The outcome is random, resulting in one of the following cases:
Execution terminated without any output
"ok" printed as expected
"ok" printed follow by a 'ValueError: readline of closed file' exception
Any idea on what might be the problem ?
The code:
import argparse
from subprocess import Popen, PIPE
import sys
import threading
class ReadlineThread(threading.Thread):
def __init__(self, proc):
threading.Thread.__init__(self)
self._proc = proc
def run(self):
while self._proc.poll() is None:
line = self._proc.stdout.readline()
sys.stdout.buffer.write(line)
sys.stdout.flush()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('command', nargs='+', help='bar help')
args = parser.parse_args()
with Popen(args.command, stdout=PIPE, stderr=PIPE, shell=True) as proc:
stdout_thread = ReadlineThread(proc)
stdout_thread.start()
if __name__ == "__main__":
main()
When you create a thread, it becomes part of the parent process. The parent process is the thread that runs your main function. In your main function, you call stdout_thread.start(), which begins the process of starting a thread and then immediately returns. Aftfer that, there is no more code in your main function, which results in python shutting down the main process. Since your thread is part of the main process, it will be taken down when the main process terminates. Meanwhile, the thread you've started up is still being created.
Here we have what is called a race condition. Your thread is starting while simultaneously the process it belongs to is shutting down. If your thread manages to start up and complete its work before the process terminates, you get your expected result. If the process terminates before the thread has started, you get no output. In the third situation, the process closes its stdout before the thread has finished reading it, resulting in an error.
To fix this, in your main function you should wait for your spawned thread to finish, which could be achieved by calling stdout_thread.join().
Related
I have a requirement to use python to start a totally independent process. That means even the main process exited, the sub-process can still run.
Just like the shell in Linux:
#./a.out &
then even if the ssh connection is lost, then a.out can still keep running.
I need a similar but unified way across Linux and Windows
I have tried the multiprocessing module
import multiprocessing
import time
def fun():
while True:
print("Hello")
time.sleep(3)
if __name__ == '__main__':
p = multiprocessing.Process(name="Fun", target=fun)
p.daemon = True
p.start()
time.sleep(6)
If I set the p.daemon = True, then the print("Hello") will stop in 6s, just after the main process exited.
But if I set the p.daemon = False, the main process won't exit on time, and if I CTRL+C to force quit the main process, the print("Hello") will also be stopped.
So, is there any way the keep print this "Hello" even the main process has exited?
The multiprocessing module is generally used to split a huge task into multiple sub tasks and run them in parallel to improve performance.
In this case, you would want to use the subprocess module.
You can put your fun function in a seperate file(sub.py):
import time
while True:
print("Hello")
time.sleep(3)
Then you can call it from the main file(main.py):
from subprocess import Popen
import time
if __name__ == '__main__':
Popen(["python", "./sub.py"])
time.sleep(6)
print('Parent Exiting')
The subprocess module can do it. If you have a .py file like this:
from subprocess import Popen
p = Popen([r'C:\Program Files\VideoLAN\VLC\vlc.exe'])
The file will end its run pretty quickly and exit, but vlc.exe will stay open.
In your case, because you want to use another function, you could in principle separate that into another .py file
I have the below three scripts, and when I run main.py file, it spawns child.py which again executes the subchild.py and terminates quicly, subchild.py however keeps executing for a lot of time.
The problem with this is, main.py is blocked at p.communicate() till subchild.py terminates. If I open task manager and kill the running subchild.py main.py immediately returns the output of child.py
So my questions are as below
main.py is supposed to wait only till child.py terminates, why is it waiting for subchild.py to terminate ?
How do I make the p.communicate() not wait till subchild.py completes its execution ?
# main.py file
if __name__ == '__main__':
import subprocess
p = subprocess.Popen(
["python", "child.py"],
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE
)
out, _ = p.communicate()
print(out)
# child.py file
if __name__ == '__main__':
import subprocess
print("Child running")
p = subprocess.Popen(["python", "subchild.py"])
print("Child pid - ", p.pid)
exit(1)
# subchild.py file
if __name__ == '__main__':
import time
time.sleep(10000)
Note: I'm trying this on Windows 7 Enterprise. I'm using python3.6.6
Update after comment:
On the main.py I need child.py's pid, stdout, stderr and process object so that I can kill child.py from main.py at any later point if I want to. This code is a small snippet is part of some API which I am building where the user would want the control to kill the process if he wishes to. subprocesses.call or subprocesses.run would not let me get control over the process object. I also won't have control over what child.py command I will receive as input for main.py, So I need to somehow not wait for the subchild.py and exit immediately with child.py's output as soon as it completes.
Your communicate call doesn't just wait for the child to complete. It first tries to read the entire contents of the child's stdout and stderr.
Even when the child has completed, the parent can't stop reading, because the grandchild inherits the child's stdin, stdout, and stderr. The parent needs to wait for the grandchild to complete, or at least for the grandchild to close its stdout and stderr, before the parent can be sure it's done reading.
Right now, child.py will always wait for subchild.py to finish. If you don't need to wait, you can try subprocess.call() as an alternative in the child.py code. Replace the Popen() logic with the call() logic instead.
p = subprocess.call(["python", "subchild.py"])
Here is a reference: https://docs.python.org/2/library/subprocess.html#subprocess.call
An alternative is to keep the code as it is an actually change to using .call() in the main.py file.
import subprocess
p = subprocess.call(
["python", "child.py"],
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE
)
You mentioned you can't use .call() because you'll need to kill it, however this is not the case. Making this change should allow you to:
return main.py and child.py quickly,
it will print the ID of the subchild process from child.py and
subchild.py will continue running
Finally I found the solution to this myself, this seems to be an open issue with python. Would be glad to see this fixed.
https://bugs.python.org/issue26731
I'm running the following python code:
import threading
import multiprocessing
def forever_print():
while True:
print("")
def main():
t = threading.Thread(target=forever_print)
t.start()
return
if __name__=='__main__':
p = multiprocessing.Process(target=main)
p.start()
p.join()
print("main process on control")
It terminates.
When I unwrapped main from the new process, and just ran it directly, like this:
if name == '__main__':
main()
The script went on forever, as I thought it should. Am I wrong to assume that, given that t is a non-daemon process, p shouldn't halt in the first case?
I basically set up this little test because i've been developing an app in which threads are spawned inside subprocesses, and it's been showing some weird behaviour (sometimes it terminates properly, sometimes it doesn't). I guess what I wanted to know, in a broader sense, is if there is some sort of "gotcha" when mixing these two python libs.
My running environment: python 2.7 # Ubuntu 14.04 LTS
For now, threads created by multiprocessing worker processes act like daemon threads with respect to process termination: the worker process exits without waiting for the threads it created to terminate. This is due to worker processes using os._exit() to shut down, which skips most normal shutdown processing (and in particular skips the normal exit processing code (sys.exit()) that .join()'s non-daemon threading.Threads).
The easiest workaround is for worker processes to explicitly .join() the non-daemon threads they create.
There's an open bug report about this behavior, but it hasn't made much progress: http://bugs.python.org/issue18966
You need to call t.join() in your main function.
As your main function returns, the process gets terminated with both its threads.
p.join() blocks the main thread waiting for the spawned process to end. Your spawned process then, creates a thread but does not wait for it to end. It returns immediately thus trashing the thread itself.
If Threads share memory, Processes don't. Therefore, the Thread you create in the newly spawned process remains relegated to that process. The parent process is not aware of it.
The gotcha is that the multiprocessing machinery calls os._exit() after your target function exits, which violently kills the child process, even if it has background threads running.
The code for Process.start() looks like this:
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
_current_process._children.add(self)
Popen.__init__ looks like this:
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork() # This forks a new process
if self.pid == 0: # This if block runs in the new process
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap() # This calls your target function
sys.stdout.flush()
sys.stderr.flush()
os._exit(code) # Violent death of the child process happens here
The _bootstrap method is the one that actually executes the target function you passed passed to the Process object. In your case, that's main. main returns right after you start your background thread, even though the process doesn't exit, because there's still a non-daemon thread running.
However, as soon execution hits os._exit(code), the child process is killed, regardless of any non-daemon threads still executing.
So I essentially have a case like this where in my main script I have
command = 'blender -b ' + settings.BLENDER_ROOT + 'uploadedFileCheck.blend -P ' + settings.BLENDER_ROOT + 'uploadedFileCheck.py -noaudio'
process = Popen(command.split(' ') ,stdout=PIPE, stderr=PIPE)
out, err = process.communicate()
And in the subprocess script uploadedFileCheck.py I have the line
exportFile(fileIn, fileOut)
Thread(target=myfunction).start()
So I expect the subprocess to be finished, or at least to return to out, err after the exportFile() call, but it seems it's waiting for the Thread to finish as well. Does anyone understand this behavior?
Also, in case you're wondering, I'm calling that other python file as a subprocess because the main script is in python2 and that script (blender) is in python3, but that's irrelevant (and can't change)
A process won't exit until all its non-daemon threads have exited. By default, Thread objects in Python are created as non-daemon threads. If you want your script to exit as soon as the main thread is done, rather than waiting for the spawned thread to finish, set the daemon flag on the Thread object to True prior to starting it:
t = Thread(target=myfunction)
t.daemon = True
t.start()
Note that this will kill the daemon thread in a non-graceful way, without any cleanup occuring. If you're doing any kind of work in that thread that needs to be cleaned up, you should consider an approach where you signal the thread to shut itself down, instead.
From within my python script, I want to start another python script which will run in the background waiting for the instruction to terminate.
Host Python script (H1) starts subprocess P1.
P1 performs some short lived work & returns a sentinel to indicate that it is now going to sleep awaiting instructions to terminate.
H1 polls for this sentinel repeatedly. When it receives the sentinel, it performs some other IO bound task and when that completes, tells P1 to die gracefully (meaning close any resources that you have acquired).
Is this feasible to do with the subprocess module ?
Yes, start the process with :
p=subprocess.Popen([list for the script to execute], stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
You can then read from p.stdout and p.stderr to watch for your sentinel and write to p.stdin to send messages to the child process. If you are running on a posix system, you might consider using pexpect instead; it doesn't support MS Windows, but it handles communicating with child processes better than subprocess.
"""H1"""
from multiprocessing import Process, Pipe
import sys
def P1(conn):
print 'P1: some short lived work'
sys.stdout.flush()
conn.send('work done')
# wait for shutdown command...
conn.recv()
conn.close()
print 'P1: shutting down'
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=P1, args=(child_conn,))
p.start()
print parent_conn.recv()
print 'H1: some other IO bound task'
parent_conn.send("game over")
p.join()
Output:
P1: some short lived work
work done
H1: some other IO bound task
P1: shutting down