Python subprocess in parallel

Python subprocess in parallel - python

I want to run many processes in parallel with ability to take stdout in any time. How should I do it? Do I need to run thread for each subprocess.Popen() call, a what?

You can do it in a single thread.
Suppose you have a script that prints lines at random times:
#!/usr/bin/env python
#file: child.py
import os
import random
import sys
import time
for i in range(10):
print("%2d %s %s" % (int(sys.argv[1]), os.getpid(), i))
sys.stdout.flush()
time.sleep(random.random())
And you'd like to collect the output as soon as it becomes available, you could use select on POSIX systems as #zigg suggested:
#!/usr/bin/env python
from __future__ import print_function
from select import select
from subprocess import Popen, PIPE
# start several subprocesses
processes = [Popen(['./child.py', str(i)], stdout=PIPE,
bufsize=1, close_fds=True,
universal_newlines=True)
for i in range(5)]
# read output
timeout = 0.1 # seconds
while processes:
# remove finished processes from the list (O(N**2))
for p in processes[:]:
if p.poll() is not None: # process ended
print(p.stdout.read(), end='') # read the rest
p.stdout.close()
processes.remove(p)
# wait until there is something to read
rlist = select([p.stdout for p in processes], [],[], timeout)[0]
# read a line from each process that has output ready
for f in rlist:
print(f.readline(), end='') #NOTE: it can block
A more portable solution (that should work on Windows, Linux, OSX) can use reader threads for each process, see Non-blocking read on a subprocess.PIPE in python.
Here's os.pipe()-based solution that works on Unix and Windows:
#!/usr/bin/env python
from __future__ import print_function
import io
import os
import sys
from subprocess import Popen
ON_POSIX = 'posix' in sys.builtin_module_names
# create a pipe to get data
input_fd, output_fd = os.pipe()
# start several subprocesses
processes = [Popen([sys.executable, 'child.py', str(i)], stdout=output_fd,
close_fds=ON_POSIX) # close input_fd in children
for i in range(5)]
os.close(output_fd) # close unused end of the pipe
# read output line by line as soon as it is available
with io.open(input_fd, 'r', buffering=1) as file:
for line in file:
print(line, end='')
#
for p in processes:
p.wait()

You can also collect stdout from multiple subprocesses concurrently using twisted:
#!/usr/bin/env python
import sys
from twisted.internet import protocol, reactor
class ProcessProtocol(protocol.ProcessProtocol):
def outReceived(self, data):
print data, # received chunk of stdout from child
def processEnded(self, status):
global nprocesses
nprocesses -= 1
if nprocesses == 0: # all processes ended
reactor.stop()
# start subprocesses
nprocesses = 5
for _ in xrange(nprocesses):
reactor.spawnProcess(ProcessProtocol(), sys.executable,
args=[sys.executable, 'child.py'],
usePTY=True) # can change how child buffers stdout
reactor.run()
See Using Processes in Twisted.

You don't need to run a thread for each process. You can peek at the stdout streams for each process without blocking on them, and only read from them if they have data available to read.
You do have to be careful not to accidentally block on them, though, if you're not intending to.

You can wait for process.poll() to finish, and run other stuff concurrently:
import time
import sys
from subprocess import Popen, PIPE
def ex1() -> None:
command = 'sleep 2.1 && echo "happy friday"'
proc = Popen(command, shell=True, stderr=PIPE, stdout=PIPE)
while proc.poll() is None:
# do stuff here
print('waiting')
time.sleep(0.05)
out, _err = proc.communicate()
print(out, file=sys.stderr)
sys.stderr.flush()
assert proc.poll() == 0
ex1()

Related

python subprocess waiting for grandchild on Windows with stdout set

I have a script that is part of an automated test suite. It runs very slowly on Windows but not on Linux and I have found out why. The process that we are testing ('frank') creates a child process (so a grandchild). The python code won't return until that grandchild process also ends (on Windows - doesn't do this on Linux). The grandchild process will kill itself off after 5 seconds if there is no parent (it hangs around in case another process talks to it)
I've found I can stop the communicate function from hanging in this way if I don't capture stdout. But I need stdout. I read somewhere that the communicate function is waiting for all pipes to be closed. I know that the stdout handle is duplicated for the grandchild but I can't change the code I'm testing.
I've been searching for a solution. I tried some creation flags (still in the code) but that didn't help.
This is the cut down test -
import os
import sys
import threading
import subprocess
def read_from_pipe(process):
last_stdout = process.communicate()[0]
print (last_stdout)
CREATE_NEW_PROCESS_GROUP = 0x00000200
DETACHED_PROCESS = 0x00000008
# start process
command = 'frank my arguments'
cwd = "C:\\dev\\ui_test\\frank_test\\workspace\\report183"
p = subprocess.Popen(command,
stdout=subprocess.PIPE,
cwd=cwd)
# run thread to read from output
t = threading.Thread(target=read_from_pipe, args=[p])
t.start()
t.join(30)
print('finished')
Any ideas?
Thanks.
Peter.

After tips from #eryksun and a lot of Googling, I have this rather complicated lot of code! At one point, I considered cheating and doing os.system and redirecting to a temp file but then I realised that our test code allows for a command timing out. os.system would just block forever if the child process doesn't die.
import os
import sys
import threading
import subprocess
import time
if os.name == 'nt':
import msvcrt
import ctypes
# See https://stackoverflow.com/questions/55160319/python-subprocess-waiting-for-grandchild-on-windows-with-stdout-set for details on Windows code
# Based on https://github.com/it2school/Projects/blob/master/2017/Python/party4kids-2/CarGame/src/pygame/tests/test_utils/async_sub.py
from ctypes.wintypes import DWORD
if sys.version_info >= (3,):
null_byte = '\x00'.encode('ascii')
else:
null_byte = '\x00'
def ReadFile(handle, desired_bytes, ol = None):
c_read = DWORD()
buffer = ctypes.create_string_buffer(desired_bytes+1)
success = ctypes.windll.kernel32.ReadFile(handle, buffer, desired_bytes, ctypes.byref(c_read), ol)
buffer[c_read.value] = null_byte
return ctypes.windll.kernel32.GetLastError(), buffer.value
def PeekNamedPipe(handle):
c_avail = DWORD()
c_message = DWORD()
success = ctypes.windll.kernel32.PeekNamedPipe(handle, None, 0, None, ctypes.byref(c_avail), ctypes.byref(c_message))
return "", c_avail.value, c_message.value
def read_available(handle):
buffer, bytesToRead, result = PeekNamedPipe(handle)
if bytesToRead:
hr, data = ReadFile(handle, bytesToRead, None)
return data
return b''
def read_from_pipe(process):
if os.name == 'posix':
last_stdout = process.communicate()[0]
else:
handle = msvcrt.get_osfhandle(process.stdout.fileno())
last_stdout = b''
while process.poll() is None:
last_stdout += read_available(handle)
time.sleep(0.1)
last_stdout += read_available(handle)
print (last_stdout)
# start process
command = 'frank my arguments'
cwd = "C:\\dev\\ui_test\\frank_test\\workspace\\report183"
p = subprocess.Popen(command,
stdout=subprocess.PIPE,
cwd=cwd)
# run thread to read from output
t = threading.Thread(target=read_from_pipe, args=[p])
t.start()
t.join(30)
print('finished')

cannot kill a Sub process created by Popen when printing process.stdout

I have created a script which should run a command and kill it after 15 seconds
import logging
import subprocess
import time
import os
import sys
import signal
#cmd = "ping 192.168.1.1 -t"
cmd = "C:\\MyAPP\MyExe.exe -t 80 -I C:\MyApp\Temp -M Documents"
proc=subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,shell=True)
**for line in proc.stdout:
print (line.decode("utf-8"), end='')**
time.sleep(15)
os.kill(proc.pid, signal.SIGTERM)
#proc.kill() #Tried this too but no luck
This doesnot terminate my subprocess. however if I comment out the logging to stdout part, ie
for line in proc.stdout:
print (line.decode("utf-8"), end='')
the subprocess has been killed.
I have tried proc.kill() and CTRL_C_EVENT too but no luck.
Any help would be highly appreciated. Please see me as novice to python

To terminate subprocess in 15 seconds while printing its output line-by-line:
#!/usr/bin/env python
from __future__ import print_function
from threading import Timer
from subprocess import Popen, PIPE, STDOUT
# start process
cmd = r"C:\MyAPP\MyExe.exe -t 80 -I C:\MyApp\Temp -M Documents"
process = Popen(cmd, stdout=PIPE, stderr=STDOUT,
bufsize=1, universal_newlines=True)
# terminate process in 15 seconds
timer = Timer(15, terminate, args=[process])
timer.start()
# print output
for line in iter(process.stdout.readline, ''):
print(line, end='')
process.stdout.close()
process.wait() # wait for the child process to finish
timer.cancel()
Notice, you don't need shell=True here. You could define terminate() as:
def terminate(process):
if process.poll() is None:
try:
process.terminate()
except EnvironmentError:
pass # ignore
If you want to kill the whole process tree then define terminate() as:
from subprocess import call
def terminate(process):
if process.poll() is None:
call('taskkill /F /T /PID ' + str(process.pid))
Use raw-string literals for Windows paths: r"" otherwise you should escape all backslashes in the string literal
Drop shell=True. It creates an additional process for no reason here
universal_newlines=True enables text mode (bytes are decode into Unicode text using the locale preferred encoding automatically on Python 3)
iter(process.stdout.readline, '') is necessary for compatibility with Python 2 (otherwise the data may be printed with a delay due to the read-ahead buffer bug)
Use process.terminate() instead of process.send_signal(signal.SIGTERM) or os.kill(proc.pid, signal.SIGTERM)
taskkill allows to kill a process tree on Windows

The problem is reading from stdout is blocking. You need to either read the subprocess's output or run the timer on a separate thread.
from subprocess import Popen, PIPE
from threading import Thread
from time import sleep
class ProcKiller(Thread):
def __init__(self, proc, time_limit):
super(ProcKiller, self).__init__()
self.proc = proc
self.time_limit = time_limit
def run(self):
sleep(self.time_limit)
self.proc.kill()
p = Popen('while true; do echo hi; sleep 1; done', shell=True)
t = ProcKiller(p, 5)
t.start()
p.communicate()
EDITED to reflect suggested changes in comment
from subprocess import Popen, PIPE
from threading import Thread
from time import sleep
from signal import SIGTERM
import os
class ProcKiller(Thread):
def __init__(self, proc, time_limit):
super(ProcKiller, self).__init__()
self.proc = proc
self.time_limit = time_limit
def run(self):
sleep(self.time_limit)
os.kill(self.proc.pid, SIGTERM)
p = Popen('while true; do echo hi; sleep 1; done', shell=True)
t = ProcKiller(p, 5)
t.start()
p.communicate()

Threaded subprocess read is not give any output

I have the following code and am trying to run in in Idle in linux.
import sys
from subprocess import PIPE, Popen
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty # python 3.x
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen(['youtube-dl', '-l', '-c', 'https://www.youtube.com/watch?v=utV1sdjr4PY'], stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
# ... do other things here
# read line without blocking
while True:
try: line = q.get_nowait() # or q.get(timeout=.1)
except Empty:
pass
#print('no output yet')
else: # got line
print line
But is is always printing "no output yet".
Edit: I edited the code and it is working. But I have another problem. The percentage of the download is updated in a single line, but the code reads it only after the line is complete

OK, let's put the comments in an answer.
import sys, os
from subprocess import PIPE, Popen
from time import sleep
import pty
master, slave = pty.openpty()
stdout = os.fdopen(master)
p = Popen(['youtube-dl', '-l', '-c', 'https://www.youtube.com/watch?v=AYlb-7TXMxM'], shell=False,stdout=slave,stderr=slave, close_fds=True)
while True:
#line = stdout.readline().rstrip() - will strip the new line
line = stdout.readline()
if line != b'':
sys.stdout.write("\r%s" % line)
sys.stdout.flush()
sleep(.1)
If you want a thread and a diferent while, I sugest wrapping in a class and avoid queue. The output is „unbuffered” - thanks #FilipMalckzak

Non-blocking read on subprocess PIPE in Python, one byte at a time

I have implemented a variant on the code in this question:
A non-blocking read on a subprocess.PIPE in Python
To try and read the output in real time from this dummy program test.py:
import time, sys
print "Hello there"
for i in range(100):
time.sleep(0.1)
sys.stdout.write("\r%d"%i)
sys.stdout.flush()
print
print "Go now or I shall taunt you once again!"
The variation on the other question is that the calling program must read character by character, not line by line, as the dummy program test.py outputs progress indication all on one line by use of \r. So here it is:
import sys,time
from subprocess import PIPE, Popen
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty # Python 3.x
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
while True:
buffersize = 1
data = out.read(buffersize)
if not data:
break
queue.put(data)
out.close()
p = Popen(sys.executable + " test.py", stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # Thread dies with the program
t.start()
while True:
p.poll()
if p.returncode:
break
# Read line without blocking
try:
char = q.get_nowait()
time.sleep(0.1)
except Empty:
pass
else: # Got line
sys.stdout.write(char)
sys.stdout.flush()
print "left loop"
sys.exit(0)
Two problems with this
It never exits - p.returncode never returns a value and the loop is not left. How can I fix it?
It's really slow! Is there a way to make it more efficient without increasing buffersize?

As #Markku K. pointed out, you should use bufsize=0 to read one byte at a time.
Your code doesn't require a non-blocking read. You can simplify it:
import sys
from functools import partial
from subprocess import Popen, PIPE
p = Popen([sys.executable, "test.py"], stdout=PIPE, bufsize=0)
for b in iter(partial(p.stdout.read, 1), b""):
print b # it should print as soon as `sys.stdout.flush()` is called
# in the test.py
p.stdout.close()
p.wait()
Note: reading 1 byte at a time is very inefficient.
Also, in general, there could be a block-buffering issue that sometimes can be solved using pexpect, pty modules or unbuffer, stdbuf, script command-line utilities.
For Python processes you could use -u flag to force unbuffering (binary layer) of stdin, stdout, stderr streams.

Python subprocess readlines() hangs

The task I try to accomplish is to stream a ruby file and print out the output. (NOTE: I don't want to print out everything at once)
main.py
from subprocess import Popen, PIPE, STDOUT
import pty
import os
file_path = '/Users/luciano/Desktop/ruby_sleep.rb'
command = ' '.join(["ruby", file_path])
master, slave = pty.openpty()
proc = Popen(command, bufsize=0, shell=True, stdout=slave, stderr=slave, close_fds=True)
stdout = os.fdopen(master, 'r', 0)
while proc.poll() is None:
data = stdout.readline()
if data != "":
print(data)
else:
break
print("This is never reached!")
ruby_sleep.rb
puts "hello"
sleep 2
puts "goodbye!"
Problem
Streaming the file works fine. The hello/goodbye output is printed with the 2 seconds delay. Exactly as the script should work. The problem is that readline() hangs in the end and never quits. I never reach the last print.
I know there is a lot of questions like this here a stackoverflow but non of them made me solve the problem. I'm not that into the whole subprocess thing so please give me a more hands-on/concrete answer.
Regards
edit
Fix unintended code. (nothing to do with the actual error)

I assume you use pty due to reasons outlined in Q: Why not just use a pipe (popen())? (all other answers so far ignore your "NOTE: I don't want to print out everything at once").
pty is Linux only as said in the docs:
Because pseudo-terminal handling is highly platform dependent, there
is code to do it only for Linux. (The Linux code is supposed to work
on other platforms, but hasn’t been tested yet.)
It is unclear how well it works on other OSes.
You could try pexpect:
import sys
import pexpect
pexpect.run("ruby ruby_sleep.rb", logfile=sys.stdout)
Or stdbuf to enable line-buffering in non-interactive mode:
from subprocess import Popen, PIPE, STDOUT
proc = Popen(['stdbuf', '-oL', 'ruby', 'ruby_sleep.rb'],
bufsize=1, stdout=PIPE, stderr=STDOUT, close_fds=True)
for line in iter(proc.stdout.readline, b''):
print line,
proc.stdout.close()
proc.wait()
Or using pty from stdlib based on #Antti Haapala's answer:
#!/usr/bin/env python
import errno
import os
import pty
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable
# line-buffering on ruby's side
proc = Popen(['ruby', 'ruby_sleep.rb'],
stdin=slave_fd, stdout=slave_fd, stderr=STDOUT, close_fds=True)
os.close(slave_fd)
try:
while 1:
try:
data = os.read(master_fd, 512)
except OSError as e:
if e.errno != errno.EIO:
raise
break # EIO means EOF on some systems
else:
if not data: # EOF
break
print('got ' + repr(data))
finally:
os.close(master_fd)
if proc.poll() is None:
proc.kill()
proc.wait()
print("This is reached!")
All three code examples print 'hello' immediately (as soon as the first EOL is seen).
leave the old more complicated code example here because it may be referenced and discussed in other posts on SO
Or using pty based on #Antti Haapala's answer:
import os
import pty
import select
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable
# line-buffering on ruby's side
proc = Popen(['ruby', 'ruby_sleep.rb'],
stdout=slave_fd, stderr=STDOUT, close_fds=True)
timeout = .04 # seconds
while 1:
ready, _, _ = select.select([master_fd], [], [], timeout)
if ready:
data = os.read(master_fd, 512)
if not data:
break
print("got " + repr(data))
elif proc.poll() is not None: # select timeout
assert not select.select([master_fd], [], [], 0)[0] # detect race condition
break # proc exited
os.close(slave_fd) # can't do it sooner: it leads to errno.EIO error
os.close(master_fd)
proc.wait()
print("This is reached!")

Not sure what is wrong with your code, but the following seems to work for me:
#!/usr/bin/python
from subprocess import Popen, PIPE
import threading
p = Popen('ls', stdout=PIPE)
class ReaderThread(threading.Thread):
def __init__(self, stream):
threading.Thread.__init__(self)
self.stream = stream
def run(self):
while True:
line = self.stream.readline()
if len(line) == 0:
break
print line,
reader = ReaderThread(p.stdout)
reader.start()
# Wait until subprocess is done
p.wait()
# Wait until we've processed all output
reader.join()
print "Done!"
Note that I don't have Ruby installed and hence cannot check with your actual problem. Works fine with ls, though.

Basically what you are looking at here is a race condition between your proc.poll() and your readline(). Since the input on the master filehandle is never closed, if the process attempts to do a readline() on it after the ruby process has finished outputting, there will never be anything to read, but the pipe will never close. The code will only work if the shell process closes before your code tries another readline().
Here is the timeline:
readline()
print-output
poll()
readline()
print-output (last line of real output)
poll() (returns false since process is not done)
readline() (waits for more output)
(process is done, but output pipe still open and no poll ever happens for it).
Easy fix is to just use the subprocess module as it suggests in the docs, not in conjunction with openpty:
http://docs.python.org/library/subprocess.html
Here is a very similar problem for further study:
Using subprocess with select and pty hangs when capturing output

Try this:
proc = Popen(command, bufsize=0, shell=True, stdout=PIPE, close_fds=True)
for line in proc.stdout:
print line
print("This is most certainly reached!")
As others have noted, readline() will block when reading data. It will even do so when your child process has died. I am not sure why this does not happen when executing ls as in the other answer, but maybe the ruby interpreter detects that it is writing to a PIPE and therefore it will not close automatically.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python subprocess in parallel - python

I want to run many processes in parallel with ability to take stdout in any time. How should I do it? Do I need to run thread for each subprocess.Popen() call, a what?

You don't need to run a thread for each process. You can peek at the stdout streams for each process without blocking on them, and only read from them if they have data available to read. You do have to be careful not to accidentally block on them, though, if you're not intending to.

Related

python subprocess waiting for grandchild on Windows with stdout set

cannot kill a Sub process created by Popen when printing process.stdout

Threaded subprocess read is not give any output

Non-blocking read on subprocess PIPE in Python, one byte at a time

Python subprocess readlines() hangs

Categories

Resources