I have the following situation:
I receive a request on a socketio server. I answer it (socket.emit(..)) and then start something with heavy computation load in another thread.
If the heavy computation is caused by subprocess.Popen (using subprocess.PIPE) it totally blocks every incoming request as long as it is being executed although it happens in a separate thread.
No problem - in this thread it was suggested to asynchronously read the result of the subprocess with a buffer size of 1 so that between these reads other threads have the chance to do something. Unfortunately this did not help for me.
I also already monkeypatched eventlet and that works fine - as long as I don't use subprocess.Popen with subprocess.PIPE in the thread.
In this code sample you can see that it only happens using subprocess.Popen with subprocess.PIPE. When uncommenting #functionWithSimulatedHeavyLoad() and instead comment functionWithHeavyLoad() everything works like charm.
from flask import Flask
from flask.ext.socketio import SocketIO, emit
import eventlet
eventlet.monkey_patch()
app = Flask(__name__)
socketio = SocketIO(app)
import time
from threading import Thread
#socketio.on('client command')
def response(data, type = None, nonce = None):
socketio.emit('client response', ['foo'])
thread = Thread(target = testThreadFunction)
thread.daemon = True
thread.start()
def testThreadFunction():
#functionWithSimulatedHeavyLoad()
functionWithHeavyLoad()
def functionWithSimulatedHeavyLoad():
time.sleep(5)
def functionWithHeavyLoad():
from datetime import datetime
import subprocess
import sys
from queue import Queue, Empty
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueueOutput(out, queue):
for line in iter(out.readline, b''):
if line == '':
break
queue.put(line)
out.close()
# just anything that takes long to be computed
shellCommand = 'find / test'
p = subprocess.Popen(shellCommand, universal_newlines=True, shell=True, stdout=subprocess.PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target = enqueueOutput, args = (p.stdout, q))
t.daemon = True
t.start()
t.join()
text = ''
while True:
try:
line = q.get_nowait()
text += line
print(line)
except Empty:
break
socketio.emit('client response', {'text': text})
socketio.run(app)
The client receives the message 'foo' after the blocking work in the functionWithHeavyLoad() function is completed. It should receive the message earlier, though.
This sample can be copied and pasted in a .py file and the behavior can be instantly reproduced.
I am using Python 3.4.3, Flask 0.10.1, flask-socketio1.2, eventlet 0.17.4
Update
If I put this into the functionWithHeavyLoad function it actually works and everything's fine:
import shlex
shellCommand = shlex.split('find / test')
popen = subprocess.Popen(shellCommand, stdout=subprocess.PIPE)
lines_iterator = iter(popen.stdout.readline, b"")
for line in lines_iterator:
print(line)
eventlet.sleep()
The problem is: I used find for heavy load in order to make the sample for you more easily reproducable. However, in my code I actually use tesseract "{0}" stdout -l deu as the sell command. This (unlike find) still blocks everything. Is this rather a tesseract issue than eventlet? But still: how can this block if it happens in a separate thread where it reads line by line with context switch when find does not block?
Thanks to this question I learned something new today. Eventlet does offer a greenlet friendly version of subprocess and its functions, but for some odd reason it does not monkey patch this module in the standard library.
Link to the eventlet implementation of subprocess: https://github.com/eventlet/eventlet/blob/master/eventlet/green/subprocess.py
Looking at the eventlet patcher, the modules that are patched are os, select, socket, thread, time, MySQLdb, builtins and psycopg2. There is absolutely no reference to subprocess in the patcher.
The good news is that I was able to work with Popen() in an application very similar to yours, after I replaced:
import subprocess
with:
from eventlet.green import subprocess
But note that the currently released version of eventlet (0.17.4) does not support the universal_newlines option in Popen, you will get an error if you use it. Support for this option is in master (here is the commit that added the option). You will either have to remove that option from your call, or else install the master branch of eventlet direct from github.
Related
I am trying to work with subprocess routine that spawns an interactive child process which expects user inputs. This process normally hangs immediately if I try to read its stdout stream directly.
I read through many solutions using fcntl, asynchronous operations, pexpect and file output and reading redirections. Although temporary log files should work, I don't want to go through that route as I would like to keep the process interactive within the Python interface. From all of those, threads seemed to be the most easiest and straightforward way (I could not get pexpect to work properly, although it seemed to be a good option, too).
Indeed, when I implemented the following code (stolen from Non-blocking read on a subprocess.PIPE in python):
import os
import subprocess as sp
from threading import Thread
from queue import Queue, Empty
class App:
def __init__(self):
proc = sp.Popen(['app'], stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, encoding='utf8')
out = NonBlockingStreamReader(proc.stdout)
print(out.readline(1))
class NonBlockingStreamReader:
def __init__(self, stream):
self.s = stream
self.q = Queue()
def populateQueue(stream, queue):
while True:
line = stream.readline()
if line:
queue.put(line)
else:
raise UnexpectedEndOfStream
self.t = Thread(target = populateQueue, args = (self.s, self.q))
self.t.daemon = True
self.t.start()
def readline(self, timeout = None):
try:
return self.q.get(block = timeout is not None, timeout = timeout)
except Empty:
return None
class UnexpectedEndOfStream(Exception):
pass
everything worked, flawlessly. Well, the problem is -- it worked on Linux only, even though the solution should be Windows compatible.
When I try to run this implementation on Windows, the newly created thread hangs the moment it tries to execute stream.readline(), never gets to actually populate the queue and thus the output of out.readline(1) read from the main thread is None.
How can I make this work on Windows?
I've been having an odd issue with PyCharm and subprocesses created by the multiprocessing library locking up forever. I'm using Windows with Python 3.5. What I'm trying to do is:
Start a background thread to block on stdin (waiting for input)
Have the main thread check occasionally for input from stdin and then delegate the work to Python processes created using multiprocessing
However, I've found that newly created multiprocessing Processes lock up forever if and only if the following conditions are met:
I'm running the code via Pycharm (both the latest and older versions)
The background thread is blocking on stdin
Here's the simplest example I can create that reproduces the problem:
import multiprocessing
import threading
import sys
def noop():
pass
def consume():
while True:
sys.stdin.readline()
if __name__ == '__main__':
# create a daemon thread to block on stdin
thread = threading.Thread(target=consume, daemon=True)
thread.start()
# create a background process
process = multiprocessing.Process(target=noop)
process.start()
I've Googled various combinations of "PyCharm stdin multiprocessing hang ..." and had no luck at finding an explanation, and I can't figure out why a thread of the main process blocking on stdin should ever cause a subprocess to also block/hang, let alone why it would only happen when running the script in PyCharm. The only think I can guess is that there might be some monkey-patching of either stdin or the multiprocessing library going on.
Has anyone else encountered this problem? Can anyone explain to me why this only occurs in PyCharm, and how I can make it work regardless of the Python editor I'm using?
I faced the same problem when I was trying to do multiple API calls to fetch data from a remote server. I replaced multiprocessing dummy with ThreadPoolExecutor. It works in the same way as dummy.
Following is a short snippet of a running code to write the response to a json file:
uids = [] # an array of the requisite parameters used in requests
with open('flight_config.json', 'w') as f:
futures = []
for i in range(chunk_index, len(uids)):
print('For uid[{}], fetching started:'.format(i))
chunk_index += 1
auth_token = get_header()
with ThreadPoolExecutor(max_workers=7) as executor:
future_to_url = {executor.submit(fetch_response_from_api, uid=uid, auth_token=auth_token): uid for uid in
uids[i]}
for future in concurrent.futures.as_completed(future_to_url):
result = future_to_url[future]
try:
data = future.result()
print(data)
except Exception as exc:
print('%r generated an exception: %s' % (result, exc))
else:
print('%r page is %d bytes' % (result, len(data)))
I'm trying to write a Tornado web app which runs a local command asynchronously, as a coroutine. This is the stripped down example code:
#! /usr/bin/env python3
import shlex
import asyncio
import logging
from tornado.web import Application, url, RequestHandler
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
logging.getLogger('asyncio').setLevel(logging.DEBUG)
async def run():
command = "python3 /path/to/my/script.py"
logging.debug('Calling command: {}'.format(command))
process = asyncio.create_subprocess_exec(
*shlex.split(command),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
logging.debug(' - process created')
result = await process
stdout, stderr = result.communicate()
output = stdout.decode()
return output
def run_sync(self, path):
command = "python3 /path/to/my/script.py"
logging.debug('Calling command: {}'.format(command))
try:
result = subprocess.run(
*shlex.split(command),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
check=True
)
except subprocess.CalledProcessError as ex:
raise RunnerError(ex.output)
else:
return result.stdout
class TestRunner(RequestHandler):
async def get(self):
result = await run()
self.write(result)
url_list = [
url(r"/test", TestRunner),
]
HTTPServer(Application(url_list, debug=True)).listen(8080)
logging.debug("Tornado server started at port {}.".format(8080))
IOLoop.configure('tornado.platform.asyncio.AsyncIOLoop')
IOLoop.instance().start()
When /path/to/my/script.py is called directly it executes as expected. Also, when I have TestHandler.get implemented as a regular, synchronous method (see run_sync), it executes correctly. However, when running the above app and calling /test, the log shows:
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:asyncio:execute program 'python3' stdout=stderr=<pipe>
DEBUG:asyncio:process 'python3' created: pid 21835
However, ps shows that the process hanged:
$ ps -ef | grep 21835
berislav 21835 21834 0 19:19 pts/2 00:00:00 [python3] <defunct>
I have a feeling that I'm not implementing the right loop, or I'm doing it wrong, but all the examples I've seen show how to use asyncio.get_event_loop().run_until_complete(your_coro()), and I couldn't find much about combining asyncio and Tornado. All suggestions welcome!
Subprocesses are tricky because of the singleton SIGCHLD handler. In asyncio, this means that they only work with the "main" event loop. If you change tornado.ioloop.IOLoop.configure('tornado.platform.asyncio.AsyncIOLoop') to tornado.platform.asyncio.AsyncIOMainLoop().install(), then the example works. A few other cleanups were also necessary; here's the full code:
#! /usr/bin/env python3
import shlex
import asyncio
import logging
import tornado.platform.asyncio
from tornado.web import Application, url, RequestHandler
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
logging.getLogger('asyncio').setLevel(logging.DEBUG)
async def run():
command = "python3 /path/to/my/script.py"
logging.debug('Calling command: {}'.format(command))
process = await asyncio.create_subprocess_exec(
*shlex.split(command),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
logging.debug(' - process created')
result = await process.wait()
stdout, stderr = await process.communicate()
output = stdout.decode()
return output
tornado.platform.asyncio.AsyncIOMainLoop().install()
IOLoop.instance().run_sync(run)
Also note that tornado has its own subprocess interface in tornado.process.Subprocess, so if that's the only thing you need asyncio for, consider using the Tornado version instead. Be aware that combining Tornado and asyncio's subprocesses interfaces in the same process may produce conflicts with the SIGCHLD handler, so you should pick one or the other, or use the libraries in such a way that the SIGCHLD handler is unnecessary (for example by relying solely on stdout/stderr instead of the process's exit status).
I think a quick code snippet is better to explain my problem, so please have a look at this:
from flask import Flask
from flask.ext.socketio import SocketIO
from threading import Thread
import subprocess
import threading
from eventlet.green.subprocess import Popen
app = Flask(__name__)
socketio = SocketIO(app)
def get_tasks_and_emit():
instance = Popen(["tasklist"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1)
lines_iterator = iter(instance.stdout.readline, b"")
data = ""
for line in lines_iterator:
data += line.decode("utf8")
socketio.emit("loaded", data)
print("::: DEBUG - returned tasks with thread")
#app.route("/")
def index():
html = "<!DOCTYPE html>"
html += "<script src=https://code.jquery.com/jquery-2.2.0.min.js></script>"
html += "<script src=https://cdn.socket.io/socket.io-1.4.5.js></script>"
html += "<script>"
html += "var socket = io.connect(window.location.origin);"
html += "socket.on('loaded', function(data) {alert(data);});"
html += "function load_tasks_threaded() {$.get('/tasks_threaded');}"
html += "function load_tasks_nonthreaded() {$.get('/tasks');}"
html += "</script>"
html += "<button onclick='load_tasks_nonthreaded()'>Load Tasks</button>"
html += "<button onclick='load_tasks_threaded()'>Load Tasks (Threaded)</button>"
return html
#app.route("/tasks")
def tasks():
get_tasks_and_emit()
print("::: DEBUG - returned tasks without thread")
return ""
#app.route("/tasks_threaded")
def tasks_threaded():
threading.Thread(target=get_tasks_and_emit).start()
return ""
if __name__ == "__main__":
socketio.run(app, port=7000, debug=True)
I am running this code on Windows using eventlet, if I don't use eventlet everything is fine (but of course much slower due to the werkzeug threading mode). (And I just checked and it's not working on Linux either)
I hope someone can point me into the right direction. (My Python version is 3.5.1 by the way)
I found the problem. Apparently you have to monkey patch the threading module, so I added
import eventlet
eventlet.monkey_patch(thread=True)
and then I also had a problem with long running programs. I had the same problem as the guy in this StackOverflow post:
Using Popen in a thread blocks every incoming Flask-SocketIO request
So I added
eventlet.sleep()
to the for loop that processes the pipes.
EDIT:
As temoto pointed out, alternatively one can also just use the threading module from eventlet.green like this:
from eventlet.green import threading
I'm trying to write a Python script that will enable me to start the Google App Engine dev_appserver using coverage.py, fetch the /test url from the app that I launch, wait for the server to finish returning the page, then shutdown the dev_appserver, and then generate a report.
My challenge is how to launch the dev_appserver in the background so that I can do the http fetch and then how to shut down the dev_appserver before generating my report.
I'm heading towards something like this:
# get_gae_coverage.py
# Launch dev_appserver with coverge.py
coverage run --source=./ /usr/local/bin/dev_appserver.py --clear_datastore --use_sqlite .
#Fetch /test
urllib.urlopen('http://localhost:8080/test').read()
# Shutdown dev_appserver somehow
# ??
# Generate coverage report
coverage report
What is the best way to write a python script to do this?
You should go with subprocess Popen
import os
import signal
import subprocess
coverage_proc = subprocess.Popen(
['coverage','run', your_flag_list]
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
time.sleep(5) #Find the correct sleep value
urllib.urlopen('http://localhost:8080/test').read()
time.sleep(1)
os.kill(coverage_proc.pid, signal.SIGINT)
Here you can find another approach to test if the server is up and running:
line = proc.stdout.readline()
while '] Running application' not in line:
line = proc.stdout.readline()
threading is the way to accomplish such a kind of task. Namely, you start the dev_appserver in a thread or in the main thread and as it is running, run and collect the results using the coverage module and then kill the dev_appserver python process in another thread and you will have results from coverage.
Here is sample snippet, which runs the dev_appserver.py in a thread and then waits for 10 seconds before and then it kills the python process. You can modify the end method in a suitable wherein the instead of waiting for 10 seconds, it waits for few seconds (in order to let the python process start) and then start doing the coverage testing and after it is done, kill the appserver and finish coverage.
import threading
import subprocess
import time
hold_process = []
def start():
print 'In the start process'
proc = subprocess.Popen(['/usr/bin/python','dev_appserver.py','yourapp'])
hold_process.append(proc)
def end():
time.sleep(10)
proc = hold_process.pop(0)
print 'Killing the appserver process'
proc.kill()
t = threading.Thread(name='startprocess',target=start)
t.deamon = True
w = threading.Thread(name='endprocess',target=end)
t.start()
w.start()
t.join()
w.join()