Running any web server event loop on a secondary thread - python

We have a rich backend application that handles messaging/queuing, database queries, and computer vision. An additional feature we need is tcp communication - preferably via http. The point is: this is not primarily a web application. We would expect to have a set of http channels set up for different purposes. Yes - we understand about messaging including topics and publish-subscribe: but direct tcp based request/response also has its place.
I have looked at and tried out a half dozen python http web servers. They either implicitly or explicitly describe a requirement to run the event loop on the main thread. This is for us a cart before the horse: the main thread is already occupied with other tasks including coordination of the other activities.
To illustrate the intended structure I will lift code from my aiohttp-specific question How to run an aiohttp web application in a secondary thread. In that question I tried running in another standalone script but on a subservient thread:
def runWebapp():
from aiohttp import web
async def handle(request):
name = request.match_info.get('name', "Anonymous")
text = "Hello, " + name
return web.Response(text=text)
app = web.Application()
app.add_routes([web.get('/', handle),
web.get('/{name}', handle)])
web.run_app(app)
if __name__ == '__main__':
from threading import Thread
t = Thread(target=runWebapp)
t.start()
print('thread started let''s nap..')
import time
time.sleep(50)
This gives error:
RuntimeError: There is no current event loop in thread 'Thread-1'.
This error turns out to mean "hey you're not running this on the main thread".
We can logically replace aiohttp with other web servers here. Are there any for which this approach of asking the web server's event handling loop to run on a secondary thread will work? So far I have also tried cherrypy, tornado, and flask.
Note that one prominent webserver that I have not tried is django. But that one seems to require an extensive restructuring of the application around the directory structures expected (/required?) for django. We would not want to do that given the application has a set of other purposes that supersede this sideshow of having http servers.
An approach that I have looked at is asyncio. I have not understood whether it can support running event loops on a side thread or not: if so then it would be an answer to this question.
In any case are there any web servers that explicitly support having their event loops off of the main thread?

You can create and set an event loop while on the secondary thread:
asyncio.set_event_loop(asyncio.new_event_loop())
cherrypy and flask already work without this; tornado works with this.
On aiohttp, you get another error from it calling loop.add_signal_handler():
ValueError: set_wakeup_fd only works in main thread
You need to skip that because only the main thread of the main interpreter is allowed to set a new signal handler, which means web servers running on a secondary thread cannot directly handle signals to do graceful exit.
Example: aiohttp
Set the event loop before calling run_app().
aiohttp 3.8+ already uses a new event loop in run_app(), so you can skip this.
Pass handle_signals=False when calling run_app() to not add signal handlers.
asyncio.set_event_loop(asyncio.new_event_loop()) # aiohttp<3.8
web.run_app(app, handle_signals=False)
Example: tornado
Set the event loop before calling app.listen().
asyncio.set_event_loop(asyncio.new_event_loop())
app.listen(8888)
tornado.ioloop.IOLoop.current().start()

Any Python program is run on a single thread which is the main. And when you create a Thread it does not mean that your program already uses two threads.
Unfortunately, it is not possible to use different event loops for every Thread but possible to do that using multiprocessing instead of threading.
It allows creating its own event loop for every single Process.
from multiprocessing import Process
from aiohttp import web
def runWebapp(port):
async def handle(request):
name = request.match_info.get("name", "Anonymous")
text = "Hello, " + name
return web.Response(text=text)
app = web.Application()
app.add_routes([
web.get("/", handle),
web.get("/{name}", handle)
])
web.run_app(app, port=port)
if __name__ == "__main__":
p1 = Process(target=runWebapp, args=(8080,))
p2 = Process(target=runWebapp, args=(8081,))
p1.start()
p2.start()

Related

Using asyncio in the middle of Python3.6 script

I have a Python task that reports on it's progress during execution using a status updater that has 1 or more handlers associated with it. I want the updates to be dispatched to each handler in an asynchronous way (each handler is responsible for making an I/O bound call with the updates, pushing to a queue, logging to a file, calling a HTTP endpoint etc). The status updater has a method like so with each handler.dispatch method being a coroutine. This was working until a handler using aiohttp was added and now I am getting weird errors from the aiohttp module.
def _dispatch(self, **updates):
event_loop = asyncio.get_event_loop()
tasks = (event_loop.create_task(handler.dispatch(**updates)) for handler in self._handlers)
event_loop.run_until_complete(asyncio.gather(*tasks))
Every example of asyncio I've seen basically has this pattern
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
My question is, is the way I am attempting to use the asyncio module in this case just completely wrong? Does the event loop need to be created once and only once and then everything else goes through that?

Multithreaded server using Python Websockets library

I am trying to build a Server and Client using Python Websockets library. I am following the example given here: Python OCPP documentation. I want to edit this example such that for each new client(Charge Point), the server(Central System) runs on new thread (or new loop as Websockets library uses Python's asyncio). Basically I want to receive messages from multiple clients simultaneously. I modified the main method of server(Central System), from the example that I followed, like this by adding a loop parameter in websockets.serve() in a hope that whenever a new client runs, it runs on a different loop:
async def main():
server = await websockets.serve(
on_connect,
'0.0.0.0',
9000,
subprotocols=['ocpp2.0'],
ssl=ssl_context,
loop=asyncio.new_event_loop(),
ping_timeout=100000000
)
await server.wait_closed()
if __name__ == '__main__':
asyncio.run(main())
I am getting the following error. Please help:
RuntimeError:
Task <Task pending coro=<main() running at server.py:36>
cb=_run_until_complete_cb() at /usr/lib/python3.7/asyncio/base_events.py:153]>
got Future <_GatheringFuture pending> attached to a different loop - sys:1:
RuntimeWarning: coroutine 'BaseEventLoop._create_server_getaddrinfo' was never awaited
Author of the OCPP package here. The examples in the project are able to handle multiple clients with 1 event loop.
The problem in your example is loop=asyncio.new_event_loop(). The websocket.serve() is called in the 'main' event loop. But you pass it a reference to a second event loop. This causes the future to be attached to a different event loop it was created it.
You should avoid running multiple event loops.

Mixing tornado and sqlalchemy

I'm trying to write a tornado web application that uses sqlalchemy in some request handlers. These handlers have two parts: one that takes a long time to complete, and another that uses sqlalchemy and is relatively fast.
I would like to make the slow part of the request asynchronous, but not the sqlalchemy part. Can I do something like the following code and be safe?
class ExampleHandler(BaseHandler):
async def post(self):
loop = asyncio.get_event_loop()
await loop.run_in_executor(...) # very slow (no sqlalchemy here)
with self.db_session() as s: # sqlalchemy session
s.add(...)
s.commit()
self.render(...)
The idea is to have sqlalchemy still blocking, but have the computational heavy part not blocking the application.
The tornado web server uses asynchronous code to get around the limit of the python Global Interpreter Lock. The GIL, as it is colloquially known, allows only one thread of execution to take place in the python interpreter process. Tornado is able to answer many requests simultaneously because of its use of an event loop. The event loop can perform one small task at a time. Let's take your own post handler to understand this better.
In this handler, when the python interpreter gets to the await keyword, it pauses the execution of the function and queues it for later on its event loop. It then checks the event loop to respond to other events that may have queued up there, like responding to a new connection or servicing another handler.
When you block in an asynchronous function, you freeze the entire event loop as it is unable to pause your function and service anything else. What this actually means for you is that your web server will not accept or service any requests while your async function blocks. It will appear as if your web server is hanging and indeed it is stuck.
To keep the server responsive, you have to find a way to execute your sqlalchemy query in an asynchronous non-blocking manner.

Tornado server caused Django unable to handle concurrent requests

I wrote a Django website that handles concurrent database requests and subprocess calls perfectly fine, if I just run "python manage.py runserver"
This is my model
class MyModel:
...
def foo(self):
args = [......]
pipe = subprocess.Popen(args, stdout=subproccess.PIPE, stderr=subprocess.PIPE)
In my view:
def call_foo(request):
my_model = MyModel()
my_model.foo()
However, after I wrap it using Tornado server, it's no longer able to handle concurrent request. When I click my website where it sends async get request to this call_foo() function, it seems like my app is not able to handle other requests. For example, if I open the home page url, it keeps waiting and won't display until the above subprocess call in foo() has finished.
If I do not use Tornado, everything works fine.
Below is my code to start the tornado server. Is there anything that I did wrong?
MAX_WAIT_SECONDS_BEFORE_SHUTDOWN = 5
def sig_handler(sig, frame):
logging.warning('Caught signal: %s', sig)
tornado.ioloop.IOLoop.instance().add_callback(force_shutdown)
def force_shutdown():
logging.info("Stopping tornado server")
server.stop()
logging.info('Will shutdown in %s seconds ...', MAX_WAIT_SECONDS_BEFORE_SHUTDOWN)
io_loop = tornado.ioloop.IOLoop.instance()
deadline = time.time() + MAX_WAIT_SECONDS_BEFORE_SHUTDOWN
def stop_loop():
now = time.time()
if now < deadline and (io_loop._callbacks or io_loop._timeouts):
io_loop.add_timeout(now + 1, stop_loop)
else:
io_loop.stop()
logging.info('Force Shutdown')
stop_loop()
def main():
parse_command_line()
logging.info("starting tornado web server")
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
django.setup()
wsgi_app = tornado.wsgi.WSGIContainer(django.core.handlers.wsgi.WSGIHandler())
tornado_app = tornado.web.Application([
(r'/(favicon\.ico)', tornado.web.StaticFileHandler, {'path': "static"}),
(r'/static/(.*)', tornado.web.StaticFileHandler, {'path': "static"}),
('.*', tornado.web.FallbackHandler, dict(fallback=wsgi_app)),
])
global server
server = tornado.httpserver.HTTPServer(tornado_app)
server.listen(options.port)
signal.signal(signal.SIGTERM, sig_handler)
signal.signal(signal.SIGINT, sig_handler)
tornado.ioloop.IOLoop.instance().start()
logging.info("Exit...")
if __name__ == '__main__':
main()
There is nothing wrong with your set-up. This is by design.
So, WSGI protocol (and so Django) uses syncronous model. It means that when your app starts processing a request it takes control and gives it back only when request is finished. That's why it can process single request at once. To allow simultaneous requests one usually launches wsgi application in multithreaded or multiprocessed mode.
The Tornado server on other side uses asynchronous model. The idea here is to have own scheduler instead of OS scheduler that works with threads and processes. So your code runs some logic, then launches some long task (DB call, URL fetch), sets up what to run when task finishes and gives control back to scheduler.
Giving controll back to scheduler is crucial part, it allows async server to work fast because it can start processing new request while previous is waiting for data.
This answer explains sync/async detailed. It focuses on client, but I think you can see the idea.
So whats wrong with your code: Popen does not give control to IOLoop. Python does nothing until your subprocess is finished, and so can not process other requests, even not Django's requests. runserver "works" here because it's multithreaded. So while locking entirely the thread, other threads can still process requests.
For this reason it's usually not recommended to run WSGI apps under async server like tornado. The doc claims it will be less scalable, but you can see the problem on your own code. So if you need both servers (e.g. Tornado for sockets and Django for main site), I'd suggest to run both behind nginx, and use uwsgi or gunicorn to run Django. Or take a look at django-channels app instead of tornado.
Besides, while it works on test environment, I guess it's not a recomended way to do what you try to achieve. It's hard to suggest the solution, as I don't know what do you call with Popen, but it seams to be something long running. Maybe you should take a look at Celery project. It's a package for running long-term background job.
However, back to running sub-processes. In Tornado you can use tornado.process.Subprocess. It's a wrapper over Popen to allow it to work with IOLoop. Unfortunately I don't know if you can use it in wsgi part under tornado. There are some projects I remember, like django futures but it seems to be abandoned.
As another quick and dirty fix - you can run Tornado with several processes. Check this example on how to fork server. But I will not recoment using this in production anyway (fork is OK, running wsgi fallback is not).
So to summarize, I would rewrite your code to do one of the following:
Run the Popen call in some background queue, like Celery
Process such views with Tornado and use tornado.processes module to run subprocess.
And overall, I'd seek for another deployment infrastructure, and would not run Durango under tornado.

How to run python-socketio in Thread?

I have python-socketio used in Flask and want to start Thread instance and emit signals from it when signal comes. In flask app I have:
import threading
def game(my_sio):
my_sio.emit('log', data = "Game started!")
return
#sio.on('start')
def startGame(sid):
t = threading.Thread(target = game, args = [sio])
t.start()
There's a simple example and it does not work. In server log I get:
engineio:a16afb90de2e44ab8a836498086c88f6: Sending packet MESSAGE data 2["log","Game started!"]
But client never gets it!
Javascript:
socket.on('log', function(a) {
console.log(a);
});
So what worked for me was switching to threading mode in Flask + python-socketio as documented here:
https://python-socketio.readthedocs.io/en/latest/server.html#standard-threads
I was using eventlet before and that caused the problem.
Another solution
Using eventlet is possible, but threads must be non-blocking and thus said, standard Threads are useless here.
Instead to create thread one has to use socketio.Server method start_background_task which takes function as an argument.
Also inside the threaded task, use eventlet.sleep() instead of the time.sleep() method.
But event that may not work without some hacks and use of monkey_patch coming with eventlet. See more in documentation. But if there are still problems, adding empty eventlet.sleep() in the import section right before the monkey_patch will do the trick. Found it somewhere on the web in a discussion.

Categories