How to run python-socketio in Thread?

How to run python-socketio in Thread? - python

I have python-socketio used in Flask and want to start Thread instance and emit signals from it when signal comes. In flask app I have:
import threading
def game(my_sio):
my_sio.emit('log', data = "Game started!")
return
#sio.on('start')
def startGame(sid):
t = threading.Thread(target = game, args = [sio])
t.start()
There's a simple example and it does not work. In server log I get:
engineio:a16afb90de2e44ab8a836498086c88f6: Sending packet MESSAGE data 2["log","Game started!"]
But client never gets it!
Javascript:
socket.on('log', function(a) {
console.log(a);
});

So what worked for me was switching to threading mode in Flask + python-socketio as documented here:
https://python-socketio.readthedocs.io/en/latest/server.html#standard-threads
I was using eventlet before and that caused the problem.
Another solution
Using eventlet is possible, but threads must be non-blocking and thus said, standard Threads are useless here.
Instead to create thread one has to use socketio.Server method start_background_task which takes function as an argument.
Also inside the threaded task, use eventlet.sleep() instead of the time.sleep() method.
But event that may not work without some hacks and use of monkey_patch coming with eventlet. See more in documentation. But if there are still problems, adding empty eventlet.sleep() in the import section right before the monkey_patch will do the trick. Found it somewhere on the web in a discussion.

Related

Running any web server event loop on a secondary thread

We have a rich backend application that handles messaging/queuing, database queries, and computer vision. An additional feature we need is tcp communication - preferably via http. The point is: this is not primarily a web application. We would expect to have a set of http channels set up for different purposes. Yes - we understand about messaging including topics and publish-subscribe: but direct tcp based request/response also has its place.
I have looked at and tried out a half dozen python http web servers. They either implicitly or explicitly describe a requirement to run the event loop on the main thread. This is for us a cart before the horse: the main thread is already occupied with other tasks including coordination of the other activities.
To illustrate the intended structure I will lift code from my aiohttp-specific question How to run an aiohttp web application in a secondary thread. In that question I tried running in another standalone script but on a subservient thread:
def runWebapp():
from aiohttp import web
async def handle(request):
name = request.match_info.get('name', "Anonymous")
text = "Hello, " + name
return web.Response(text=text)
app = web.Application()
app.add_routes([web.get('/', handle),
web.get('/{name}', handle)])
web.run_app(app)
if __name__ == '__main__':
from threading import Thread
t = Thread(target=runWebapp)
t.start()
print('thread started let''s nap..')
import time
time.sleep(50)
This gives error:
RuntimeError: There is no current event loop in thread 'Thread-1'.
This error turns out to mean "hey you're not running this on the main thread".
We can logically replace aiohttp with other web servers here. Are there any for which this approach of asking the web server's event handling loop to run on a secondary thread will work? So far I have also tried cherrypy, tornado, and flask.
Note that one prominent webserver that I have not tried is django. But that one seems to require an extensive restructuring of the application around the directory structures expected (/required?) for django. We would not want to do that given the application has a set of other purposes that supersede this sideshow of having http servers.
An approach that I have looked at is asyncio. I have not understood whether it can support running event loops on a side thread or not: if so then it would be an answer to this question.
In any case are there any web servers that explicitly support having their event loops off of the main thread?

You can create and set an event loop while on the secondary thread:
asyncio.set_event_loop(asyncio.new_event_loop())
cherrypy and flask already work without this; tornado works with this.
On aiohttp, you get another error from it calling loop.add_signal_handler():
ValueError: set_wakeup_fd only works in main thread
You need to skip that because only the main thread of the main interpreter is allowed to set a new signal handler, which means web servers running on a secondary thread cannot directly handle signals to do graceful exit.
Example: aiohttp
Set the event loop before calling run_app().
aiohttp 3.8+ already uses a new event loop in run_app(), so you can skip this.
Pass handle_signals=False when calling run_app() to not add signal handlers.
asyncio.set_event_loop(asyncio.new_event_loop()) # aiohttp<3.8
web.run_app(app, handle_signals=False)
Example: tornado
Set the event loop before calling app.listen().
asyncio.set_event_loop(asyncio.new_event_loop())
app.listen(8888)
tornado.ioloop.IOLoop.current().start()

Any Python program is run on a single thread which is the main. And when you create a Thread it does not mean that your program already uses two threads.
Unfortunately, it is not possible to use different event loops for every Thread but possible to do that using multiprocessing instead of threading.
It allows creating its own event loop for every single Process.
from multiprocessing import Process
from aiohttp import web
def runWebapp(port):
async def handle(request):
name = request.match_info.get("name", "Anonymous")
text = "Hello, " + name
return web.Response(text=text)
app = web.Application()
app.add_routes([
web.get("/", handle),
web.get("/{name}", handle)
])
web.run_app(app, port=port)
if __name__ == "__main__":
p1 = Process(target=runWebapp, args=(8080,))
p2 = Process(target=runWebapp, args=(8081,))
p1.start()
p2.start()

Tornado server caused Django unable to handle concurrent requests

I wrote a Django website that handles concurrent database requests and subprocess calls perfectly fine, if I just run "python manage.py runserver"
This is my model
class MyModel:
...
def foo(self):
args = [......]
pipe = subprocess.Popen(args, stdout=subproccess.PIPE, stderr=subprocess.PIPE)
In my view:
def call_foo(request):
my_model = MyModel()
my_model.foo()
However, after I wrap it using Tornado server, it's no longer able to handle concurrent request. When I click my website where it sends async get request to this call_foo() function, it seems like my app is not able to handle other requests. For example, if I open the home page url, it keeps waiting and won't display until the above subprocess call in foo() has finished.
If I do not use Tornado, everything works fine.
Below is my code to start the tornado server. Is there anything that I did wrong?
MAX_WAIT_SECONDS_BEFORE_SHUTDOWN = 5
def sig_handler(sig, frame):
logging.warning('Caught signal: %s', sig)
tornado.ioloop.IOLoop.instance().add_callback(force_shutdown)
def force_shutdown():
logging.info("Stopping tornado server")
server.stop()
logging.info('Will shutdown in %s seconds ...', MAX_WAIT_SECONDS_BEFORE_SHUTDOWN)
io_loop = tornado.ioloop.IOLoop.instance()
deadline = time.time() + MAX_WAIT_SECONDS_BEFORE_SHUTDOWN
def stop_loop():
now = time.time()
if now < deadline and (io_loop._callbacks or io_loop._timeouts):
io_loop.add_timeout(now + 1, stop_loop)
else:
io_loop.stop()
logging.info('Force Shutdown')
stop_loop()
def main():
parse_command_line()
logging.info("starting tornado web server")
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
django.setup()
wsgi_app = tornado.wsgi.WSGIContainer(django.core.handlers.wsgi.WSGIHandler())
tornado_app = tornado.web.Application([
(r'/(favicon\.ico)', tornado.web.StaticFileHandler, {'path': "static"}),
(r'/static/(.*)', tornado.web.StaticFileHandler, {'path': "static"}),
('.*', tornado.web.FallbackHandler, dict(fallback=wsgi_app)),
])
global server
server = tornado.httpserver.HTTPServer(tornado_app)
server.listen(options.port)
signal.signal(signal.SIGTERM, sig_handler)
signal.signal(signal.SIGINT, sig_handler)
tornado.ioloop.IOLoop.instance().start()
logging.info("Exit...")
if __name__ == '__main__':
main()

There is nothing wrong with your set-up. This is by design.
So, WSGI protocol (and so Django) uses syncronous model. It means that when your app starts processing a request it takes control and gives it back only when request is finished. That's why it can process single request at once. To allow simultaneous requests one usually launches wsgi application in multithreaded or multiprocessed mode.
The Tornado server on other side uses asynchronous model. The idea here is to have own scheduler instead of OS scheduler that works with threads and processes. So your code runs some logic, then launches some long task (DB call, URL fetch), sets up what to run when task finishes and gives control back to scheduler.
Giving controll back to scheduler is crucial part, it allows async server to work fast because it can start processing new request while previous is waiting for data.
This answer explains sync/async detailed. It focuses on client, but I think you can see the idea.
So whats wrong with your code: Popen does not give control to IOLoop. Python does nothing until your subprocess is finished, and so can not process other requests, even not Django's requests. runserver "works" here because it's multithreaded. So while locking entirely the thread, other threads can still process requests.
For this reason it's usually not recommended to run WSGI apps under async server like tornado. The doc claims it will be less scalable, but you can see the problem on your own code. So if you need both servers (e.g. Tornado for sockets and Django for main site), I'd suggest to run both behind nginx, and use uwsgi or gunicorn to run Django. Or take a look at django-channels app instead of tornado.
Besides, while it works on test environment, I guess it's not a recomended way to do what you try to achieve. It's hard to suggest the solution, as I don't know what do you call with Popen, but it seams to be something long running. Maybe you should take a look at Celery project. It's a package for running long-term background job.
However, back to running sub-processes. In Tornado you can use tornado.process.Subprocess. It's a wrapper over Popen to allow it to work with IOLoop. Unfortunately I don't know if you can use it in wsgi part under tornado. There are some projects I remember, like django futures but it seems to be abandoned.
As another quick and dirty fix - you can run Tornado with several processes. Check this example on how to fork server. But I will not recoment using this in production anyway (fork is OK, running wsgi fallback is not).
So to summarize, I would rewrite your code to do one of the following:
Run the Popen call in some background queue, like Celery
Process such views with Tornado and use tornado.processes module to run subprocess.
And overall, I'd seek for another deployment infrastructure, and would not run Durango under tornado.

I can't understand multithreading using cherrypy/bottle

I'm using bottle with a cherrypy server to utilize multithreading. As I understand it this makes each request handled by a different thread. So given the following code:
from bottle import request, route
somedict = {}
#route("/read")
def read():
return somedict
#route("/write", method="POST")
def write():
somedict[request.forms.get("key")] = request.forms.get("value")
Would somedict be thread safe? What if a daemon thread were run to manage somedict, say it's a dictionary of active sessions and the daemon thread prunes expired sessions? If not would a simple locking mechinism suffice, and would I need to use it when reading, writing, and in the daemon thread, or just in the daemon thread?
Also as I understand it cherrypy is a true multithreaded server. Is there a more proper method I should use to impliment a daemon thread while using cherrypy as pythons threads are not true threads? I don't wish to delve much into the cherrypy environment preferring to stick with bottle for this project though, so if it involves moving away from bottle/migrating my app to cherrypy then it doesn't really matter for now. I'd still like to know though as I didn't see much in their documentation on threads at all.

In your particular example, yes, the (single) dict assignment you perform is threadsafe.
somedict[request.forms.get("key")] = request.forms.get("value")
But, more generally, the proper answer to your question is: you will indeed need to use a locking mechanism. This is true if, for example, you make multiple updates to somedict while handling a single request, and you need them to be made atomically.
The good news is: it's probably as simple as a mutex:
from bottle import request, route
import threading
somedict = {}
somedict_lock = threading.Lock()
#route("/read")
def read():
with somedict_lock:
return somedict
#route("/write", method="POST")
def write():
with somedict_lock:
somedict[request.forms.get("key1")] = request.forms.get("value1")
somedict[request.forms.get("key2")] = request.forms.get("value2")

I had originally answered that a dict is threadsafe, but on futher research, that answer was wrong. See here for a good explanation.
For a quick explanation, imagine two threads running this code at once:
d['k'] += 1
They might both read d['k'] at the same time, and thus instead of being incremented by 2, be incremented only by 1.
I don't think it's an issue of your application locking up, more of just some data being lost. If that's not acceptable, using threading.Lock is pretty easy, and doesn't add that much overhead.
Here's some good info on thread safety with CherryPy. You might also consider using something like gunicorn in place of CherryPy. It has a worker process model, so each somedict would be different for every process, so there would be no worry of thread-safety.

CherryPy is based on Python threads, so you should stay away from using it as an HTTP server only (and any other native HTTP server). I suggest that you go with uWSGI, which is multiprocess and thus doesn't have GIL issues. Since it is multiprocess, you won't be able to use simple thread-shared variables. You can use uWSGI's SharedArea though or any 3rd party data storage.

How do I invoke Twisted from a plugin to a GTK program that is already running the main loop?

I wrote a Rhythmbox plugin and I'm trying to add some code to download some JSON asynchronously. Callbacks are registered in the do_activate function:
def do_activate(self):
shell = self.object
sp = shell.props.shell_player
self.db = shell.get_property('db')
self.qm = RB.RhythmDBQueryModel.new_empty(self.db)
self.pec_id = sp.connect('playing-song-changed', self.playing_entry_changed)
self.pc_id = sp.connect('playing-changed', self.playing_changed)
self.sc_id = sp.connect('playing-source-changed', self.source_changed)
self.current_entry = None
...
I'm trying to download some content when playing_changed is triggered. It currently uses urllib2 to download the content synchronously, but this has the potential to block the UI for a short while. I'd like to use Twisted to solve the problem, but all the examples I've seen use reactor.run(), which blocks indefinitely.
I'm pretty new to Twisted and I was wondering, is there some way to handle this case asynchronously without blocking the main thread?
The full code is here

There isn't any way in twisted to do asynchronous http requests without running IO-loop (reactor.run). Running reactor enables you to use async features not present in python by default. However if your only reason to use twisted is to make async http calls it might be an overkill. Use simple threading instead and make your thread wait for http response.

In the context of a Rhythmbox plugin, you probably need to deal with the fact that the GTK main loop is already running. This is a situation that Twisted supports in principle, but supported APIs to cooperatively initialize a reactor on a main loop that may or may not already have one are tricky.
You can work around it with a function like this:
def maybeInstallReactor():
import sys
if 'twisted.internet.reactor' not in sys:
from twisted.internet import gtk2reactor # s/2/3 if you're using gtk3
reactor = gtk2reactor.install()
reactor.startRunning()
reactor._simulate()
else:
from twisted.internet import reactor
return reactor
Make sure this function is called as early as possible in your program, before anything else gets imported (especially stuff from Twisted).
The startRunning call hooks the reactor up to the GLib main loop, and the _simulate call hooks up Twisted's timed events to a GLib timer.
Sadly, this does involve calling one private function, _simulate, so you'll have to be careful to make sure new versions of Twisted don't break it; but as a result of this question I opened a bug to make this use-case explicitly supported. Plus, beyond this one private method call, nothing else about your usage of Twisted needs to be weird.

Starting python bottle in a thread/Process and another daemon next to it

Ok, so this may be a little bit unorthodox or I'm just stupid or both :)
I'm trying a very simple setup where I start a bottle server in one Process instance and start a smallish TFTP server in another instance.
#!/usr/bin/env python
import bottle
import sys
import tftpy
from multiprocessing import Process
def main():
try:
t = Process(target=bottle.run(host='0.0.0.0', port=8080))
t.daemon = True
t.start()
t.join()
h = Process(target=tftpy.TftpServer('/srv/tftp').listen('0.0.0.0', 69))
h.start()
h.join()
except KeyboardInterrupt:
sys.stdout.write("Aborted by user.\n")
sys.exit(1)
if __name__ == "__main__":
main()
Unless I'm totally crazy, I'd expect them to start up in parallel. In reality, what happens is that bottle starts and locks whole thing up. If I exit bottle, TFTP daemon starts.
I also tried a similar approach with threading module, with about same results.
What am I doing wrong?

There are several issues:
you call run() in the main thread. You should pass arguments in args instead:
Process(target=bottle.run, kwargs=dict(host='0.0.0.0', port=8080))
you call t.join() which blocks until t process ends before h.start(). Join after all processes are started instead
bottle, tftpy might not be compatible with multiprocessing module. You could try subprocess module if so

Well, I am not sure if I understood what you are trying to accomplish but if I were in your place I would try to use the python-daemon package
I think that both bottle and TFTP could be daemonized.
As you are only in search for a simple test I guess that the examples given in the python-daemon webpage would be enough.
If you really like the idea of going daemonizing things, I would suggest also that you search about the proper daemonizing approach for your platform as this way you have several facilities to manage your daemon by making them more alike to the ones found in your OS.
For some simple examples:
http://troydhanson.wordpress.com/2012/08/21/minimal-sysvinit-launchd-and-upstart/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.