Twisted PerspectiveBroker callRemote from wsgi webapp - python

I'm writing a wsgi application that needs to use Twisted PerspectiveBroker to call some remote methods. The problem is that the wsgi needs to return a rendered webpage, but the calls to the Twisted service are asynchronous. So basically my web app needs to call the remote methods, then do some other stuff, then it has to wait for the remote calls to finish, then render the page and return it to the client.
What is the best way to do this?
I am currently planning on using Flask to write the app.

A WSGI application runs in its own thread (or process). When running in Twisted's WSGI container, that is a different thread than the reactor is running in. Most of Twisted's APIs are not thread-safe: they may only be called in the reactor thread.
So, the basic way to call a Twisted API from a WSGI application is using reactor.callFromThread, which is thread-safe and which causes a function to be called in the reactor thread:
...
reactor.callFromThread(pbRemote.callRemote, "someMethod", some, args)
However, this discards the result, which you you probably want. It's simple to build an API on top of reactor.callFromThread that preserves the result, though, and there's an implementation of that in Twisted, too:
from twisted.internet.threads import blockingCallFromThread
...
result = blockingCallFromThread(reactor, pbRemote.callRemote, "someMethod", some, args)
This call will block until the Deferred returned by callRemote fires and then it will return the result of that Deferred.
If you want to make the call, do some other work, and then wait for the call to finish, you have to get a little bit creative. You need to make the call and get the actual Deferred it returns, but not block on it:
resultHolder = blockingCallFromThread(
reactor, lambda: [pbRemote.callRemote("someMethod", some, args)])
And then you can do what other work you need to do. And when you're ready to wait for the result of the PB call:
result = blockingCallFromThread(reactor, lambda: resultHolder[0])
This is all rather more awkward than using Twisted in a single-threaded scenario, so it may indeed be easier to use Twisted Web's native APIs rather than build a WSGI application. Remember that one of the primary goals of WSGI is to allow applications to be developed that are portable across different servers - Twisted, Apache, etc. If you're actually using Twisted APIs in your WSGI application, then it's not portable at all.

You can return server.NOT_DONE_YET to tell twisted.web the request is not finished. And then call request.write() and request.finish() to finish the request later, for example:
from twisted.web import server, resource
class MyResource(resource.Resource):
def render_GET(self, request):
# the call will return defer, that will notify us when it finish
d = delayCall()
# finish the request
def finisn_req(data):
request.write(data)
request.finish()
d.addCallback(finisn_req)
# tell twisted.web that this request is not finished, yet
return server.NOT_DONE_YET

Related

Returning a status code early for a time consuming function in Flask

I have an API route that calls a function (ie: doSomethingForALongTime) that takes some time to finish. If we assume that the function doesn't return anything to the client, is there a work around to just call the function and send the status code 200 while the function is doing it's job?
#application.route('/api', methods=['GET'])
def api_route():
doSomethingForALongTime()
return 200
Yes, with some caveats. The usual way to handle this is to use a 'task queue', such as
celery or
rq
(there's a walk-through of how to use rq in chapter 22 of the
flask mega tutorial). The approaches require that, at least, you have redis running, and are running separate worker processes.
The idea is hand a task off to the task queue in your handler (route), then return a response to the browser while a worker in a separate process picks the task up from the queue and runs it.
It's also possible to run a 'worker thread' in your app, and have the handler queue up work for it. I have a proof-of-concept for that here, with the caveat that I've only used it for personal apps. The caveat is that this is only really suitable for a personal webapp.

Tornado server caused Django unable to handle concurrent requests

I wrote a Django website that handles concurrent database requests and subprocess calls perfectly fine, if I just run "python manage.py runserver"
This is my model
class MyModel:
...
def foo(self):
args = [......]
pipe = subprocess.Popen(args, stdout=subproccess.PIPE, stderr=subprocess.PIPE)
In my view:
def call_foo(request):
my_model = MyModel()
my_model.foo()
However, after I wrap it using Tornado server, it's no longer able to handle concurrent request. When I click my website where it sends async get request to this call_foo() function, it seems like my app is not able to handle other requests. For example, if I open the home page url, it keeps waiting and won't display until the above subprocess call in foo() has finished.
If I do not use Tornado, everything works fine.
Below is my code to start the tornado server. Is there anything that I did wrong?
MAX_WAIT_SECONDS_BEFORE_SHUTDOWN = 5
def sig_handler(sig, frame):
logging.warning('Caught signal: %s', sig)
tornado.ioloop.IOLoop.instance().add_callback(force_shutdown)
def force_shutdown():
logging.info("Stopping tornado server")
server.stop()
logging.info('Will shutdown in %s seconds ...', MAX_WAIT_SECONDS_BEFORE_SHUTDOWN)
io_loop = tornado.ioloop.IOLoop.instance()
deadline = time.time() + MAX_WAIT_SECONDS_BEFORE_SHUTDOWN
def stop_loop():
now = time.time()
if now < deadline and (io_loop._callbacks or io_loop._timeouts):
io_loop.add_timeout(now + 1, stop_loop)
else:
io_loop.stop()
logging.info('Force Shutdown')
stop_loop()
def main():
parse_command_line()
logging.info("starting tornado web server")
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
django.setup()
wsgi_app = tornado.wsgi.WSGIContainer(django.core.handlers.wsgi.WSGIHandler())
tornado_app = tornado.web.Application([
(r'/(favicon\.ico)', tornado.web.StaticFileHandler, {'path': "static"}),
(r'/static/(.*)', tornado.web.StaticFileHandler, {'path': "static"}),
('.*', tornado.web.FallbackHandler, dict(fallback=wsgi_app)),
])
global server
server = tornado.httpserver.HTTPServer(tornado_app)
server.listen(options.port)
signal.signal(signal.SIGTERM, sig_handler)
signal.signal(signal.SIGINT, sig_handler)
tornado.ioloop.IOLoop.instance().start()
logging.info("Exit...")
if __name__ == '__main__':
main()
There is nothing wrong with your set-up. This is by design.
So, WSGI protocol (and so Django) uses syncronous model. It means that when your app starts processing a request it takes control and gives it back only when request is finished. That's why it can process single request at once. To allow simultaneous requests one usually launches wsgi application in multithreaded or multiprocessed mode.
The Tornado server on other side uses asynchronous model. The idea here is to have own scheduler instead of OS scheduler that works with threads and processes. So your code runs some logic, then launches some long task (DB call, URL fetch), sets up what to run when task finishes and gives control back to scheduler.
Giving controll back to scheduler is crucial part, it allows async server to work fast because it can start processing new request while previous is waiting for data.
This answer explains sync/async detailed. It focuses on client, but I think you can see the idea.
So whats wrong with your code: Popen does not give control to IOLoop. Python does nothing until your subprocess is finished, and so can not process other requests, even not Django's requests. runserver "works" here because it's multithreaded. So while locking entirely the thread, other threads can still process requests.
For this reason it's usually not recommended to run WSGI apps under async server like tornado. The doc claims it will be less scalable, but you can see the problem on your own code. So if you need both servers (e.g. Tornado for sockets and Django for main site), I'd suggest to run both behind nginx, and use uwsgi or gunicorn to run Django. Or take a look at django-channels app instead of tornado.
Besides, while it works on test environment, I guess it's not a recomended way to do what you try to achieve. It's hard to suggest the solution, as I don't know what do you call with Popen, but it seams to be something long running. Maybe you should take a look at Celery project. It's a package for running long-term background job.
However, back to running sub-processes. In Tornado you can use tornado.process.Subprocess. It's a wrapper over Popen to allow it to work with IOLoop. Unfortunately I don't know if you can use it in wsgi part under tornado. There are some projects I remember, like django futures but it seems to be abandoned.
As another quick and dirty fix - you can run Tornado with several processes. Check this example on how to fork server. But I will not recoment using this in production anyway (fork is OK, running wsgi fallback is not).
So to summarize, I would rewrite your code to do one of the following:
Run the Popen call in some background queue, like Celery
Process such views with Tornado and use tornado.processes module to run subprocess.
And overall, I'd seek for another deployment infrastructure, and would not run Durango under tornado.

Eventlet wsgi server and time-consuming operations in requests

Let's assume we have a WSGI app which is hosted on an event-driven single-threaded server:
from eventlet import wsgi
import eventlet
def app(env, start_response):
# IO opeartions here
...
wsgi.server(eventlet.listen(('', 8090)), app)
Within app function, some I/O operations such as reading files or DB access must be performed.
Now, when we perform IO operations in app, the server is effectively blocked and can't serve other clients.
Q: What are possible solutions to this problem? How can I get Eventlet wsgi server perform time-consuming operations while not getting blocked?
TL;DR: use mysqldb/psycopg or eventlet.import_patched() pure python DB drivers; tpool.execute() for files and everything else.
Try to mend your thought process into separating operations which could be converted to cooperation with Eventlet and those for which it is impossible. Cooperation here means breaking into "execute code" - "wait for result" parts and providing notification mechanism when result is ready. Main notification mechanism for Eventlet is file descriptors.
So everything that waits for file descriptor is a candidate to be green (not blocking). Most importantly, it affects all network IO. If your blocking function is written in pure Python, just use import_patched(module_name) to modify its socket and other references to Eventlet green version. mysqldb and psycopg2 are special cases of C extension modules made cooperative thanks for explicit support from their authors. Everything else blocking in non Python code - your option is OS threads.
Unfortunately, waiting on actual disk files is full of quirks, so I recommend using OS threads and we have built-in thread pool to support that. Convert blocking_fun(filepath, something_else) to eventlet.tpool.execute(blocking_fun, filepath, something_else) and it doesn't block everything. Check tpool documentation for details.
If you can, reengineer whole application into blocking and non-blocking processes and have them communicate via sockets. This is hard from code rewriting point of view, but very simple for runtime, debugging; robust and fail-proof design.

python http server, multiple simultaneous requests

I have developed a rather extensive http server written in python utilizing tornado. Without setting anything special, the server blocks on requests and can only handle one at a time. The requests basically access data (mysql/redis) and print it out in json. These requests can take upwards of a second at the worst case. The problem is that a request comes in that takes a long time (3s), then an easy request comes in immediately after that would take 5ms to handle. Well since that first request is going to take 3s, the second one doesn't start until the first one is done. So the second request takes >3s to be handled.
How can I make this situation better? I need that second simple request to begin executing regardless of other requests. I'm new to python, and more experienced with apache/php where there is no notion of two separate requests blocking each other. I've looked into mod_python to emulate the php example, but that seems to block as well. Can I change my tornado server to get the functionality that I want? Everywhere I read, it says that tornado is great at handling multiple simultaneous requests.
Here is the demo code I'm working with. I have a sleep command which I'm using to test if the concurrency works. Is sleep a fair way to test concurrency?
import tornado.httpserver
import tornado.ioloop
import tornado.web
import tornado.gen
import time
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.engine
def handlePing1(self):
time.sleep(4)#simulating an expensive mysql call
self.write("response to browser ....")
self.finish()
def get(self):
start = time.time()
self.handlePing1()
#response = yield gen.Task(handlePing1)#i see tutorials around that suggest using something like this ....
print "done with request ...", self.request.path, round((time.time()-start),3)
application = tornado.web.Application([
(r"/.*", MainHandler),
])
if __name__ == "__main__":
http_server = tornado.httpserver.HTTPServer(application)
port=8833;
http_server.listen(port)
print "listening on "+str(port);
tornado.ioloop.IOLoop.instance().start()
Thanks for any help!
Edit: remember that Redis is also single threaded, so even if you have concurrent requests, your bottleneck will be Redis. You won't be able to process more requests because Redis won't be able to process them.
Tornado is single-threaded, event-loop based server.
From the documentation:
By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
Concurrency in tornado is achieved through asynchronous callbacks. The idea is to do as little as possible in the main event loop (single-threaded) to avoid blocking and defer i/o operations through callbacks.
If using asynchronous operations doesn't work for you (ex: no async driver for MySQL, or Redis), your only way of handling more concurrent requests is to run multiple processes.
The easiest way is to front your tornado processes with a reverse-proxy like HAProxy or Nginx. The tornado doc recommends Nginx: http://www.tornadoweb.org/en/stable/overview.html#running-tornado-in-production
Your basically run multiple versions of your app on different ports. Ex:
python app.py --port=8000
python app.py --port=8001
python app.py --port=8002
python app.py --port=8003
A good rule of thumb is to run 1 process for each core on your server.
Nginx will take care of balancing each incoming requests to the different backends. So if one of the request is slow (~ 3s) you have n-1 other processes listening for incoming requests. It is possible – and very likely – that all processes will be busy processing a slow-ish request, in which case requests will be queued and processed when any process becomes free, eg. finished processing the request.
I strongly recommend you start with Nginx before trying HAProxy as the latter is a little bit more advanced and thus a bit more complex to setup properly (lots of switches to tweak).
Hope this helps. Key take-away: Tornado is great for async I/O, less so for CPU heavy workloads.
I had same problem, but no tornado, no mysql.
Do you have one database connection shared with all server?
I created a multiprocessing.Pool. Each have its own db connection provided by init function. I wrap slow code in function and map it to Pool. So i have no shared variables and connections.
Sleep not blocks other threads, but DB transaction may block threads.
You need to setup Pool at top of your code.
def spawn_pool(fishes=None):
global pool
from multiprocessing import Pool
def init():
from storage import db #private connections
db.connect() #connections stored in db-framework and will be global in each process
pool = Pool(processes=fishes,initializer=init)
if __name__ == "__main__":
spawn_pool(8)
from storage import db #shared connection for quick-type requests.
#code here
if __name__ == "__main__":
start_server()
Many of concurrent quick requests may slowdown one big request, but this concurrency will be placed on database server only.

How do I invoke Twisted from a plugin to a GTK program that is already running the main loop?

I wrote a Rhythmbox plugin and I'm trying to add some code to download some JSON asynchronously. Callbacks are registered in the do_activate function:
def do_activate(self):
shell = self.object
sp = shell.props.shell_player
self.db = shell.get_property('db')
self.qm = RB.RhythmDBQueryModel.new_empty(self.db)
self.pec_id = sp.connect('playing-song-changed', self.playing_entry_changed)
self.pc_id = sp.connect('playing-changed', self.playing_changed)
self.sc_id = sp.connect('playing-source-changed', self.source_changed)
self.current_entry = None
...
I'm trying to download some content when playing_changed is triggered. It currently uses urllib2 to download the content synchronously, but this has the potential to block the UI for a short while. I'd like to use Twisted to solve the problem, but all the examples I've seen use reactor.run(), which blocks indefinitely.
I'm pretty new to Twisted and I was wondering, is there some way to handle this case asynchronously without blocking the main thread?
The full code is here
There isn't any way in twisted to do asynchronous http requests without running IO-loop (reactor.run). Running reactor enables you to use async features not present in python by default. However if your only reason to use twisted is to make async http calls it might be an overkill. Use simple threading instead and make your thread wait for http response.
In the context of a Rhythmbox plugin, you probably need to deal with the fact that the GTK main loop is already running. This is a situation that Twisted supports in principle, but supported APIs to cooperatively initialize a reactor on a main loop that may or may not already have one are tricky.
You can work around it with a function like this:
def maybeInstallReactor():
import sys
if 'twisted.internet.reactor' not in sys:
from twisted.internet import gtk2reactor # s/2/3 if you're using gtk3
reactor = gtk2reactor.install()
reactor.startRunning()
reactor._simulate()
else:
from twisted.internet import reactor
return reactor
Make sure this function is called as early as possible in your program, before anything else gets imported (especially stuff from Twisted).
The startRunning call hooks the reactor up to the GLib main loop, and the _simulate call hooks up Twisted's timed events to a GLib timer.
Sadly, this does involve calling one private function, _simulate, so you'll have to be careful to make sure new versions of Twisted don't break it; but as a result of this question I opened a bug to make this use-case explicitly supported. Plus, beyond this one private method call, nothing else about your usage of Twisted needs to be weird.

Categories