Async Message Queue - which combination? - python

I have been trying to determine which combination of packages to use for a push messaging service behind a web site...
My current idea is to go with Tornado + Socket.IO (Tornadio) and ZMQ. But I was also looking at involving Mongrel2. Then there is also a similar project called Brubeck, that takes from Tornado, using ZMQ and Eventlet.
My main question is this... I'm trying to understand where the benefit of Mongrel2 would come into play if I were to use Tornado. At that point, is Tornado even necessary? I figured at that point I would just be writing a Mongrel2 python handler and thats it. I would like to focus on using websockets/jssockets which is why using Socket.IO was interesting since it handles all the backwards compatibility under the hood for you.
If the tools in the mix for consideration are: Python focus, Tornado, Mongrel2, ZMQ, Brubeck, and Socket.IO, what recommendations would you have for the best mix to support websockets? Having Mongrel2 was really appealing for the idea of scalability, and just turning on more python handlers.
Update 1/1/2012
At first went with Tornado + TornadIO + ZeroMQ, and had a working server. But ultimately I ended up learning Go (www.golang.org) and rewrote my server using pure Go with its built in concurrency. Ended up being faster than python by over 10x even with more features than my Python version: http://www.justinfx.com/2011/07/28/go-language-for-python-programmers/
It seems to keep on picking up speed as the Go team makes more releases towards Go 1.0

Sounds like a job for the Flash/Javascript binding. http://www.zeromq.org/bindings:javascript
That way you have a ZMQ app in the browser that is a SUB to whatever PUB sockets are pushing relevant messages.

I am adding my own update to this question as the answer, since I never received any other answers, and so I can close this one down...
At first went with Tornado + TornadIO + ZeroMQ, and had a working server. But ultimately I ended up learning Go (www.golang.org) and rewrote my server using pure Go with its built in concurrency. Ended up being faster than python by over 10x even with more features than my Python version: http://www.justinfx.com/2011/07/28/go-language-for-python-programmers/
It seems to keep on picking up speed as the Go team makes more releases towards Go 1.0

Related

Python or Node for Chat Application

I intend to start off a new chat web application which allows users to join a chatroom and participate in the chat. I've heard a lot about how Node.js will be perfect for this. Plus, there are a lot of tutorials online that demonstrate building a Node + socket.io chat application. Personally, I have never given Node a shot. I know javascript well enough to work with Jquery and Backbone but I've been avoiding Node due to my preference for Python for web development. What do you guys suggest? Should I try the app in Python ( I have no idea where to get started) or should I spend some time and learn Node?
Thanks a lot!
I'm personally not a big fan of writing Python, and while I love Node and would recommend giving it a shot sometime, if you already know Python there's no reason you can't use it for this task; you may be interested in checking out Twisted or Tornado.
I will say that one of the big plusses for using Node.js for evented programming (as compared to doing it in other languages) is that all I/O is asynchronous by default in Node.js. In other environments, you need to make sure you only use non-blocking libraries.
Node.js is a preferred framework for a chat like application because it is very good with handling conditions which are more data intensive rather than cpu bound. Personally i am a big fan of node.js myself. BUT i am going to step up here and tell you that,
The syntax of node.js for handling asynchronous events becomes a pain once your project grows out of a simple example into a fully grown application. I mean how long will you do this.
response.onComplete( function(data) {
data.parseJson( function( json ) {
json.getElement('hoo', function( value ) {
value.HowDoIEscapeNow()
.....
I do not mean to say anything against node.js but imho its a completely different beast once you go into complexities.

Python with Twisted, or Node.js

I am working on a project which is I/O bound.
I have 3 dependent tasks:
1. scraping a site + extracting the main content(removing comments/ads etc)
2. as soon as 1 completes it sends the data to a summerizer
3. as soon as 2 completes it calls a view and renders a page
I know Python and Django at the moment. What technologies do you recommend me for this project? (I know that Python + Twisted or node.js are ideal for I/O bound projects).
If you're already using Python, you're probably better off sticking with a Python library, especially when there are so many powerful asynchronous Python libraries. Node.js is fine, but switching between Python and Javascript is unnecessary.
Anyway, your question is very very vague. You can absolutely use Twisted and it will probably do what you want just fine, as long as you learn the API well enough. Other asynchronous frameworks include gevent and a web server called Tornado.
There's also Celery which is used specifically for asynchronous processing of queues. It may or may not be helpful to what you want.
I recommend you do a lot of research, look at the documentation of the above libraries, and decide what'll fit your project best. If you have more specific questions you can ask the respective IRC channels of the library, or post a clearer question here.
I am finally using django-socketio.
https://github.com/stephenmcd/django-socketio
In case websockets are not supported, socketio falls back to long polling.

Eventlet or gevent or Stackless + Twisted, Pylons, Django and SQL Alchemy

We're using Twisted extensively for apps requiring a great deal of asynchronous io. There are some cases where stuff is cpu bound instead and for that we spawn a pool of processes to do the work and have a system for managing these across multiple servers as well - all done in Twisted. Works great. The problem is that it's hard to bring new team members up to speed. Writing asynchronous code in Twisted requires a near vertical learning curve. It's as if humans just don't think that way naturally.
We're considering a mixed approach perhaps. Maybe keep the xmlrpc server part and process management in Twisted and implement the other stuff in code that at least looks synchronous to some extent while not being as such. Then again I like explicit over implicit so I have to think about this a bit more. Anyway onto greenlets - how well does that stuff work? So there's Stackless and as you can see from my Gallentean avatar I'm well aware of the tremendous success in it's use for CCP's flagship EVE Online game first hand. What about Eventlet or gevent? Well for now only Eventlet works with Twisted. However gevent claims to be faster as it's not a pure python implementation but rather relies upon libevent instead. It also claims to have fewer idiosyncrasies and defects. gevent It's maintained by 1 guy as far as I can tell. This makes me somewhat leery but all great projects start this way so... Then there's PyPy - I haven't even finished reading about that one yet - just saw it in this thread: Drawbacks of Stackless.
So confusing - I'm wondering what the heck to do - sounds like Eventlet is probably the best bet but is it really stable enough? Anyone out there have any experience with it? Should we go with Stackless instead as it's been around and is proven technology - just like Twisted is as well - and they do work together nicely. But still I hate having to have a separate version of Python to do this. what to do....
This somewhat obnoxious blog entry hit the nail on the head for me though: Asynchronous IO for Grownups I don't get the Twisted is being like Java remark as to me Java is typically where you are in the threading mindset but whatever. Nevertheless if that monkey patch thing really works just like that then wow. Just wow!
You might want to check out:
Comparing gevent to eventlet
Reports from users who moved from twisted or eventlet to gevent
Eventlet and gevent are not really comparable to Stackless, because Stackless ships with a standard library that is not aware of tasklets. There are implementations of socket for Stackless but there isn't anything as comprehensive as gevent.monkey. CCP does not use bare bones Stackless, it has something called Stackless I/O which AFAIK is windows-only and was never open sourced (?).
Both eventlet and gevent could be made to run on Stackless rather than on greenlet. At some point we even tried to do this as a GSoC project but did not find a student.
Answering part of your question - if you look at http://speed.pypy.org you'll see that using twisted on top of PyPy may give you some speedups. This depends of course on your workload, but it's probably worth checking out.
Cheers,
fijal
I've built a little real time web app on top of eventlet and repoze.bfg (I gave up on django quite a while ago). I've found eventlet and monkey patching to be just as easy as Ted says.
Gevent isn't pure Python, and it strictly depends on CPython.
From web frameworks you mentioned Eventlet (OpenStack) and Tornado (FriendsFeed, Quora) has the biggest deploy.

Any ready solution for basic asynchronous (non-blocking) HTTP clients with Stackless Python 3.1?

UPDATE: after much laboring with Py3, including writing my own asynchronous webserver (following a presentation given by Dave Beazley), i finally dumped Python (and a huge stack of my code )-: in favor of CoffeeScript running on NodeJS. Check it out: GitHub (where you'll find like 95% of all interesting code these days), npm (package manager that couldn't be any user friendly; good riddance, easy_install, you never lived up to your name), an insanely huge repository of modules (with tons of new stuff being published virtually 24/7), a huge and vibrant community, out-of-the-box asynchronous HTTP and filehandling..., all that (thanks to V8) at one third the speed of light — what's not to like? read more propaganda: "The future of Scripting" (slide hosting courtesy SpreeWebdesign).
I am looking for a way to serve HTTP (and do HTTP requests) in an asynchronous, non-blocking fashion. This seems to be hard to do when you’ve decided on Stackless Python 3.1 (also see here for docs) as i did.
There are some basic examples, like the pretty informative and detailed article How To Use Linux epoll with Python, and there is a a Google code project named stacklessexamples which contains some valuable information (but no Python 3.x compatible code).
So, after many days of doing research on the web and trying to put together the pieces i’ve found so far: does anyone know of a fairly usable asynchronous HTTP library? It doesn’t have to be WSGI-compliant (I am not interested in that).
The server part should be able to serve multiple non-blocking HTTP requests (and possibly do the basics of HTTP header processing); the HTTP client part should be able to retrieve, in a non-blocking way, web content via HTTP requests (also doing basic header processing, but no fancy stuff like authorization or so).
My research so far has shown me that non-blocking HTTP
is the only way that makes sense in a stackless, cooperatively scheduled environment;
is feasible in Stackless Python 3 by virtue of the standard library’s select epoll (introduced in Py2.6; some solutions prefer libevent, but that means another hurdle as the pyevent project seems to have stopped developing at Py2.5);
is sadly still not a household item, with most people relying on blocking HTTP.
The way it looks like now, i would have to learn the basics of socket programming and roll my own HTTP server/client library. I still shy away from that task as i have very little background in that area and am bound to ‘repeat history’ that way.
I would be very happy about any relevant pointers. I prefer very much solutions that make use of select.epoll; i seem to remember it is much more scalable that the older asyncore (but maybe someone has more precise data on this). As a minimum requirement, solutions should run on Ubuntu 9.10.
I know this is like resurrecting the dead (and flow has probably long since solved his problem), but for completeness stackless is available for 3.1.3:
http://www.stackless.com/download
For information on implementing a HTTP server using stacklesssocket:
http://code.google.com/p/stacklessexamples/wiki/StacklessNetworking
Non blocking HTTP case is very well handled with twisted, what is does is creating a series of callbacks, and registering those callbacks with deferred. Twisted documentation is worth checking out. Stackless uses microthreads but twisted is coding the entire web framework using fragment by fragment non bloking code chained with callbacks, errbacks and deferreds running is a main reactor loop over a single thread. Think this should the Async HTTP thing better.

What are my options for doing multithreaded/concurrent programming in Python?

I'm writing a simple site spider and I've decided to take this opportunity to learn something new in concurrent programming in Python. Instead of using threads and a queue, I decided to try something else, but I don't know what would suit me.
I have heard about Stackless, Celery, Twisted, Tornado, and other things. I don't want to have to set up a database and the whole other dependencies of Celery, but I would if it's a good fit for my purpose.
My question is: What is a good balance between suitability for my app and usefulness in general? I have taken a look at the tasklets in Stackless but I'm not sure that the urlopen() call won't block or that they will execute in parallel, I haven't seen that mentioned anywhere.
Can someone give me a few details on my options and what would be best to use?
Thanks.
Tornado is a web server, so it wouldn't help you much in writing a spider. Twisted is much more general (and, inevitably, complex), good for all kinds of networking tasks (and with good integration with the event loop of several GUI frameworks). Indeed, there used to be a twisted.web.spider (but it was removed years ago, since it was unmaintained -- so you'll have to roll your own on top of the facilities Twisted does provide).
I must say that Twisted gets my vote.
Performing event-drive tasks is fairly straightforward in Twisted. Integration with other important system components such as GTK+ and DBus is very easy.
The HTTP client support is basic for now but improving (>9.0.0): see related question.
The added bonus is that Twisted is available in the Ubuntu default repository ;-)
For a quick look at package sizes, see
ohloh.net/p/compare .
Of course source size is only a rough metric (what I'd really like is nr pages doc, nr pages examples,
dependencies), but it can help.

Categories