I am new at Server side,
but I have gotten a chance to design and implement a server that will cover around 2000~3000 client.
And I am thinking that I will use Python and Websocket, though I don't know this choice is appropriate.
In this point, I am curious on how to design the server.
I think there must be some architecture normally in use depending on capacity that server handles.
Otherwise, Could I use a Websocket server offered by some python package like Tornado or Django?
I hope that I can get any information on this.
Any advice?
I've had good experiences using haproxy in front of sockjs-tornado. Depending on how complex your server-side logic, routing, and persistence requirements are, you could write all your server endpoints using tornado and use SQLAlchemy to handle writes to a relational database or use a non SQL data store like Redis.
If your main requirement is real-time interactivity it might be worth investigating meteor as well.
One of solutions could be Pyramid, sockjs, gunicorn, and gevent. Nginx probably better suits to be a frontend than Apache, but of course if you do not have any lengthy processing on the backend, any decent asynchronous Python server with websocket and sockjs support (not sure about socket.io as an alternative) will work for you out of the box.
Lenghty processing should be offloaded to some queue workers anyway, so asynchronous server will fit the bill.
Just check whether all used datastore/database adapters are compatible with your server solution be it asynchronous or multi-threading.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Which of these frameworks / libraries would be the best choise for building modern multiuser web application? I would love to have an asynchronous webserver which will allow me to scale easly.
What solution will give the best performance / scalability / most useful framework (in terms of easy of use and easy of developing)?
It would be great if it will provide good functionality (websockets, rpc, streaming, etc).
What are the pros and cons of each solution?
"Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design". If you are building something that is similar to a e-commerce site, then you should probably go with Django. It will get your work done quick. You dont have to worry about too many technology choices. It provides everything thing you need from template engine to ORM. It will be slightly opinionated about the way you structure your app, which is good If you ask me. And it has the strongest community of all the other libraries, which means easy help is available.
"Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions". Beware - "microframework" may be misleading. This does not mean that Flask is a half-baked library. This mean the core of flask is very, very simple. Unlike Django, It will not make any Technology decisions for you. You are free to choose any template engine or ORM that pleases you. Even though it comes with Jinja template engine by default, you are always free to choose our own. As far as I know Flask comes in handy for writing APIs endpoints (RESTful services).
"Twisted is an event-driven networking engine written in python". This is a high-performance engine. The main reason for its speed is something called as deferred. Twisted is built on top of deferreds. For those of you who dont know about defereds, it is the mechanism through with asynchronous architecture is achieved. Twisted is very fast. But is not suitable for writing conventional webapps. If you want to do something low-level networking stuff, twisted is your friend.
"Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user". Tornado stands some where between Django and Flask. If you want to write something with Django or Flask, but if you need a better performance, you can opt for Tornado. it can handle C10k problem very well if it is architected right.
"Cyclone is a web server framework for Python that implements the Tornado API as a Twisted protocol". Now, what if you want something that is nearly as performant as Twisted but easy to write conventional webapps? Say hello to cyclone. I would prefer Cyclone over Tornado. It has an API that is very similar to Tornado. As a matter of fact, this is a fork of Tornado. But the problem is it has relativly small community. Alexandre Fiori is the only main commiter to the repo.
"Pyramid is a general, open source, Python web application development framework. Its primary goal is to make it easier for a Python developer to create web applications." I haven't really used Pyramid, but I went through the documentation. From what I understand, Pyramid is very similar to Flask and I think you can use Pyramid wherever Flask seems appropriate and vice-versa.
EDIT: Request to review any other frameworks are welcomed!
Source: http://dhilipsiva.com/2013/05/19/python-libraries-django-twisted-tornado-flask-cyclone-and-pyramid.html
This is obviously a somewhat biased answer, but that is not the same thing as a wrong answer; you should always use Twisted. I've answered similar questions before, but since your question is not quite the same, here are some reasons:
"Best Performance"
Twisted continuously monitors our performance at the speed.twistedmatrix.com website. We were also one of the first projects to be monitored by PyPy's similar site, thereby assuring the good performance of Twisted on the runtime that anyone concerned with high-performance applications in Python.
"Scalability"
To my knowledge, none of the listed frameworks have any built-in support for automatic scaling; they're all communication frameworks, so you have to do the work to communicate between your scaling nodes. However, Twisted has an advantage in its built-in support for local multi-processing. In fairness, there is a third-party add-on for Tornado that allows you to do the same thing. In recent releases, Twisted has added features that increase the number of ways you can share work between cores, and work is ongoing in that area. Twisted also has a couple of well-integrated, "native" RPC protocols which offer a construction-kit for whatever scaling idiom you want to pursue.
"Most Useful"
Lots of people seem to find Twisted very useful. So much so that many of them have extended it and made their extensions available to you.
"Functionality"
Out of the box, Twisted includes:
good support for test-driven development of all the following
TCP servers, clients, transport layer security
SSH client and server
IMAP4, ESMTP, POP3 clients and servers
DNS client and server
HTTP client and server
IRC, XMPP, OSCAR, MSN clients and servers
In this last department, at least, Twisted seems a clear winner for built-in functionality. And all this, in a package just over 2 megabytes!
I like #Glyph response.
Twisted is very comprehensive, rich python framework.
Twisted and Tornado have a very similar design.
And I like this design very much:
it's fast
easy to understand
easy to extend
doesn't require c-extensions
works on PyPy.
But I want to highlight Tornado, which I prefer and recently gain popularity.
Tornado, like Twisted, uses callback style programming, but it can be inlined using tornado.gen.engine (twisted.internet.inlineCallbacks in Twisted).
Codebase
The best comment is from http://cyclone.io site. cyclone tries to mix Twisted and Tornado because:
Twisted is one of the most mature libraries for non-blocking I/O available to the
public. Tornado is the open source version of FriendFeed’s web server,
one of the most popular and fast web servers for Python, with a very
decent API for building web applications.
The idea is to bridge Tornado's elegant and straightforward API to
Twisted's Event-Loop, enabling a vast number of supported protocols.
But in 2011 tornado.platform.twisted was out which brings similar functionality.
Performance
Tornado has much better performance. It also works seamlessly with PyPy, and get huge gain.
Scalability
The same like Twisted. Tornado has tornado.process and a lot of rpc services implemented on top of it.
Functionality
There are 71 Tornado based package, compared to 148 Twisted's and 48 Gevent's. But if you look carefully and compute median of packages upload time, you will see that Twisted ones are the oldest, then Gevent and Tornado the freshest.
Furthermore there is tornado.platform.twisted module which allows you to run code written for Twisted on Tornado.
Summary
With Tornado you can use a code from Twisted. There is no need to use cyclone which only twists your code (your code becomes more messy).
As for 2014, Tornado is considered as widely accepted and default async framework which works both on python2 and python3. Also the latest version 4.x brings a lot of functionality from https://docs.python.org/dev/library/asyncio.html.
I wrote an article, explaining why I consider that Tornado - the best Python web framework where I wrote much more about Tornado functionality.
(UPDATE: I'm sadly surprised about how few answers here recommend or even mention Gevent—I don't think it's in proportion to the popularity, performance and ease of use of this excellent library!)
Gevent and Twisted are not mutually exclusive, even though the contrary might seem obvious at first. There is a project called geventreactor which allows one to relatively smoothly leverage the best of both worlds, namely:
The efficient and cheap (cooperative green) thread model of Gevent, which is much easier to program in when it comes to concurrency—frankly, Twisted's inlineCallbacks is simply not up to the job in terms of performance when it comes to many coroutines, and neither in terms of ease/transparency of use: yield and Deferreds everywhere; often hard to build some abstractions; horrifyingly useless stack traces with both bare Deferreds as well as, and even more so with #inlineCallbacks.
All the built-in functionality of Twisted you can ever dream of, including but not limited to IReactorProcess.spawnProcess.
I'm personally currently using Gevent 1.0rc2 with Twisted 12.3 bridged by geventreactor. I have implemented my own as-of-yet unpublished additions and enhancements to geventreactor which I will publish soon, hopefully as part of geventreactor's original GitHub repository: https://github.com/jyio/geventreactor.
My current layout allows me to program in the nice programming model of Gevent, and leverage things such as a non-blocking socket, urllib2 and other modules. I can use regular Python code for doing regular things, as opposed to the learning curve and inconvenience of doing even simple, basic things the Twisted way. I can also easily use most 3rd party libraries that are normally either out of question with Twisted, or require the use of threads.
I can also completely avoid the awkward and often overly complex callback based programming by using greenlets (instead of Deferreds and callbacks, and/or #inlineCallbacks).
(This answer was written based on my personal experiences having used both Twisted and Gevent in real life projects, with significantly more experience using Twisted (but I don't claim to be a Twisted expert). The software I've had to write hasn't had to use too many of Twisted's features, so depending on the set of features you require of Twisted, the (relatively painless) extra complexity of mixing Gevent and Twisted might not be worth the trouble.)
I have 2 python servers running independently of one another but sharing the same database. They need to communicate with each other about when certain changes have been made to the database so the other server (if running) can reload cached data.
What would be my best options for communicating between two such programs?
I've thought of using sockets but it seems like a lot of work. Either one program will be polling connect whenever the other is off, or they both need to have server/client capabilities. I looked into named pipes but didn't see any easy portable solution (needs to run on windows and unix).
You could have each one implement a simple XMLRPC server. Each one can then execute code in the other, such as telling the other one it needs to update.
The easiest way is to use the database itself as a means of communication. Add a table for logging updates. Then, either machine can periodically query to see if the underlying data has been changed.
Another easy form of communication is email using the smtplib module. Our buildbots and version control repository use this form of communication.
If you want something with a little more "industrial" strength, consider using RabbitMQ or somesuch for messaging between servers.
I agree that sockets are usually too low-level. If you investigate "RabbitMQ", you should also investigate celery. It can use RabbitMQ as a back-end, but it can also use the database, and it neatly encapsulates the messaging mechanism. It is also integrated with django and gevent.
I'm interested in something based on Jabber but I didn't find a free/opensource one so I'm thinking of writing one.
I've installed a Jabber server and now thinking about the ways in which I can write the client. I'm thinking of one of either these two methods.
1) An ajax call made to a jabber script running on the webserver that takes care of connecting to the server. But then I thought because of the dependencies involved in the jabber client, it might end up consuming too much memory when a few clients connect.
2) The other method is to run a client running as a daemon that takes care of all the heavy lifting. This way I need to have only one instance of the client that sends a spoofed message (sender's name as that of whatever the user entered on the site). A simple script running on the webserver talks to this daemon over some sort of API (XMLRPC or Msgpack maybe?)
I think #2 is better but I'm not sure. Are there other ways I can implement this? I'm considering using Perl or Python for this.
Jabber is usually called XMPP nowadays, and there are dozens of clients and servers, something for every language. If you are using Javascript (you mention Ajax), you probably want Strophe. Most servers are modular, so you only load the features you need (consider Tigase, ejabberd, or xmpppy). Writing your own is even worse an idea than it sounds.
BOSH
Install prosody because it is really eaSily installed and has BOSH support built-in. You could skip this but then you need to find out how to use BOSH via ejabberd.
use strophe.js to implement this(using BOSH). New browsers support cross-domain request(CORS -> read Proxy-less BOSH part). The old browsers you could use proxy or use flash in the middle as proxy.
read Professional XMPP Programming with JavaScript and jQuery to learn strophe. It even has chapters explaining how to create chat.
Node.js
Or you could consider installing node.js to create your chat system using socket.io.
I have an XMPP server (likely — python, twisted, wokkel), which I prefer not to restart even in the development version, and I have some python module “worker” (which is interface to particular django project), which gets jid and message text and returns some response (text or XML, either way).
The question is, what would be the best way to connect them, considering that I may prefer to update the module part too often?
Another consideration is that it might be required to run multiple instances of “worker” for it all to be high-load-capable.
One possible way I see is implementing a thread in the server which checks if the module was changed and reload()s it if necessary.
The other way would be making something similar to fastcgi through sockets, although not HTTP-based.
My suggestion is:
Use RabbitMQ with XMPP adaptor.
Use Python carrot for AMQP since it can be used directly under Django.
I can't say that I understand all of your question, but the bit where you're asking how to connect django and twisted and multiple workers: I'd suggest using AMPQ. This gets you reliable message delivery, multiple consumers, persistence.
There's the txAMQP library for twisted.
https://launchpad.net/txamqp
A good primer to AMQP here, it's a good place to start:
http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/