I've just created a web chat server with Tornado over Python. The communication mechanism is to use long-polling and I/O events.
I want to benchmark this web chat server at large scale, meaning I want to test this chat server (Tornado based) to see how many chatters it can withstand.
Because I'm using cookies to identify sessions, presently I can only test with maximum 5 (IE, Firefox, Chrome, Safari, Opera) sessions per computer (cookie path has no use coz everything goes thru' the same web page), but in my office we only have limited number of computers.
I want to test this Tornado app at the extreme, hopefully it can withstand few thousand concurrent users like Tornado is advertising, but having no clue how to do this!
I would run the server in a mode where you let the client tell which client they are. i.e. change the code so it can be run this way as required. This is less secure, but makes testing easier. In production, don't use this option. This will give you a realistic test from a small number of client machines.
Related
I have a site, that performs some heavy calculations, using library for symbolic math.
Currently average calculation time is 5 seconds.
I know, that ask too broad question, but nevertheless, what is the optimized configuration for this type of sites? What server is best for this?
Currently, I'm using Apache with mod_wsgi, but I don't know how to correctly configure it.
On average site is receiving 40 requests per second.
How many processes, threads, MaxClients etc. should I set?
Maybe, it is better to use nginx/uwsgi/gunicorn (I'm using python as programming language)?
Anyway, any info is highly appreciated.
Andrew,
I believe that you can move some pieces of your deployment topology.
My suggestion is use nginx for delivering HTTP content, and expose your application using some web framework, i.e. tornadoweb (my preference, considering async core, and best documented if compared to twisted, even twisted being a really great framework)
You can communicate between nginx and tornado by proxy. It is simple to be configured.
You can replicate your service instance to distribute your calculation application inside the same machine and another hosts. It can be easily configured by nginx upstreams.
If you need more performance, you can break your application in small modules and integrate it using Async Messaging. You can choose using zeromq or rabbitmq, among other solutions.
Then, you can have different topologies, gradually applied during the evolution of your application.
1th Topology:
nginx -> tornadoweb
2th Topology:
nginx with loadbalance (upstreams) -> tornadoweb replicated on [1..n] instances
3rd Topology:
[2nd topology] -> your app integrated by messaging (zeromq, amqp(rabbitmq), ...)
My favorite is 3rd, for begining. But, you should start, for this moment, by 1th and 2nd
There are a lot of options. But, these thre may be sufficient for a simple organization of your app.
I spent quite some time now with researching Server Backends/API/Frameworks. I need a solution where I can store user content (JSON & Binary data).
The obvious choice would be a REST API. The only missing element is a push feature when data on server changed and clients should be notified instantly. With more research in this matter I discovered classic approaches (Comet, Push, Server sent events, Bayeux, BOSH, …) as well as the „new“ league, Websockets. I would definitely prefer the method with Websockets or using directly TCP Sockets. But this post is not about pros/cons of these two technologies so please restrain yourself from getting side tracked in comments.
At moment exists following projects which are very similar to my needs:
- Simperium (simperium.com), this looks very promising, but core/server is sadly not open source and god knows when, if ever, this step happens
- Realtime.co (framework.realtime.co/storage), hosted service, but same principle
- Some Frameworks for building servers such as Atmosphere (java, no WAMP), Cometd (java, project page looks like stuck in the 90’s), Autobahn (python, WAMP)
My actual favorite is the Autobahn framework (autobahn.ws). Especially using the WAMP protocol (subset of Websocket) as it offers exactly what I need. So the idea would be to build a python backend/server with Autobahn Python (based on Twisted framework) which manages all socket (WAMP) connections and include a Postgresql database for data storing. For all desired clients exists already WAMP libraries. The server would need to be able to do the typical REST API features:
- Send, update, delete requested data (JSON/Binary) from/to server/clients
- Synchronize & automatic conflict management
- Offline handling when connection breaks, automatic restart when connection available again
So finally the questions:
- Have I missed an open source project which covers exactly my needs?
- If I would like to develop my own server with autobahn and a database, could you point me to right direction? Have lot of concerns and not enough depth understanding.. I know Autobahn gives you already a server, but this one is not very close to my final needs.. how to build a server efficient so that he can handle all connected sockets? How handle when a client needs server push? Are there schemas, models or concept how such a server should look like?
- Twisted is a very powerful python framework but not regarded as the most convenient for writing apps.. But I guess a Socket based storage server with db access should be possible? When I run twisted as a web ressource and develop server components with other python framework, would this compromise the latency/performance much?
- Is such a desired server backend with lot of data storage (JSON fields and also binary data such as documents, images) reasonable to build with Sockets by a single devoloper/small team or is this smth. which only bigger companies like Dropbox can do at the moment?
Thank you very much for your help & time!
So finally the questions:
Have I missed an open source project which covers exactly my needs?
No you've covered the open source projects. Open source only gets you about halfway there though. To implement a Global Realtime Network requires equal parts implementation and equal parts operations. You have to think about dropped messages, retries, what happens if a particular geography gets hot how do you scale your servers ...etc. I would argue that an open source solution won't achieve what you want unless you're willing to invest significant resources into operations. I would recommend a service like PubNub: http://pubnub.com
If I would like to develop my own server with autobahn and a database, could you point me to right direction? Have lot of concerns and not enough depth understanding.. I know Autobahn gives you already a server, but this one is not very close to my final needs.. how to build a server efficient so that he can handle all connected sockets? How handle when a client needs server push? Are there schemas, models or concept how such a server should look like?
A good database to back a realtime framework would be Cassandra because it supports high write volumes and handles time series data well: http://cassandra.apache.org/.
Twisted is a very powerful python framework but not regarded as the most convenient for writing apps.. But I guess a Socket based storage server with db access should be possible? When I run twisted as a web ressource and develop server components with other python framework, would this compromise the latency/performance much?
I would not use Twisted. I would use Gevent:http://www.gevent.org/. Its coroutine based so you don't get into callback hell. To support more connections you just increase your greenlet pool to listen on the socket.
Is such a desired server backend with lot of data storage (JSON fields and also binary data such as documents, images) reasonable to build with Sockets by a single devoloper/small team or is this smth. which only bigger companies like Dropbox can do at the moment?
Again I would not build this on your own. A service like PubNub: http://pubnub.com which takes care of all the operational issues for you and has a clean API would service your needs with minimal cost. PubNub takes care of the protocol for you so if your on a mobile device that doesn't support WebSockets it will use TCP, HTTP or whatever the best transport is for the device.
I'm using Django/Postgres and Python for my web site and the background processes. I have hundreds of messages every minute populating my database and I would like to securely allow other customers access their data.
My customers use either linux or windows so I would like some solution that will be platform/database agnostic.
I looked so far at Piston , Twisted , Celery and RabbitMQ. All these have some way to exchange data. But I'm not sure what to use or if there are any better options.
For example I need the customers to be able to access only their data on my database. Another thing I need is to allow the customers to send a short command back to my servers. My servers will execute the command and return re result in real time back to the customer.
Any ideas?
You asked how your customers can securely transmit commands to your website and retrieve results in their response (near "real-time").
... have you considered hooking a reasonable API into your django app? If you're concerned about security, you can use authentication and serve it over HTTPS.
It's not as fancy as the messaging and queuing platforms that the kids are using these days but it'll get the job done.
Things to like about HTTP/HTTPS APIs:
They can be load balanced (highly available and scalable!)
They can be cached (mo' betta performance and the ability to still serve content while rate limiting how often a client can hit the DB)
Just about every programming language has a mature library that allows HTTP/HTTPS connections. Some have multiple, e.g. Python: urllib,urllib2,httplib
Every time I read either WSGI or CGI I cringe. I've tried reading on it before but nothing really has stuck.
What is it really in plain English?
Does it just pipe requests to a terminal and redirect the output?
From a totally step-back point of view, Blankman, here is my "Intro Page" for Web Server Gateway Interface:
PART ONE: WEB SERVERS
Web servers serve up responses. They sit around, waiting patiently, and then with no warning at all, suddenly:
a client process sends a request. The client process could be a web server, a bot, a mobile app, whatever. It is simply "the client"
the web server receives this request
deliberate mumble various things happen (see below)
The web server sends back something to the client
web server sits around again
Web servers (at least, the better ones) are VERY good at this. They scale up and down processing depending on demand, they reliably hold conversations with the flakiest of clients over really cruddy networks, and we never really have to worry about it. They just keep on serving.
This is my point: web servers are just that: servers. They know nothing about content, nothing about users, nothing in fact other than how to wait a lot and reply reliably.
Your choice of web server should reflect your delivery preference, not your software. Your web server should be in charge of serving, not processing or logical stuff.
PART TWO: (PYTHON) SOFTWARE
Software does not sit around. Software only exists at execution time. Software is not terribly accommodating when it comes to unexpected changes in its environment (files not being where it expects, parameters being renamed etc). Although optimisation should be a central tenet of your design (of course), software itself does not optimise. Developers optimise. Software executes. Software does all the stuff in the 'deliberate mumble' section above. Could be anything.
Your choice or design of software should reflect your application, your choice of functionality, and not your choice of web server.
This is where the traditional method of "compiling in" languages to web servers becomes painful. You end up putting code in your application to cope with the physical server environment or, at least, being forced to choose an appropriate 'wrapper' library to include at runtime, to give the illusion of uniformity across web servers.
SO WHAT IS WSGI?
So, at last, what is WSGI? WSGI is a set of rules, written in two halves. They are written in such a way that they can be integrated into any environment that welcomes integration.
The first part, written for the web server side, says "OK, if you want to deal with a WSGI application, here's how the software will be thinking when it loads. Here are the things you must make available to the application, and here is the interface (layout) that you can expect every application to have. Moreover, if anything goes wrong, here's how the app will be thinking and how you can expect it to behave."
The second part, written for the Python application software, says "OK, if you want to deal with a WSGI server, here's how the server will be thinking when it contacts you. Here are the things you must make available to the server, and here is the interface (layout) that you can expect every server to have. Moreover, if anything goes wrong, here's how you should behave and here's what you should tell the server."
So there you have it - servers will be servers and software will be software, and here's a way they can get along just great without one having to make any allowances for the specifics of the other. This is WSGI.
mod_wsgi, on the other hand, is a plugin for Apache that lets it talk to WSGI-compliant software, in other words, mod_wsgi is an implementation - in Apache - of the rules of part one of the rule book above.
As for CGI.... ask someone else :-)
WSGI runs the Python interpreter on web server start, either as part of the web server process (embedded mode) or as a separate process (daemon mode), and loads the script into it. Each request results in a specific function in the script being called, with the request environment passed as arguments to the function.
CGI runs the script as a separate process each request and uses environment variables, stdin, and stdout to "communicate" with it.
Both CGI and WSGI define standard interfaces that programs can use to handle web requests. The CGI interface is at a lower level than WSGI, and involves the server setting up environment variables containing the data from the HTTP request, with the program returning something formatted pretty much like a bare HTTP server response.
WSGI, on the other hand, is a Python-specific, slightly higher-level interface that allows programmers to write applications that are server-agnostic and which can be wrapped in other WSGI applications (middleware).
If you are unclear on all the terms in this space, and let's face it, it's a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on. I wish I'd read it first.
I have a Python-driven web interface powered by Apache 2.2 with mod_python and Python 2.4. I need to make an asynchronous process appear synchronous to users of this web interface.
When users access one module on this website:
An external SOAP interface will be contacted with a unique identifier and will respond with a number N
The external interface will respond asynchronously by contacting a SOAP server on my machine between 1 and 10 times (the number N tells us how many responses we will receive)
I need to somehow aggregate these responses and pass them to the original module which will display the information back to the user. The goal is to make the process appear synchronous to the user.
What is the best way to handle this synchronization issue? Is this something Twisted would be well-suited for?
I am not restricting myself to Python for the solution, though it is preferred because everything else on the server is in Python. I prefer a solution that is both scalable and will take a minimal amount of programming time (though I understand that these attributes are somewhat at odds).
Maybe you can use Orbited to get ajax push with long-lived HTTP connections to your web clients. Orbited is based on Twisted, so I think it makes sense to look at if you already know Twisted. Have a look at this tutorial to get started.