I have a site, that performs some heavy calculations, using library for symbolic math.
Currently average calculation time is 5 seconds.
I know, that ask too broad question, but nevertheless, what is the optimized configuration for this type of sites? What server is best for this?
Currently, I'm using Apache with mod_wsgi, but I don't know how to correctly configure it.
On average site is receiving 40 requests per second.
How many processes, threads, MaxClients etc. should I set?
Maybe, it is better to use nginx/uwsgi/gunicorn (I'm using python as programming language)?
Anyway, any info is highly appreciated.
Andrew,
I believe that you can move some pieces of your deployment topology.
My suggestion is use nginx for delivering HTTP content, and expose your application using some web framework, i.e. tornadoweb (my preference, considering async core, and best documented if compared to twisted, even twisted being a really great framework)
You can communicate between nginx and tornado by proxy. It is simple to be configured.
You can replicate your service instance to distribute your calculation application inside the same machine and another hosts. It can be easily configured by nginx upstreams.
If you need more performance, you can break your application in small modules and integrate it using Async Messaging. You can choose using zeromq or rabbitmq, among other solutions.
Then, you can have different topologies, gradually applied during the evolution of your application.
1th Topology:
nginx -> tornadoweb
2th Topology:
nginx with loadbalance (upstreams) -> tornadoweb replicated on [1..n] instances
3rd Topology:
[2nd topology] -> your app integrated by messaging (zeromq, amqp(rabbitmq), ...)
My favorite is 3rd, for begining. But, you should start, for this moment, by 1th and 2nd
There are a lot of options. But, these thre may be sufficient for a simple organization of your app.
Related
I'm struggling with some architectural choices for a scalable internet-of-things application.
I've chosen to base my project on Twisted augmented with the Cyclone framework to provide many Tornado convenances (websockets, auth-decorators, secure-cookies, etc)
Using a Twisted core has worked beautifully for me. I have numerous IP protocol and hardware interfaces all of which turned out to have great library support inside of twisted (and adding new protocols and interfaces to my application are the most-likely angles I'll have project scope creep), all with Twisted needing very low CPU and providing for very high connection counts.
My problems are with second-order webapp functionality.
I pulled in Cyclone thinking that with it's auth goodies (OpenID, oauth, user-auth decorators and secure-cookies) it wouldn't take much to implement user/session/admin functionality in my webapp. After the 500+ lines of abstracting my database (via txmongo) and just building user logins it became clear I both:
Didn't understand how little Cyclone/Tornado bring in the user/session/admin space, and
Didn't understand the amount of code it takes to fill in the gaps if your trying to build a multi-user auth webapp
A friend pointed me at Flask, which initially I thought was completely redundant, until I found flask plugins. The combination of Flask-Login and Flask-Admin would completely cover my user, session and user-admin needs, negating me writing what I would guess to be about 2k lines of code. Unfortunately, the flask plugins are all rife with blocking code and calls to blocking libraries. I don't see them as compatible with my project even if WSGI containers are used given that the user/session functionality happens with every page load (additionally I don't see any short cuts that would allow me to port them to async world without work roughly equal to that of rewriting them)
My question is:
In the python async space (... hopefully in the Twisted space, given my protocol needs), are there any plugins or alternate frameworks that provide ready-to-go user/login/admin functionality similar to what is in Flask-Login and Flask-Admin?
P.S. I've looked at Klein as the obvious Twisted version of Flask, but it doesn't seem to have a plugin ecosystem, and I'm not finding any strong user/session/admin there.
P.P.S. By the time I wrote this question I had already written my own (crappy) user-login-session system. So what I'm really after is the "Admin" capability (automated CRUD functions on user-style records, including web UI rendering, all designed in a Twisted/async way). I asked about user/login in the question in case it turn out there is an already-integraded solution (such as flask-login and flask-admin) in which case I would happily drop my code and switch to that.
Do you really need everything async? Consider async WebSockets but sync page renders. If you must, add an async downstream proxy or load balancer which will virtually eliminate app server's IO overhead.
I spent quite some time now with researching Server Backends/API/Frameworks. I need a solution where I can store user content (JSON & Binary data).
The obvious choice would be a REST API. The only missing element is a push feature when data on server changed and clients should be notified instantly. With more research in this matter I discovered classic approaches (Comet, Push, Server sent events, Bayeux, BOSH, …) as well as the „new“ league, Websockets. I would definitely prefer the method with Websockets or using directly TCP Sockets. But this post is not about pros/cons of these two technologies so please restrain yourself from getting side tracked in comments.
At moment exists following projects which are very similar to my needs:
- Simperium (simperium.com), this looks very promising, but core/server is sadly not open source and god knows when, if ever, this step happens
- Realtime.co (framework.realtime.co/storage), hosted service, but same principle
- Some Frameworks for building servers such as Atmosphere (java, no WAMP), Cometd (java, project page looks like stuck in the 90’s), Autobahn (python, WAMP)
My actual favorite is the Autobahn framework (autobahn.ws). Especially using the WAMP protocol (subset of Websocket) as it offers exactly what I need. So the idea would be to build a python backend/server with Autobahn Python (based on Twisted framework) which manages all socket (WAMP) connections and include a Postgresql database for data storing. For all desired clients exists already WAMP libraries. The server would need to be able to do the typical REST API features:
- Send, update, delete requested data (JSON/Binary) from/to server/clients
- Synchronize & automatic conflict management
- Offline handling when connection breaks, automatic restart when connection available again
So finally the questions:
- Have I missed an open source project which covers exactly my needs?
- If I would like to develop my own server with autobahn and a database, could you point me to right direction? Have lot of concerns and not enough depth understanding.. I know Autobahn gives you already a server, but this one is not very close to my final needs.. how to build a server efficient so that he can handle all connected sockets? How handle when a client needs server push? Are there schemas, models or concept how such a server should look like?
- Twisted is a very powerful python framework but not regarded as the most convenient for writing apps.. But I guess a Socket based storage server with db access should be possible? When I run twisted as a web ressource and develop server components with other python framework, would this compromise the latency/performance much?
- Is such a desired server backend with lot of data storage (JSON fields and also binary data such as documents, images) reasonable to build with Sockets by a single devoloper/small team or is this smth. which only bigger companies like Dropbox can do at the moment?
Thank you very much for your help & time!
So finally the questions:
Have I missed an open source project which covers exactly my needs?
No you've covered the open source projects. Open source only gets you about halfway there though. To implement a Global Realtime Network requires equal parts implementation and equal parts operations. You have to think about dropped messages, retries, what happens if a particular geography gets hot how do you scale your servers ...etc. I would argue that an open source solution won't achieve what you want unless you're willing to invest significant resources into operations. I would recommend a service like PubNub: http://pubnub.com
If I would like to develop my own server with autobahn and a database, could you point me to right direction? Have lot of concerns and not enough depth understanding.. I know Autobahn gives you already a server, but this one is not very close to my final needs.. how to build a server efficient so that he can handle all connected sockets? How handle when a client needs server push? Are there schemas, models or concept how such a server should look like?
A good database to back a realtime framework would be Cassandra because it supports high write volumes and handles time series data well: http://cassandra.apache.org/.
Twisted is a very powerful python framework but not regarded as the most convenient for writing apps.. But I guess a Socket based storage server with db access should be possible? When I run twisted as a web ressource and develop server components with other python framework, would this compromise the latency/performance much?
I would not use Twisted. I would use Gevent:http://www.gevent.org/. Its coroutine based so you don't get into callback hell. To support more connections you just increase your greenlet pool to listen on the socket.
Is such a desired server backend with lot of data storage (JSON fields and also binary data such as documents, images) reasonable to build with Sockets by a single devoloper/small team or is this smth. which only bigger companies like Dropbox can do at the moment?
Again I would not build this on your own. A service like PubNub: http://pubnub.com which takes care of all the operational issues for you and has a clean API would service your needs with minimal cost. PubNub takes care of the protocol for you so if your on a mobile device that doesn't support WebSockets it will use TCP, HTTP or whatever the best transport is for the device.
I need to create a project that has a web frontend to manage synchronous task execution (ala fabric), async tasks (AMQP), and long-polling/ajax for tabular viewing of results and queues/large, frequently changing datasets (think tail -f syslog). I have an existing Python codebase for a lot of the implementation-specific stuff.
After looking at a bunch of existing frameworks, the obvious answer appears to be Django+Celery. However, I do not want to "learn Django", nor do I need 95% of it's functionality. I just need simple auth, maybe sqlalchemy, easy ajax, amqp, xmlrpc would be helpful.
I would consider using Mongrel2, but I have a strong preference for RabbitMQ over 0MQ (for a few implementation-specific reasons).
I originally spent a great deal of time learning Twisted, and ended up getting a few hundred useful LOC out of it, but I found that I was twisting (lol) too much of my platform code to fit it's callback model. It actually 'fit the bill' very well (except with it's own amqp implementation), but it was so frustrating, and I went through so many iterations of code (one for each 'twisted ahah moment'), that it's 100% out.
Can somebody please help me wade through the mire? Tornado? Pylons? repoze? Pyramid? Flask? Bottle? CherryPy? Web2py? Paster/Webob? Anything else# http://wiki.python.org/moin/WebFrameworks?
Edit:
To be clear, integration with RabbitMQ (or another amqp provider) is of the utmost importance, and is really the crux of problem.
I don't have a full vision of python web frameworks but just want to share my point of view on 2 of them :
Bottle is light and works fine. If you want something easy to learn and easy to use that may be the right choice. I used it for quite simple front-end apps running locally and i liked it very much.
Tornado seems to me as a very good non-blocking server for real-time web app. Combined with tornadio it makes ajax-long-polling quite easy. However, it may be a little harder to learn than Bottle. I would recommend to have a look to the chat app in the example folder of tornadio.
I hope it helps
If you are going to use AMQP long term then I would steer clear of Celery because they use AMQP in a wierd way that suggests the developers did not understand the AMQP model.
bottle is a nice framework for knocking together RESTful apps (I use it to create mock servers for testing) and if you already have the code that does the real work, you may be surprised at how short a bottle app can be.
I'm currently building Python apps using RabbitMQ and using amqplib by way of kombu. I originally chose kombu in case I wanted to swap libraries and use pika or something else, but now I wish that I had just gone with amqplib and built a proper Pythonic AMQP model on top of that.
Do spend some time on the RabbitMQ site reading some of the blogs and slide presentations on AMQP before you get too deep into coding or you won't really understand the AMQP model and will make things harder for yourself.
Please don't use xmlrpc unless you have to talk to other apps. Bottle makes simple RESTful apps so simple, that XMLRPC is just uneccessary complexity.
A couple of suggestions.
CherryPy is a great low level framework. It doesn't provide a lot of functionality, but it provide a very easy system for mapping http requests to function calls.
web.py is another extremely lightweight and easy to use framework. It is more comprehensive than CherryPy, including templates and other features.
Plain wsgi is not a bad choice if your needs are extremely simple. It is a little more complicated to do simple stuff than CherryPy or Web.py. WSGI is the lowest common denominator, these days most web frameworks are built on top of it.
I need to build a webservice that is very computationally intensive, and I'm trying to get my bearings on how best to proceed.
I expect users to connect to my service, at which point some computation is done for some amount of time, typically less than 60s. The user knows that they need to wait, so this is not really a problem. My question is, what's the best way to structure a service like this and leave me with the least amount of headache? Can I use Node.js, web.py, CherryPy, etc.? Do I need a load balancer sitting in front of these pieces if used? I don't expect huge numbers of users, perhaps hundreds or into the thousands. I'll need a number of machines to host this number of users, of course, but this is uncharted territory for me, and if someone can give me a few pointers or things to read, that would be great.
Thanks.
Can I use Node.js, web.py, CherryPy, etc.?
Yes. Pick one. Django is nice, also.
Do I need a load balancer sitting in front of these pieces if used?
Almost never.
I'll need a number of machines to host this number of users,
Doubtful.
Remember that each web transaction has several distinct (and almost unrelated) parts.
A front-end (Apache HTTPD or NGINX or similar) accepts the initial web request. It can handle serving static files (.CSS, .JS, Images, etc.) so your main web application is uncluttered by this.
A reasonably efficient middleware like mod_wsgi can manage dozens (or hundreds) of backend processes.
If you choose a clever backend processing component like celery, you should be able to distribute the "real work" to the minimal number of processors to get the job done.
The results are fed back into Apache HTTPD (or NGINX) via mod_wsgi to the user's browser.
Now the backend processes (managed by celery) are divorced from the essential web server. You achieve a great deal of parallelism with Apache HTTPD and mod_wsgi and celery allowing you to use every scrap of processor resource.
Further, you may be able to decompose your "computationally intensive" process into parallel processes -- a Unix Pipeline is remarkably efficient and makes use of all available resources. You have to decompose your problem into step1 | step2 | step3 and make celery manage those pipelines.
You may find that this kind of decomposition leads to serving a far larger workload than you might have originally imagined.
Many Python web frameworks will keep the user's session information in a single common database. This means that all of your backends can -- without any real work -- move the user's session from web server to web server, making "load balancing" seamless and automatic. Just have lots of HTTPD/NGINX front-ends that spawn Django (or web.py or whatever) which all share a common database. It works remarkably well.
I think you can build it however you like, as long as you can make it an asynchronous service so that the users don't have to wait.
Unless, of course, the users don't mind waiting in this context.
I'd recommend using nginx as it can handle rewrite/balancing/ssl etc with a minimum of fuss
If you want to make your web sevices asynchronous you can try Twisted. It is a framework oriented to asynchronous tasks and implements so many network protocols. It is so easy to offer this services via xml-rpc (just put xmlrpc_ as the prefix of your method). On the other hand it scales very well with hundreds and thousands of users.
Celery is also a good option to make the most computionally intensive tasks asynchronous. It integrates very well with Django.
is there a difference between using FAPWS3 and MOD_WSGI when dealing with Django?
FAPWS3 seems alot faster when serving requests toward Python scripts. I would like to know if I'm missing out anything. :)
Any ideas?
The underlying web server is not the bottleneck, it is your application and database access. The differences between any underlying web server are going to very minimal or non existent in the context of an actual full application stack. You cannot base decisions on hello world type tests as they are pretty meaningless. Decisions should therefore be based on the quality and stability of the hosting solutions under load, as well as ease of configuration and support, including your own competence to manage a particular setup. If you have no idea how to configure and support a particular web server properly, eg., Apache, then why would you use it.
here is the best explanation what i ever seen in the web at the moment.
http://nichol.as/benchmark-of-python-web-servers
Quote from nichol.as
When you are just interested in quickly hosting your threaded
application you really can’t go wrong with Apache ModWSGI. Even though
Apache ModWSGI might put a little more strain on your memory
requirements there is a lot to go for in terms of functionality. For
example, protecting part of your website by using a LDAP server is as
easy as enabling a module. Standalone CherryPy also shows great
performance and functionality and is really a viable (fully Python)
alternative which can lower memory requirements.
When you are a little more adventurous you can look at uWSGI and
FAPWS3, they are relatively new compared to CherryPy and ModWSGI but
they show a significant performance increase and do have lower memory
requirements.