I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?
Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.
2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.
I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.
Related
In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.
I have written a web API in BaseHTTPServer. It is meant to be used only on localhost. It returns JSON objects on GET/POST operations.
http://localhost:8888/operation?param
and code is like
def do_GET(self):
if self.path=="operation":
self.wfile.write("output")
But I am worried about keep-alive mechanisms (read: a webserver that can respawn workers), lack of multi-threading, and PITA-ful maintenance.
And like I said, I am looking at the development and deployment issues for choosing this web framework.
Development
The web interface is currently 250 lines and has very simple features. I am looking for something that lends itself well to clean maintenance and deployment. I dont want the framework's MVC, ORM, templating and other features messing my learning curve. UrL patterns that redirect to appropriate module is nice.
Deployment
It should deploy on a mature server with a WSGI module with minimum fuss. And such a setup has hot-deploy (for want of a better word), installing a new application or updating the code means copying the files to the www-root in the filesystem.
CherryPy and Flask seem interesting. Django and Web2Py seem too comprehensive.
The recommended way of deploying wsgi is as a long-running-process, either embedded or daeomonized, and not as a cgi script. Either way, its going to be a little different than just uploading files like in php, restarting the server/process by touching the config file is normally the closest you get to "hot-deployment" using wsgi.
Needless to say, the framework itself does not impose any kind of deployment restraints if it is wsgi compliant. Take your pick depending on your needs: apache+modwsgi, gunicorn, cherry.py, paste. None of them offer "hot-deployment" (afaik), you will still need to create a wsgi script and reload the processes. The filesystem layout is normally of no concern and that's good. You don't usually get autoreload either. I know werkzeug and cherry.py do, and werkzeug offers some really cool debugging tools too. Please note that tornado/werkzeug* itself offers an autoreload option, but is actually considered for development and not deployment, and not compatible with the wsgi module.
But no matter how painful or painless the deployment is, it is recommended to use something like fabric to automate your deployments, and setting up a wsgi web server isnt that that hard.
Choice of the framework itself is kind of tricky, and depends on what level you want to work in. Tornado, werkzeug are popular low level frameworks, (but also include higher level tools, and many are frameworks+webserver), but you could also work with webob directly and just plugin whatever else you need.
You have the microframeworks like flask or bottle, then the lightweight frameworks, like web2.py, or maybe pyramid (the lines on how heavy a framework are kind of blurry).
Then you have the "full-stack" django, grok, turbogears, etc...
And then you have zope, which has been on a diet but still very heavy.
Note that you can pretty much do anything with all of them (just depends how much you want to bend them), and in many cases you can swap components rather easily. I'd start try out a microframework like bottle or maybe flask (you don't have to use ORM's or templating, but are easily available once you do), but also take a look at webob.
*comment: added werkzeug to the not really autoreload camp.
For what you describe, id go with: Tornado Web Server
This is the hello world:
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
application = tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
It's highly scalable, and I think it may take you 10 minutes to set it up with your code.
I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.
Does anybody know, how GAE limit Python interpreter? For example, how they block IO operations, or URL operations.
Shared hosting also do it in some way?
The sandbox "internally works" by them having a special version of the Python interpreter. You aren't running the standard Python executable, but one especially modified to run on Google App engine.
Update:
And no it's not a virtual machine in the ordinary sense. Each application does not have a complete virtual PC. There may be some virtualization going on, but Google isn't saying exactly how much or what.
A process has normally in an operating system already limited access to the rest of the OS and the hardware. Google have limited this even more and you get an environment where you are only allowed to read the very specific parts of the file system, and not write to it at all, you are not allowed to open sockets and not allowed to make system calls etc.
I don't know at which level OS/Filesystem/Interpreter each limitation is implemented, though.
From Google's site:
An application can only access other
computers on the Internet through the
provided URL fetch and email
services. Other computers can only
connect to the application by making
HTTP (or HTTPS) requests on the
standard ports.
An application cannot write to the
file system. An app can read files,
but only files uploaded with the
application code. The app must use
the App Engine datastore, memcache or
other services for all data that
persists between requests.
Application code only runs in
response to a web request, a queued
task, or a scheduled task, and must
return response data within 30
seconds in any case. A request
handler cannot spawn a sub-process or
execute code after the response has
been sent.
Beyond that, you're stuck with Python 2.5, you can't use any C-based extensions, more up-to-date versions of web frameworks won't work in some cases (Python 2.5 again).
You can read the whole article What is Google App Engine?.
I found this site
that has some pretty decent information. What exactly are you trying to do?
Here
FRESH!
Look here: http://code.google.com/appengine/docs/python/runtime.html
Your IO Operations are limited as follows (beyond disabled modules):
App Engine records how much of each resource an application uses in a calendar day, and considers the resource depleted when this amount reaches the app's quota for the resource. A calendar day is a period of 24 hours beginning at midnight, Pacific Time. App Engine resets all resource measurements at the beginning of each day, except for Stored Data which always represents the amount of datastore storage in use.
When an app consumes all of an allocated resource, the resource becomes unavailable until the quota is replenished. This may mean that your app will not work until the quota is replenished.
An application can determine how much CPU time the current request has taken so far by calling the Quota API. This is useful for profiling CPU-intensive code, and finding places where CPU efficiency can be improved for greater cost savings. You can measure the CPU used for the entire request, or call the API before and after a section of code then subtract to determine the CPU used between those two points.
Resource| Free Default Quota| Billing Enabled Default Quota
Blobstore |Stored Data| 1 GB| 1 GB free; no maximum
Resource |Billing Enabled| Default Quota
Daily Limit| Maximum Rate
Blobstore API Calls |140,000,000 calls| 72,000 calls/minute
Hmm my table isn't that good, but hopefully still readable.
EDIT: OK, I understand. But sir, you did not have to use the "f" word. :) And you know, it's kinda like the whole 'teach a man to fish' scenario. Google is who I always ask and that's why I'm answering questions here for fun.
EDIT AGAIN: OK that made more sense before the comment was tooked. So I went and answered the question a little more. I hope it helps.
IMO it's not a standard python, but a version specifically patched for app engine. In other words you can think more or less like an "higher level" VM that however is not emulating x86 instructions but python opcodes (if you don't know what they are try writing a small function named "foo" and the doing "import dis; dis.dis(foo)" you will see the python opcodes that the compiler produced).
By patching python you can impose to it whatever limitations you like. Of course you've however to forbid the use of user supplied C/C++ extension modules as a C/C++ module will have access to everything the process can access.
Using such a virtual environment you're able to run safely python code without the need to use a separate x86 VM for every instance.
So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.
This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.
From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?