fastcgi, cherrypy, and python - python

So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.

This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.

From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.
Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.
For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

Does python with wsgi (uwsgi) under nginx have some small default cache?

In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.

Django: Concurrent access to settings.py

I am not sure whether I have to care about concurrency, but I didn't find any documentation about it.
I have some data stored at my settings.py like ip addresses and each user can take one or give one back. So I have read and write operations and I want that only one user read the file at the same moment.
How could I handle this?
And yes, I want to store the data at the settings.py. I found also the module django-concurrency. But I couldn't find anything at the documentation.
as e4c5 mentioned, conventionally settings.py is pretty light on logic. The loading mechanism for settings is pretty obscure and, I personally, like to stay away from things that are difficult to understand and interact with :)
You absolutely have to care about concurrency. How are you running your application? It's tricky because in the dev env you have a simple server and usually handle only a handful of requests at the same time (and a couple years ago the dev server was single threaded)
If you're running your application using a forking server, how will you share data between processes? one process won't even see the other processes settings.py changes. I'm not even sure of how it would look like with a threading server, but it would probably at least require a source code audit of your web server to understand the specifics of how requests are handled and how memory is shared.
Using a DB is by far the easiest solution, (you should be able to use an in memory db as an option too memcache/redis/etc). DB's provide concurrency support out the box and will be a lot more easier to reason about and provides primitives for concurrent accessing of data. And in the case of redis, which is single threaded you won't even have to worry about concurrent accesses to your shared IP addresses
And yes, I want to store the data at the settings.py.
No you definitely don't want to do that. the settings.py file is configuring django and any pluggable apps that you may use with it. it's not intended to be used as a place for dumping data. Data goes into a database.
And don't forget that the settings.py file is usually read only once.

Python bottle vs uwsgi/bottle vs nginx/uwsgi/bottle

I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?
Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.
2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.
I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

Categories