In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.
Related
I am not sure whether I have to care about concurrency, but I didn't find any documentation about it.
I have some data stored at my settings.py like ip addresses and each user can take one or give one back. So I have read and write operations and I want that only one user read the file at the same moment.
How could I handle this?
And yes, I want to store the data at the settings.py. I found also the module django-concurrency. But I couldn't find anything at the documentation.
as e4c5 mentioned, conventionally settings.py is pretty light on logic. The loading mechanism for settings is pretty obscure and, I personally, like to stay away from things that are difficult to understand and interact with :)
You absolutely have to care about concurrency. How are you running your application? It's tricky because in the dev env you have a simple server and usually handle only a handful of requests at the same time (and a couple years ago the dev server was single threaded)
If you're running your application using a forking server, how will you share data between processes? one process won't even see the other processes settings.py changes. I'm not even sure of how it would look like with a threading server, but it would probably at least require a source code audit of your web server to understand the specifics of how requests are handled and how memory is shared.
Using a DB is by far the easiest solution, (you should be able to use an in memory db as an option too memcache/redis/etc). DB's provide concurrency support out the box and will be a lot more easier to reason about and provides primitives for concurrent accessing of data. And in the case of redis, which is single threaded you won't even have to worry about concurrent accesses to your shared IP addresses
And yes, I want to store the data at the settings.py.
No you definitely don't want to do that. the settings.py file is configuring django and any pluggable apps that you may use with it. it's not intended to be used as a place for dumping data. Data goes into a database.
And don't forget that the settings.py file is usually read only once.
I am developing a Python based application (HTTP -- REST or jsonrpc interface) that will be used in a production automated testing environment. This will connect to a Java client that runs all the test scripts. I.e., no need for human access (except for testing the app itself).
We hope to deploy this on Raspberry Pi's, so I want it to be relatively fast and have a small footprint. It probably won't get an enormous number of requests (at max load, maybe a few per second), but it should be able to run and remain stable over a long time period.
I've settled on Bottle as a framework due to its simplicity (one file). This was a tossup vs Flask. Anybody who thinks Flask might be better, let me know why.
I have been a bit unsure about the stability of Bottle's built-in HTTP server, so I'm evaluating these three options:
Use Bottle only -- As http server + App
Use Bottle on top of uwsgi -- Use uwsgi as the HTTP server
Use Bottle with nginx/uwsgi
Questions:
If I am not doing anything but Python/uwsgi, is there any reason to add nginx to the mix?
Would the uwsgi/bottle (or Flask) combination be considered production-ready?
Is it likely that I will gain anything by using a separate HTTP server from Bottle's built-in one?
Flask vs Bottle comes down to a couple of things for me.
How simple is the app. If it is very simple, then bottle is my choice. If not, then I got with Flask. The fact that bottle is a single file makes it incredibly simple to deploy with by just including the file in our source. But the fact that bottle is a single file should be a pretty good indication that it does not implement the full wsgi spec and all of its edge cases.
What does the app do. If it is going to have to render anything other than Python->JSON then I go with Flask for its built in support of Jinja2. If I need to do authentication and/or authorization then Flask has some pretty good extensions already for handling those requirements. If I need to do caching, again, Flask-Cache exists and does a pretty good job with minimal setup. I am not entirely sure what is available for bottle extension-wise, so that may still be worth a look.
The problem with using bottle's built in server is that it will be single process / single thread which means you can only handle processing one request at a time.
To deal with that limitation you can do any of the following in no particular order.
Eventlet's wsgi wrapping the bottle.app (single threaded, non-blocking I/O, single process)
uwsgi or gunicorn (the latter being simpler) which is most ofter set up as single threaded, multi-process (workers)
nginx in front of uwsgi.
3 is most important if you have static assets you want to serve up as you can serve those with nginx directly.
2 is really easy to get going (esp. gunicorn) - though I use uwsgi most of the time because it has more configurability to handle some things that I want.
1 is really simple and performs well... plus there is no external configuration or command line flags to remember.
2017 UPDATE - We now use Falcon instead of Bottle
I still love Bottle, but we reached a point last year where it couldn't scale to meet our performance requirements (100k requests/sec at <100ms). In particular, we hit a performance bottleneck with Bottle's use of thread-local storage. This forced us to switch to Falcon, and we haven't looked back since. Better performance and a nicely designed API.
I like Bottle but I also highly recommend Falcon, especially where performance matters.
I faced a similar choice about a year ago--needed a web microframework for a server tier I was building out. Found these slides (and the accompanying lecture) to be very helpful in sifting through the field of choices: Web micro-framework BATTLE!
I chose Bottle and have been very happy with it. It's simple, lightweight (a plus if you're deploying on Raspberry Pis), easy to use, intuitive, has the features I need, and has been supremely extensible whenever I've needed to add features of my own. Many plugins are available.
Don't use Bottle's built-in HTTP server for anything but dev.
I've run Bottle in production with a lot of success; it's been very stable on Apache/mod_wsgi. nginx/uwsgi "should" work similarly but I don't have experience with it.
I also suggest you look at running bottle via gevent.pywsgi server. It's awesome, super simple to setup, asynchronous, and very fast.
Plus bottle has an adapter built for it already, so even easier.
I love bottle, and this concept that it is not meant for large projects is ridiculous. It's one of the most efficient and well written frameworks, and can be easily molded without a lot of hand wringing.
I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!
Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.
Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.
Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.
I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.
So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.
This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.
From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?