How AppEngine instances work on the local server

How AppEngine instances work on the local server - python

Newbie on appengine and I really don't know how to phrase the question which sadly results in me not knowing what keywords to google and I hope that i really do get help other than the bashing that a lot of people do.
I'm confused between the behavior of appengine online and the appengine on the local server.
Background info:
Btw this is in Python
Initially i assumed that , when needed or as authored
an instance of the app or module will be created.
And that instance will be the one serving multiple requests from different clients.
In this behavior any initialization code will only be run once.
But in the local development server.
Every time i add something new, specially in the main.py,
the server is able to catch the new changes,
then on browser-refresh be able to run it.
This made me think, wait...
Does it run the entire script over and over again
on every request?
Question:
Does an instance/module run the entire code on every request or is this just an added behavior to the dev server to make development easier?

Both your assumptions - about behaviour in production and development - are wrong.
In production, GAE spins up instances as required. This may be in response to increased load, or the host may simply decide after a certain amount of time to recycle an instance by killing it and starting a new one. Initialization code will always be run whenever a new instance is started.
In development, you only get a single instance. However, the server watches your file system for changes. If it detects a change to the code itself, it will restart itself, and therefore re-run the initialization code. But if you don't make any code changes between requests, the existing process continues indefinitely, and init code will not be re-run.

Related

Does python with wsgi (uwsgi) under nginx have some small default cache?

In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!

Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/

Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.

Google App Engine keeps deploying a new instance, or none at all (server error)

I built and deployed an app on GAE. Yesterday all seemed to be working fine, sending requests every few seconds to the app would be successful with a response time of about 2.5 seconds. Today GAE keeps deploying a new instance for every request, or fails to create even one, resulting in unacceptably high response times (and much higher charges) or even 500 server errors.
I tried to suspend and restart the app a few times, works again for a couple of requests, then reverts to the same behavior. On the console I can see that a new instance is immediately shut down after serving a request, or in case of server error, that GAE was unable to deploy a new instance.
I checked the quotas on the console, nothing seems to hint that I cannot send multiple requests from the same IP.
Has anyone experienced such issues, and if yes, what could be the cause(s) and remedies? Please note, I am very new to GAE so have no further clue right now on where to start.
EDIT: Just realized the average memory used by an instance (F2 in my case, which gives you 256MB) is very close to the max (250MB). Could it be the issue? I will upgrade to F4 (512MB) and see what happens.

As per the documentation - a new instance may be created based on request rate, response latencies, and other application metrics.
Therefore, it’s expected behaviour for the GAE Standard instances to scale up and down depending on the traffic they receive.
Also, if the maximum memory usage for the instance class is reached, a shutdown process will be triggered as explained here.
As for the failures to create a new instance, it’s hard to tell what may be causing it without the Stackdriver Logging information. At the top of my head, you may receive HTTP 500 errors due to having reached the response limit, but it could indeed happen for any other reason as well.
Finally, taking into account the nature of the issues, I think it’s a good idea testing the GAE app’s behaviour using a better instance class and comparing the results. If you no longer experience this using an F4 instance class, it’s safe to assume that the previous instance class was simply not enough to satisfy the app’s requirements.

Does local GAE read and write to a local datastore file on the hard drive while it's running?

I have just noticed that when I have a running instance of my GAE application, there nothing happens with the datastore file when I add or remove entries using Python code or in admin console. I can even remove the file and still have all data safe and sound in admin area and accessible from code. But when I restart my application, all data obviously goes away and I have a blank datastore. So, the question - does GAE reads all data from the file only when it starts and then deals with it in the memory, saving the data after I stop the application? Does it make any requests to the datastore file when the application is running? If it doesn't save anything to the file while it's running, then, possibly, data may be lost if the application unexpectedly stops? Please make it clear for me if you know how it works in this aspect.

How the datastore reads and writes its underlying files varies - the standard datastore is read on startup, and written progressively, journal-style, as the app modifies data. The SQLite backend uses a SQLite database.
You shouldn't have to care, though - neither backend is designed for robustness in the face of failure, as they're development backends. You shouldn't be modifying or deleting the underlying files, either.

By default the dev_appserver will store it's data in a temporary location (which is why it disappears and you can't see anything changing)
If you don't want your data to disappear on restart set --datastore_path when running your dev server like:
dev_appserver.py --datastore_path /path/to/app/myapp.db /path/to/app
As nick said, the dev server is not built to be bulletproof, it's designed to help you quickly develop your app. The production setup is very different and will not do anything unexpected when you are dealing with exceptional circumstances.

fastcgi, cherrypy, and python

So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.

This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.

From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.

From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)

Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/

Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.

I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.