Flask get request have inconsistent return

Flask get request have inconsistent return - python

I used Flask to build a mini server on Heroku. The server side code looks something like this:
from flask import Flask
from flask_cors import CORS, cross_origin
app = Flask(__name__)
schedule = {'Basketball': 'old value'}
#app.route("/")
#cross_origin()
def get_all_schedule():
return json.dumps(schedule)
#app.route("/update", method=['post'])
def update_basketball_schedule():
globle schedule
schedule['Basketball'] = 'new value'
if __name__ == "__main__":
app.run(host='0.0.0.0')
I have one global dictionary schedule to store the schedule data. I use the post /update URL to update this schedule, and use the / URL to get the data, seems pretty straight forward.
I am testing this application on my Chrome browser. I called the post URL once. And then When I calling /, sometimes it returns the dictionary with "new value" and sometimes it returns the the dictionary with "old value". What is the reason for this behavior?
I am using a free dyno on Heroku.
My Procfile contains:
web: gunicorn server:app

Heroku dynos occasionally reset, die, or are otherwise disabled. Because of this, the values of all variables stored in memory are lost. To combat this, you can use redis, or another key/value store to hold your data.

I have one global dictionary schedule to store the schedule data
You can't rely on variables to maintain state like this.
For starters, Gunicorn will run with multiple processes by default:
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a free, hobby or standard-1x dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
We recommend setting a configuration variable for this setting. Gunicorn automatically honors the WEB_CONCURRENCY environment variable, if set.
heroku config:set WEB_CONCURRENCY=3
The WEB_CONCURRENCY environment variable is automatically set by Heroku, based on the processes’ Dyno size. This feature is intended to be a sane starting point for your application. We recommend knowing the memory requirements of your processes and setting this configuration variable accordingly.
Each request you make could be handled by any of the Gunicorn workers. And setting WEB_CONCURRENCY to 1 isn't the right solution for a variety of reasons. For example, as Jake says, Heroku dynos restart frequently (at least once per day) and your state will be lost then as well.
Fortunately, Heroku offers a number of data store addons, including in-memory stores like Redis that might be a good fit here. This would let you share state across all of your Gunicorn workers and across dyno restarts. It would even work across dynos in case you ever need to scale your application that way.

Related

Atomic Code in gunicorn multiprocessing / only run code in worker 1?

I am new to gunicorn multiprocessing (by calling gunicorn --worker=X).
I am using it with Flask to provide the WSGI implementation for our productive frontend. To use multiprocessing, we pass the above mentioned parameter to unicorn.
Our Flask application also uses APScheduler (via Flask-APScheduler) to run a cron task every Y hours. This task searches for new database entries to process, and when it finds them, starts processing them one by one.
The process should only be run by one worker obviously. But because of gunicorn, X workers are now spawned, each running the task every X hours, creating race conditions.
Is there a way to make the code atomic so that I can set the "processed" variable in the DB entry to true? Or, maybe, tell gunicorn to only run that specific code on the parent process, or first spawned worker?
Thanks for every input! :-)

The --preload parameter for gunicorn gives an opportunity to run code just in the parent worker.
All the code that is run before app.run() (or whatever you called your Flask() object) is apparently run on the parent process.
Didn't find any documentation on this unfortunately, but this post lead me to it.
So, running the APScheduler code before it makes sure that it's only run (or registered, in this case) once.

Sharing static global data among processes in a Gunicorn / Flask app

I have a Flask app running under Gunicorn, using the sync worker type with 20 worker processes. The app reads a lot of data on startup, which takes time and uses memory. Worse, each process loads its own copy, which causes it to take even longer and take 20X the memory. The data is static and doesn't change. I'd like to load it once and have all 20 workers share it.
If I use the preload_app setting, it only loads in one thread, and initially only takes 1X memory, but then seems to baloon to 20X once requests start coming in. I need fast random access to the data, so I'd rather not do IPC.
Is there any way to share static data among Gunicorn processes?

Memory mapped files will allow you to share pages between processes.
https://docs.python.org/3/library/mmap.html
Note that memory consumption statistics are usually misleading and unhelpful. It is usually better to consider the output of vmstat and see if you are swapping a lot.

Assuming your priority is to keep the data as a Python data structure instead of moving it to a database such as Redis, then you'll have to change things so that you can use a single process for your server.
Gunicorn can work with gevent to create a server that can support multiple clients within a single worker process using coroutines, that could be a good option for your needs.

Preventing slow queries from exausting gunicorn worker pool

Let's say that we have a rather typical Django web application:
there is an Nginx in front of the app doing proxy stuff and serving
static content
there is gunicorn starting workers to handle Django requests
there is Django-based web app doing all kinds of fun stuff
there is a Redis server for sessions/cache
there is a MySQL database serving queries from Django
Some URLs have basically just a rendered Django template with almost no queries, some pages incorporate some info from Redis. But there are a few pages that do some rather involved database queries, which can (after all possible optimizations) take several seconds to execute on MySQL side.
And here my problem - each time a gunicorn worker gets a request for such heavy URL it no longer serves other requests for a while - it just sits there idle waiting for the database to reply. If there are enough such queries then eventually all workers just sit idle and wait on the heavy URLs leaving none to serve the other, faster pages.
It there a way to either allow worker to do other work while it is waiting on a database reply? Or to somehow scale up worker pool in such situation (preferably without also scaling RAM usage and database connection count :))? At least is there a way to find out any statistics on how many workers are busy in a gunicorn pool and for how long each of them has been processing a request?

A simple way that might work in your case would be to increase the number of workers. The recommended number of workers is 2-4 x {NUM CPUS}. Depending on load and the type of requests to the site this might be enough.
The next step to look into if increasing number of workers isn't enough, would be to look into using async workers (docs about it here). More detailed configuration options are described here. Note that depending on what type of async worker you choose to use, you will have to install either eventlet, gevent or tornado.

Are Heroku instances persistent? (Or, can I use dict/array as a cache?)

So my friend told me that instances on Heroku are persistent (I'm not sure if the vocab is right, but he implied that all users share the same instance).
So, if I have app.py, and an instance runs it, then all users share that instance. That means we can use a dict as a temporary cache for storing small things for faster response time.
So for example, if I'm serving an API, I can maybe define a cache like this and then use it.
How true is that? I tried looking this up, but could not find anything.
I deployed the linked API to heroku on 1 dyno, and with just a few requests per second, it was taking over 100 seconds to serve it. So my understanding is that the cache wasn't working. (It might be useful to note here that majority of time was due to request queueing, according to new relic.)

The Heroku Devcenter has several articles about the Heroku architecture.
Processes don't share memory. Moreover, your code is compiled into a slug and optimized for distribution to the dyno manager. In simple words, it means you don't even know which machine will execute your code. Theoretically, 5 users hitting your app may be routed to 5 different machines and processes.
Last but not least, keep in mind that if your app has only a single web dyno running, that web dyno will sleep. You have to have more than one web dyno to prevent web dynos from sleeping. When the dyno enter the sleep mode, the memory is released and you will loose all the data in memory.
This means that your approach will not work.
Generally speaking, in Heroku you should use external storages. For example, you can use the Memcached add-on and store your cache information in Memcached.
Also note you should not use the file system as cache. Not only because it's slower than Memcached, but also because the Cedar stack file system should be considered ephemeral.

How many concurrent requests does a single Flask process receive?

I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?
I do mean concurrent requests, and not requests per second or anything else.

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.
By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.
It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).
This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:
app.run(host="your.host", port=4321, threaded=True)
Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:
threaded – should the process handle each request in a separate thread?
processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
Something like:
app.run(host="your.host", port=4321, processes=3) #up to 3 processes
More info on the run() method here, and the blog post that led me to find the solution and api references.
Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."
However, they do point to their Deployment Options page for the recommended ways to do this when going for production.

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.
Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).

No- you can definitely handle more than that.
Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.
Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).
Therefore, most concurrency we talk about in computer science is software concurrency.
In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.
These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.
Another example is threads, which are units of code inside processes that allow concurrency as well.
The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.
The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.
Good luck!
*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.