Flask contexts (application and request) vs thread-local variables

Flask contexts (application and request) vs thread-local variables - python

Flask Web Development says:
from flask import request
#app.route('/')
def index():
user_agent = request.headers.get('User-Agent')
return '<p>Your browser is %s</p>' % user_agent
Note how in this view function request is used as if it was a global
variable. In reality, request cannot be a global variable if you
consider that in a multithreaded server the threads are working on
different requests from different clients at the same time, so each
thread needs to see a different object in request. Contexts enable
Flask to make certain variables globally accessible to a thread
without interfering with the other threads.
Understandable, but why not simply make request a thread-local variable? Under the hood, what exactly is request, and how is it different from a thread-local variable?

This was simply a design decision by Armin (the author of Flask). You could indeed rewrite Flask to operate as a thread-local, but that was not what he wanted to do here.
The idea of Flask (in general) is to keep things as simple as possible, and abstract a lot of thinking away. This is why a lot of Flask helpers are implemented as 'global variables': you don't really have to think about the meaning behind it, because each global is bound to the incoming request.

Related

Problem with a shared object when multiple Users are requesting my Flask Web-App [duplicate]

In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?

You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.

This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.

Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")

While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...

How does Context Locals and The Request Context work together?

I am trying to learn to make a variable available across method through a decorator function in Flask.
I read Flask request context documentation and wrote the following code, which works as intended.
a.py
_request_ctx_stack.top.current_identity = payload.get('sub')
b.py
current_identity = getattr(_request_ctx_stack.top, 'current_identity', None)
However flask-jwt solves this problem by introducing an additional local proxy like this:
a.py
from werkzeug.local import LocalProxy
current_identity = LocalProxy(lambda: getattr(_request_ctx_stack.top, 'current_identity', None))
_request_ctx_stack.top.current_identity = payload.get('sub')
b.py
from a import current_identity
Why? I read werkzeug context locals documentation and doesn't Flask already implements Werkzeug context locals for request object?
Is there any advantage of introducing LocalProxy?

The LocalProxy wraps the manual code you wrote for getting the value. Instead of needing that manual code everywhere you want to access current_identity, you access it as a proxy and it does the manual code for you.
It's most useful in libraries, where users wouldn't be expected to know how current_identity is set up, and would import the proxy to access it. The same applies to Flask itself: you're not expected to know how the request is set up, or where exactly it's stored, only that you access it by importing request.
A proxy is useful for data that is set up during and local to each request. If you used a true global variable, it would not behave the way you expect when multiple requests are being handled. See Are global variables thread safe in flask? How do I share data between requests?

Why binding to context is necessary in Werkzeug

I was reading the source code of the Werkzeug library in github and in one of the examples (Simplewiki to name it), in the application.py file there is function which binds the application to the current active context. I would like to know why this is necessary, or where can I find something that explains this?
The function is this:
def bind_to_context(self):
"""
Useful for the shell. Binds the application to the current active
context. It's automatically called by the shell command.
"""
local.application = self
And this is the part where the dispatcher binds the request.
def dispatch_request(self, environ, start_response):
"""Dispatch an incoming request."""
# set up all the stuff we want to have for this request. That is
# creating a request object, propagating the application to the
# current context and instanciating the database session.
self.bind_to_context()
request = Request(environ)
request.bind_to_context()

As far as I know, contexts in the Werkzeug is about separating environment between different threads. For example, contexts are very common thing in the Flask framework which is built on top of the Werkzeug. You can run Flask application in multi-threaded mode. In such case you'll have only one application object which is accessed by multiple threads simultaneously. Each thread requires a piece of data within the app for private usage. Storing such data is organized via thread's local storage. And this is called the context.

Efficient session variable server-side caching with Python+Flask

Scenario:
Major web app w. Python+Flask
Flask login and Flask.session for basic session variables (user-id and session-id)
Flask.session and limitations? (Cookies)
Cookie based and basically persist only at the client side.
For some session variables that will be regularly read (ie, user permissions, custom application config) it feels awkward to carry all that info around in a cookie, at every single page request and response.
Database is too much?
Since the session can be identified at the server side by introducing unique session id at login, some server-side session variable management can be used. Reading this data at the server side from a database also feels like unnecessary overhead.
Question
What is the most efficient way to handle the session variables at the server side?
Perhaps that could be a memory-based solution, but I am worried that different Flask app requests could be executed at different threads that would not share the memory-stored session data, or cause conflicts in case of simultaneous reading-writing.
I am looking for advice and best practice for planning the basic level architecture.

Flask-Caching
What you need is a server-side caching package that's Flask-Caching.
A simple setup:
from flask import Flask
from flask_caching import Cache
app = Flask(__name__)
app.config['CACHE_TYPE'] = 'SimpleCache'
cache = Cache(app)
Then a explicitly use of a cached variable:
#app.route('/')
def load():
cache.set("foo", foo)
bar = cache.get("foo")
There is much more in Flask-Caching and that's the recommended approach by Flask.
In case of a multithread server with gunicorn from here you better use ['CACHE_TYPE'] = 'FileSystemCache'

Your instinct is correct, it's probably not the way to do it.
Session data should only be ephemeral information that is not too troublesome to lose and recreate. For example, the user will just have to login again to restore it.
Configuration data or anything else that's necessary on the server and that must survive a logout is not part of the session and should be stored in a DB.
Now, if you really need to easily keep this information client-side and it's not too much of a problem if it's lost, then use a session cookie for logged in/out state and a permanent cookie with a long lifespan for the rest of the configuration information.
If the information is too much size-wise, then the only option I can think of is to store the data, other than the logged in/out state, in a DB.

Does Google App Engine run one instance of an app per one request? or for all requests?

Using google app engine:
# more code ahead not shown
application = webapp.WSGIApplication([('/', Home)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
If two different users request the webpage on two different machine, two individual instances of the server will be invoked?
Or just one instance of the server is running all the time which handle all the requests?
How about if one user open the webpage twice in the same browser?
Edit:
According to the answers below, one instance may handle requests from different users turn by turn. Then consider the following fraction of code, taken from the example Google gave:
class User(db.Model):
email = db.EmailProperty()
nickname = db.StringProperty()
1, email and nickname here are defined as class variables?
2, All the requests handled by the same instance of server share the same variables and thus by mistake interfere with each other? (Say, one's email appears in another's page)
ps. I know that I should read the manual and doc more and I am doing it, however answers from experienced programmer will really help me understand faster and more through, thanks

An instance can handle many requests over its lifetime. In the python runtime's threading model, each instance can only handle a single request at any given time. If 2 requests arrive at the same time they might be handled one after the other by a single instance, or a second instance might be spawned to handle the request.
EDIT:
In general, variables used by each request will be scoped to a RequestHandler instance's .get() or .post() method, and thus can't "leak" into other requests. You should be careful about using global variables in your scripts, as these will be cached in the instance and would be shared between requests. Don't use globals without knowing exactly why you want to (which is good advice for any application, for that matter), and you'll be fine.

App Engine dynamically builds up and tears down instances based on request volume.
From the docs:
App Engine applications are powered by
any number of instances at any given
time, depending on the volume of
requests received by your application.
As requests for your application
increase, so do the number of
instances powering it.
Each instance has its own queue for
incoming requests. App Engine monitors
the number of requests waiting in each
instance's queue. If App Engine
detects that queues for an application
are getting too long due to increased
load, it automatically creates a new
instance of the application to handle
that load.
App Engine scales instances in reverse
when request volumes decrease. In this
way, App Engine ensures that all of
your application's current instances
are being used to optimal efficiency.
This automatic scaling makes running
App Engine so cost effective.
When an application is not being used
all, App Engine turns off its
associated instances, but readily
reloads them as soon as they are
needed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.