If I have global variables in flask and have multiple users accessing the site at once, can one persons session overwrite the global variables of another persons session, or does flask make a unique instance of my site and program code each time its requested from a users browser?
Generally speaking, global variables are shared between requests.
Some WSGI servers can use a new separate process for each request, but that is not an efficient way to scale your requests. Most will use treading or several child processes to spread the load but even in the case of separate child processes each subprocess will have to handle multiple requests during its lifetime.
In other words: no, Flask will not protect your global variables from being shared between different users.
Related
In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?
You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.
This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.
Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")
While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...
Flask Web Development says:
from flask import request
#app.route('/')
def index():
user_agent = request.headers.get('User-Agent')
return '<p>Your browser is %s</p>' % user_agent
Note how in this view function request is used as if it was a global
variable. In reality, request cannot be a global variable if you
consider that in a multithreaded server the threads are working on
different requests from different clients at the same time, so each
thread needs to see a different object in request. Contexts enable
Flask to make certain variables globally accessible to a thread
without interfering with the other threads.
Understandable, but why not simply make request a thread-local variable? Under the hood, what exactly is request, and how is it different from a thread-local variable?
This was simply a design decision by Armin (the author of Flask). You could indeed rewrite Flask to operate as a thread-local, but that was not what he wanted to do here.
The idea of Flask (in general) is to keep things as simple as possible, and abstract a lot of thinking away. This is why a lot of Flask helpers are implemented as 'global variables': you don't really have to think about the meaning behind it, because each global is bound to the incoming request.
I know there are frameworks that exist, but I am trying to use wsgi directly to improve my own understanding.
I have my wsgi handler, and up at the top I have declared a variable i = 0.
In my application(environ, start_response) function, I declare global i, then I increment i whenever a button is pressed.
It is my understanding that the state of this variable is preserved as long as the server is running, so all users of the web app see the same i.
If I declare i inside the application function, then the value of i resets to 0 any time a request is made.
I'm wondering, how do you preserve i between a single user's requests, but not have it preserved across different users' sessions? So, a single user can make multiple posts, and i will increment, but if another user visits the web app, they start with i=0.
And I know you could store i in a database between posts, but is it possible to just keep i in memory between posts?
Web-apps are generally “shared-nothing”. In the context of WSGI, that means you have no idea how many times your application (and the counter with it) will be instantiated; that choice is up to the WSGI server which acts as your app's container.
If you want some notion of user sessions, you have to implement it explicitly, typically on top of cookies. If you want persistence, you need a component that explicitly supports it; that can be a shared database, or it can piggy-back on your cookie sessions.
Add WSGI session middleware and store the value these.
I'm just wondering if Django was designed to be a fully stateless framework?
It seems to encourage statelessness and external storage mechanisms (databases and caches) but I'm wondering if it is possible to store some things in the server's memory while my app is in develpoment and runs via manage.py runserver.
Sure it's possible. But if you are writing a web application you probably won't want to do that because of threading issues.
That depends on what you mean by "store things in the server's memory." It also depends on the type of data. If you can, you're better off storing "global data" in a database or in the file system somewhere. Unless it is needed every request it doesn't really make sense to store it in the Django instance itself. You'll need to implement some form of locking to prevent race conditions, but you'd need to worry about race conditions if you stored everything on the server object anyway.
Of course, if you're talking about user-by-user data, Django does support sessions. Or, and this is another perfectly good option if you're willing to make the user save the data, cookies.
The best way to maintain state in a django app on a per-user basis is request.session (see django sessions) which is a dictionary you can use to remember things about the current user.
For Application-wide state you should use the a persistent datastore (database or key/value store)
example view for sessions:
def my_view(request):
pages_viewed = request.session.get('pages_viewed', 1) + 1
request.session['pages_viewed'] = pages_viewed
...
If you wanted to maintain local variables on a per app-instance basis you can just define module level variables like so
# number of times my_view has been served since by this server
# instance since the last restart
served_since_restart = 0
def my_view(request):
served_since_restart += 1
...
If you wanted to maintain some server state across ALL app servers (like total number of pages viewed EVER) you should probably use a persistent key/value store like redis, memcachedb, or riak. There is a decent comparison of all these options here: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
You can do it with redis (via redis-py) like so (assuming your redis server is at "127.0.0.1" (localhost) and it's port 6379 (the default):
import redis
def my_view(request):
r = redis.Redis(host='127.0.0.1', port="6379")
served = r.get('pages_served_all_time', 0)
served += 1
r.set('pages_served_all_time', served)
...
There is LocMemCache cache backend that stores data in-process. You can use it with sessions (but with great care: this cache is not cross-process so you will have to use single process for deployment because it will not be guaranteed that subsequent requests will be handled by the same process otherwise). Global variables may also work (use threadlocals if they shouldn't be shared for all process threads; the warning about cross-process communication also applies here).
By the way, what's wrong with external storage? External storage provides easy cross-process data sharing and other features (like memory limiting algorithms for cache or persistance with databases).
Using google app engine:
# more code ahead not shown
application = webapp.WSGIApplication([('/', Home)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
If two different users request the webpage on two different machine, two individual instances of the server will be invoked?
Or just one instance of the server is running all the time which handle all the requests?
How about if one user open the webpage twice in the same browser?
Edit:
According to the answers below, one instance may handle requests from different users turn by turn. Then consider the following fraction of code, taken from the example Google gave:
class User(db.Model):
email = db.EmailProperty()
nickname = db.StringProperty()
1, email and nickname here are defined as class variables?
2, All the requests handled by the same instance of server share the same variables and thus by mistake interfere with each other? (Say, one's email appears in another's page)
ps. I know that I should read the manual and doc more and I am doing it, however answers from experienced programmer will really help me understand faster and more through, thanks
An instance can handle many requests over its lifetime. In the python runtime's threading model, each instance can only handle a single request at any given time. If 2 requests arrive at the same time they might be handled one after the other by a single instance, or a second instance might be spawned to handle the request.
EDIT:
In general, variables used by each request will be scoped to a RequestHandler instance's .get() or .post() method, and thus can't "leak" into other requests. You should be careful about using global variables in your scripts, as these will be cached in the instance and would be shared between requests. Don't use globals without knowing exactly why you want to (which is good advice for any application, for that matter), and you'll be fine.
App Engine dynamically builds up and tears down instances based on request volume.
From the docs:
App Engine applications are powered by
any number of instances at any given
time, depending on the volume of
requests received by your application.
As requests for your application
increase, so do the number of
instances powering it.
Each instance has its own queue for
incoming requests. App Engine monitors
the number of requests waiting in each
instance's queue. If App Engine
detects that queues for an application
are getting too long due to increased
load, it automatically creates a new
instance of the application to handle
that load.
App Engine scales instances in reverse
when request volumes decrease. In this
way, App Engine ensures that all of
your application's current instances
are being used to optimal efficiency.
This automatic scaling makes running
App Engine so cost effective.
When an application is not being used
all, App Engine turns off its
associated instances, but readily
reloads them as soon as they are
needed.