i'm new to python flask REST web services. I'm trying to develop a rest web service which will have a shared queue, multiple threads will constantly write to that queue on the server side and finally when a user calls a GET methods, the service should return first item in the shared queue.
I was trying getting start to develop this by first implementing a shared variable, following is the code I used,
from flask import Flask
app = Flask(__name__)
count= 0 #Shared Variable
#app.route("/")
def counter():
count = count+1
return {'count':count}
if __name__ == "__main__":
app.run()
But even above code is not working. Then I though of using cache for the shared variable, but it will not the correct way to implement a shared queue (my ultimate goal). Please give me your advises
The thing you want to do is a little bit more complex than that, I'm afraid.
Flask (and other WSGI python systems) don't work in a single thread - they will normally need to spawn multiple threads and instances to cope with requests coming in without blocking, or without multiple requests accessing the same 'first task' at the same time. Thus global variables don't work as they might in other simple single-threaded python scripts.
You need some way for the different processes to all access the same single queue of data.
Usually, this means outsourcing the data queue to an external database. One popular option is Redis. There's a good intro to flask and redis for exactly this:
http://flask.pocoo.org/snippets/73/
I hope this helps you in the right direction!
You have a couple of bugs in your example. Here is a version that works:
from flask import Flask, jsonify
app = Flask(__name__)
count= 0 #Shared Variable
#app.route("/")
def counter():
global count
count = count+1
return jsonify({'count':count})
if __name__ == "__main__":
app.run()
The two problems you have in your version are:
You missed to declare count as global in your view function. Without the global declaration the view function creates a local variable of the same name.
The response returned by the view function cannot be a dictionary, it needs to be a string or a Response object. I corrected this using jsonify() to convert the dict to a JSON string.
But note that this way of creating a shared value is not robust. In particular note that if you run this application under a web server that creates multiple processes then each process will have its own copy of the count value.
If you need to do this on a production server my recommendation is that you use a database to store your shared value(s).
Related
In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?
You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.
This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.
Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")
While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...
In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?
You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.
This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.
Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")
While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...
In my application, the state of a common object is changed by making requests, and the response depends on the state.
class SomeObj():
def __init__(self, param):
self.param = param
def query(self):
self.param += 1
return self.param
global_obj = SomeObj(0)
#app.route('/')
def home():
flash(global_obj.query())
render_template('index.html')
If I run this on my development server, I expect to get 1, 2, 3 and so on. If requests are made from 100 different clients simultaneously, can something go wrong? The expected result would be that the 100 different clients each see a unique number from 1 to 100. Or will something like this happen:
Client 1 queries. self.param is incremented by 1.
Before the return statement can be executed, the thread switches over to client 2. self.param is incremented again.
The thread switches back to client 1, and the client is returned the number 2, say.
Now the thread moves to client 2 and returns him/her the number 3.
Since there were only two clients, the expected results were 1 and 2, not 2 and 3. A number was skipped.
Will this actually happen as I scale up my application? What alternatives to a global variable should I look at?
You can't use global variables to hold this sort of data. Not only is it not thread safe, it's not process safe, and WSGI servers in production spawn multiple processes. Not only would your counts be wrong if you were using threads to handle requests, they would also vary depending on which process handled the request.
Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs. If you need to load and access Python data, consider multiprocessing.Manager. You could also use the session for simple data that is per-user.
The development server may run in single thread and process. You won't see the behavior you describe since each request will be handled synchronously. Enable threads or processes and you will see it. app.run(threaded=True) or app.run(processes=10). (In 1.0 the server is threaded by default.)
Some WSGI servers may support gevent or another async worker. Global variables are still not thread safe because there's still no protection against most race conditions. You can still have a scenario where one worker gets a value, yields, another modifies it, yields, then the first worker also modifies it.
If you need to store some global data during a request, you may use Flask's g object. Another common case is some top-level object that manages database connections. The distinction for this type of "global" is that it's unique to each request, not used between requests, and there's something managing the set up and teardown of the resource.
This is not really an answer to thread safety of globals.
But I think it is important to mention sessions here.
You are looking for a way to store client-specific data. Every connection should have access to its own pool of data, in a threadsafe way.
This is possible with server-side sessions, and they are available in a very neat flask plugin: https://pythonhosted.org/Flask-Session/
If you set up sessions, a session variable is available in all your routes and it behaves like a dictionary. The data stored in this dictionary is individual for each connecting client.
Here is a short demo:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
# Check Configuration section for more details
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
#app.route('/')
def reset():
session["counter"]=0
return "counter was reset"
#app.route('/inc')
def routeA():
if not "counter" in session:
session["counter"]=0
session["counter"]+=1
return "counter is {}".format(session["counter"])
#app.route('/dec')
def routeB():
if not "counter" in session:
session["counter"] = 0
session["counter"] -= 1
return "counter is {}".format(session["counter"])
if __name__ == '__main__':
app.run()
After pip install Flask-Session, you should be able to run this. Try accessing it from different browsers, you'll see that the counter is not shared between them.
Another example of a data source external to requests is a cache, such as what's provided by Flask-Caching or another extension.
Create a file common.py and place in it the following:
from flask_caching import Cache
# Instantiate the cache
cache = Cache()
In the file where your flask app is created, register your cache with the following code:
# Import cache
from common import cache
# ...
app = Flask(__name__)
cache.init_app(app=app, config={"CACHE_TYPE": "filesystem",'CACHE_DIR': Path('/tmp')})
Now use throughout your application by importing the cache and executing as follows:
# Import cache
from common import cache
# store a value
cache.set("my_value", 1_000_000)
# Get a value
my_value = cache.get("my_value")
While totally accepting the previous upvoted answers, and discouraging use of global variables for production and scalable Flask storage, for the purpose of prototyping or really simple servers, running under the flask 'development server'...
...
The Python built-in data types, and I personally used and tested the global dict, as per Python documentation are thread safe. Not process safe.
The insertions, lookups, and reads from such a (server global) dict will be OK from each (possibly concurrent) Flask session running under the development server.
When such a global dict is keyed with a unique Flask session key, it can be rather useful for server-side storage of session specific data otherwise not fitting into the cookie (max size 4 kB).
Of course, such a server global dict should be carefully guarded for growing too large, being in-memory. Some sort of expiring the 'old' key/value pairs can be coded during request processing.
Again, it is not recommended for production or scalable deployments, but it is possibly OK for local task-oriented servers where a separate database is too much for the given task.
...
So I'm building a longrunning query web app for internal use.
My goal is to have a flask app with a daemon process that starts when the server starts, that will update a global dictionary object.
I don't necessarily have any sample code to post, as I've tried to accomplish this many ways and none have been successful.
The daemon will be creating a thread pool (multiprocessing.Pool) to loop through all database instances and running a couple queries on them.
It seems that no matter how I try and implement this (right now, using the flask development server) it locks up the app and nothing else can be done while it's running. I have tried reading through a bunch of documentation, but as per usual a lot of other knowledge is assumed and I end up overwhelmed.
I'm wondering if anyone can offer some guidance, even if it's places I can look for this, because I have searched all over for 'flask startup routine' and similar, but have found nothing of use. It seems that when I deploy this to our server, I may be able to define some startup daemons in my .wsgi file, but until then is there any way to do this locally? Is that even the right approach when I do push it out for General use?
Otherwise, I was just thinking of setting up a cron job that continuously runs a python script that does the queries I need, and dumps to a MongoDB instance or something, so that the clients can simply pull from that (as doing all of the queries on the server side of the Flask app just locks up the server, so nothing else can be done with it -- aka: can't take action on info, kill spids etc)
Any help with this would help majorly, my brain has been spinning for days.
from flask import Flask
from celery import Celery
app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = 'amqp://guest#localhost//'
app.config['CELERY_RESULT_BACKEND'] = 'amqp://guest#localhost//'
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
output = 0
#app.before_first_request
def init():
task = my_task.apply_async()
#app.route('/')
def hello_world():
global output
return 'Hello World! - ' + str(output)
#celery.task
def my_task():
global output
result = 0
for i in range(100):
result += i
output = result
if __name__ == '__main__':
app.run()
Depending how complex your query is, you could consider running your query via a second thread. Because of the GIL you don't need to worry about common data structure objects (such as the dictionary) being thread safe. A nice thing about threads is even though there's a GIL they're generally good about not blocking other threads executing during intense I/O (such as a thread for queries). See 2. Trivial example:
import threading
import time
import random
from flask import Flask
app = Flask(__name__)
data_store = {'a': 1}
def interval_query():
while True:
time.sleep(1)
vals = {'a': random.randint(0,100)}
data_store.update(vals)
thread = threading.Thread(name='interval_query', target=interval_query)
thread.setDaemon(True)
thread.start()
#app.route('/')
def hello_world():
return str(data_store['a'])
if __name__ == "__main__":
app.run()
Well, first of all: don't try to solve this problem by yourself: don't use threads or any kind of multiprocessing. Why? Because later on you want to scale up and the best way is to leave this up to the server - gunicorn, uwsgi. If you would try to handle this by yourself it would very likely collide with how these servers works.
Instead what you should do is to use one service for processing the request and message queue with a worker process that handles asynchronous tasks. This approach is more better at scaling.
From your question it seems that your are not looking for an answer but rather for guidance, have a look here: http://flask.pocoo.org/docs/0.10/patterns/celery/ and this https://www.quora.com/Celery-distributed-task-queue-What-is-the-difference-between-workers-and-processes
The advantage here is that the web worker / task worker / celery solution scales much better than the alternatives as the only bottleneck is the database.
I'm using bottle with a cherrypy server to utilize multithreading. As I understand it this makes each request handled by a different thread. So given the following code:
from bottle import request, route
somedict = {}
#route("/read")
def read():
return somedict
#route("/write", method="POST")
def write():
somedict[request.forms.get("key")] = request.forms.get("value")
Would somedict be thread safe? What if a daemon thread were run to manage somedict, say it's a dictionary of active sessions and the daemon thread prunes expired sessions? If not would a simple locking mechinism suffice, and would I need to use it when reading, writing, and in the daemon thread, or just in the daemon thread?
Also as I understand it cherrypy is a true multithreaded server. Is there a more proper method I should use to impliment a daemon thread while using cherrypy as pythons threads are not true threads? I don't wish to delve much into the cherrypy environment preferring to stick with bottle for this project though, so if it involves moving away from bottle/migrating my app to cherrypy then it doesn't really matter for now. I'd still like to know though as I didn't see much in their documentation on threads at all.
In your particular example, yes, the (single) dict assignment you perform is threadsafe.
somedict[request.forms.get("key")] = request.forms.get("value")
But, more generally, the proper answer to your question is: you will indeed need to use a locking mechanism. This is true if, for example, you make multiple updates to somedict while handling a single request, and you need them to be made atomically.
The good news is: it's probably as simple as a mutex:
from bottle import request, route
import threading
somedict = {}
somedict_lock = threading.Lock()
#route("/read")
def read():
with somedict_lock:
return somedict
#route("/write", method="POST")
def write():
with somedict_lock:
somedict[request.forms.get("key1")] = request.forms.get("value1")
somedict[request.forms.get("key2")] = request.forms.get("value2")
I had originally answered that a dict is threadsafe, but on futher research, that answer was wrong. See here for a good explanation.
For a quick explanation, imagine two threads running this code at once:
d['k'] += 1
They might both read d['k'] at the same time, and thus instead of being incremented by 2, be incremented only by 1.
I don't think it's an issue of your application locking up, more of just some data being lost. If that's not acceptable, using threading.Lock is pretty easy, and doesn't add that much overhead.
Here's some good info on thread safety with CherryPy. You might also consider using something like gunicorn in place of CherryPy. It has a worker process model, so each somedict would be different for every process, so there would be no worry of thread-safety.
CherryPy is based on Python threads, so you should stay away from using it as an HTTP server only (and any other native HTTP server). I suggest that you go with uWSGI, which is multiprocess and thus doesn't have GIL issues. Since it is multiprocess, you won't be able to use simple thread-shared variables. You can use uWSGI's SharedArea though or any 3rd party data storage.