SOLVED: Turns out problem comes from gunicorn preloading and forking vs the apscheduler. See comment.
Background
I am writing a simple flask API that does periodic background query to a SQL database using apscheduler, then serves incoming rest requests with flask. The API will do different aggregation based on the incoming request.
I have a data class object that has methods for 1) querying/updating, 2) responding to aggregation requests. The problem arises when somehow the flask resource seems to be stuck at an older version of the data while the logs show that the query/update method was called properly.
Code so far
I broke down my app in modules as follow:
app/
├── app.py
└── apis
├── __init__.py
└── model1.py
Data model file
In model1.py, I defined the data class, the API endpoints with flask-restplus namespace, and initialize the data object:
from flask_restplus import Namespace, Resource
import pandas as pd
api = Namespace('sales')
#api.route('/check')
class check_sales(Resource):
def post(self):
import json
req = api.payload
result = data.get_sales(**req)
return result, 200
class sales_today():
def __init__(self):
self.data = None
self.update()
def update(self):
# some logging here
self.data = self.check_sql()
logging.debug("Last Order: %s" % str(self.data.sales_time.max()))
def check_sql(self):
query = """
SELECT region, store, item, sales_count, MAX(UtcTimeStamp) as sales_time FROM db GROUP BY 1,2,3
"""
sales = pd.read_gbq(query)
return sales
def get_sales(self, **kwargs):
'''
kwargs here is a dict where we filter and sum
'''
for arg_name in (x for x in kwargs):
mask = True
if type(kwargs[arg_name]) is str:
arg_value = kwargs[arg_name].split(',')
mask = mask & (self.data[arg_name].isin(arg_value))
result = {k:v for k,v in kwargs.items()}
result['count'] = int(self.data.loc[mask]['sales_count'])
result['last_updated'] = str(self.data.sales_time.max())
return result
data = sales_today()
Module init file
In __init__.py inside app/apis I pass the data object instance as well as the api namespace.
from .model1 import api as ns_model1
from .model1 import data as data_model1
def add_apins(api):
api.add_namespace(ns_model1, path='/model1')
Main app file
In the main app.py file I layout the scheduler to keep the data refreshed every 5 minutes with apscheduler. I then serve this app with gunicorn.
import atexit
from apscheduler.schedulers.background import BackgroundScheduler
from flask import Flask
from flask_restplus import Resource, Api
from apis import add_apins
from apis import data_model1
# parameters
port = 8888
poll_freq = '0-59/5'
# flask app
main_app = Flask(__name__)
api = Api()
add_apins(api)
api.init_app(main_app)
# background scheduler
sched = BackgroundScheduler()
sched.add_job(data_model1.update, 'cron', minute=poll_freq)
sched.start()
atexit.register(lambda: sched.shutdown(wait=False))
if __name__ == "__main__":
# serve(application, host='0.0.0.0', port=port) # ssl_context="adhoc" for https testing locally
run_simple(application=main_app, hostname='0.0.0.0', port=port, use_debugger=True)
Expectation and issues
Since the query is updated every 5 minutes, I would expect whenever I query the /check endpoint, the responding payload's last_updated value will match the latest from the logs (logging.debug line in the update() method). However, I'm getting responses indicating that the last_updated value equals to when the app was run initially.
I have confirmed in the DB that indeed data is up to date there, and from logging, I'm also confirmed that the update() method is being run every 5 minutes and showing the latest timestamp.
I also noticed that the app runs fine with python app.py in Windows, but when running the app with gunicorn it starts exhibiting this weird behaviour.
I am quite puzzled as to where things go wrong. Could it be scoping? Or am I passing the instance between modules wrongly?
Thank you so much for your time and help. Any ideas would be much appreciated.
Related
I'm trying to run Flask from an imported module (creating a wrapper using decorators).
Basically I have:
app.py:
import mywrapper
#mywrapper.entrypoint
def test():
print("HEYO!")
mywrapper.py
from flask import Flask
ENTRYPOINT = None
app = Flask(__name__)
#app.route("/")
def listen():
"""Start the model API service"""
ENTRYPOINT()
def entrypoint(f):
global ENTRYPOINT
ENTRYPOINT = f
return f
FLASK_APP=app
Running python -m flask, however, results in:
flask.cli.NoAppException: Failed to find Flask application or factory in module "app". Use "FLASK_APP=app:name to specify one.
Is there any trick to getting Flask to run like this? Or is it just not possible? The purpose of this is to abstract Flask away in this situation.
In my head flask should try to import mywrapper.py, which imports app.py which should generate the app and route, yet this doesn't seem to be what occurs.
Any help would be appreciated.
So I've since learnt that Flask searches only in the chosen module's namespace for a variable containing a Flask object.
There may be a smart way to avoid this limitation, but I instead decided that it was more sensible to instead just wrap the Flask class itself. If people want direct Flask functionality, I don't really care in this situation, so the only real limitation I have from this is some function names are off limits.
Basically:
wrapper.py:
class Wrapper(Flask):
def __init__(self, name):
super().__init__(name)
self.entrypoint_func = None
#self.route("/")
def listen():
return self.entrypoint_func()
def entrypoint(self, f):
assert self.entrypoint_func is None, "Entrypoint can only be set once"
self.entrypoint_func = f
return f
and app.py:
from mywrapper import Wrapper
app = Wrapper(__name__)
#app.entrypoint
def test():
print("HEYO!")
return "SUCCESS"
This is still abstracted enough that I am happy with the results.
Here is a code to run a Flask app along with Bokeh server inspired from
flask_gunicorn_embed.py on Github.
At first it works like a charm; however, having refreshed the page, this error occurs.
Models must be owned by only a single document: ... (rest truncated)
By the way, the code gets runned using gunicorn.
Also, create_figure() function returns a layout
from Decision_Tree.Plot.decision_tree import create_figure
if __name__ == '__main__':
import sys
sys.exit()
app = Flask(__name__)
def modify_doc(doc):
# Create the plot
plot = create_figure()
# Embed plot into HTML via Flask Render
doc.add_root(plot)
bkapp = Application(FunctionHandler(modify_doc))
# This is so that if this app is run using something like "gunicorn -w 4" then
# each process will listen on its own port
sockets, port = bind_sockets("x.x.x.x", 0)
#app.route('/', methods=['GET'])
def bkapp_page():
script = server_document('http://x.x.x.x:%d/bkapp' % port)
return render_template("index.html", script=script, template="Flask")
def bk_worker():
asyncio.set_event_loop(asyncio.new_event_loop())
bokeh_tornado = BokehTornado({'/bkapp': bkapp}, extra_websocket_origins=["x.x.x.x:5000"])
bokeh_http = HTTPServer(bokeh_tornado)
bokeh_http.add_sockets(sockets)
server = BaseServer(IOLoop.current(), bokeh_tornado, bokeh_http)
server.start()
server.io_loop.start()
from threading import Thread
Thread(target=bk_worker).start()
Looking forward to any help!
P.S. Domain replaced with x.x.x.x intentionally.
You have not included all the code, so it is impossible to say for certain, but the most likely explanation is that you are creating Bokeh models somewhere and re-using them between different calls to modify_doc. For example this would be the case, if your create_figure function referred to a global ColumnDataSource (or whatever) that was created outside the function as a module global. This will not work, Bokeh models cannot be re-used between different docs/sessions. Every call to modify_doc needs to return an entirely new set of Bokeh models for the session, otherwise different users would have shared state which is not good for many reasons (so it is explicitly disallowed by raising that exception).
I would like to upload multiple files using a thread. This way the files can upload in the background and not make the user wait.
Here is my simplified code:
In app.py:
from file_upload import upload_process
from flask import request
#app.route('/complete', methods=['POST'])
def complete():
id = 5 #for simplified example
upload_process(id) #My thread
...
return render_template('complete.html')
In file_upload.py
from threading import Thread
from flask import request
def upload_process(id):
thr = Thread(target = upload_files, args = [id])
thr.start()
def upload_files(id):
file_1= request.files['file_1']
file_2= request.files['file_2']
file_3= request.files['file_3']
newFiles = FileStorage(id= id, file_1 = file_1.read(), file_2 =
file_2.read(), file_3 = file_3.read())
db.session.add(newFiles)
db.session.commit()
I get the error:
RuntimeError: Working outside of request context.
This typically means that you attempted to use functionality that needed an active HTTP request. Consult the documentation on testing for information about how to avoid this problem.
How would I get the request to work within the upload_files function.
(Without threading the files upload correctly.)
What is the best way handle unit tests that rely on calling code that in turn relies on the current app's configuration?
eg
code.py
from flask import current_app
def some_method():
app = current_app._get_current_object()
value = (app.config['APP_STATIC_VAR'])*10
return value
test_code.py
class TestCode(unittest.TestCase):
def test_some_method(self):
app = create_app('app.settings.TestConfig')
value = some_method()
self.assertEqual(10, value)
Running the test above I get an 'RuntimeError: working outside of application context' error when the app = create_app('app.settings.TestConfig') line is executed.
Calling app = create_app during the test doesn't do the trick. What is the best way to unit test in this case where I am needing the config to be read in the the application?
You are using accessing the app within an app context when you call some_method() to fix it replace your call with:
with app.app_context():
value = some_method()
I'm using WSGI/Apache2 and am trying to declare my database pool on init, to be accessible via a global var from my endpoints. I'm using Redis and Cassandra (DSE, specifically). It's my understanding that both the Redis and DSE libs offer pool management so this shouldn't be an issue.
My folder structure for my WSGI app looks something akin to
folder/
tp.wsgi
app/
__init__.py
decorators/
cooldec.py
mod_api/
controllers.py
tp.wsgi looks like the following
#! /usr/bin/env python2.7
import sys
import logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0, "/opt/tp")
from app import app
def application(environ, start_response):
return app(environ, start_response)
__init__.py looks like the following
#! /usr/bin/env python2.7
from flask import Flask
from cassandra.cluster import Cluster
# Import our handlers
from app.mod_api.files import mod_files
# Setup routine
def setup():
# Instantiate Flask
app = Flask('app')
# Set up a connection to Cassandra
cassandraSession = Cluster(['an ip address', 'an ip address']).connect('keyspace')
cassandraSession.default_timeout = None
# Register our blueprints
app.register_blueprint(mod_files)
...
return app, cassandraSession
app, cassandraSession = setup()
I'm calling a decorator defined in cooldec.py that handles authentication (I use that term loosely, for a reason. I ask that we not go down the path of using Flask extensions for authentication, that's out of scope for this question and isn't applicable in my use-use [see: loose usage of the term 'authentication'])
In cooldec.py and controllers.py I'm trying to access the cassandraSession global but I keep getting global name 'cassandraSession' is not defined. I know what the error means, but I'm not sure why I'm seeing this. It's my understanding that the way I've set my WSGI app up allows for cassandraSession to be accessible within the scope of the app, no?
I found Preserving state in mod_wsgi Flask application but .. it hasn't really shed any light on to what I'm doing wrong.
My issue was the location of my imports. I made a few changes to tp.wsgi and __init__.py and I've got what I need working. That is, calling from app import cassandraSession from within cooldec.py and controllers.py
Below is how I've set up the aforementioned.
tp.wsgi
#! /usr/bin/env python2.7
import sys
import logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0, "/opt/tp")
from app import app as application
__init__.py
#! /usr/bin/env python2.7
# API Module
from flask import Flask, jsonify
from cassandra.cluster import Cluster
# Create our API
app = Flask('app')
# Define a cassandra cluster/session we can use
cassandraSession = Cluster(['an ip address' 'an ip address']).connect('keyspace')
cassandraSession.default_timeout = None
... Register blueprints
These are overly simplified edits, but it gives the idea of what I was doing wrong (eg: declaring in wrong file and trying to import improperly.
In both cooldec.py and controllres.py we can now do
from app import cassandraSession
rows = cassandraSession.execute('select * from table')
Tip for new WSGI developers: Continue to think "in python".
+ WARNING +
I have yet to find an absolute answer on whether or not this is safe to do. Doing this using sqlalchemy is perfectly OK due to how sqlalchemy handles connection pooling. I am, as of yet, unaware if this is safe to do with Cassandra/DSE, so proceed with caution if you utilize this post.