Hi I'm just learning python, threading and flask
what is the proper way to see thread is running in flask app ? I already tried storing the thread in global variable
global thread_a
thread_a = threading.start()
and tried using flask.current_app
flask.current_app.thread_a = threading.start()
then I check if thread is running using
thread_a.is_alive()
it is works fine on my local machine. however when I deploy to remote server (this case is openshift) the value of the variable is sometimes defined but sometimes not, so I thought this method doesn't always work ?
Related
My application creates a Flask app as well as a background process that does work with my MySQL database (through SQLAlchemy) every so often:
from task_manager import TaskManager
# Session is a sessionmaker created earlier
task_manager = TaskManager(timedelta(seconds = 1), timedelta(seconds = 1), Session)
threading.Thread(target=task_manager.scheduler_loop).start()
app.run(debug=True, host='0.0.0.0', port=5000)
Whenever this process finds an available task (this is in the scheduler_loop that's running in the separate thread), it does some work:
with db_session(self.Session) as session:
task = session.query(Task).filter(or_(Task.date == None, Task.date <= datetime.now())).order_by(Task.priority).first()
if task is not None:
if task.type == "create_paper_task":
self.create_paper(session, task.paper_title)
elif task.type == "update_citations_task":
self.update_citations(session, task.paper)
session.delete(task)
...
def create_paper(self, session, paper_title):
...
# For the purposes of testing, I replaced a long API call with this sleep.
time.sleep(3)
paper = Paper(paper_title, year)
paper.citations.append(Citation(citations, datetime.now()))
session.add(paper)
If I try to use this code, the SQLAlchemy queries are run twice. Two Paper objects are created, and I get this error (presumably the Task being deleted twice):
/app/task_manager.py:17: SAWarning: DELETE statement on table 'create_paper_task' expected to delete 1 row(s); 0 were matched. Please set confirm_deleted_rows=False within the mapper configuration to prevent this warning.
The actual code itself isn't running twice, and there definitely aren't multiple scheduler threads running: I've tested this using print statements.
Now, the weirdest part about this is that the issue ONLY occurs when
There's a long wait during the execution. If the time.sleep is removed, there's no problem, and
The Flask app is running and the scheduler loop is running in a separate thread. If the Flask app isn't running, or the scheduler_loop is running in the main thread (so obviously the Flask app isn't running), then there's no problem.
Also, the Flask app isn't being used at all while I'm testing this, so that's not the issue.
The app.run function of Flask will run your initialization code twice when you set debug=True. This is part of the way Flask can detect code changes and dynamically restart as needed. The downside is that this is causing your thread to run twice which in turn creates a race condition on reading and executing your tasks, which indeed would only show up when the task takes long enough for the second thread to start working.
See this question/answer for more details about what is happening: Why does running the Flask dev server run itself twice?
To avoid this you could add code to avoid the second execution, but that has the limitation that the auto-reloading feature for modified code will no longer work. In general, it would probably be better to use something like Celery to handle task execution instead of building your own solution. However, as mentioned in the linked answer, you could use something like
from werkzeug.serving import is_running_from_reloader
if is_running_from_reloader():
from task_manager import TaskManager
task_manager = TaskManager(timedelta(seconds = 1), timedelta(seconds = 1), Session)
threading.Thread(target=task_manager.scheduler_loop).start()
which would keep your thread from being created unless you are in the second (reloaded) process. Note this would prevent your thread from executing at all if you remove debug=True.
I have a Tornado app which is using python firebase_admin SDK.
When I run in single process:
console_server = tornado.httpserver.HTTPServer(ConsoleApplication())
console_server.listen(options.console_port, options.bind_addr)
tornado.ioloop.IOLoop.instance().start()
firebase_admin works fine. But when I change it to run in multiprocess:
console_server = tornado.httpserver.HTTPServer(ConsoleApplication())
console_server.bind(options.console_port, options.bind_addr)
console_server.start(4)
tornado.ioloop.IOLoop.instance().start()
The last line here is getting stuck:
if (not len(firebase_admin._apps)):
cred = ...
self.app = firebase_admin.initialize_app(cred)
self.app = firebase_admin.get_app()
self.db = firestore.client()
...
ref = self.db.document(USER_DOC.format(org, value))
user_ref = ref.get()
Seems like get() is not getting resolved since I don't get any exception.
Does anyone has an idea why it's happening or at least how can I debug it?
The multiprocess fork (i.e. the start(4) call) must come very early in the life of your application. In particular, most things that touch the network must come after the fork (bind() is one of the few exceptions, and must come before the fork in this case).
You (probably) need to reorganize things so that you're creating the firebase app after the fork. This can be annoying if you're using the HTTPServer.start method, so you may want to switch to calling tornado.process.fork_processes() directly instead (this is documented as the "advanced multi-process" pattern).
I know it's an old question, but I want to share my experience regarding this issue to help future visitors.
I recently developed a script with multiprocessing that uses Firebase Admin Python SDK, everything worked fine in my local Windows machine, but when I deployed it for production in Linux server, I noticed the script is getting stuck in get() function.
After hours of searching, I found out that the default start method of a python process is different in Windows and Unix environments: Windows uses spawn as default start method, whereas Unix uses fork. You can learn more about start methods in the documentation.
So to make it work in my Linux server, I just changed the start method to spawn:
if __name__ == '__main__':
multiprocessing.set_start_method('spawn') # <-- Set spawn as start_method
# The rest of your script here
# ...
I have two different Flask project. I want to run them on server on different link.
Currently I saw at a time one project I can see running.
I tried running on same port with different link, and also with different port. But I see it runs only one project at a time.
Project 1
if __name__ == '__main__':
app.run(host="0.0.0.0", port=5001,debug = True)
Project 2
I tried running
export FLASK_APP=app.py
flask run --host 0.0.0.0 --port 5000
Also this way
if __name__ == '__main__':
app.run(host="0.0.0.0", port="5000",debug = True)
I recently did a parallel threading operation with my own website in Flask. So I completely understand your confusion, though I'm going to explain this the best of my abilities.
When creating parallel operations, it's best to use multi-threading. Basically multi-threading is meant for splitting operations up and doing them simultaneously on the CPU. Though this must be supported by the CPU, which most by today are supporting Multi-Threading.
Anyways, with the application. I initialized the Flask Application classes to share the data between all the threads, by using the main thread as the memory handler. Afterwards, I created the pages. Then within the initialization 'if statement'(if __name__ == '__main__') - Known as a driver class in other languages. I initialized and started the threads to do there parts of the application.
Notes:
Flask doesn't allow debug mode to be executed while not on the Main Thread. Basically meaning you cannot use the multi-threading on the Flask Apps when debugging the application, which is no problem. VSCode has a great output console to give me enough information to figure out the issues within the application. Though... sometimes thread error finding can be.. painful at times, it's best to watch your steps when debugging.
Another thing is you can still operate the threaded feature on Flask. Which I like to use on any Flask Application I make, because it allows better connection for the clients. For example, thread is disabled; the client connects and holds up the main thread, which holds it for a millisecond then releases it. Having threaded enabled; allows the clients to open and release multiple requests. Instead of all the clients piping through one thread.
Why would that be important? Well, if a client runs a heavy script that has to do operations on the local host machine, then that page's request query will be taking a larger amount of time. In returns, makes the client hold that main thread pipe, so therefore no-one else could connect.
My Code for your Issue:
import threading
from flask import Flask
# My typical setup for a Flask App.
# ./media is a folder that holds my JS, Imgs, CSS, etc.
app1 = Flask(__name__, static_folder='./media')
app2 = Flask(__name__, static_folder='./media')
#app1.route('/')
def index1():
return 'Hello World 1'
#app2.route('/')
def index2():
return 'Hello World 2'
# With Multi-Threading Apps, YOU CANNOT USE DEBUG!
# Though you can sub-thread.
def runFlaskApp1():
app1.run(host='127.0.0.1', port=5000, debug=False, threaded=True)
def runFlaskApp2():
app2.run(host='127.0.0.1', port=5001, debug=False, threaded=True)
if __name__ == '__main__':
# Executing the Threads seperatly.
t1 = threading.Thread(target=runFlaskApp1)
t2 = threading.Thread(target=runFlaskApp2)
t1.start()
t2.start()
PS: Run this app by doing python app.py instead of
export FLASK_APP=app.py
flask run --host 0.0.0.0 --port 5000
Hope this helps you, and happy developing!
I've read some documentation online about how to do remote debugging with PyCharm - https://www.jetbrains.com/help/pycharm/remote-debugging.html
But there was one key issue with that for what I was trying to do, with my setup - Nginx connecting to uWSGI, which then connects to my Flask app. I'm not sure, but setting up something like,
import sys
sys.path.append('pycharm-debug.egg')
import pydevd
pydevd.settrace('localhost', port=11211,
stdoutToServer=True, stderrToServer=True,
suspend=False)
print 'connected'
from wsgi_configuration_module import app
My wsgi_configuration_module.py file is the uWSGI file used in Production, i.e. no debug.
Connects the debugger to the main/master process of uWSGI, which is run once only, at uWSGI startup / reload, but if you try to set a breakpoint in code blocks of your requests, I've found it to either skip over it, or hang entirely, without ever hitting it, and uWSGI shows a gateway error, after timeout.
The problem here, as far as I see it is exactly that last point, the debugger connects to uWSGI / the application process, which is not any of the individual request processes.
To solve this, from my situation, it needed 2 things changed, 1 of which is the uWSGI configuration for my app. Our production file looks something like
[uwsgi]
...
master = true
enable-threads = true
processes = 5
But here, to give the debugger (and us) an easy time to connect to the request process, and stay connected, we change this to
[uwsgi]
...
master = true
enable-threads = false
processes = 1
Make it the master, disable threads, and limit it to only 1 process - http://uwsgi-docs.readthedocs.io/en/latest/Options.html
Then, in the startup python file, instead of setting the debugger to connect when the entire flask app starts, you set it to connect in a function decorated with the handy flask function, before_first_request http://flask.pocoo.org/docs/0.12/api/#flask.Flask.before_first_request, so the startup script changes to something like,
import sys
import wsgi_configuration_module
sys.path.append('pycharm-debug.egg')
import pydevd
app = wsgi_configuration_module.app
#app.before_first_request
def before_first_request():
pydevd.settrace('localhost', port=11211,
stdoutToServer=True, stderrToServer=True,
suspend=False)
print 'connected'
#
So now, you've limited uWSGI to no threads, and only 1 process to limit the chance of any mixup with them and the debugger, and set pydevd to only connect before the very first request. Now, the debugger connects (for me) successfully once, at the first request in this function, prints 'connected' only once, and from then on breakpoints connect in any of your request endpoint functions without issue.
I am working on migrating an existing python GAE (Google App Engine) standard environment app to the flexible environment. I read through the guide and decided to try out the python-compact runtime, as it's always good to re-use as much code as possible.
In the standard environment app, we use background_thread.start_new_background_thread() to spawn a bunch of infinite-loop threads to work on some background work forever. However, I couldn't get start_new_background_thread working in the flexible environment, even for some really simple app. Like this sample app:
github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/background
I keep getting the following error while running the app in the cloud (it works fine locally though).
I debugged into it by using the cloud debugger, but there was no any error message available at all while the exception was raised in the background_thread.py
Any idea how I can run a long-live background thread in the flexible environment with python-compact runtime? Thanks!
One of the differences between App Engine standard and App Engine flexible is that with Flex we're really just running a docker container. I can think of 2 approaches to try out.
1. Just use Python multiprocessing
App Engine standard enforces a sandbox that mostly means no direct use of threads or processes. With Flex, you should be able to just use the standard Python lib for starting a new sub process:
https://docs.python.org/3/library/subprocess.html
2. Use supervisord and docker
If that doesn't work, another approach you could take here is to customize the docker image you're using in Flex, and use supervisord to start multiple processes. First, generate the dockerfile by cd-ing into folder with your sources and running:
gcloud preview app gen-config --custom
This will create a Dockerfile that you can customize. Now, you are going to want to start 2 processes - the process we were starting (I think for python-compat it's gunicorn) and your background process. The easiest way to do that with docker is to use supervisord:
https://docs.docker.com/engine/admin/using_supervisord/
After modifying your Dockerfile and adding a supervisord.conf, you can just deploy your app as you normally would with gcloud preview app deploy.
Hope this helps!
I wish the documentation said that background_thread was not a supported API.
Anyway, I've found some hacks to help with some thread incompatibilities. App Engine uses os.environ to read a lot of settings. The "real" threads in your application will have a bunch of environment variables set there. The background threads you start will have none. One hack I've used is to copy some of the environment variables. For example, I needed to copy set the SERVER_SOFTWARE variable in the background threads in order to get the App Engine cloud storage library to work. We use something like:
_global_server_software = None
_SERVER_SOFTWARE = 'SERVER_SOFTWARE'
def environ_wrapper(function, args):
if _global_server_software is not None:
os.environ[_SERVER_SOFTWARE] = _global_server_software
function(*args)
def start_thread_with_app_engine_environ(function, *args):
# HACK: Required for the cloudstorage API on Flexible environment threads to work
# App Engine relies on a lot of environment variables to work correctly. New threads get none
# of those variables. loudstorage uses SERVER_SOFTWARE to determine if it is a test instance
global _global_server_software
if _global_server_software is None and os.environ.get(_SERVER_SOFTWARE) is not None:
_global_server_software = os.environ[_SERVER_SOFTWARE]
t = threading.Thread(target=environ_wrapper, args=(
function, args))
t.start()