APScheduler does not run scheduled tasks: Flask + uWSGI - python

I have an application on a Flask and uWSGI with a jobstore in a SQLite. I start the scheduler along with the application, and add new tasks through add_task when some url is visited.
I see that the tasks are saved correctly in the jobstore, I can view them through the API, but it does not execute at the appointed time.
A few important data:
uwsgi.ini
processes = 1
enable-threads = true
__init__.py
scheduler = APScheduler()
scheduler.init_app(app)
with app.app_context():
scheduler.start()
main.py
scheduler.add_job(
id='{}{}'.format(test.id, g.user.id),
func = pay_day,
args = [test.id, g.user.id],
trigger ='interval',
minutes=test.timer
)
in service.py
def pay_day(tid, uid):
with scheduler.app.app_context():
*some code here*
Interesting behavior: if you create a task by going to the URL and restart the application after that, the task will be executed. But if the application is running and one of the users creates a task by going to the URL, then this task will not be completed until the application is restarted.
I don't get any errors or exceptions, even in the scheduler logs.
I already have no idea how to make it work and what I did wrong. I need a hint.

uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the --enable-threads switch. See the uWSGI documentation for more details.
I know that you had enable-threads = true in uwsgi.ini, but try the to enable it using the command line.

Related

Celery link_error raise NotRegistered exception

I have a client celery application issuing task for a worker (using Redis), it's working ok. Both client and worker applications uses the same config :
app = Celery('clientApp', broker='redis://redis:6379/0',backend='redis://redis:6379/0')
# Listen to queue2
app = Celery('workerApp', broker='redis://redis:6379/0',backend='redis://redis:6379/0')
# Listen to queue1
Now I want to execute handler on success or error, so I used something like this :
task = Signature('mytask', queue='queue1')
task.apply_async(
link=Signature("handle_success", queue='queue2'),
link_error=Signature("handle_error", queue='queue2'))
This call handle_success correctly on success but do not call handle_error when mytask raise an Exception. Can you see any reason why ? The goal here would be that the client to execute handle_error on task failed by the worker (like it does execute handle_sucess when the worker task complete sucessfully).
celery.exceptions.NotRegistered: 'handle_error'
I have no error or info messages when celery applications starts, backend is the same url for both apps and handle_success / handle_error correctly shows up in registered tasks for the client.
Resolved by using the hack listed in this article.
If the link_error argument is a single task, it will get executed by the worker directly (unlike link), one way to force the worker to send the task to the client is to use a chain.
from .tasks import error_callback
app.send_task("system_b.foo", link_error=(error_callback.si() | error_callback.si())
Or use a dummy task plus helper functions to make it more clear, see article.

How to use the logging module in Python with gunicorn

I have a flask-based app. When I run it locally, I run it from the command line, but when I deploy it, I start it with gunicorn with multiple workers.
I want to use the logging module to log to a file. The docs I've found for this are https://docs.python.org/3/library/logging.html and https://docs.python.org/3/howto/logging-cookbook.html .
I am confused over the correct way to use logging when my app may be launched with gunicorn. The docs address threading but assume I have control of the master process. Points of confusion:
Will logger = logging.getLogger('myapp') return the same logger object in different gunicorn worker threads?
If I am attaching a logging FileHandler to my logger in order to log to a file, how can I avoid doing this multiple times in the different workers?
My understanding - which may be wrong - is that if I just call logger.setLevel(logging.DEBUG), this will send messages via the root logger which may have a higher default logging level and may ignore debug messages, and so I also need to call logging.basicConfig(logging.DEBUG) in order for my debug messages to get through. But the docs say not to call logging.basicConfig() from a thread. How can I correctly set the root logging level when using gunicorn? Or do I not need to?
This is my typical Flask/Gunicorn setup. Note gunicorn is ran via supervisor.
wsgi_web.py. Note ProxyFix to get a client's real IP address from Nginx.
from werkzeug.contrib.fixers import ProxyFix
from app import create_app
import logging
gunicorn_logger = logging.getLogger('gunicorn.error')
application = create_app(logger_override=gunicorn_logger)
application.wsgi_app = ProxyFix(application.wsgi_app, num_proxies=1)
Edit February 2020, for newer versions of werkzeug use the following and adjust the parameters to ProxyFix as necessary:
from werkzeug.middleware.proxy_fix import ProxyFix
from app import create_app
import logging
gunicorn_logger = logging.getLogger('gunicorn.error')
application = create_app(logger_override=gunicorn_logger)
application.wsgi_app = ProxyFix(application.wsgi_app, x_for=1, x_host=1)
Flask application factory create_app
def create_app(logger_override=None):
app = Flask(__name__)
if logger_override:
# working solely with the flask logger
app.logger.handlers = logger_override.handlers
app.logger.setLevel(logger_override.level)
# OR, working with multiple loggers
# for logger in (app.logger, logging.getLogger('sqlalchemy')):
# logger.handlers = logger_override.handlers
# logger.setLevel(logger_override.level)
# more
return app
Gunicorn command (4th line) within supervisor conf, note the --log-level parameter has been set to info in this instance. Note X-REAL-IP passed to access --access-logformat
[program:web]
directory = /home/paul/www/example
environment = APP_SETTINGS="app.config.ProductionConfig"
command = /home/paul/.virtualenvs/example/bin/gunicorn wsgi_web:application -b localhost:8000 --workers 3 --worker-class gevent --keep-alive 10 --log-level info --access-logfile /home/paul/www/logs/admin.gunicorn.access.log --error-logfile /home/paul/www/logs/admin.gunicorn.error.log --access-logformat '%%({X-REAL-IP}i)s %%(l)s %%(u)s %%(t)s "%%(r)s" %%(s)s %%(b)s "%%(f)s" "%%(a)s"'
user = paul
autostart=true
autorestart=true
Each worker is an isolated process with its own memory so you can't really share the same logger across different workers.
Your code runs inside these workers because the master process only cares about managing the workers.
The master process is a simple loop that listens for various process
signals and reacts accordingly. It manages the list of running workers
by listening for signals like TTIN, TTOU, and CHLD. TTIN and TTOU tell
the master to increase or decrease the number of running workers.
In Gunicorn itself, there are two main run modes
Sync
Async
So this is different from threading, this is multiprocessing.
However since Gunicorn 19, a threads option can be used to process requests in
multiple threads. Using threads assumes use of the gthread worker.
With this in mind, the logging code will be written once and will be invoked multiple times each time a new worker is created. You can use Singelton pattern to ensure the same logger instance is being used inside the same worker.
For configuring the logger itself, you just need to follow the normal process of setting the root logger levels and the different loggers levels.
basicConfig() won't affect the root handler if it's already setup:
This function does nothing if the root logger already has handlers configured for it.
To set the level on root explicitly do
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(name)
Then the level can be set on the handler or logger level.
handler = logging.handlers.TimedRotatingFileHandler(log_path, when='midnight', backupCount=30)
handler.setLevel(min_level)
You can check this similar answer for logging related details
Set logging levels
More Resources :
http://docs.gunicorn.org/en/stable/design.html

Long running script from flask endpoint

I've been pulling my hair out trying to figure this one out, hoping someone else has already encountered this and knows how to solve it :)
I'm trying to build a very simple Flask endpoint that just needs to call a long running, blocking php script (think while true {...}). I've tried a few different methods to async launch the script, but the problem is my browser never actually receives the response back, even though the code for generating the response after running the script is executed.
I've tried using both multiprocessing and threading, neither seem to work:
# multiprocessing attempt
#app.route('/endpoint')
def endpoint():
def worker():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
p = multiprocessing.Process(target=worker)
print '111111'
p.start()
print '222222'
return json.dumps({
'success': True
})
# threading attempt
#app.route('/endpoint')
def endpoint():
def thread_func():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
t = threading.Thread(target=thread_func)
print '111111'
t.start()
print '222222'
return json.dumps({
'success': True
})
In both scenarios I see the 111111 and 222222, yet my browser still hangs on the response from the endpoint. I've tried p.daemon = True for the process, as well as p.terminate() but no luck. I had hoped launching a script with nohup in a different shell and separate processs/thread would just work, but somehow Flask or uWSGI is impacted by it.
Update
Since this does work locally on my Mac when I start my Flask app directly with python app.py and hit it directly without going through my Nginx proxy and uWSGI, I'm starting to believe it may not be the code itself that is having issues. And because my Nginx just forwards the request to uWSGI, I believe it may possibly be something there that's causing it.
Here is my ini configuration for the domain for uWSGI, which I'm running in emperor mode:
[uwsgi]
protocol = uwsgi
max-requests = 5000
chmod-socket = 660
master = True
vacuum = True
enable-threads = True
auto-procname = True
procname-prefix = michael-
chdir = /srv/www/mysite.com
module = app
callable = app
socket = /tmp/mysite.com.sock
This kind of stuff is the actual and probably main use case for Python Celery (https://docs.celeryproject.org/). As a general rule, do not run long-running jobs that are CPU-bound in the wsgi process. It's tricky, it's inefficient, and most important thing, it's more complicated than setting up an async task in a celery worker. If you want to just prototype you can set the broker to memory and not using an external server, or run a single-threaded redis on the very same machine.
This way you can launch the task, call task.result() which is blocking, but it blocks in an IO-bound fashion, or even better you can just return immediately by retrieving the task_id and build a second endpoint /result?task_id=<task_id> that checks if result is available:
result = AsyncResult(task_id, app=app)
if result.state == "SUCCESS":
return result.get()
else:
return result.state # or do something else depending on the state
This way you have a non-blocking wsgi app that does what is best suited for: short time CPU-unbound calls that have IO calls at most with OS-level scheduling, then you can rely directly to the wsgi server workers|processes|threads or whatever you need to scale the API in whatever wsgi-server like uwsgi, gunicorn, etc. for the 99% of workloads as celery scales horizontally by increasing the number of worker processes.
This approach works for me, it calls the timeout command (sleep 10s) in the command line and lets it work in the background. It returns the response immediately.
#app.route('/endpoint1')
def endpoint1():
subprocess.Popen('timeout 10', shell=True)
return 'success1'
However, not testing on WSGI server, but just locally.
Would it be enough to use a background task? Then you only need to import threading e.g.
import threading
import ....
def endpoint():
"""My endpoint."""
try:
t = BackgroundTasks()
t.start()
except RuntimeError as exception:
return f"An error occurred during endpoint: {exception}", 400
return "successful.", 200
return "successfully started.", 200
class BackgroundTasks(threading.Thread):
def run(self,*args,**kwargs):
...do long running stuff

Heroku database management from worker

Wondering if anyone can help me with this or at least guide me the correct way.
I currently have a web and a worker process running. I need a task to run 24/7 while the dynos are online, it's job is to access the database and remove records that have expired by checking the "expiry" value for each record against the current timestamp.
My worker.py file:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
This is as shown by the heroku documentation.
Then in my app.py:
from rq import Queue
from worker import conn
from datetime import datetime
q = Queue(connection=conn)
def myFunction():
while True:
for item in Users.query.all():
if int(item.expiry) < (datetime.now().timestamp()):
db.session.delete(item)
db.session.commit()
If __name__ == “__main__”:
q.enqueue(myFunction)
app.run()
My profile looks like so:
web: gunicorn app:app
worker: python worker.py
When I run this, expired records are not removed from the database. Is there anyway I can solve this or diagnose the issue further?
The code that enqueues your task is inside the __name__ == “__main__” block, so it only runs when your script is run directly - eg via python app.py. But you are running this on Heroku via the procfile, which loads it as a module into gunicorn - so that code is never executed. You need to put it somewhere else.
Note though I can't see any reason for using rq here at all. That's used for creating workers that dynamically run offline tasks when enqueued by your web processes. But you seem to want one function to run continuously; rq is irrelevant here, you should just run that code directly via the procfile.

Gunicorn do not reload worker

Can I tell the Gunicorn to fail when one of the workers failed to boot? I don't want gunicorn to automatically handle and reload the worker for me, but I want it to fail not trying to launch worker again and again. Should I raise any specific exception to master process or some signal? Or I can provide a command line argument when launching master process?
I want to implement smth like this logic in worker:
if cond():
sys.exit(1)
and then all the gunicorn to stop without relaunching this one worker
So, the solution is to use Gunicorn hooks. There are a lot of them but for this particular case you can use worker_int hook.
Example of usage might be the following (simplified version, launched with gunicorn app_module:app --config gunicorn_config.py), content of gunicorn_config.py:
import sys
workers = 1
loglevel = 'debug'
def worker_int(worker):
print('Exit because of worker failure')
sys.exit(1)
And you worker code might be a simple Flask app for example (content of app_module.py:
from flask import Flask
app = Flask()
Other useful hooks:
on_exit - before exiting gunicorn
pre_request - before a worker processes the request
on_starting - before the master process is initialized
That's it!

Categories