Gunicorn do not reload worker - python

Can I tell the Gunicorn to fail when one of the workers failed to boot? I don't want gunicorn to automatically handle and reload the worker for me, but I want it to fail not trying to launch worker again and again. Should I raise any specific exception to master process or some signal? Or I can provide a command line argument when launching master process?
I want to implement smth like this logic in worker:
if cond():
sys.exit(1)
and then all the gunicorn to stop without relaunching this one worker

So, the solution is to use Gunicorn hooks. There are a lot of them but for this particular case you can use worker_int hook.
Example of usage might be the following (simplified version, launched with gunicorn app_module:app --config gunicorn_config.py), content of gunicorn_config.py:
import sys
workers = 1
loglevel = 'debug'
def worker_int(worker):
print('Exit because of worker failure')
sys.exit(1)
And you worker code might be a simple Flask app for example (content of app_module.py:
from flask import Flask
app = Flask()
Other useful hooks:
on_exit - before exiting gunicorn
pre_request - before a worker processes the request
on_starting - before the master process is initialized
That's it!

Related

APScheduler does not run scheduled tasks: Flask + uWSGI

I have an application on a Flask and uWSGI with a jobstore in a SQLite. I start the scheduler along with the application, and add new tasks through add_task when some url is visited.
I see that the tasks are saved correctly in the jobstore, I can view them through the API, but it does not execute at the appointed time.
A few important data:
uwsgi.ini
processes = 1
enable-threads = true
__init__.py
scheduler = APScheduler()
scheduler.init_app(app)
with app.app_context():
scheduler.start()
main.py
scheduler.add_job(
id='{}{}'.format(test.id, g.user.id),
func = pay_day,
args = [test.id, g.user.id],
trigger ='interval',
minutes=test.timer
)
in service.py
def pay_day(tid, uid):
with scheduler.app.app_context():
*some code here*
Interesting behavior: if you create a task by going to the URL and restart the application after that, the task will be executed. But if the application is running and one of the users creates a task by going to the URL, then this task will not be completed until the application is restarted.
I don't get any errors or exceptions, even in the scheduler logs.
I already have no idea how to make it work and what I did wrong. I need a hint.
uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the --enable-threads switch. See the uWSGI documentation for more details.
I know that you had enable-threads = true in uwsgi.ini, but try the to enable it using the command line.

Background worker process crashing

I'm creating a web app that takes username as input and perform some tasks with it. I'm using flask for the web and heroku to deploy it.I've two main python scripts these run the app. app.py and task.py. Everything is okey into my app.py file, where i used flask code for making the app but into my task.py file I've a
Initialisation step
from instabot import Bot
bot = Bot()
bot.login(username="myusername", password="mypassword")
now I deployed these script into the Procfile like
worker: python task.py
web: gunicorn app:app
but after deploying when i check it's log I'm getting
logs
2022-01-28T18:30:06.376795+00:00
heroku[worker.1]: Starting process
with command `python task.py`
2022-01-
28T18:30:07.081356+00:00
heroku[worker.1]: State changed
from starting to up
2022-01-
28T18:30:08.213687+00:00
heroku[worker.1]: Process exited
with status 0
2022-01-
28T18:30:08.274498+00:00
heroku[worker.1]: State changed
from up to crashed
as you can see it's getting crashed!! i don't know what's my mistake is can you please help me to figure out : (
The code in my worker is almost stolen from the redis-rq documentation, with some changes to make it work also in our local development enviroment:
import os
from rq import Queue, Connection
from tools import get_redis_store
# we need to use a different worker when we are in Heroku
if 'DYNO' in os.environ:
from rq.worker import HerokuWorker as Worker
else:
from rq import Worker
redis_url = os.getenv('REDIS_URL')
if not redis_url:
raise RuntimeError('Set up Redis first.')
listen = ['default']
conn = get_redis_store()
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
On tools.py
_redis_store = None
def get_redis_store():
'''
Get a connection pool to redis based on the url configured
on env variable REDIS_URL
Returns
-------
redis.ConnectionPool
'''
global _redis_store
if not _redis_store:
redis_url = os.getenv('REDIS_URL')
if redis_url:
logger.debug('starting redis: %s ' % redis_url)
if redis_url.startswith('rediss://'):
# redis 6 encripted connection
# heroku needs disable ssl verification
_redis_store = redis.from_url(
redis_url, ssl_cert_reqs=None)
else:
_redis_store = redis.from_url(redis_url)
else:
logger.debug('redis not configured')
return _redis_store
On the main app, to queue a task:
from rq import Queue
from tools import get_redis_store
redis_store = get_redis_store()
queue = Queue(connection=redis_store)
def _slow_task(param1, param2, paramn):
# here I execute the slow task
my_slow_code(param1, param2)
# when I need to execute the slow task
queue.enqueue(_slow_task, param1, param2)
There are multiple options to use redis on Heroku, depending on what you use, you could need to change the env variable REDIS_URL. The environment variable DYNO is checked to know if the code is running on Heroku or in a developer machine.

How to use the logging module in Python with gunicorn

I have a flask-based app. When I run it locally, I run it from the command line, but when I deploy it, I start it with gunicorn with multiple workers.
I want to use the logging module to log to a file. The docs I've found for this are https://docs.python.org/3/library/logging.html and https://docs.python.org/3/howto/logging-cookbook.html .
I am confused over the correct way to use logging when my app may be launched with gunicorn. The docs address threading but assume I have control of the master process. Points of confusion:
Will logger = logging.getLogger('myapp') return the same logger object in different gunicorn worker threads?
If I am attaching a logging FileHandler to my logger in order to log to a file, how can I avoid doing this multiple times in the different workers?
My understanding - which may be wrong - is that if I just call logger.setLevel(logging.DEBUG), this will send messages via the root logger which may have a higher default logging level and may ignore debug messages, and so I also need to call logging.basicConfig(logging.DEBUG) in order for my debug messages to get through. But the docs say not to call logging.basicConfig() from a thread. How can I correctly set the root logging level when using gunicorn? Or do I not need to?
This is my typical Flask/Gunicorn setup. Note gunicorn is ran via supervisor.
wsgi_web.py. Note ProxyFix to get a client's real IP address from Nginx.
from werkzeug.contrib.fixers import ProxyFix
from app import create_app
import logging
gunicorn_logger = logging.getLogger('gunicorn.error')
application = create_app(logger_override=gunicorn_logger)
application.wsgi_app = ProxyFix(application.wsgi_app, num_proxies=1)
Edit February 2020, for newer versions of werkzeug use the following and adjust the parameters to ProxyFix as necessary:
from werkzeug.middleware.proxy_fix import ProxyFix
from app import create_app
import logging
gunicorn_logger = logging.getLogger('gunicorn.error')
application = create_app(logger_override=gunicorn_logger)
application.wsgi_app = ProxyFix(application.wsgi_app, x_for=1, x_host=1)
Flask application factory create_app
def create_app(logger_override=None):
app = Flask(__name__)
if logger_override:
# working solely with the flask logger
app.logger.handlers = logger_override.handlers
app.logger.setLevel(logger_override.level)
# OR, working with multiple loggers
# for logger in (app.logger, logging.getLogger('sqlalchemy')):
# logger.handlers = logger_override.handlers
# logger.setLevel(logger_override.level)
# more
return app
Gunicorn command (4th line) within supervisor conf, note the --log-level parameter has been set to info in this instance. Note X-REAL-IP passed to access --access-logformat
[program:web]
directory = /home/paul/www/example
environment = APP_SETTINGS="app.config.ProductionConfig"
command = /home/paul/.virtualenvs/example/bin/gunicorn wsgi_web:application -b localhost:8000 --workers 3 --worker-class gevent --keep-alive 10 --log-level info --access-logfile /home/paul/www/logs/admin.gunicorn.access.log --error-logfile /home/paul/www/logs/admin.gunicorn.error.log --access-logformat '%%({X-REAL-IP}i)s %%(l)s %%(u)s %%(t)s "%%(r)s" %%(s)s %%(b)s "%%(f)s" "%%(a)s"'
user = paul
autostart=true
autorestart=true
Each worker is an isolated process with its own memory so you can't really share the same logger across different workers.
Your code runs inside these workers because the master process only cares about managing the workers.
The master process is a simple loop that listens for various process
signals and reacts accordingly. It manages the list of running workers
by listening for signals like TTIN, TTOU, and CHLD. TTIN and TTOU tell
the master to increase or decrease the number of running workers.
In Gunicorn itself, there are two main run modes
Sync
Async
So this is different from threading, this is multiprocessing.
However since Gunicorn 19, a threads option can be used to process requests in
multiple threads. Using threads assumes use of the gthread worker.
With this in mind, the logging code will be written once and will be invoked multiple times each time a new worker is created. You can use Singelton pattern to ensure the same logger instance is being used inside the same worker.
For configuring the logger itself, you just need to follow the normal process of setting the root logger levels and the different loggers levels.
basicConfig() won't affect the root handler if it's already setup:
This function does nothing if the root logger already has handlers configured for it.
To set the level on root explicitly do
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(name)
Then the level can be set on the handler or logger level.
handler = logging.handlers.TimedRotatingFileHandler(log_path, when='midnight', backupCount=30)
handler.setLevel(min_level)
You can check this similar answer for logging related details
Set logging levels
More Resources :
http://docs.gunicorn.org/en/stable/design.html

Heroku database management from worker

Wondering if anyone can help me with this or at least guide me the correct way.
I currently have a web and a worker process running. I need a task to run 24/7 while the dynos are online, it's job is to access the database and remove records that have expired by checking the "expiry" value for each record against the current timestamp.
My worker.py file:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
This is as shown by the heroku documentation.
Then in my app.py:
from rq import Queue
from worker import conn
from datetime import datetime
q = Queue(connection=conn)
def myFunction():
while True:
for item in Users.query.all():
if int(item.expiry) < (datetime.now().timestamp()):
db.session.delete(item)
db.session.commit()
If __name__ == “__main__”:
q.enqueue(myFunction)
app.run()
My profile looks like so:
web: gunicorn app:app
worker: python worker.py
When I run this, expired records are not removed from the database. Is there anyway I can solve this or diagnose the issue further?
The code that enqueues your task is inside the __name__ == “__main__” block, so it only runs when your script is run directly - eg via python app.py. But you are running this on Heroku via the procfile, which loads it as a module into gunicorn - so that code is never executed. You need to put it somewhere else.
Note though I can't see any reason for using rq here at all. That's used for creating workers that dynamically run offline tasks when enqueued by your web processes. But you seem to want one function to run continuously; rq is irrelevant here, you should just run that code directly via the procfile.

Get worker id from Gunicorn worker itself

I run multiple gunicorn workers with workers=4 setting. So as I understand I have 5 different processes: one gunicorn master process and 4 other worker processes. Can I get information about which worker is serving request at the moment? So can I get any worker id from inside worker itself like for each request send back a response with content: this request was served by worker:worker_id?
Within a worker code just use
import os
print(os.getpid())
Process id is a good enough identifier for such a case. Another option which is overkill obviously is to create a worker-id-file for each worker at this point https://docs.gunicorn.org/en/stable/settings.html?highlight=hooks#post-worker-init and read from it when needed. Do not forget to remove this file on exit https://docs.gunicorn.org/en/stable/settings.html?highlight=hooks#worker-exit
For debugging purpose, you may use post_request hook to log the worker pid
def post_response(worker, req, environ, resp):
worker.log.debug("%s", worker.pid)

Categories