Get worker id from Gunicorn worker itself - python

I run multiple gunicorn workers with workers=4 setting. So as I understand I have 5 different processes: one gunicorn master process and 4 other worker processes. Can I get information about which worker is serving request at the moment? So can I get any worker id from inside worker itself like for each request send back a response with content: this request was served by worker:worker_id?

Within a worker code just use
import os
print(os.getpid())
Process id is a good enough identifier for such a case. Another option which is overkill obviously is to create a worker-id-file for each worker at this point https://docs.gunicorn.org/en/stable/settings.html?highlight=hooks#post-worker-init and read from it when needed. Do not forget to remove this file on exit https://docs.gunicorn.org/en/stable/settings.html?highlight=hooks#worker-exit

For debugging purpose, you may use post_request hook to log the worker pid
def post_response(worker, req, environ, resp):
worker.log.debug("%s", worker.pid)

Related

Celery not processing tasks everytime

I am having below configuration for celery
celery = Celery(__name__,
broker=os.environ.get('CELERY_BROKER_URL', 'redis://'),
backend=os.environ.get('CELERY_BROKER_URL', 'redis://'))
celery.config_from_object(APP_SETTINGS)
ssl = celery.conf.get('REDIS_SSL', True)
r = redis.StrictRedis(REDIS_BROKER, int(REDIS_BROKER_PORT), 0,
charset='utf-8', decode_responses=True, ssl=ssl)
db_uri = celery.conf.get('SQLALCHEMY_DATABASE_URI')
#celery.task
def process_task(data):
#some code here
I am calling process task inside API endpoint like
process_task.delay(data)
sometimes it's processing tasks sometimes not.
can someone help me to resolve this issue?
I am running worker like celery worker -A api.celery --loglevel=DEBUG --concurrency=10
Once all the worker-processes are busy the new tasks will just sit on the queue waiting for the next idle worker-process to start the task. This is most likely why you perceive this as "not processing tasks everytime". Go through the monitoring and management section of the Celery documentation to find how to monitor your Celery cluster. For starters, do celery worker -A api.celery inspect active to check the currently running tasks.

Celery link_error raise NotRegistered exception

I have a client celery application issuing task for a worker (using Redis), it's working ok. Both client and worker applications uses the same config :
app = Celery('clientApp', broker='redis://redis:6379/0',backend='redis://redis:6379/0')
# Listen to queue2
app = Celery('workerApp', broker='redis://redis:6379/0',backend='redis://redis:6379/0')
# Listen to queue1
Now I want to execute handler on success or error, so I used something like this :
task = Signature('mytask', queue='queue1')
task.apply_async(
link=Signature("handle_success", queue='queue2'),
link_error=Signature("handle_error", queue='queue2'))
This call handle_success correctly on success but do not call handle_error when mytask raise an Exception. Can you see any reason why ? The goal here would be that the client to execute handle_error on task failed by the worker (like it does execute handle_sucess when the worker task complete sucessfully).
celery.exceptions.NotRegistered: 'handle_error'
I have no error or info messages when celery applications starts, backend is the same url for both apps and handle_success / handle_error correctly shows up in registered tasks for the client.
Resolved by using the hack listed in this article.
If the link_error argument is a single task, it will get executed by the worker directly (unlike link), one way to force the worker to send the task to the client is to use a chain.
from .tasks import error_callback
app.send_task("system_b.foo", link_error=(error_callback.si() | error_callback.si())
Or use a dummy task plus helper functions to make it more clear, see article.

How do I get a Celery worker to consume an 'outside' RabbitMQ queue?

I have the following scripts:
celery_tasks.py
from celery import Celery
app = Celery(broker='amqp://guest:guest#localhost:5672//')
app.conf.task_default_queue = 'test_queue'
#app.task(acks_late=True)
def test(a):
return a
publish.py
from celery_tasks import test
test.delay('abc')
When i run publish.py and start the worker (celery -A celery_tasks worker --loglevel=DEBUG), the 'abc' content is published in the 'test_queue' and is consumed by the worker.
Is there a way for the worker to consume something from a queue that was not posted by Celery? For example, when I put something in the test_queue straight through RabbitMQ, without going through the Celery publisher, and run the Celery worker, it gave me the following warning:
WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: 'abc' (3b)
{content_type:None content_encoding:None
delivery_info:{'exchange': '', 'redelivered': False, 'delivery_tag': 1, 'consumer_tag': 'None2', 'routing_key': 'test_queue'} headers={}}
Is there a way to solve this?
Celery has a specific format and a set of headers that needs to be maintained to comply with it. Therefore you would have to reverse engineer it to make celery-compliant message not produced by celery.
Keep in mind that celery is not really made to send messages across the broker, but to send tasks, which are enhanced messages therefore have extras in the header part of the amqp message
It's a late answer but custom consumers might help you. I'm using this for consuming messages from rabbitmq. Where these messages are being populated from another app with pika.
http://docs.celeryproject.org/en/latest/userguide/extending.html#custom-message-consumers

Gunicorn do not reload worker

Can I tell the Gunicorn to fail when one of the workers failed to boot? I don't want gunicorn to automatically handle and reload the worker for me, but I want it to fail not trying to launch worker again and again. Should I raise any specific exception to master process or some signal? Or I can provide a command line argument when launching master process?
I want to implement smth like this logic in worker:
if cond():
sys.exit(1)
and then all the gunicorn to stop without relaunching this one worker
So, the solution is to use Gunicorn hooks. There are a lot of them but for this particular case you can use worker_int hook.
Example of usage might be the following (simplified version, launched with gunicorn app_module:app --config gunicorn_config.py), content of gunicorn_config.py:
import sys
workers = 1
loglevel = 'debug'
def worker_int(worker):
print('Exit because of worker failure')
sys.exit(1)
And you worker code might be a simple Flask app for example (content of app_module.py:
from flask import Flask
app = Flask()
Other useful hooks:
on_exit - before exiting gunicorn
pre_request - before a worker processes the request
on_starting - before the master process is initialized
That's it!

Celery can't get result from given task

I try to configure celery with rabbitmq. Server works fine, my worker receive task and return succeded result but communication(?) fail. I'm following first steps from celery doc. I started tasks worker and created file tasks.py. My connection:
app = Celery('tasks', backend='amqp', broker='amqp://')
Logs inside worker (correct):
[2015-03-13 21:00:46,146: INFO/MainProcess] Task tasks.add[ee0fd026-d08e-4380-b010-9bbe65cb8b8f] succeeded in 0.00891784499981s: 4
but can't get result and the status is pending
add_task = tasks.add.delay(2,2)
In [4]: add_task.status
Out[4]: 'PENDING'
add_task.status gets the state of the task as soon as you queue it (remember, you are using .delay, not executing it immediately), which will be PENDING.
To get the state of the task from the backend use AsyncResult.
res = tasks.add.AsyncResult(add_task.task_id)
This will work, unless you have set the backend to ignore task results.

Categories