event loop is closed in a celery worker

event loop is closed in a celery worker - python

I am facing multiple issues using an async ODM inside my celery worker
First i wasn't able to init my database models using celery worker signal
i am using beanie for the db connection.
First Implementation
from asyncer import syncify
from asgiref.sync import async_to_sync
client = AsyncIOMotorClient(
DATABASE_URL, uuidRepresentation="standard" )
db = client[DB_NAME]
async def db_session():
await init_beanie(
database=db,
document_models=[Project, User],
)
#worker_ready.connect
def startup_celery_ecosystem(**kwargs):
logger.info('Startup celery worker process')
async_to_sync(db_session)()
logger.info('FINISHED : Startup celery worker process')
async def get_users():
users = User.find()
users_list = await users.to_list()
return users_list
#celery_app.task
def pool_db():
async_to_sync(get_users)()
#syncify(get_users)() same error User class is not initialized yet (init_beanie should have already initialized all the models )
With this implementation i could not access my database using the User and Project class and it raises an error as if User and Project haven't been instantiated yet
The workaround is to call db_session() at the module level which solve the problem with database models instantiation, But now when querying the database i get the following error from my celery task
RuntimeError: Event loop is closed
Second Implementation
from asyncer import syncify
from asgiref.sync import async_to_sync client = AsyncIOMotorClient(
DATABASE_URL, uuidRepresentation="standard" )
db = client[DB_NAME]
async def db_session():
await init_beanie(
database=db,
document_models=[Project, User],
)
# now init_beanie at module level
async_to_sync(db_session)()
async def get_users():
users = User.find()
users_list = await users.to_list()
return users_list
#celery_app.task
def pool_db():
# this raises the following Runtime error RuntimeError('Event loop is closed')
async_to_sync(get_users)()
#syncify(get_users)() same error
i am not very familiar with how asyncio is implemented and how asyncer and asgiref allows to run async code inside a sync thread which left me confused, any help would be appriciated

After many investigation using flower for monitoring workers and logging the workers Id ( processes ids) it turns out that Celery worker itself does not process any tasks, it spawns other child processes ( this is my case because i am using the default executor pool which is prefork), while the signal ( worker_ready.connect ) is only run on the supervisor process Celery worker and not the childs, and since processes are isoleted memory wise, this means that you can't have access to db connection or any initialized ressources from the child processes.
Now i am using celery with gevent which only spawn 1 process, because initially my project doesn't require CPU heavy tasks which means i don't need all the cpu power provided by the prefork pool

Related

How to determine the name of a task in celery?

I have a fastAPI app where I want to call a celery task
I can not import the task as they are in two different code base. So I have to call it using its name.
in tasks.py
imagery = Celery(
"imagery", broker=os.getenv("BROKER_URL"), backend=os.getenv("REDIS_URL")
)
...
#imagery.task(bind=True, name="filter")
def filter_task(self, **kwargs) -> Dict[str, Any]:
print('running task')
The celery worker is running with this command:
celery worker -A worker.imagery -P threads --loglevel=INFO --queues=imagery
Now in my FastAPI code base I want to run the filter task.
So my understanding is I have to use the celery.send_task() function
In app.py I have
from celery import Celery, states
from celery.execute import send_task
from fastapi import FastAPI
from starlette.responses import JSONResponse, PlainTextResponse
from app import models
app = FastAPI()
tasks = Celery(broker=os.getenv("BROKER_URL"), backend=os.getenv("REDIS_URL"))
#app.post("/filter", status_code=201)
async def upload_images(data: models.FilterProductsModel):
"""
TODO: use a celery task(s) to query the database and upload the results to S3
"""
data = ['ok', 'un test']
data = ['ok', 'un test']
result = tasks.send_task('workers.imagery.filter', args=list(data))
return PlainTextResponse(f"here is the id: {str(result.ready())}")
After calling the /filter endpoint, I don't see any task being picked up by the worker.
So I tried different name in send_task()
filter
imagery.filter
worker.imagery.filter
How come my task never get picked up by the worker and nothing shows in the log?
Is my task name wrong?
Edit:
The worker process run in docker. Here is the fullpath of the file on its disk.
tasks.py : /workers/worker.py
So if I follow the import schema. the name of the task would be workers.worker.filter but this does not work, nothing get printed in the logs of docker. Is a print supposed to appear in the STDOUT of the celery cli?

Your Celery worker is subscribed to the imagery queue only . On the other hand, you try to send the task to the default queue (if you did not change configuration, the name of that queue is celery) with result = tasks.send_task('workers.imagery.filter', args=list(data)). It is not surprising you do not see task being executed by your worker as you have been sending tasks to the default queue whole time.
To fix this, try the following:
result = tasks.send_task('workers.imagery.filter', args=list(data), queue='imagery')

OP Here.
This is the solution I used.
task = signature("filter", kwargs=data.dict() ,queue="imagery")
res = task.delay()
As mentioned by #DejanLekic I had to specify the queue.

Aborting code execution in a Python Process without terminating the process

Let's say I have a (websocket) API, api.py, as such:
from flask import Flask, request
from flask_socketio import SocketIO, emit
from worker import Worker
app = Flask()
socketio = SocketIO(app)
worker = Worker()
worker.start()
#socketio.on('connect')
def connect():
print("Client", request.sid, "connected")
#socketio.on('get_results')
def get_results(query):
"""
The only endpoing of the API.
"""
print("Client", request.sid, "requested results for query", query)
# Set the worker to work, wait for results to be ready, and
# send the results back to the client.
worker.task_queue.put(query)
results = worker.result_queue.get()
emit("results", results)
#socketio.on('disconnect')
def disconnect():
print("Client", request.sid, "disconnected, perhaps before results where ready")
# What to do here?
socketio.run(app, host='')
The a API will serve many clients but only has a single worker to produce the results that should be served. worker.py:
from multiprocessing import Process, Queue
class Worker(Process):
def __init__(self):
super().__init__()
self.task_queue = Queue()
self.result_queue = Queue()
self.some_stateful_variable = 0
# Do other computationally expensive work
def reset_state(self):
# Computationally inexpensive.
pass
def do_work(self, task):
# Computationally expensive. Takes long time.
# Modifies internal state.
pass
def run(self):
while True:
task = self.task_queue.get()
results = self.do_work(task)
self.result_queue.put(results)
The worker gets a request, i.e. a task to do, and sets forth producing a result. When the result is ready, the client will be served it.
But not all clients are patient. They may leave, i.e. disconnect from the API, before the results are ready. They don't want them, and the worker therefore ends up working on a task that does not need to finish. That makes other client in queue wait unnecessarily. How to avoid this situation, and get the worker to abort executing do_work for a task that does not need to finish?

In client side: when user closes browser tab or leave the page send request to your Flask server, the request should contain id of the task you would like to cancel.
In server side put cancel status for the task in database or any shared variable between Flask Server and your Worker Process
Divide Task processing in several stages and check status of task in database before each stage, if status is canceled - stop the task processing.
Another choice for point 1 is to do some monitoring in Server side in separate Process - count interval between status requests from client side.

I've handled similar problems by launching an entirely separate process via:
sp.call('start python path\\worker.py', shell=True)
worker.py would then report its PID back to the api.py via redis, then its straightforward to kill the process at any point from api.py
Of course, how viable that is for you will depend on how much data resides within api.py and is shared to worker.py - whether its feasible for that to also pass via redis or not is for you to decide.
The added benefit is you decouple socket from heavy compute - and you can go quasi-multi-core (single thread per worker.py). You could go full multi core by incorporating multiprocessing into each worker.py if you wished.

Python Flask API with stomp listener multiprocessing

TLDR:
I need to setup a flask app for multiprocessing such that the API and stomp queue listener are running in separate processes and therefore not interfering with each other's operations.
Details:
I am building a python flask app that has API endpoints and also creates a message queue listener to connect to an activemq queue with the stomp package.
I need to implement multiprocessing such that the API and listener do not block each other's operation. That way the API will accept new requests and the listener will continue to listen for new messages and carry out tasks accordingly.
A simplified version of the code is shown below (some details are omitted for brevity).
Problem: The multiprocessing is causing the application to get stuck. The worker's run method is not called consistently, and therefore the listener never gets created.
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
After several days of troubleshooting I suspect the problem is due to the multiprocessing not being setup correctly. I was able to prove that this is the case because when I stripped out all of the multiprocessing code and called the worker's run method directly, the all of the queue management code is working correctly, the CustomWorker module creates the listener, creates the message, and picks up the message. I think this indicates that the queue management code is working correctly and the source of the problem is most likely due to the multiprocessing.
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
worker = MyWorker()
worker.run()
Here is the code I have so far:
App
This part of the code creates the API and attempts to create a new process to create the queue listener. The 'custom_worker_utils' module is a custom module that creates the stomp listener in the CustomWorker() class run method.
from flask import Flask, request, make_response, jsonify
from flask_restx import Resource, Api
import sys, os, logging, time
basedir = os.path.dirname(os.getcwd())
sys.path.append('..')
from custom_worker_utils.custom_worker_utils import *
from multiprocessing import Manager
# app.py
def create_app():
app = Flask(__name__)
app.config['BASE_DIR'] = basedir
api = Api(app, version='1.0', title='MPS Worker', description='MPS Common Worker')
logger = get_logger()
'''
This is a placeholder to trigger the sending of a message to the first queue
'''
#api.route('/initialapicall', endpoint="initialapicall", methods=['GET', 'POST', 'PUT', 'DELETE'])
class InitialApiCall(Resource):
#Sends a message to the queue
def get(self, *args, **kwargs):
mqconn = get_mq_connection()
message = create_queue_message(initial_tracker_file)
mqconn.send('/queue/test1', message, headers = {"persistent":"true"})
return make_response(jsonify({'message': 'Initial Test Call Worked!'}), 200)
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
#worker = MyWorker()
#worker.run()
return app
Custom worker utils
The run() method is called, connects to the queue and creates the listener with the stomp package
# custom_worker_utils.py
from multiprocessing import Manager, Process
from _datetime import datetime
import os, time, json, stomp, requests, logging, random
'''
The listener
'''
class MyListener(stomp.ConnectionListener):
def __init__(self, p):
self.process = p
self.logger = p.logger
self.conn = p.mqconn
self.conn.connect(_user, _password, wait=True)
self.subscribe_to_queue()
def on_message(self, headers, message):
message_data = json.loads(message)
ticket_id = message_data[constants.TICKET_ID]
prev_status = message_data[constants.PREVIOUS_STEP_STATUS]
task_name = message_data[constants.TASK_NAME]
#Run the service
if prev_status == "success":
resp = self.process.do_task(ticket_id, task_name)
elif hasattr(self, 'revert_task'):
resp = self.process.revert_task(ticket_id, task_name)
else:
resp = True
if (resp):
self.logger.debug('Acknowledging')
self.logger.debug(resp)
self.conn.ack(headers['message-id'], self.process.conn_id)
else:
self.conn.nack(headers['message-id'], self.process.conn_id)
def on_disconnected(self):
self.conn.connect('admin', 'admin', wait=True)
self.subscribe_to_queue()
def subscribe_to_queue(self):
queue = os.getenv('QUEUE_NAME')
self.conn.subscribe(destination=queue, id=self.process.conn_id, ack='client-individual')
def get_mq_connection():
conn = stomp.Connection([(_host, _port)], heartbeats=(4000, 4000))
conn.connect(_user, _password, wait=True)
return conn
class CustomWorker(Process):
def __init__(self, **kwargs):
super(CustomWorker, self).__init__()
self.logger = logging.getLogger("Worker Log")
log_level = os.getenv('LOG_LEVEL', 'WARN')
self.logger.setLevel(log_level)
self.mqconn = get_mq_connection()
self.conn_id = random.randrange(1,100)
for k, v in kwargs.items():
setattr(self, k, v)
def revert_task(self, ticket_id, task_name):
# If the subclass does not implement this,
# then there is nothing to undo so just return True
return True
def run(self):
lst = MyListener(self)
self.mqconn.set_listener('queue_listener', lst)
while True:
pass

Seems like Celery is excatly what you need.
Celery is a task queue that can distribute work across worker-processes and even across machines.
Miguel Grinberg created a great post about that, Showing how to accept tasks via flask and spawn them using Celery as tasks.
Good Luck!

To resolve this issue I have decided to run the flask API and the message queue listener as two entirely separate applications in the same docker container. I have installed and configured supervisord to start and the processes individually.
[supervisord]
nodaemon=true
logfile=/home/appuser/logs/supervisord.log
[program:gunicorn]
command=gunicorn -w 1 -c gunicorn.conf.py "app:create_app()" -b 0.0.0.0:8081 --timeout 10000
directory=/home/appuser/app
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_worker_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_worker_stderr.log
[program:mqlistener]
command=python3 start_listener.py
directory=/home/appuser/mqlistener
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_mqlistener_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_mqlistener_stderr.log

Django Celery get task count

I am currently using django with celery and everything works fine.
However I want to be able to give the users an opportunity to cancel a task if the server is overloaded by checking how many tasks are currently scheduled.
How can I achieve this ?
I am using redis as broker.
I just found this :
Retrieve list of tasks in a queue in Celery
It is somehow relate to my issue but I don't need to list the tasks , just count them :)

Here is how you can get the number of messages in a queue using celery that is broker-agnostic.
By using connection_or_acquire, you can minimize the number of open connections to your broker by utilizing celery's internal connection pooling.
celery = Celery(app)
with celery.connection_or_acquire() as conn:
conn.default_channel.queue_declare(
queue='my-queue', passive=True).message_count
You can also extend Celery to provide this functionality:
from celery import Celery as _Celery
class Celery(_Celery)
def get_message_count(self, queue):
'''
Raises: amqp.exceptions.NotFound: if queue does not exist
'''
with self.connection_or_acquire() as conn:
return conn.default_channel.queue_declare(
queue=queue, passive=True).message_count
celery = Celery(app)
num_messages = celery.get_message_count('my-queue')

If your broker is configured as redis://localhost:6379/1, and your tasks are submitted to the general celery queue, then you can get the length by the following means:
import redis
queue_name = "celery"
client = redis.Redis(host="localhost", port=6379, db=1)
length = client.llen(queue_name)
Or, from a shell script (good for monitors and such):
$ redis-cli -n 1 -h localhost -p 6379 llen celery

If you have already configured redis in your app, you can try this:
from celery import Celery
QUEUE_NAME = 'celery'
celery = Celery(app)
client = celery.connection().channel().client
length = client.llen(QUEUE_NAME)

Get a redis client instance used by Celery, then check the queue length. Don't forget to release the connection every time you use it (use .acquire):
# Get a configured instance of celery:
from project.celery import app as celery_app
def get_celery_queue_len(queue_name):
with celery_app.pool.acquire(block=True) as conn:
return conn.default_channel.client.llen(queue_name)
Always acquire a connection from the pool, don't create it manually. Otherwise, your redis server will run out of connection slots and this will kill your other clients.

I'll expand on the answer of #StephenFuhry around the not-found error, because more or less broker-agnostic way of retrieving queue length is beneficial even if Celery suggests to mess with brokers directly. In Celery 4 (with Redis broker) this error looks like:
ChannelError: Channel.queue_declare: (404) NOT_FOUND - no queue 'NAME' in vhost '/'
Observations:
ChannelError is a kombu exception (if fact, it's amqp's and kombu "re-exports" it).
On Redis broker Celery/Kombu represent queues as Redis lists
Redis collection type keys are removed whenever the collection becomes empty
If we look at what queue_declare does, it has these lines:
if passive and not self._has_queue(queue, **kwargs):
raise ChannelError(...)
Kombu Redis virtual transport's _has_queue is this:
def _has_queue(self, queue, **kwargs):
with self.conn_or_acquire() as client:
with client.pipeline() as pipe:
for pri in self.priority_steps:
pipe = pipe.exists(self._q_for_pri(queue, pri))
return any(pipe.execute())
The conclusion is that on a Redis broker ChannelError raised from queue_declare is okay (for an existing queue of course), and just means that the queue is empty.
Here's an example of how to output all active Celery queues' lengths (normally should be 0, unless your worker can't cope with the tasks).
from kombu.exceptions import ChannelError
def get_queue_length(name):
with celery_app.connection_or_acquire() as conn:
try:
ok_nt = conn.default_channel.queue_declare(queue=name, passive=True)
except ChannelError:
return 0
else:
return ok_nt.message_count
for queue_info in celery_app.control.inspect().active_queues().values():
print(queue_info[0]['name'], get_queue_length(queue_info[0]['name']))

Have Celery broadcast return results from all workers

Is there a way to get all the results from every worker on a Celery Broadcast task? I would like to monitor if everything went ok on all the workers. A list of workers that the task was send to would also be appreciated.

No, that is not easily possible.
But you don't have to limit yourself to the built-in amqp result backend,
you can send your own results using Kombu (http://kombu.readthedocs.org),
which is the messaging library used by Celery:
from celery import Celery
from kombu import Exchange
results_exchange = Exchange('myres', type='fanout')
app = Celery()
#app.task(ignore_result=True)
def something():
res = do_something()
with app.producer_or_acquire(block=True) as producer:
producer.send(
{'result': res},
exchange=results_exchange,
serializer='json',
declare=[results_exchange],
)
producer_or_acquire will create a new kombu.Producer using the celery
broker connection pool.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

event loop is closed in a celery worker - python

Related

How to determine the name of a task in celery?

Aborting code execution in a Python Process without terminating the process

Python Flask API with stomp listener multiprocessing

Django Celery get task count

Have Celery broadcast return results from all workers

Categories

Resources