Celery - Only one instance per task/process? - python

In the celery docs, section Instantiation (http://celery.readthedocs.org/en/latest/userguide/tasks.html#custom-task-classes) the following is stated:
A task is not instantiated for every request, but is registered in the task registry as a global instance.
This means that the init constructor will only be called once per process, and that the task class is semantically closer to an Actor.
Nevertheless, when I run the following example I see that the init method is called at least 3 times. What is wrong in the setup? The CELERYD_CONCURRENCY = 1 should make sure that there is only one process per worker, right?
$ celery -A proj beat
celery beat v3.1.17 (Cipater) is starting.
init Task1
40878160
x=1.0
init Task1
40878352
x=1.0
init Task1
40879312
x=1.0
__ - ... __ - _
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> now (0s)
[2015-02-05 23:05:21,875: INFO/MainProcess] beat: Starting...
[2015-02-05 23:05:21,971: INFO/MainProcess] Scheduler: Sending due task task1-every-5-seconds (proj.tasks.t1)
[2015-02-05 23:05:26,972: INFO/MainProcess] Scheduler: Sending due task task1-every-5-seconds (proj.tasks.t1)
celery.py:
from __future__ import absolute_import
from datetime import timedelta
from celery import Celery
app = Celery('proj',
broker='amqp://guest#localhost//',
backend='amqp://',
include=['proj.tasks'])
app.conf.update(
CELERY_REDIRECT_STDOUTS=True,
CELERY_TASK_RESULT_EXPIRES=60,
CELERYD_CONCURRENCY = 1,
CELERYBEAT_SCHEDULE = {
'task1-every-5-seconds': {
'task': 'proj.tasks.t1',
'schedule': timedelta(seconds=5)
},
},
CELERY_TIMEZONE = 'GMT',
)
if __name__ == '__main__':
app.start()
tasks.py:
from __future__ import absolute_import
from proj.celery import app
from celery import Task
import time
class Foo():
def __init__(self, x):
self.x = x
class Task1(Task):
abstract = True
def __init__(self):
print "init Task1"
print id(self)
self.f = Foo(1.0)
print "x=1.0"
#app.task(base=Task1)
def t1():
t1.f.x +=1
print t1.f.x

So, as per your comment, you need to maintain one connection per thread.
Why not to use a thread storage then? It should be a safe solution in your case.
from threading import local
thread_storage = local()
def get_or_create_conntection(*args, **kwargs):
if not hasattr(thread_storage, 'connection'):
thread_storage.connection = Connection(*args, **kwargs)
return thread_storage.connection
#app.task()
def do_stuff():
connection = get_or_create_connection('some', connection='args')
connection.ping()

Related

Different queues in celery

I have a project where I'm starting my FastAPI using a file (python main.py):
import uvicorn
from configuration import API_HOST, API_PORT
if __name__ == "__main__":
uvicorn.run("endpoints:app", host="localhost", port=8811, reload=True, access_log=False)
Inside endpoints.py I have:
from celery import Celery
from fastapi import FastAPI
import os
import time
# Create object for fastAPI
app = FastAPI(
title="MYFASTAPI",
description="MYDESCRIPTION",
version=1.0,
contact="ME!",
)
celery = Celery(__name__)
celery.conf.broker_url = os.environ.get("CELERY_BROKER_URL", "redis://localhost:6379")
celery.conf.result_backend = os.environ.get("CELERY_RESULT_BACKEND", "redis://localhost:6379")
celery.conf.task_track_started = True
celery.conf.task_serializer = pickle
celery.conf.result_serializer = pickle
celery.conf.accept_content = ["pickle"]
# By defaul celery can handle as many threads as CPU cores have the instance.
celery.conf.worker_concurrency = os.cpu_count()
# Start the celery worker. I start it in a separate thread, so fastapi can run in parallel
worker = celery.Worker()
def start_worker():
worker.start()
ce = threading.Thread(target=start_worker)
ce.start()
#app.post("/taskA")
def taskA():
task = ask_taskA.delay()
return {"task_id": task.id}
#celery.task(name="ask_taskA", bind=True)
def ask_taskA(self):
time.sleep(100)
#app.post("/get_results")
def get_results(task_id):
task_result = celery.AsyncResult(task_id)
return {'task_status': task_result.status}
Given this code, how can I have two different queues, assign a specific number of workers per earch queue and assign a specific task to one of these queues?
I read that people use to execute celery as:
celery -A proj worker
but there was a structure in the project that limited me because of some importings, and at the end I finished by starting the celery worker in a different thread (which works perfectly)
Based on the official celery documentation https://docs.celeryq.dev/en/stable/userguide/routing.html#manual-routing[1] you can follow this to specify different queues.
from kombu import Queue
app.conf.task_default_queue = 'default'
app.conf.task_queues = (
Queue('default', routing_key='task.#'),
Queue('feed_tasks', routing_key='feed.#'),
)
app.conf.task_default_exchange = 'tasks'
app.conf.task_default_exchange_type = 'topic'
app.conf.task_default_routing_key = 'task.default'

Django Celery Periodic task is not running on given crontab

I am using the below packages.
celery==5.1.2
Django==3.1
I have 2 periodic celery tasks, in which I want the first task to run every 15 mins and the second to run every 20 mins. But the problem is that the first task is running on time, while the second is running on random timing.
Although I'm getting a message on console on time for both tasks:
Scheduler: Sending due task <task_name> (<task_name>)
Please find the following files,
celery.py
from celery import Celery, Task
app = Celery('settings')
...
class PeriodicTask(Task):
#classmethod
def on_bound(cls, app):
app.conf.beat_schedule[cls.name] = {
"schedule": cls.run_every,
"task": cls.name,
"args": cls.args if hasattr(cls, "args") else (),
"kwargs": cls.kwargs if hasattr(cls, "kwargs") else {},
"options": cls.options if hasattr(cls, "options") else {}
}
tasks.py
from celery.schedules import crontab
from settings.celery import app, PeriodicTask
...
#app.task(
base=PeriodicTask,
run_every=crontab(minute='*/15'),
name='task1',
options={'queue': 'queue_name'}
)
def task1():
logger.info("task1 called")
#app.task(
base=PeriodicTask,
run_every=crontab(minute='*/20'),
name='task2'
)
def task2():
logger.info("task2 called")
Please help me to find the bug here. Thanks!

Django Celery periodic task example

I need a minimum example to do periodic task (run some function after every 5 minutes, or run something at 12:00:00 etc.).
In my myapp/tasks.py, I have,
from celery.task.schedules import crontab
from celery.decorators import periodic_task
from celery import task
#periodic_task(run_every=(crontab(hour="*", minute=1)), name="run_every_1_minutes", ignore_result=True)
def return_5():
return 5
#task
def test():
return "test"
When I run celery workers it does show the tasks (given below) but does not return any values (in either terminal or flower).
[tasks]
. mathematica.core.tasks.test
. run_every_1_minutes
Please provide a minimum example or hints to achieve the desired results.
Background:
I have a config/celery.py which contains the following:
import os
from celery import Celery
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.local")
app = Celery('config')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
And in my config/__init__.py, I have
from .celery import app as celery_app
__all__ = ['celery_app']
I added a function something like below in myapp/tasks.py
from celery import task
#task
def test():
return "test"
When I run test.delay() from shell, it runs successfully and also shows the task information in flower
To run periodic task you should run celery beat also. You can run it with this command:
celery -A proj beat
Or if you are using one worker:
celery -A proj worker -B

celery feature "reply_to" don't works as expected

I need configue to which queue celery should put result of task execution, I am using this way as described in documentation (item "reply_to"):
#app.task(reply_to='export_task') # <= configured right way
def test_func():
return "here is result of task"
Expected behavior
Task result should be in queue with name "export_task" (as configured in decorator)
Actual behavior
Task result locates in queue with name like:
d5587446-0149-3133-a3ed-d9a297d52a96
celery report:
python -m celery -A my_worker report
software -> celery:3.1.24 (Cipater) kombu:3.0.37 py:3.5.1
billiard:3.3.0.23 py-amqp:1.4.9
platform -> system:Windows arch:64bit, WindowsPE imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:amqp results:rpc:///
CELERY_ACCEPT_CONTENT: ['json']
CELERY_RESULT_BACKEND: 'rpc:///'
CELERY_QUEUES:
(<unbound Queue main_check -> <unbound Exchange main_check(direct)> -> main_check>,)
CELERYD_CONCURRENCY: 10
CELERY_TASK_SERIALIZER: 'json'
CELERY_RESULT_PERSISTENT: True
CELERY_ROUTES: {
'my_worker.test_func': {'queue': 'main_check'}}
BROKER_TRANSPORT: 'amqp'
CELERYD_MAX_TASKS_PER_CHILD: 3
CELERY_RESULT_SERIALIZER: 'json'
Steps to reproduce
Please create files of project.
celery_app.py:
from celery import Celery
from kombu import Exchange, Queue
app = Celery('worker')
app.conf.update(
CELERY_ROUTES={
'my_worker.test_func': {'queue': 'main_check'},
},
BROKER_TRANSPORT='amqp',
CELERY_RESULT_BACKEND='rpc://',
CELERY_RESULT_PERSISTENT=True,
# CELERY_DEFAULT_DELIVERY_MODE='persistent',
# CELERY_RESULT_EXCHANGE='export_task',
CELERYD_CONCURRENCY=10,
CELERYD_MAX_TASKS_PER_CHILD=3,
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
CELERY_ACCEPT_CONTENT=['json'],
CELERY_QUEUES=(
Queue('main_check', Exchange('main_check', type='direct'), routing_key='main_check'),
),
)
my_worker.py:
from celery_app import app
#app.task(reply_to='export_task')
def test_func():
return "here is result of task"
then start celery:
python -m celery -A my_worker worker --loglevel=info
then in python debug console add new task:
from my_worker import *
result = test_func.delay()
I asked to help on official issue tracker, but nobody cares.
I don't see in your code where that queue (export_task) has been declared.

How can I set up Celery to call a custom worker initialization?

I am quite new to Celery and I have been trying to setup a project with 2 separate queues (one to calculate and the other to execute). So far, so good.
My problem is that the workers in the execute queue need to instantiate a class with a unique object_id (one id per worker). I was wondering if I could write a custom worker initialization to initialize the object at start and keep it in memory until the worker is killed.
I found a similar question on custom_task but the proposed solution does not work in my case.
Considering the following toy example:
celery.py
from celery import Celery
app = Celery('proj',
broker='amqp://guest#localhost//',
backend='amqp://',
include=['proj.tasks'])
app.conf.update(
CELERY_TASK_RESULT_EXPIRES=60,
CELERY_ROUTES = {"proj.tasks.add1": {"queue": "q1"}},
)
if __name__ == '__main__':
app.start()
tasks.py
from proj.celery import app
from celery.signals import worker_init
#worker_init.connect(sender='worker1#hostname')
def configure_worker1(*args, **kwargs):
#SETUP id=1 for add1 here???
#worker_init.connect(sender='worker2#hostname')
def configure_worker2(*args, **kwargs):
#SETUP id=2 for add1 here???
#app.task
def add1(y):
return id + y
#app.task
def add(x, y):
return x + y
initializing:
celery multi start worker1 -A proj -l info -Q q1
celery multi start worker2 -A proj -l info -Q q1
celery multi start worker3 -A proj -l info
Is this the right approach? If so, what should I write in the configure_worker1 function in tasks.py to setup id at the worker initialization?
Thanks
I found out the answer by following this http://docs.celeryproject.org/en/latest/userguide/tasks.html#instantiation
The tasks.py looks like this:
from proj.celery import app
from celery import Task
class Task1(Task):
def __init__(self):
self._x = 1.0
class Task2(Task):
def __init__(self):
self._x = 2.0
#app.task(base=Task1)
def add1(y):
return add1._x + y
#app.task(base=Task2)
def add2(y):
return add2._x + y
initializing as before:
celery multi start worker1 -A proj -l info -Q q1
celery multi start worker2 -A proj -l info -Q q1
celery multi start worker3 -A proj -l info

Categories