Celery not discovering tasks inside Docker

Celery not discovering tasks inside Docker - python

I have a very simple implementation.
/lib/queue/__init__.py
from celery import Celery
from os import environ
REDIS_URI = environ.get('REDIS_URI')
app = Celery('tasks',
broker=f'redis://{REDIS_URI}')
app.autodiscover_tasks([
'lib.queue.cache',
], force=True)
/lib/queue/cache/tasks.py
from lib.queue import app
#app.task
def some_task():
pass
Dockerfile
RUN git clone <my_repo> /usr/src/lib
WORKDIR /usr/src/lib
RUN python3 setup.py install
CMD ["celery", "-A", "worker:app", "worker", "--loglevel=info", "--concurrency=4"]
/worker.py
from lib.queue import app
This works just fine if I initialize command line without Docker.
celery -A worker:app worker --loglevel=info
> [tasks]
> . lib.queue.cache.tasks.some_task
However, when I run it inside Docker, the tasks remain blank:
> [tasks]
Question:
Any thoughts as to why celery would not be able to find the library and tasks inside Docker? I am using another Dockerfile with an almost identical setup to push the tasks, and it is able to access lib.queue.cache.tasks no problem.

Because I have been asked to provide my solution a couple times, here it is. HOWEVER, it may not really be helpful since what I am doing now is slightly different.
Inside my worker file, where app is defined, I have just a single task.
app = Celery("tasks", broker=f"redis://{REDIS_URI}:{REDIS_PORT}/{REDIS_DB}")
#app.task
def run_task(task_name, *args, **kwargs):
print(f"Running {task_name}. Received...")
print(f"- args: {args}")
print(f"- kwargs: {kwargs}")
module_name, method_name = task_name.split(".")
module = import_module(f".{module_name}", package="common.tasks")
task = getattr(module, method_name)
loop = asyncio.get_event_loop()
retval = loop.run_until_complete(task(*args, **kwargs))
This may not be relevant to most people since it takes a string argument to import a coroutine and execute that. This really is because my tasks are sharing some functions that also need to execute in async world.

Related

Python - Celery autorelaod

How does you develop when using celery ?
Seem it require reload for every change,
I'm using command:
watchmedo auto-restart --directory=proj/ -p '*.py' --recursive -- celery -A proj worker --concurrency=1 --loglevel=INFO
cellery.py
from decouple import AutoConfig
cwd = os.getcwd()
DOTENV_FILE = cwd + '/proj/config/.env'
config = AutoConfig(search_path='DOTENV_FILE')
app = Celery('proj',
broker=config('CELERY_BROKER_URL'),
backend=config('CELERY_RESULT_BACKEND'),
include=['proj.tasks'])
app.conf.update(
result_expires=3600,
)
if __name__ == '__main__':
app.start()
tasks.py
from .celery import app
#app.task
def add(x, y):
return x + y

Even if there is a technical solution for this kind of reloading I would suggest you shouldn't use celery stuff as you develop your task function because, well, it's just a function! So my approach here is to get the function done first and add celery stuff then to check if it integrates well with other things like tasks in the chain, django, etc. The same technic will apply if you think about unit testing.

Python Celery Received unregistered task of type - import app error

I can't import my celery app to run tasks from my main Python application. I want to be able to run celery tasks from the myprogram.py file.
My celery_app.py file is as follows:
import celery
app = celery.Celery('MyApp', broker='redis://localhost:6379/0')
app.conf.broker_url = 'redis://localhost:6379/0'
app.conf.result_backend = 'redis://localhost:6379/0'
app.autodiscover_tasks()
#app.task(ignore_result=True)
def task_to_run():
print("Task Running")
# The following call runs a worker in celery
task_to_run.delay()
if __name__ == '__main__':
app.start()
Application structure
projectfolder/core/celery_app.py # Celery app
projectfolder/core/myprogram.py # My Python application
projectfolder/core/other python files...
The file myprogram.py contains the following:
from .celery_app import task_to_run
task_to_run.delay()
Error:
Received unregistered task of type 'projectfolder.core.celery_app.task_to_run'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
strategy = strategies[type_]
KeyError: 'projectfolder.core.celery_app.task_to_run'
Thanks

interesting, I didn't know about autodiscover_tasks, I guess it's new in 4.1
As I see in the documentation, this function takes list of packages to search. You might want to call it with:
app.autodiscover_tasks(['core.celery_app'])
or it might be better to extract the task to a seperate file called tasks.py and then it would be just:
app.autodiscover_tasks(['core']).
Alternatively, you can use the inculde parameter when creating the Celery instance:
app = celery.Celery('MyApp', broker='redis://localhost:6379/0', include=['core.celery_app']) or wherever your tasks are.
Good luck

Celery with RabbitMQ: AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'

I'm running the First Steps with Celery Tutorial.
We define the following task:
from celery import Celery
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
Then call it:
>>> from tasks import add
>>> add.delay(4, 4)
But I get the following error:
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
I'm running both the celery worker and the rabbit-mq server. Rather strangely, celery worker reports the task as succeeding:
[2014-04-22 19:12:03,608: INFO/MainProcess] Task test_celery.add[168c7d96-e41a-41c9-80f5-50b24dcaff73] succeeded in 0.000435483998444s: 19
Why isn't this working?

Just keep reading tutorial. It will be explained in Keep Results chapter.
To start Celery you need to provide just broker parameter, which is required to send messages about tasks. If you want to retrieve information about state and results returned by finished tasks you need to set backend parameter. You can find full list with description in Configuration docs: CELERY_RESULT_BACKEND.

I suggest having a look at:
http://www.cnblogs.com/fangwenyu/p/3625830.html
There you will see that
instead of
app = Celery('tasks', broker='amqp://guest#localhost//')
you should be writing
app = Celery('tasks', backend='amqp', broker='amqp://guest#localhost//')
This is it.

In case anyone made the same easy to make mistake as I did: The tutorial doesn't say so explicitly, but the line
app = Celery('tasks', backend='rpc://', broker='amqp://')
is an EDIT of the line in your tasks.py file. Mine now reads:
app = Celery('tasks', backend='rpc://', broker='amqp://guest#localhost//')
When I run python from the command line I get:
$ python
>>> from tasks import add
>>> result = add.delay(4,50)
>>> result.ready()
>>> False
All tutorials should be easy to follow, even when a little drunk. So far this one doesn't reach that bar.

What is not clear by the tutorial is that the tasks.py module needs to be edited so that you change the line:
app = Celery('tasks', broker='pyamqp://guest#localhost//')
to include the RPC result backend:
app = Celery('tasks', backend='rpc://', broker='pyamqp://')
Once done, Ctrl + C the celery worker process and restart it:
celery -A tasks worker --loglevel=info
The tutorial is confusing in that we're making the assumption that creation of the app object is done in the client testing session, which it is not.

In your project directory find the settings file.
Then run the below command in your terminal:
sudo vim settings.py
copy/paste the below config into your settings.py:
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
Note: This is your backend for storing the messages in the queue if you are using django-celery package for your Django project.

Celery rely both on a backend AND a broker.
This solved it for me using only Redis:
app = Celery("tasks", backend='redis://localhost',broker="redis://localhost")
Remember to restart worker in your terminal after changing the config

I solved this error by adding app after taskID:
response = AsyncResult(taskID, app=celery_app)
where celery_app = Celery('ANYTHING', broker=BROKER_URL, backend=BACKEND_URL )
if you want to get the status of the celery task to know whether it is "PENDING","SUCCESS","FAILURE"
status = response.status

My case was simple - I used interactive Python console and Python cached imported module. I killed console and started it again - everything works as it should.
import celery
app = celery.Celery('tasks', broker='redis://localhost:6379',
backend='mongodb://localhost:27017/celery_tasks')
#app.task
def add(x, y):
return x + y
In Python console.
>>> from tasks import add
>>> result = add.delay(4, 4)
>>> result.ready()
True

Switching from Windows to Linux solved the issue for me
Windows is not guaranteed to work, it's mentioned here

I had the same issue, what resolved it for me was to import the celery file (celery.py) in the init function of you're app with something like:
from .celery import CELERY_APP as celery_app
__all__ = ('celery_app',)
if you use a celery.py file as described here

Celery auto reload on ANY changes

I could make celery reload itself automatically when there is changes on modules in CELERY_IMPORTS in settings.py.
I tried to give mother modules to detect changes even on child modules but it did not detect changes in child modules. That make me understand that detecting is not done recursively by celery. I searched it in the documentation but I did not meet any response for my problem.
It is really bothering me to add everything related celery part of my project to CELERY_IMPORTS to detect changes.
Is there a way to tell celery that "auto reload yourself when there is any changes in anywhere of project".
Thank You!

Celery --autoreload doesn't work and it is deprecated.
Since you are using django, you can write a management command for that.
Django has autoreload utility which is used by runserver to restart WSGI server when code changes.
The same functionality can be used to reload celery workers. Create a seperate management command called celery. Write a function to kill existing worker and start a new worker. Now hook this function to autoreload as follows.
import shlex
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def restart_celery():
cmd = 'pkill celery'
subprocess.call(shlex.split(cmd))
cmd = 'celery worker -l info -A foo'
subprocess.call(shlex.split(cmd))
class Command(BaseCommand):
def handle(self, *args, **options):
print('Starting celery worker with autoreload...')
# For Django>=2.2
autoreload.run_with_reloader(restart_celery)
# For django<2.1
# autoreload.main(restart_celery)
Now you can run celery worker with python manage.py celery which will autoreload when codebase changes.
This is only for development purposes and do not use it in production. Code taken from my other answer here.

You can manually include additional modules with -I|--include. Combine this with GNU tools like find and awk and you'll be able to find all .py files and include them.
$ celery -A app worker --autoreload --include=$(find . -name "*.py" -type f | awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
Lets explain it:
find . -name "*.py" -type f
find searches recursively for all files containing .py. The output looks something like this:
./app.py
./some_package/foopy
./some_package/bar.py
Then:
awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=','
This line takes output of find as input and removes all occurences of ./. Then it replaces all / with a .. The last sub() removes replaces .py with an empty string. ORS replaces all newlines with ,. This outputs:
app,some_package.foo,some_package.bar,
The last command, sed removes the last ,.
So the command that is being executed looks like:
$ celery -A app worker --autoreload --include=app,some_package.foo,some_package.bar
If you have a virtualenv inside your source you can exclude it by adding -path .path_to_your_env -prune -o:
$ celery -A app worker --autoreload --include=$(find . -path .path_to_your_env -prune -o -name "*.py" -type f | awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')

You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery worker --app=worker.app --concurrency=1 --loglevel=INFO
More detailed

I used watchdog watchdemo utility, it works great but for some reason the PyCharm debugger was not able to debug the subprocess spawned by watchdemo.
So if your project has werkzeug as dependency, you can use the werkzeug._reloader.run_with_reloader function to autoreload celery worker on code change. Plus it works with PyCharm debugger.
"""
Filename: celery_dev.py
"""
import sys
from werkzeug._reloader import run_with_reloader
# this is the celery app path in my application, change it according to your project
from web.app import celery_app
def run():
# create copy of "argv" and remove script name
argv = sys.argv.copy()
argv.pop(0)
# start the celery worker
celery_app.worker_main(argv)
if __name__ == '__main__':
run_with_reloader(run)
Sample PyCharm debug configuration.
NOTE:
This is a private werkzeug API and is working as of Werkzeug==2.0.3. It may stop working in future versions. Use at you own risk.

OrangeTux's solution didn't work out for me, so I wrote a little Python script to achieve more or less the same. It monitors file changes using inotify, and triggers a celery restart if it detects a IN_MODIFY, IN_ATTRIB, or IN_DELETE.
#!/usr/bin/env python
"""Runs a celery worker, and reloads on a file change. Run as ./run_celery [directory]. If
directory is not given, default to cwd."""
import os
import sys
import signal
import time
import multiprocessing
import subprocess
import threading
import inotify.adapters
CELERY_CMD = tuple("celery -A amcat.amcatcelery worker -l info -Q amcat".split())
CHANGE_EVENTS = ("IN_MODIFY", "IN_ATTRIB", "IN_DELETE")
WATCH_EXTENSIONS = (".py",)
def watch_tree(stop, path, event):
"""
#type stop: multiprocessing.Event
#type event: multiprocessing.Event
"""
path = os.path.abspath(path)
for e in inotify.adapters.InotifyTree(path).event_gen():
if stop.is_set():
break
if e is not None:
_, attrs, path, filename = e
if filename is None:
continue
if any(filename.endswith(ename) for ename in WATCH_EXTENSIONS):
continue
if any(ename in attrs for ename in CHANGE_EVENTS):
event.set()
class Watcher(threading.Thread):
def __init__(self, path):
super(Watcher, self).__init__()
self.celery = subprocess.Popen(CELERY_CMD)
self.stop_event_wtree = multiprocessing.Event()
self.event_triggered_wtree = multiprocessing.Event()
self.wtree = multiprocessing.Process(target=watch_tree, args=(self.stop_event_wtree, path, self.event_triggered_wtree))
self.wtree.start()
self.running = True
def run(self):
while self.running:
if self.event_triggered_wtree.is_set():
self.event_triggered_wtree.clear()
self.restart_celery()
time.sleep(1)
def join(self, timeout=None):
self.running = False
self.stop_event_wtree.set()
self.celery.terminate()
self.wtree.join()
self.celery.wait()
super(Watcher, self).join(timeout=timeout)
def restart_celery(self):
self.celery.terminate()
self.celery.wait()
self.celery = subprocess.Popen(CELERY_CMD)
if __name__ == '__main__':
watcher = Watcher(sys.argv[1] if len(sys.argv) > 1 else ".")
watcher.start()
signal.signal(signal.SIGINT, lambda signal, frame: watcher.join())
signal.pause()
You should probably change CELERY_CMD, or any other global variables.

There was an issue in #AlexTT answer, I don't know if I should comment on his answer of put this as an answer.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery -A <app> worker --concurrency=1 --loglevel=INFO

This is the way I made it work in Django:
# worker_dev.py (put it next to manage.py)
from django.utils import autoreload
def run_celery():
from projectname import celery_app
celery_app.worker_main(["-Aprojectname", "-linfo", "-Psolo"])
print("Starting celery worker with autoreload...")
autoreload.run_with_reloader(run_celery)
Then run python worker_dev.py. This has an advantage of working inside docker container.

This is a huge adaptation from Suor's code.
I made a custom Django command which can be called like this:
python manage.py runcelery
So, every time the code changes, celery's main process is gracefully killed and then executed again.
Change the CELERY_COMMAND variable as you wish.
# File: runcelery.py
import os
import signal
import subprocess
import time
import psutil
from django.core.management.base import BaseCommand
from django.utils import autoreload
DELAY_UNTIL_START = 5.0
CELERY_COMMAND = 'celery --config my_project.celeryconfig worker --loglevel=INFO'
class Command(BaseCommand):
help = ''
def kill_celery(self, parent_pid):
os.kill(parent_pid, signal.SIGTERM)
def run_celery(self):
time.sleep(DELAY_UNTIL_START)
subprocess.run(CELERY_COMMAND.split(' '))
def get_main_process(self):
for process in psutil.process_iter():
if process.ppid() == 0: # PID 0 has no parent
continue
parent = psutil.Process(process.ppid())
if process.name() == 'celery' and parent.name() == 'celery':
return parent
return
def reload_celery(self):
parent = self.get_main_process()
if parent is not None:
self.stdout.write('[*] Killing Celery process gracefully..')
self.kill_celery(parent.pid)
self.stdout.write('[*] Starting Celery...')
self.run_celery()
def handle(self, *args, **options):
autoreload.run_with_reloader(self.reload_celery)

Python Celery could start with a threading inprocess ?

I want make a testcase with my celery codes.
But usually celery need start with a new process like $ celery -A CELERY_MODULE worker, It's means I can't run my testcase code directly ?
I'm configurate the Celery with memory store to void the extra I/O in the testcase. That's config can't sample share the task queue in different process.

Here is my naive implements.
The celery entry from celery.bin.celeryd.WorkCommand, it's parse the args and execute works.
Use the solo to void the MultiProcess use in the case. Of course you need install that's lib first.
You could use this before your celery testcase start.
#!/usr/bin/env python
#vim: encoding=utf-8
import time
import unittest
from threading import Thread
from celery import Celery, states
from celery.bin.celeryd import WorkerCommand
class CELERY_CONFIG(object):
BROKER_URL = "memory://"
CELERY_CACHE_BACKEND = "memory"
CELERY_RESULT_BACKEND = "cache"
CELERYD_POOL = "solo"
class CeleryTestCase(unittest.TestCase):
def test_inprocess(self):
app = Celery(__name__)
app.config_from_object(CELERY_CONFIG)
#app.task
def dumpy_task(dct):
return 321
worker = WorkerCommand(app)
#worker.execute_from_commandline(["-P solo"])
t = Thread(target=worker.execute_from_commandline, args=(["-c 1"],))
t.daemon = True
t.start()
ar = dumpy_task.apply_async(({"a": 123},))
while ar.status != states.SUCCESS:
time.sleep(.01)
self.assertEqual(states.SUCCESS, ar.status)
self.assertEqual(ar.result, 321)
t.join(0)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery not discovering tasks inside Docker - python

Related

Python - Celery autorelaod

Python Celery Received unregistered task of type - import app error

Celery with RabbitMQ: AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'

Celery auto reload on ANY changes

Python Celery could start with a threading inprocess ?

Categories

Resources