How to pass settings to rq task? - python

I wonder what's the best approach to pass settings from a RQ worker to task function?
I want to keep all of my settings outside of the app in text file, then read it while worker is starting and then have those values for performing the actual task.
Right now I'm using os.environ.
Thanks!

It appears from the documentation from the section "Custom worker classes" that the way to do this is to write a custom worker class. Here is a pattern that I've been using which loads a section from a ConfigParser file into a virtual python module named config:
import configparser
import rq.worker
import sys
import types
class ConfigurableWorker(rq.worker.Worker):
"""An RQ Worker class which loads configuration from config.ini"""
def work(self, *args, **kwargs):
"""Perform RQ work"""
# Store some configuration on a virtual module
sys.modules['config'] = types.ModuleType('config')
import config
config_reader = configparser.ConfigParser()
config_reader.read('config.ini')
for key, value in config_reader['rq'].items():
setattr(config, key, value)
# Do work
super().work(*args, **kwargs)
And then from the job function:
def my_bunch_of_work():
import config
con = db_connection(url=config.my_db_connection)
Running the rq worker with a customer worker class can be done with the --worker-class argument.
$ rq worker --worker-class my.prog.ConfigurableWorker
It might be nice to do some kind of connection pooling setup in the ConfigurableWorker.work method, but I'm unsure of the safety of that without knowing too much about RQ worker fork model with os.fork()

Assuming you are taking about python-rq, you may want to read this page here and for config file here.

Try using the os module.
#IN WORKER
import os
os.environ['TOKEN'] = 'xxx'
# IN JOB
#Open child processes via os.system(), popen() or fork() and execv()
someVariable = os.environ['TOKEN']

Related

Python: logging object loses its file handler when passed to an RQ task queue

Problem
I pass a logging (logger) object, supposed to add lines to test.log, to a function background_task() that is run by the rq utility (task queues manager). logger has a FileHandler assigned to it to allow logging to test.log. Until background_task() is run, you can see the file handler present in logger.handlers, but when the logger is passed to background_task() and background_task() is run by rq worker, logger.handlers gets empty.
But if I ditch rq (and Redis) and just run background_task right away, the content of logger.handlers is preserved. So, it has something to do with rq (and, probably, task queuing in general, it's a new topic for me).
Steps to reproduce
Run add_job.py: python3 add_job.py. You'll see the output of print(logger.handlers) called from within add_job(): there will be a handlers list containing FileHandler added in get_job_logger().
Run command rq worker to start executing the queued task. You'll see the output of print(logger.handlers) once again but this time called from within background_task() and the list will be empty! Handlers of the logging (logger) object somehow get lost when the function that accepts a logger as an argument is run by rq (rq worker). What gives?
Here's how it looks like in the terminal:
$ python3 add_job.py
[<FileHandler /home/user/project/test.log (INFO)>]
$ rq worker
17:44:45 Worker rq:worker:2bbad3623e95438f81396c662cb01284: started, version 1.10.1
17:44:45 Subscribing to channel rq:pubsub:2bbad3623e95438f81396c662cb01284
17:44:45 *** Listening on default...
17:44:45 default: tasks.background_task(<RootLogger root (INFO)>) (5a5301be-efc3-49a7-ab0c-f7cf0a4bd3e5)
[]
Source code
add_job.py
import logging
from logging import FileHandler
from redis import Redis
from rq import Queue
from tasks import background_task
def add_job():
r = Redis()
qu = Queue(connection=r)
logger = get_job_logger()
print(logger.handlers)
job = qu.enqueue(background_task, logger)
def get_job_logger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger_file_handler = FileHandler('test.log')
logger_file_handler.setLevel(logging.INFO)
logger.addHandler(logger_file_handler)
return logger
if __name__ == '__main__':
add_job()
tasks.py
def background_task(logger):
print(logger.handlers)
Answered here.
FileHandler does not get carried over into other threads. You start the FileHandler in the main thread and rq worker starts other threads. Memory is not shared like that.
Hm, I see... Thanks!
I assumed the FileHandler was being serialized or whatnot when written to Redis as a part of the whole logger object and then reinitialized when popping out of the queue.
Anyway, I'll try passing a file path to the function and initialize a logger from within. That way, keeping the FileHandler object to one thread.
EDIT: yeah, it works

Python Celery Received unregistered task of type - import app error

I can't import my celery app to run tasks from my main Python application. I want to be able to run celery tasks from the myprogram.py file.
My celery_app.py file is as follows:
import celery
app = celery.Celery('MyApp', broker='redis://localhost:6379/0')
app.conf.broker_url = 'redis://localhost:6379/0'
app.conf.result_backend = 'redis://localhost:6379/0'
app.autodiscover_tasks()
#app.task(ignore_result=True)
def task_to_run():
print("Task Running")
# The following call runs a worker in celery
task_to_run.delay()
if __name__ == '__main__':
app.start()
Application structure
projectfolder/core/celery_app.py # Celery app
projectfolder/core/myprogram.py # My Python application
projectfolder/core/other python files...
The file myprogram.py contains the following:
from .celery_app import task_to_run
task_to_run.delay()
Error:
Received unregistered task of type 'projectfolder.core.celery_app.task_to_run'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
strategy = strategies[type_]
KeyError: 'projectfolder.core.celery_app.task_to_run'
Thanks
interesting, I didn't know about autodiscover_tasks, I guess it's new in 4.1
As I see in the documentation, this function takes list of packages to search. You might want to call it with:
app.autodiscover_tasks(['core.celery_app'])
or it might be better to extract the task to a seperate file called tasks.py and then it would be just:
app.autodiscover_tasks(['core']).
Alternatively, you can use the inculde parameter when creating the Celery instance:
app = celery.Celery('MyApp', broker='redis://localhost:6379/0', include=['core.celery_app']) or wherever your tasks are.
Good luck

Django Celery - Passing an object to the views and between tasks using RabbitMQ

This is the first time I'm using Celery, and honestly, I'm not sure I'm doing it right. My system has to run on Windows, so I'm using RabbitMQ as the broker.
As a proof of concept, I'm trying to create a single object where one task sets the value, another task reads the value, and I also want to show the current value of the object when I go to a certain url. However I'm having problems sharing the object between everything.
This is my celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault('DJANGO_SETTINGS_MODULE','cesGroundStation.settings')
app = Celery('cesGroundStation')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind = True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
The object I'm trying to share is:
class SchedulerQ():
item = 0
def setItem(self, item):
self.item = item
def getItem(self):
return self.item
This is my tasks.py
from celery import shared_task
from time import sleep
from scheduler.schedulerQueue import SchedulerQ
schedulerQ = SchedulerQ()
#shared_task()
def SchedulerThread():
print ("Starting Scheduler")
counter = 0
while(1):
counter += 1
if(counter > 100):
counter = 0
schedulerQ.setItem(counter)
print("In Scheduler thread - " + str(counter))
sleep(2)
print("Exiting Scheduler")
#shared_task()
def RotatorsThread():
print ("Starting Rotators")
while(1):
item = schedulerQ.getItem()
print("In Rotators thread - " + str(item))
sleep(2)
print("Exiting Rotators")
#shared_task()
def setSchedulerQ(schedulerQueue):
schedulerQ = schedulerQueue
#shared_task()
def getSchedulerQ():
return schedulerQ
I'm starting my tasks in my apps.py...I'm not sure if this is the right place as the tasks/workers don't seem to work until I start the workers in a separate console where I run the celery -A cesGroundStation -l info.
from django.apps import AppConfig
from scheduler.schedulerQueue import SchedulerQ
from scheduler.tasks import SchedulerThread, RotatorsThread, setSchedulerQ, getSchedulerQ
class SchedulerConfig(AppConfig):
name = 'scheduler'
def ready(self):
schedulerQ = SchedulerQ()
setSchedulerQ.delay(schedulerQ)
SchedulerThread.delay()
RotatorsThread.delay()
In my views.py I have this:
def schedulerQ():
queue = getSchedulerQ.delay()
return HttpResponse("Your list: " + queue)
The django app runs without errors, however my output from "celery -A cesGroundStation -l info" is this: Celery command output
First it seems to start multiple "SchedulerThread" tasks, secondly the "SchedulerQ" object isn't being passed to the Rotators, as it's not reading the updated value.
And if I go to the url for which shows the views.schedulerQ view I get this error:
Django views error
I have very, very little experience with Python, Django and Web Development in general, so I have no idea where to start with that last error. Solutions suggest using Redis to pass the object to the views, but I don't know how I'd do that using RabbitMQ. Later on the schedulerQ object will implement a queue and the scheduler and rotators will act as more of a producer/consumer dynamic with the view showing the contents of the queue, so I believe using the database might be too resource intensive. How can I share this object across all tasks, and is this even the right approach?
The right approach would be to use some persistence layer, such as a database or results back end to store the information you want to share between tasks if you need to share information between tasks (in this example, what you are currently putting in your class).
Celery operates on a distributed message passing paradigm - a good way to distill that idea for this example, is that your module will be executed independently every time a task is dispatched. Whenever a task is dispatched to Celery, you must assume it is running in a seperate interpreter and loaded independently of other tasks. That SchedulerQ class is instantiated anew each time.
You can share information between tasks in ways described in the docs linked previously and some best practice tips discuss data persistence concerns.

Celery auto reload on ANY changes

I could make celery reload itself automatically when there is changes on modules in CELERY_IMPORTS in settings.py.
I tried to give mother modules to detect changes even on child modules but it did not detect changes in child modules. That make me understand that detecting is not done recursively by celery. I searched it in the documentation but I did not meet any response for my problem.
It is really bothering me to add everything related celery part of my project to CELERY_IMPORTS to detect changes.
Is there a way to tell celery that "auto reload yourself when there is any changes in anywhere of project".
Thank You!
Celery --autoreload doesn't work and it is deprecated.
Since you are using django, you can write a management command for that.
Django has autoreload utility which is used by runserver to restart WSGI server when code changes.
The same functionality can be used to reload celery workers. Create a seperate management command called celery. Write a function to kill existing worker and start a new worker. Now hook this function to autoreload as follows.
import shlex
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def restart_celery():
cmd = 'pkill celery'
subprocess.call(shlex.split(cmd))
cmd = 'celery worker -l info -A foo'
subprocess.call(shlex.split(cmd))
class Command(BaseCommand):
def handle(self, *args, **options):
print('Starting celery worker with autoreload...')
# For Django>=2.2
autoreload.run_with_reloader(restart_celery)
# For django<2.1
# autoreload.main(restart_celery)
Now you can run celery worker with python manage.py celery which will autoreload when codebase changes.
This is only for development purposes and do not use it in production. Code taken from my other answer here.
You can manually include additional modules with -I|--include. Combine this with GNU tools like find and awk and you'll be able to find all .py files and include them.
$ celery -A app worker --autoreload --include=$(find . -name "*.py" -type f | awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
Lets explain it:
find . -name "*.py" -type f
find searches recursively for all files containing .py. The output looks something like this:
./app.py
./some_package/foopy
./some_package/bar.py
Then:
awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=','
This line takes output of find as input and removes all occurences of ./. Then it replaces all / with a .. The last sub() removes replaces .py with an empty string. ORS replaces all newlines with ,. This outputs:
app,some_package.foo,some_package.bar,
The last command, sed removes the last ,.
So the command that is being executed looks like:
$ celery -A app worker --autoreload --include=app,some_package.foo,some_package.bar
If you have a virtualenv inside your source you can exclude it by adding -path .path_to_your_env -prune -o:
$ celery -A app worker --autoreload --include=$(find . -path .path_to_your_env -prune -o -name "*.py" -type f | awk '{sub("\./",""); gsub("/", "."); sub(".py",""); print}' ORS=',' | sed 's/.$//')
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery worker --app=worker.app --concurrency=1 --loglevel=INFO
More detailed
I used watchdog watchdemo utility, it works great but for some reason the PyCharm debugger was not able to debug the subprocess spawned by watchdemo.
So if your project has werkzeug as dependency, you can use the werkzeug._reloader.run_with_reloader function to autoreload celery worker on code change. Plus it works with PyCharm debugger.
"""
Filename: celery_dev.py
"""
import sys
from werkzeug._reloader import run_with_reloader
# this is the celery app path in my application, change it according to your project
from web.app import celery_app
def run():
# create copy of "argv" and remove script name
argv = sys.argv.copy()
argv.pop(0)
# start the celery worker
celery_app.worker_main(argv)
if __name__ == '__main__':
run_with_reloader(run)
Sample PyCharm debug configuration.
NOTE:
This is a private werkzeug API and is working as of Werkzeug==2.0.3. It may stop working in future versions. Use at you own risk.
OrangeTux's solution didn't work out for me, so I wrote a little Python script to achieve more or less the same. It monitors file changes using inotify, and triggers a celery restart if it detects a IN_MODIFY, IN_ATTRIB, or IN_DELETE.
#!/usr/bin/env python
"""Runs a celery worker, and reloads on a file change. Run as ./run_celery [directory]. If
directory is not given, default to cwd."""
import os
import sys
import signal
import time
import multiprocessing
import subprocess
import threading
import inotify.adapters
CELERY_CMD = tuple("celery -A amcat.amcatcelery worker -l info -Q amcat".split())
CHANGE_EVENTS = ("IN_MODIFY", "IN_ATTRIB", "IN_DELETE")
WATCH_EXTENSIONS = (".py",)
def watch_tree(stop, path, event):
"""
#type stop: multiprocessing.Event
#type event: multiprocessing.Event
"""
path = os.path.abspath(path)
for e in inotify.adapters.InotifyTree(path).event_gen():
if stop.is_set():
break
if e is not None:
_, attrs, path, filename = e
if filename is None:
continue
if any(filename.endswith(ename) for ename in WATCH_EXTENSIONS):
continue
if any(ename in attrs for ename in CHANGE_EVENTS):
event.set()
class Watcher(threading.Thread):
def __init__(self, path):
super(Watcher, self).__init__()
self.celery = subprocess.Popen(CELERY_CMD)
self.stop_event_wtree = multiprocessing.Event()
self.event_triggered_wtree = multiprocessing.Event()
self.wtree = multiprocessing.Process(target=watch_tree, args=(self.stop_event_wtree, path, self.event_triggered_wtree))
self.wtree.start()
self.running = True
def run(self):
while self.running:
if self.event_triggered_wtree.is_set():
self.event_triggered_wtree.clear()
self.restart_celery()
time.sleep(1)
def join(self, timeout=None):
self.running = False
self.stop_event_wtree.set()
self.celery.terminate()
self.wtree.join()
self.celery.wait()
super(Watcher, self).join(timeout=timeout)
def restart_celery(self):
self.celery.terminate()
self.celery.wait()
self.celery = subprocess.Popen(CELERY_CMD)
if __name__ == '__main__':
watcher = Watcher(sys.argv[1] if len(sys.argv) > 1 else ".")
watcher.start()
signal.signal(signal.SIGINT, lambda signal, frame: watcher.join())
signal.pause()
You should probably change CELERY_CMD, or any other global variables.
There was an issue in #AlexTT answer, I don't know if I should comment on his answer of put this as an answer.
You can use watchmedo
pip install watchdog
Start celery worker indirectly via watchmedo
watchmedo auto-restart --directory=./ --pattern=*.py --recursive -- celery -A <app> worker --concurrency=1 --loglevel=INFO
This is the way I made it work in Django:
# worker_dev.py (put it next to manage.py)
from django.utils import autoreload
def run_celery():
from projectname import celery_app
celery_app.worker_main(["-Aprojectname", "-linfo", "-Psolo"])
print("Starting celery worker with autoreload...")
autoreload.run_with_reloader(run_celery)
Then run python worker_dev.py. This has an advantage of working inside docker container.
This is a huge adaptation from Suor's code.
I made a custom Django command which can be called like this:
python manage.py runcelery
So, every time the code changes, celery's main process is gracefully killed and then executed again.
Change the CELERY_COMMAND variable as you wish.
# File: runcelery.py
import os
import signal
import subprocess
import time
import psutil
from django.core.management.base import BaseCommand
from django.utils import autoreload
DELAY_UNTIL_START = 5.0
CELERY_COMMAND = 'celery --config my_project.celeryconfig worker --loglevel=INFO'
class Command(BaseCommand):
help = ''
def kill_celery(self, parent_pid):
os.kill(parent_pid, signal.SIGTERM)
def run_celery(self):
time.sleep(DELAY_UNTIL_START)
subprocess.run(CELERY_COMMAND.split(' '))
def get_main_process(self):
for process in psutil.process_iter():
if process.ppid() == 0: # PID 0 has no parent
continue
parent = psutil.Process(process.ppid())
if process.name() == 'celery' and parent.name() == 'celery':
return parent
return
def reload_celery(self):
parent = self.get_main_process()
if parent is not None:
self.stdout.write('[*] Killing Celery process gracefully..')
self.kill_celery(parent.pid)
self.stdout.write('[*] Starting Celery...')
self.run_celery()
def handle(self, *args, **options):
autoreload.run_with_reloader(self.reload_celery)

Python Celery could start with a threading inprocess ?

I want make a testcase with my celery codes.
But usually celery need start with a new process like $ celery -A CELERY_MODULE worker, It's means I can't run my testcase code directly ?
I'm configurate the Celery with memory store to void the extra I/O in the testcase. That's config can't sample share the task queue in different process.
Here is my naive implements.
The celery entry from celery.bin.celeryd.WorkCommand, it's parse the args and execute works.
Use the solo to void the MultiProcess use in the case. Of course you need install that's lib first.
You could use this before your celery testcase start.
#!/usr/bin/env python
#vim: encoding=utf-8
import time
import unittest
from threading import Thread
from celery import Celery, states
from celery.bin.celeryd import WorkerCommand
class CELERY_CONFIG(object):
BROKER_URL = "memory://"
CELERY_CACHE_BACKEND = "memory"
CELERY_RESULT_BACKEND = "cache"
CELERYD_POOL = "solo"
class CeleryTestCase(unittest.TestCase):
def test_inprocess(self):
app = Celery(__name__)
app.config_from_object(CELERY_CONFIG)
#app.task
def dumpy_task(dct):
return 321
worker = WorkerCommand(app)
#worker.execute_from_commandline(["-P solo"])
t = Thread(target=worker.execute_from_commandline, args=(["-c 1"],))
t.daemon = True
t.start()
ar = dumpy_task.apply_async(({"a": 123},))
while ar.status != states.SUCCESS:
time.sleep(.01)
self.assertEqual(states.SUCCESS, ar.status)
self.assertEqual(ar.result, 321)
t.join(0)

Categories