Test if a celery task is still being processed - python

How can I test if a task (task_id) is still processed in celery? I have the following scenario:
Start a task in a Django view
Store the BaseAsyncResult in the session
Shutdown the celery daemon (hard) so the task is not processed anymore
Check if the task is 'dead'
Any ideas? Can a lookup all task being processed by celery and check if mine is still there?

define a field (PickledObjectField) in your model to store the celery task:
class YourModel(models.Model):
.
.
celery_task = PickledObjectField()
.
.
def task():
self.celery_task = SubmitTask.apply_async(args = self.task_detail())
self.save()
In case your task is not specific on any model you should create one specifically for the celery tasks.
or else I suggest using django-celery. It has a nice monitoring feature:
http://ask.github.com/celery/userguide/monitoring.html#django-admin-monitor, saves the tasks details in a django model in a nice graphical way.

I think there is a better way than to store a task object in the model. For example, in case you wanted to check if a group of task (parallel) have completed:
# in the script you launch the task
from celery import group
job = group(
task1.s(param1, param2),
task2.s(param3, param4)
)
result = job.apply_async()
result.save()
# save the result ID in your model
your_obj_model = YourModel.objects.get(id='1234')
your_obj_model.task_id = result.id
your_obj_model.save()
Then in your view
from celery.result import GroupResult
# ...
task_result = GroupResult.restore(your_obj_model.task_id)
task_finished = task_result.ready()
# will be True or False

Related

Python / rq - How to pass information from the caller to the worker?

I want to use rq to run tasks on a separate worker to gather data from a measuring instrument. The end of the task will be signaled by a user pressing a button on a dash app.
The problem is that the task itself does not know when to terminate since it doesn't have access to the dash app's context.
I already use meta to pass information from the worker back to the caller but can I pass information from the caller to the worker?
Example task:
from rq import get_current_job
from time import time
def mock_measurement():
job = get_current_job()
t_start = time()
# Run the measurement
t = []
i = []
job.meta['should_stop'] = False # I want to use this tag to tell the job to stop
while not job.meta['should_stop']:
t.append(time() - t_start)
i.append(np.random.random())
job.meta['data'] = (t, i)
job.save_meta()
sleep(5)
print("Job Finished")
From the console, I can start a job as such
queue = rq.Queue('test-app', connection=Redis('localhost', 6379))
job = queue.enqueue('tasks.mock_measurement')
and I would like to be able to do this from the console to signify to the worker it can stop running:
job.meta['should_stop'] = True
job.save_meta()
job.refresh
However, while the commands above return without an error, they do not actually update the meta dictionary.
Because you didn't fetch the updated meta. But, don't do this!!
Invoking save_meta and refresh in caller and worker will lose data.
Instead, Use job.connection.set(job + ':should_stop', 1, ex=300) to set flag, and use job.connection.get(job + ':should_stop') to check if flag is set.

Job Scheduling in Django

I need to implement a scheduled task in our Django app. DBader's schedule seems to be a good candidate for the job, however when run it as part of a Django project, it doesn't seem to produce the desired effect.
Specifically, this works fine as an independent program:
import schedule
import time
import logging
log = logging.getLogger(__name__)
def handleAnnotationsWithoutRequests(settings):
'''
From settings passed in, grab job-ids list
For each job-id in that list, perform annotation group/set logic [for details, refer to handleAnnotationsWithRequests(requests, username)
sans requests, those are obtained from db based on job-id ]
'''
print('Received settings: {}'.format(str(settings)))
def job():
print("I'm working...")
#schedule.every(3).seconds.do(job)
#schedule.every(2).seconds.do(handleAnnotationsWithoutRequests, settings={'a': 'b'})
invoc_time = "10:33"
schedule.every().day.at(invoc_time).do(handleAnnotationsWithoutRequests, settings={'a': 'b'})
while True:
schedule.run_pending()
time.sleep(1)
But this (equivalent) code run in Django context doesn't result in an invocation.
def handleAnnotationsWithoutRequests(settings):
'''
From settings passed in, grab job-ids list
For each job-id in that list, perform annotation group/set logic [for details, refer to handleAnnotationsWithRequests(requests, username)
sans requests, those are obtained from db based on job-id ]
'''
log.info('Received settings: {}'.format(str(settings)))
def doSchedule(settings):
'''
with scheduler library
Based on time specified in settings, invoke .handleAnnotationsWithoutRequests(settings)
'''
#settings will need to be reconstituted from the DB first
#settings = {}
invocationTime = settings['running_at']
import re
invocationTime = re.sub(r'([AaPp][Mm])', "", invocationTime)
log.info("Invocation time to be used: {}".format(invocationTime))
schedule.every().day.at(invocationTime).do(handleAnnotationsWithoutRequests, settings=settings)
while True:
schedule.run_pending()
time.sleep(1)
so the log from handleAnnotationsWithoutRequests() doesn't appear on the console.
Is this scheduling library compatible with Django? Are there any usage samples that one could refer me to?
I'm suspecting some thread issues are at work here. Perhaps there are better alternatives to be used? Suggestions are welcome.
Thank you in advance.
For web servers, you probably don't want something that runs in-process:
An in-process scheduler for periodic jobs [...]
https://github.com/Tivix/django-cron has proven a working solution.
There's also the heavyweight champion Celery and Celerybeat.
I do this a lot with Django Commands
The pattern I use is to setup a new Django command in my app and then make it a long-running process inside a never-ending while() loop.
I the loop iterates continuously with a custom defined sleep(1) timer.
The short version is here, with a bit of pseudo-code thrown in. You can see a working version of this pattern in my Django Reference Implementation.
class Command(BaseCommand):
help = 'My Long running job'
def handle(self, *args, **options):
self.stdout.write(self.style.SUCCESS(f'Starting long-running job.'))
while True:
if conditions met for job:
self.job()
sleep(5)
def job(self):
self.stdout.write(self.style.SUCCESS(f'Running the job...'))

Django Celery - Passing an object to the views and between tasks using RabbitMQ

This is the first time I'm using Celery, and honestly, I'm not sure I'm doing it right. My system has to run on Windows, so I'm using RabbitMQ as the broker.
As a proof of concept, I'm trying to create a single object where one task sets the value, another task reads the value, and I also want to show the current value of the object when I go to a certain url. However I'm having problems sharing the object between everything.
This is my celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault('DJANGO_SETTINGS_MODULE','cesGroundStation.settings')
app = Celery('cesGroundStation')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind = True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
The object I'm trying to share is:
class SchedulerQ():
item = 0
def setItem(self, item):
self.item = item
def getItem(self):
return self.item
This is my tasks.py
from celery import shared_task
from time import sleep
from scheduler.schedulerQueue import SchedulerQ
schedulerQ = SchedulerQ()
#shared_task()
def SchedulerThread():
print ("Starting Scheduler")
counter = 0
while(1):
counter += 1
if(counter > 100):
counter = 0
schedulerQ.setItem(counter)
print("In Scheduler thread - " + str(counter))
sleep(2)
print("Exiting Scheduler")
#shared_task()
def RotatorsThread():
print ("Starting Rotators")
while(1):
item = schedulerQ.getItem()
print("In Rotators thread - " + str(item))
sleep(2)
print("Exiting Rotators")
#shared_task()
def setSchedulerQ(schedulerQueue):
schedulerQ = schedulerQueue
#shared_task()
def getSchedulerQ():
return schedulerQ
I'm starting my tasks in my apps.py...I'm not sure if this is the right place as the tasks/workers don't seem to work until I start the workers in a separate console where I run the celery -A cesGroundStation -l info.
from django.apps import AppConfig
from scheduler.schedulerQueue import SchedulerQ
from scheduler.tasks import SchedulerThread, RotatorsThread, setSchedulerQ, getSchedulerQ
class SchedulerConfig(AppConfig):
name = 'scheduler'
def ready(self):
schedulerQ = SchedulerQ()
setSchedulerQ.delay(schedulerQ)
SchedulerThread.delay()
RotatorsThread.delay()
In my views.py I have this:
def schedulerQ():
queue = getSchedulerQ.delay()
return HttpResponse("Your list: " + queue)
The django app runs without errors, however my output from "celery -A cesGroundStation -l info" is this: Celery command output
First it seems to start multiple "SchedulerThread" tasks, secondly the "SchedulerQ" object isn't being passed to the Rotators, as it's not reading the updated value.
And if I go to the url for which shows the views.schedulerQ view I get this error:
Django views error
I have very, very little experience with Python, Django and Web Development in general, so I have no idea where to start with that last error. Solutions suggest using Redis to pass the object to the views, but I don't know how I'd do that using RabbitMQ. Later on the schedulerQ object will implement a queue and the scheduler and rotators will act as more of a producer/consumer dynamic with the view showing the contents of the queue, so I believe using the database might be too resource intensive. How can I share this object across all tasks, and is this even the right approach?
The right approach would be to use some persistence layer, such as a database or results back end to store the information you want to share between tasks if you need to share information between tasks (in this example, what you are currently putting in your class).
Celery operates on a distributed message passing paradigm - a good way to distill that idea for this example, is that your module will be executed independently every time a task is dispatched. Whenever a task is dispatched to Celery, you must assume it is running in a seperate interpreter and loaded independently of other tasks. That SchedulerQ class is instantiated anew each time.
You can share information between tasks in ways described in the docs linked previously and some best practice tips discuss data persistence concerns.

Python Django Asynchronous Request handling

I am working in an application where i am doing a huge data processing to generate a completely new set of data which is then finally saved to database. The application is taking a huge time in processing and saving the data to data base. I want to improve the user experience to some extent by redirecting user to result page first and then doing the data saving part in background(may be in the asynchronous way) . My problem is that for displaying the result page i need to have the new set of processed data. Is there any way that i can do so that the data processing and data saving part is done in background and whenever the data processing part is completed(before saving to database) i would get the processed data in result page?.
Asynchronous tasks can be accomplished in Python using Celery. You can simply push the task to Celery queue and the task will be performed in an asynchronous way. You can then do some polling from the result page to check if it is completed.
Other alternative can be something like Tornado.
Another strategy is to writing a threading class that starts up custom management commands you author to behave as worker threads. This is perhaps a little lighter weight than working with something like celery, and of course has both advantages and disadvantages. I also used this technique to sequence/automate migration generation/application during application startup (because it lives in a pipeline). My gunicorn startup script then starts these threads in pre_exec() or when_ready(), etc, as appropriate, and then stops them in on_exit().
# Description: Asychronous Worker Threading via Django Management Commands
# Lets you run an arbitrary Django management command, either a pre-baked one like migrate,
# or a custom one that you've created, as a worker thread, that can spin forever, or not.
# You can use this to take care of maintenance tasks at start-time, like db migration,
# db flushing, etc, or to run long-running asynchronous tasks.
# I sometimes find this to be a more useful pattern than using something like django-celery,
# as I can debug/use the commands I write from the shell as well, for administrative purposes.
import json
import os
import requests
import sys
import time
import uuid
import logging
import threading
import inspect
import ctypes
from django.core.management import call_command
from django.conf import settings
class DjangoWorkerThread(threading.Thread):
"""
Initializes a seperate thread for running an arbitrary Django management command. This is
one (simple) way to make asynchronous worker threads. There exist richer, more complex
ways of doing this in Django as well (django-cerlery).
The advantage of this pattern is that you can run the worker from the command line as well,
via manage.py, for the sake of rapid development, easy testing, debugging, management, etc.
:param commandname: name of a properly created Django management command, which exists
inside the app/management/commands folder in one of the apps in your project.
:param arguments: string containing command line arguments formatted like you would
when calling the management command via manage.py in a shell
:param restartwait: integer seconds to wait before restarting worker if it dies,
or if a once-through command, acts as a thread-loop delay timer
"""
def __init__(self, commandname,arguments="",restartwait=10,logger=""):
super(DjangoWorkerThread, self).__init__()
self.commandname = commandname
self.arguments = arguments
self.restartwait = restartwait
self.name = commandname
self.event = threading.Event()
if logger:
self.l = logger
else:
self.l = logging.getLogger('root')
def run(self):
"""
Start the thread.
"""
try:
exceptioncount = 0
exceptionlimit = 10
while not self.event.is_set():
try:
if self.arguments:
self.l.info('Starting ' + self.name + ' worker thread with arguments ' + self.arguments)
call_command(self.commandname,self.arguments)
else:
self.l.info('Starting ' + self.name + ' worker thread with no arguments')
call_command(self.commandname)
self.event.wait(self.restartwait)
except Exception as e:
self.l.error(self.commandname + ' Unkown error: {}'.format(str(e)))
exceptioncount += 1
if exceptioncount > exceptionlimit:
self.l.error(self.commandname + " : " + self.arguments + " : Exceeded exception retry limit, aborting.")
self.event.set()
finally:
self.l.info('Stopping command: ' + self.commandname + " " + self.arguments)
def stop(self):
"""Nice Stop
Stop nicely by setting an event.
"""
self.l.info("Sending stop event to self...")
self.event.set()
#then make sure it's dead...and schwack it harder if not.
#kill it with fire! be mean to your software. it will make you write better code.
self.l.info("Sent stop event, checking to see if thread died.")
if self.isAlive():
self.l.info("Still not dead, telling self to murder self...")
time.sleep( 0.1 )
os._exit(1)
def start_worker(command_name, command_arguments="", restart_wait=10,logger=""):
"""
Starts a background worker thread running a Django management command.
:param str command_name: the name of the Django management command to run,
typically would be a custom command implemented in yourapp/management/commands,
but could also be used to automate standard Django management tasks
:param str command_arguments: a string containing the command line arguments
to supply to the management command, formatted as if one were invoking
the command from a shell
"""
if logger:
l = logger
else:
l = logging.getLogger('root')
# Start the thread
l.info("Starting worker: "+ command_name + " : " + command_arguments + " : " + str(restart_wait) )
worker = DjangoWorkerThread(command_name,command_arguments, restart_wait,l)
worker.start()
l.info("Worker started: "+ command_name + " : " + command_arguments + " : " + str(restart_wait) )
# Return the thread instance
return worker
#<----------------------------------------------------------------------------->
def stop_worker(worker,logger=""):
"""
Gracefully shutsdown the worker thread
:param threading.Thread worker: the worker thread object
"""
if logger:
l = logger
else:
l = logging.getLogger('root')
# Shutdown the thread
l.info("Stopping worker: "+ worker.commandname + " : " + worker.arguments + " : " + str(worker.restartwait) )
worker.stop()
worker.join(worker.restartwait)
l.info("Worker stopped: "+ worker.commandname + " : " + worker.arguments + " : " + str(worker.restartwait) )
The long running task can be offloaded with Celery. You can still get all the updates and results. Your web application code should take care of polling for updates and results. http://blog.miguelgrinberg.com/post/using-celery-with-flask
explains how one can achieve this.
Some useful steps:
Configure celery with result back-end.
Execute the long running task asynchronously.
Let the task update its state periodically or when it executes some stage in job.
Poll from web application to get the status/result.
Display the results on UI.
There is a need for bootstrapping it all together, but once done it can be reused and it is fairly performant.
It's the same process that a synchronous request. You will use a View that should return a JsonResponse. The 'tricky' part is on the client side, where you have to make the async call to the view.

Where do I register an rq-scheduler job in a Django app?

I'd like to use django_rq and rq-scheduler for offline tasks, but I'm unsure of where to call rq-scheduler's ability to schedule repeating tasks. Right now, I've added my scheduling to a tasks.py module in my app, and import that in __init__.py. There has to be a better way to do this, though, right?
Thanks in advance.
I created a custom management command which modifies and replaces the rqscheduler command included in django_rq. An example is provided here: https://github.com/rq/rq-scheduler/issues/51#issuecomment-362352497
The best place I've found to run it is from your AppConfig in apps.py.
def ready(self):
scheduler = django_rq.get_scheduler('default')
# Delete any existing jobs in the scheduler when the app starts up
for job in scheduler.get_jobs():
job.delete()
# Have 'mytask' run every 5 minutes
scheduler.schedule(datetime.utcnow(), 'mytask', interval=60*5)
I've added scheduling to a __init__ module in one of my project application (in terms of Django), but wrapped with small function which prevents queueing jobs twice or more. Scheduling strategy may be dependent of your specific needs (i.e. you may need additional checking for a job arguments).
Code that works for me and fit my needs:
import django_rq
from collections import defaultdict
import tasks
scheduler = django_rq.get_scheduler('default')
jobs = scheduler.get_jobs()
functions = defaultdict(lambda: list())
map(lambda x: functions[x.func].append(x.meta.get('interval')), jobs)
now = datetime.datetime.now()
def schedule_once(func, interval):
"""
Schedule job once or reschedule when interval changes
"""
if not func in functions or not interval in functions[func]\
or len(functions[func])>1:
# clear all scheduled jobs for this function
map(scheduler.cancel, filter(lambda x: x.func==func, jobs))
# schedule with new interval
scheduler.schedule(now+datetime.timedelta(seconds=interval), func,
interval=interval)
schedule_once(tasks.some_task_a, interval=60*5)
schedule_once(tasks.some_task_b, interval=120)
Also I've wrapped this snippet to avoid imports at the package level:
def init_scheduler():
# paste here initialization code
init_scheduler()
you should use django command to run schedule job https://docs.djangoproject.com/en/3.2/howto/custom-management-commands/
like this
enter image description here
class Command(BaseCommand):
def handle(self, *args, **options):
scheduler = django_rq.get_scheduler('crontab_job')
for job in scheduler.get_jobs():
scheduler.cancel(job)
# 定时任务例子1
scheduler.cron(
"*/3 * * * *", # 每周一零点零时零分执行 0 0 * * 0 测试可以使用 */3 * * * * 每3分钟执行一次
func=gong_an_job, # Function to be queued
kwargs={'msg': '我是王龙飞1,我喜欢修仙', 'number': 1}, # Keyword arguments passed into function when executed
repeat=None, # Repeat this number of times (None means repeat forever)
queue_name='crontab_job', # In which queue the job should be put in
use_local_timezone=False # Interpret hours in the local timezone
)
# 定时任务例子2
scheduler.cron(
"*/5 * * * *", # 每周一零点零时零分执行 0 0 * * 0 测试可以使用 */3 * * * * 每3分钟执行一次
func=gong_an_job, # Function to be queued
kwargs={'msg': '我是王龙飞222222,我喜欢修仙', 'number': 22222}, # Keyword arguments passed into function when executed
repeat=None, # Repeat this number of times (None means repeat forever)
queue_name='crontab_job', # In which queue the job should be put in
use_local_timezone=False # Interpret hours in the local timezone
)
#create crontab job
python manage.py rq_crontab_job
#check crontab job and put crontab job to queue
python manage.py rqscheduler --queue crontab_job
#run crontab job
python manage.py rqworker crontab_job
I think the first answer is greate,but in multi-Progress enviroment may have some probelm,you should only run once to create crontab job !

Categories