I am using multiprocessing package in Python to start a run in a subprocess to dedicate a thread for the run so that the rest of the threads on the machine can be used for heavy computation. While the process is running, I would like to show the progress. Is there a way to access the report which is updated during the heavy computation? Here is a small example to illustrate what I would like to achieve.
import multiprocessing as _mp
def compute_squared(queue):
number_squared = 20
report = {}
def run(number_squared, report):
for i in range(number_squared):
j = i ** 2
report[i] = i
run(number_squared, report)
queue.put({
"report": report,
"name": "compute squared"
})
queue = _mp.Queue()
run_process = _mp.Process(
target=compute_squared,
args=(queue, ))
run_process.start()
while run_process.is_alive():
continue
final_report = queue.get()["report"]
While run_process.is_alive() I want to print the report from inside compute_squared so that I can trace the progress. I can access the final report using queue.get() but is there a way to access the intermediate reports?
I want to use rq to run tasks on a separate worker to gather data from a measuring instrument. The end of the task will be signaled by a user pressing a button on a dash app.
The problem is that the task itself does not know when to terminate since it doesn't have access to the dash app's context.
I already use meta to pass information from the worker back to the caller but can I pass information from the caller to the worker?
Example task:
from rq import get_current_job
from time import time
def mock_measurement():
job = get_current_job()
t_start = time()
# Run the measurement
t = []
i = []
job.meta['should_stop'] = False # I want to use this tag to tell the job to stop
while not job.meta['should_stop']:
t.append(time() - t_start)
i.append(np.random.random())
job.meta['data'] = (t, i)
job.save_meta()
sleep(5)
print("Job Finished")
From the console, I can start a job as such
queue = rq.Queue('test-app', connection=Redis('localhost', 6379))
job = queue.enqueue('tasks.mock_measurement')
and I would like to be able to do this from the console to signify to the worker it can stop running:
job.meta['should_stop'] = True
job.save_meta()
job.refresh
However, while the commands above return without an error, they do not actually update the meta dictionary.
Because you didn't fetch the updated meta. But, don't do this!!
Invoking save_meta and refresh in caller and worker will lose data.
Instead, Use job.connection.set(job + ':should_stop', 1, ex=300) to set flag, and use job.connection.get(job + ':should_stop') to check if flag is set.
I'm trying to pause a celery task temporary based on user button click.
What I've done is:
When a user clicks a button; I release an AJAX request that updates my celery task state to "PAUSE"
Then; my tactic was to; when I initate a task into celery; it runs a for loop.
Every for loop; I read my database 'state' and see if it's set to PAUSE: if it is set to pause; I want to sleep it for 60 seconds or sleep it until user hits resume button; same idea.
This is my code:
r = redis.StrictRedis(host='localhost', port=6379, db=0)
#celery.task(bind=True)
def runTask(self, arr)
for items in arr:
current_task_id = self.request.id
item = r.get('celery-task-meta-'+current_task_id)
load_as_json = json.loads(item)
if "PAUSE" in load_as_json['status']:
sleep(50)
#app.route('/start')
def start_task()
runTask.apply_async(args=[arr])
return 'task started running
Here is how my pause API endpoint looks like:
#app.route('/stop/<task_id>')
def updateTaskState():
task_id = request.cookie.get('task_id')
loadAsJson = json.loads(r.get('celery-task-meta-'+str(task_id)))
loadAsJson['status'] = 'PAUSE'
loadAsJson.update(loadAsJson)
dump_as_json = json.dumps(loadAsJson)
updated_state = r.set('celery-task-meta-'+last_key, dump_as_json)
return 'updated state';
From what I conceptually understand; is that the reason why I'm not seeing an updated state is because; the task is already executed and isnt able to retrieve updated values from database.
FYI: The task update state is set to PAUSE immediately; I checked this by creating a seperate script that checks state within while loop; everytime I click the button that release AJAX request to update the state; my db gets updated and it reads "PAUSE" on the seperate script; however within the #celery.task decorator I can't seem to get the updated state.
Below is my seperate script I used to test; and it seems to be updatign state as expected; I just can't get the updated state within task decorator... weirdly.
r = redis.StrictRedis(host='localhost', port=6379, db=0)
last_key = r.keys()
while True:
response = r.get('celery-task-meta-b1534a87-e18b-4f0a-89e2-08348d833056')
loadAsJson = json.loads(response)
print loadAsJson['status']
Faced with the same question and no good answers I came up with solution you might like and it is not dependent on the message queue you are using (aka Redis or RabbitMQ). The key for me was that the update_state method in the celery.app.task.Task class takes task_id as an optional parameter. In my case I am running long running file copy and checksum tasks through multiple worker nodes and sometimes the user wants to pause one running task to reduce performance requirements on the storage to allow other tasks to finish first. I am also running a stateless Flask REST API to initiate the backend tasks and retrieve status of running tasks so I needed a way to have an API call come in to pause and resume the tasks.
Here is my test function which can receive a "message" to pause itself by monitoring it's own state:
celery.task(bind=True)
def long_test(self, i):
print('long test starting with delay of ' + str(i) + 'seconds on each loop')
print('task_id =' + str(self.request.id))
self.update_state(state='PROCESSING')
count = 0
while True:
task = celery.AsyncResult(self.request.id)
while task.state == 'PAUSING' or task.state == 'PAUSED':
if task.state == 'PAUSING':
self.update_state(state='PAUSED')
time.sleep(i)
if task.state == 'RESUME':
self.update_state(state='PROCESSING')
print('long test loop ' + str(count) + ' ' + str(task.state))
count += 1
time.sleep(i)
Then, in order to pause or resume I can do the following:
>>> from project.celeryworker.tasks import long_test
>>> from project import create_app, make_celery
>>> flaskapp = create_app()
>>> celery = make_celery(flaskapp)
>>> from celery.app.task import Task
>>> long_test.apply_async(kwargs={'i': 5})
<AsyncResult: bf19d50f-cf04-47f0-a069-6545fb253887>
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='RESUME')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PROCESSING'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
I want to have a while loop going continuously in the background on my web server. I still want to have possibility to turn on and off the loop using flask giving command to my celery worker. The while loop in celery seems only run once.
from celery import Celery
#app.task
def count(i):
if i == 1: # turn on command
while True: # a while loop to achieve what I want to do
i = i+1
return i
elif i == 0: # turn off command given by flask
return i
I also tried celery_beat, but this requires me to give arguments in advance rather than accepting command from another source.
app.conf.update(
CELERYBEAT_SCHEDULE = {
'add-every-1-seconds': {
'task': 'tasks.count',
'schedule': timedelta(seconds=1),
#'args': (1)
},
})
Thanks for #dim's answer. The code I have now is:
#app.task
def count(i):
if i == 1:
while True: # a while loop to achieve what I want to do
i = i+1
time.sleep(1)
print i
print 'i am counting'
To start the worker:
$ celery -A tasks worker -l info
And call it from python
>> from tasks import count
>> result = count(1)
To stop the loop from python
>> result.revoke(terminate=True)
Hope this will be useful for people wanting to have loop in their celery task.
I am working in an application where i am doing a huge data processing to generate a completely new set of data which is then finally saved to database. The application is taking a huge time in processing and saving the data to data base. I want to improve the user experience to some extent by redirecting user to result page first and then doing the data saving part in background(may be in the asynchronous way) . My problem is that for displaying the result page i need to have the new set of processed data. Is there any way that i can do so that the data processing and data saving part is done in background and whenever the data processing part is completed(before saving to database) i would get the processed data in result page?.
Asynchronous tasks can be accomplished in Python using Celery. You can simply push the task to Celery queue and the task will be performed in an asynchronous way. You can then do some polling from the result page to check if it is completed.
Other alternative can be something like Tornado.
Another strategy is to writing a threading class that starts up custom management commands you author to behave as worker threads. This is perhaps a little lighter weight than working with something like celery, and of course has both advantages and disadvantages. I also used this technique to sequence/automate migration generation/application during application startup (because it lives in a pipeline). My gunicorn startup script then starts these threads in pre_exec() or when_ready(), etc, as appropriate, and then stops them in on_exit().
# Description: Asychronous Worker Threading via Django Management Commands
# Lets you run an arbitrary Django management command, either a pre-baked one like migrate,
# or a custom one that you've created, as a worker thread, that can spin forever, or not.
# You can use this to take care of maintenance tasks at start-time, like db migration,
# db flushing, etc, or to run long-running asynchronous tasks.
# I sometimes find this to be a more useful pattern than using something like django-celery,
# as I can debug/use the commands I write from the shell as well, for administrative purposes.
import json
import os
import requests
import sys
import time
import uuid
import logging
import threading
import inspect
import ctypes
from django.core.management import call_command
from django.conf import settings
class DjangoWorkerThread(threading.Thread):
"""
Initializes a seperate thread for running an arbitrary Django management command. This is
one (simple) way to make asynchronous worker threads. There exist richer, more complex
ways of doing this in Django as well (django-cerlery).
The advantage of this pattern is that you can run the worker from the command line as well,
via manage.py, for the sake of rapid development, easy testing, debugging, management, etc.
:param commandname: name of a properly created Django management command, which exists
inside the app/management/commands folder in one of the apps in your project.
:param arguments: string containing command line arguments formatted like you would
when calling the management command via manage.py in a shell
:param restartwait: integer seconds to wait before restarting worker if it dies,
or if a once-through command, acts as a thread-loop delay timer
"""
def __init__(self, commandname,arguments="",restartwait=10,logger=""):
super(DjangoWorkerThread, self).__init__()
self.commandname = commandname
self.arguments = arguments
self.restartwait = restartwait
self.name = commandname
self.event = threading.Event()
if logger:
self.l = logger
else:
self.l = logging.getLogger('root')
def run(self):
"""
Start the thread.
"""
try:
exceptioncount = 0
exceptionlimit = 10
while not self.event.is_set():
try:
if self.arguments:
self.l.info('Starting ' + self.name + ' worker thread with arguments ' + self.arguments)
call_command(self.commandname,self.arguments)
else:
self.l.info('Starting ' + self.name + ' worker thread with no arguments')
call_command(self.commandname)
self.event.wait(self.restartwait)
except Exception as e:
self.l.error(self.commandname + ' Unkown error: {}'.format(str(e)))
exceptioncount += 1
if exceptioncount > exceptionlimit:
self.l.error(self.commandname + " : " + self.arguments + " : Exceeded exception retry limit, aborting.")
self.event.set()
finally:
self.l.info('Stopping command: ' + self.commandname + " " + self.arguments)
def stop(self):
"""Nice Stop
Stop nicely by setting an event.
"""
self.l.info("Sending stop event to self...")
self.event.set()
#then make sure it's dead...and schwack it harder if not.
#kill it with fire! be mean to your software. it will make you write better code.
self.l.info("Sent stop event, checking to see if thread died.")
if self.isAlive():
self.l.info("Still not dead, telling self to murder self...")
time.sleep( 0.1 )
os._exit(1)
def start_worker(command_name, command_arguments="", restart_wait=10,logger=""):
"""
Starts a background worker thread running a Django management command.
:param str command_name: the name of the Django management command to run,
typically would be a custom command implemented in yourapp/management/commands,
but could also be used to automate standard Django management tasks
:param str command_arguments: a string containing the command line arguments
to supply to the management command, formatted as if one were invoking
the command from a shell
"""
if logger:
l = logger
else:
l = logging.getLogger('root')
# Start the thread
l.info("Starting worker: "+ command_name + " : " + command_arguments + " : " + str(restart_wait) )
worker = DjangoWorkerThread(command_name,command_arguments, restart_wait,l)
worker.start()
l.info("Worker started: "+ command_name + " : " + command_arguments + " : " + str(restart_wait) )
# Return the thread instance
return worker
#<----------------------------------------------------------------------------->
def stop_worker(worker,logger=""):
"""
Gracefully shutsdown the worker thread
:param threading.Thread worker: the worker thread object
"""
if logger:
l = logger
else:
l = logging.getLogger('root')
# Shutdown the thread
l.info("Stopping worker: "+ worker.commandname + " : " + worker.arguments + " : " + str(worker.restartwait) )
worker.stop()
worker.join(worker.restartwait)
l.info("Worker stopped: "+ worker.commandname + " : " + worker.arguments + " : " + str(worker.restartwait) )
The long running task can be offloaded with Celery. You can still get all the updates and results. Your web application code should take care of polling for updates and results. http://blog.miguelgrinberg.com/post/using-celery-with-flask
explains how one can achieve this.
Some useful steps:
Configure celery with result back-end.
Execute the long running task asynchronously.
Let the task update its state periodically or when it executes some stage in job.
Poll from web application to get the status/result.
Display the results on UI.
There is a need for bootstrapping it all together, but once done it can be reused and it is fairly performant.
It's the same process that a synchronous request. You will use a View that should return a JsonResponse. The 'tricky' part is on the client side, where you have to make the async call to the view.