I want to have a while loop going continuously in the background on my web server. I still want to have possibility to turn on and off the loop using flask giving command to my celery worker. The while loop in celery seems only run once.
from celery import Celery
#app.task
def count(i):
if i == 1: # turn on command
while True: # a while loop to achieve what I want to do
i = i+1
return i
elif i == 0: # turn off command given by flask
return i
I also tried celery_beat, but this requires me to give arguments in advance rather than accepting command from another source.
app.conf.update(
CELERYBEAT_SCHEDULE = {
'add-every-1-seconds': {
'task': 'tasks.count',
'schedule': timedelta(seconds=1),
#'args': (1)
},
})
Thanks for #dim's answer. The code I have now is:
#app.task
def count(i):
if i == 1:
while True: # a while loop to achieve what I want to do
i = i+1
time.sleep(1)
print i
print 'i am counting'
To start the worker:
$ celery -A tasks worker -l info
And call it from python
>> from tasks import count
>> result = count(1)
To stop the loop from python
>> result.revoke(terminate=True)
Hope this will be useful for people wanting to have loop in their celery task.
Related
I have periodic task which should trigger another task. Final expected behavior: first task should collect some data from external service and then loop over this data (list) and call another task with passing over argument (current iteration in loop). I want to have those tasks in loop being asynchronical.
I wrote code that runs a task in period, but I can't figure out how this task should call another task, because when I do it by .delay() method then nothing happens.
Here is some simplified code that I want to run:
#celery_app.task(name="Hello World")
def hello_world():
print(f"HELLO WORLD PRINT")
add.delay(2, 2)
return 'Hello'
#celery_app.task
def add(x, y):
with open(f"./{str(datetime.datetime.now())}.txt", 'w') as file:
file.write(str(x+y))
print(f"x + y = {x + y}")
return x + y
For now hello_world() is running every 30 sec and as a result I receive HELLO WORLD PRINT in logs, but add task is not running. I can't see either print or file that should be created by this task.
Update for comment, here is how I use queue:
celery_app.conf.task_routes = {
"project.app.hello_world": {
"queue": 'test_queue'
},
"project.app.add": {
"queue": 'test_queue'
},
There are few ways to solve the problem.
The obvious one is to put the queue name in the .apply_async, for an example add.apply_async(10, 10, queue="test_queue").
Another solution is to put the queue into the task annotation, ie #celery_app.task(queue="test_queue").
I have never configured task_routes, but I believe it is possible to specify it there like you tried...
I want to use rq to run tasks on a separate worker to gather data from a measuring instrument. The end of the task will be signaled by a user pressing a button on a dash app.
The problem is that the task itself does not know when to terminate since it doesn't have access to the dash app's context.
I already use meta to pass information from the worker back to the caller but can I pass information from the caller to the worker?
Example task:
from rq import get_current_job
from time import time
def mock_measurement():
job = get_current_job()
t_start = time()
# Run the measurement
t = []
i = []
job.meta['should_stop'] = False # I want to use this tag to tell the job to stop
while not job.meta['should_stop']:
t.append(time() - t_start)
i.append(np.random.random())
job.meta['data'] = (t, i)
job.save_meta()
sleep(5)
print("Job Finished")
From the console, I can start a job as such
queue = rq.Queue('test-app', connection=Redis('localhost', 6379))
job = queue.enqueue('tasks.mock_measurement')
and I would like to be able to do this from the console to signify to the worker it can stop running:
job.meta['should_stop'] = True
job.save_meta()
job.refresh
However, while the commands above return without an error, they do not actually update the meta dictionary.
Because you didn't fetch the updated meta. But, don't do this!!
Invoking save_meta and refresh in caller and worker will lose data.
Instead, Use job.connection.set(job + ':should_stop', 1, ex=300) to set flag, and use job.connection.get(job + ':should_stop') to check if flag is set.
I have problem with daily scheduled tasks with crontab.
Here is my celery.py
app.conf.beat_schedule = {
'run-cache-updater': {
'task': 'tasks.run_cache_updater',
'schedule': crontab(
minute=0,
hour='1-4'
),
}
}
Below is my tasks.py
What I am doing there is, getting all records from DB. Triggering other jobs to update my caches on Redis.
#app.task
def run_cache_updater():
batch_size = 1000
cache_records = models.CacheRecords.objects.all()
def _chunk_list(all_records, size_of_batch):
for i in range(0, len(all_records), size_of_batch):
yield [item.id for item in all_records[i: i + batch_size]]
for items in _chunk_list(cache_records, batch_size):
update_cache.delay(items)
#app.task
def update_cache(ids_in_chunks):
for id in ids_in_chunks:
# Some calls are done here. Then sleep for 200 ms.
time.sleep(0.2)
My tasks are running good. However, they start to run between 1 and 4 and then they start again every 4 hours like 8-11, 15-18..
What I am doing wrong here and how can I fix it?
This sounds like a Celery bug, it's probably worth raising on their Github repo.
However, as a workaround, you could try the more explicit notation, hour='1,2,3,4', just in case the issue is in the parsing of that specific crontab interval style.
I'm trying to pause a celery task temporary based on user button click.
What I've done is:
When a user clicks a button; I release an AJAX request that updates my celery task state to "PAUSE"
Then; my tactic was to; when I initate a task into celery; it runs a for loop.
Every for loop; I read my database 'state' and see if it's set to PAUSE: if it is set to pause; I want to sleep it for 60 seconds or sleep it until user hits resume button; same idea.
This is my code:
r = redis.StrictRedis(host='localhost', port=6379, db=0)
#celery.task(bind=True)
def runTask(self, arr)
for items in arr:
current_task_id = self.request.id
item = r.get('celery-task-meta-'+current_task_id)
load_as_json = json.loads(item)
if "PAUSE" in load_as_json['status']:
sleep(50)
#app.route('/start')
def start_task()
runTask.apply_async(args=[arr])
return 'task started running
Here is how my pause API endpoint looks like:
#app.route('/stop/<task_id>')
def updateTaskState():
task_id = request.cookie.get('task_id')
loadAsJson = json.loads(r.get('celery-task-meta-'+str(task_id)))
loadAsJson['status'] = 'PAUSE'
loadAsJson.update(loadAsJson)
dump_as_json = json.dumps(loadAsJson)
updated_state = r.set('celery-task-meta-'+last_key, dump_as_json)
return 'updated state';
From what I conceptually understand; is that the reason why I'm not seeing an updated state is because; the task is already executed and isnt able to retrieve updated values from database.
FYI: The task update state is set to PAUSE immediately; I checked this by creating a seperate script that checks state within while loop; everytime I click the button that release AJAX request to update the state; my db gets updated and it reads "PAUSE" on the seperate script; however within the #celery.task decorator I can't seem to get the updated state.
Below is my seperate script I used to test; and it seems to be updatign state as expected; I just can't get the updated state within task decorator... weirdly.
r = redis.StrictRedis(host='localhost', port=6379, db=0)
last_key = r.keys()
while True:
response = r.get('celery-task-meta-b1534a87-e18b-4f0a-89e2-08348d833056')
loadAsJson = json.loads(response)
print loadAsJson['status']
Faced with the same question and no good answers I came up with solution you might like and it is not dependent on the message queue you are using (aka Redis or RabbitMQ). The key for me was that the update_state method in the celery.app.task.Task class takes task_id as an optional parameter. In my case I am running long running file copy and checksum tasks through multiple worker nodes and sometimes the user wants to pause one running task to reduce performance requirements on the storage to allow other tasks to finish first. I am also running a stateless Flask REST API to initiate the backend tasks and retrieve status of running tasks so I needed a way to have an API call come in to pause and resume the tasks.
Here is my test function which can receive a "message" to pause itself by monitoring it's own state:
celery.task(bind=True)
def long_test(self, i):
print('long test starting with delay of ' + str(i) + 'seconds on each loop')
print('task_id =' + str(self.request.id))
self.update_state(state='PROCESSING')
count = 0
while True:
task = celery.AsyncResult(self.request.id)
while task.state == 'PAUSING' or task.state == 'PAUSED':
if task.state == 'PAUSING':
self.update_state(state='PAUSED')
time.sleep(i)
if task.state == 'RESUME':
self.update_state(state='PROCESSING')
print('long test loop ' + str(count) + ' ' + str(task.state))
count += 1
time.sleep(i)
Then, in order to pause or resume I can do the following:
>>> from project.celeryworker.tasks import long_test
>>> from project import create_app, make_celery
>>> flaskapp = create_app()
>>> celery = make_celery(flaskapp)
>>> from celery.app.task import Task
>>> long_test.apply_async(kwargs={'i': 5})
<AsyncResult: bf19d50f-cf04-47f0-a069-6545fb253887>
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='RESUME')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PROCESSING'
>>> Task.update_state(self=celery, task_id='bf19d50f-cf04-47f0-a069-6545fb253887', state='PAUSING')
>>> celery.AsyncResult('bf19d50f-cf04-47f0-a069-6545fb253887').state
'PAUSED'
I need to run multiple background asynchronous functions, using multiprocessing. I have working Popen solution, but it looks a bit unnatural. Example:
from time import sleep
from multiprocessing import Process, Value
import subprocess
def worker_email(keyword):
subprocess.Popen(["python", "mongoworker.py", str(keyword)])
return True
keywords_list = ['apple', 'banana', 'orange', 'strawberry']
if __name__ == '__main__':
for keyword in keywords_list:
# Do work
p = Process(target=worker_email, args=(keyword,))
p.start()
p.join()
If I try not to use Popen, like:
def worker_email(keyword):
print('Before:' + keyword)
sleep(10)
print('After:' + keyword)
return True
Functions run one-by-one, no async. So, how to run all functions at the same time without using Popen?
UPD: I'm using multiprocessing.Value to return results from Process, like:
def worker_email(keyword, func_result):
sleep(10)
print('Yo:' + keyword)
func_result.value = 1
return True
func_result = Value('i', 0)
p = Process(target=worker_email, args=(doc['check_id'],func_result))
p.start()
# Change status
if func_result.value == 1:
stream.update_one({'_id': doc['_id']}, {"$set": {"status": True}}, upsert=False)
But it doesn't work without .join(). Any ideas how to make it work or similar way? :)
If you just remove the line p.join() it should work.
You only need p.join if you want to wait for the process to finish before executing further. At the end of the Program python waits for all Process to finished before closing, so you don't need to worry about that.
Solved problem with getting Process result by transferring result check and status update into worker function. Something like:
# Update task status if work is done
def update_status(task_id, func_result):
# Connect to DB
client = MongoClient('mongodb://localhost:27017/')
db = client.admetric
stream = db.stream
# Update task status if OK
if func_result:
stream.update_one({'_id': task_id}, {"$set": {"status": True}}, upsert=False)
# Close DB connection
client.close()
# Do work
def yo_func(keyword):
sleep(10)
print('Yo:' + keyword)
return True
# Worker function
def worker_email(keyword, task_id):
update_status(task_id, yo_func(keyword))