I am using celery 3 with django and flower to monitor tasks.
is there any way that if
my task fail
then i do some fix in code
Then i get the task id and then rstart that task
Is it possible
or even a way to manually place any failed task in another queue so that it can be processes again after fixing the cause of it
A bit of a hack, but what works for me is creating a new task instance with the same task ID. For example, a task with ID 'abc' runs and fails. I then "restart" the task by running:
my_task.apply_async(args=('whatever'), task_id='abc')
In reality it is less of a "restart" and more just a replacement of the original task result, but it gets the job done. Definitely open to better suggestions here as it does feel a bit clumsy.
Related
I have an application with several luigi tasks (I did not write that app). Now I want to introduce another task, in the middle of a process, which will monitor some AWS instances. This task, once started, should run until the end and it must run in parallel with other tasks. You can see picture in the link for better understanding.
Link to the schema
I looked in the documentation but I could not find solution. I am new with luigi and I probably missed something.
I don't think you missed anything. I don't think that luigi covers that use case. However, one thing you could do is have Task 3 require only Task 2, have Task 4 require task 2 instead of task 3, and have Task 3 continually run some code and monitor the output of Task 5 to know when it should close. It's not the prettiest, but it should work.
However, there are a couple of problems I can forsee (which is why it probably isn't supported by luigi). If you have enough Task 3's running, you might never complete the workflow, as Task 4 never gets run. That's why this isn't recommended as you are essentially creating hidden requirements that the dependency graph doesn't know about. Another issue is that Task 3 might never run until you are done with all of the Task 5's, in which case, it's useless.
One last idea I have is that instead of having Task 3 at all, at the end of Task 2 or beginning of task 4 you start a process on the scheduler node (using simply luigi.Task instead of an extension to make the work run on another node on the cluster). Then at the end of Task 5 you remove the process. There are some other edge cases you'll need to consider though, to make sure the process doesn't run too short or too long.
Good luck!
I'm working on a Python based system, to enqueue long running tasks to workers.
The tasks originate from an outside service that generate a "token", but once they're created based on that token, they should run continuously, and stopped only when explicitly removed by code.
The task starts a WebSocket and loops on it. If the socket is closed, it reopens it. Basically, the task shouldn't reach conclusion.
My goals in architecting this solutions are:
When gracefully restarting a worker (for example to load new code), the task should be re-added to the queue, and picked up by some worker.
Same thing should happen when ungraceful shutdown happens.
2 workers shouldn't work on the same token.
Other processes may create more tasks that should be directed to the same worker that's handling a specific token. This will be resolved by sending those tasks to a queue named after the token, which the worker should start listening to after starting the token's task. I am listing this requirement as an explanation to why a task engine is even required here.
Independent servers, fast code reload, etc. - Minimal downtime per task.
All our server side is Python, and looks like Celery is the best platform for it.
Are we using the right technology here? Any other architectural choices we should consider?
Thanks for your help!
According to the docs
When shutdown is initiated the worker will finish all currently executing tasks before it actually terminates, so if these tasks are important you should wait for it to finish before doing anything drastic (like sending the KILL signal).
If the worker won’t shutdown after considerate time, for example because of tasks stuck in an infinite-loop, you can use the KILL signal to force terminate the worker, but be aware that currently executing tasks will be lost (unless the tasks have the acks_late option set).
You may get something like what you want by using retry or acks_late
Overall I reckon you'll need to implement some extra application-side job control, plus, maybe, a lock service.
But, yes, overall you can do this with celery. Whether there are better technologies... that's out of the scope of this site.
I'm running Django, Celery and RabbitMQ. What I'm trying to achieve is to ensure, that tasks related to one user are executed in order (specifically, one at the time, I don't want task concurrency per user)
whenever new task is added for user, it should depend on the most recently added task. Additional functionality might include not adding task to queue, if task of this type is queued for this user and has not yet started.
I've done some research and:
I couldn't find a way to link newly created task with already queued one in Celery itself, chains seem to be only able to link new tasks.
I think that both functionalities are possible to implement with custom RabbitMQ message handler, though it might be hard to code after all.
I've also read about celery-tasktree and this might be an easiest way to ensure execution order, but how do I link new task with already "applied_async" task_tree or queue? Is there any way that I could implement that additional no-duplicate functionality using this package?
Edit: There is this also this "lock" example in celery cookbook and as the concept is fine, I can't see a possible way to make it work as intended in my case - simply if I can't acquire lock for user, task would have to be retried, but this means pushing it to the end of queue.
What would be the best course of action here?
If you configure the celery workers so that they can only execute one task at a time (see worker_concurrency setting), then you could enforce the concurrency that you need on a per user basis. Using a method like
NUMBER_OF_CELERY_WORKERS = 10
def get_task_queue_for_user(user):
return "user_queue_{}".format(user.id % NUMBER_OF_CELERY_WORKERS)
to get the task queue based on the user id, every task will be assigned to the same queue for each user. The workers would need to be configured to only consume tasks from a single task queue.
It would play out like this:
User 49 triggers a task
The task is sent to user_queue_9
When the one and only celery worker that is listening to user_queue_9 is ready to consume a new task, the task is executed
This is a hacky answer though, because
requiring just a single celery worker for each queue is a brittle system -- if the celery worker stops, the whole queue stops
the workers are running inefficiently
Here's what I'm trying to achieve:
I have a pyramid view that puts a rather large task task_1 in the default queue
task_1 does a couple of simple database things then adds task_2 to the default queue a whole bunch of times with different arguments.
task_2 instantiates some stuff in the database
Here is what is happening:
All my ram is eaten up and my computer starts paging frantically. So frantically in fact that the keyboard doesn't work well enough to stop the offending process.
The question is: How do I fix this?
Here is what I have done so far, some assumptions I'm making and what I intend to try:
I wrote a small script to take snapshots of what is using the memory up since a whole lotta stuff is added to the queue - RabbitMQ is playing nice, while celery gradually increases its footprint so the problem is there.
My informative celery log shows tasks being completed so it isn't just one task going haywire.
As far as I can see, celery simply isn't releasing memory after a task is completed. I think this has to do with sqlalchemy simply because it sometimes does some funny things. Case in point I once had a celery task that added some stuff to the database via sqlalchemy which it did fine for a while but then life ended. I took the part of the code that dealt with the sqlalchemy and stuck it in an external script that the celery task launched as a self contained process and all my problems went away. It's kinda a mission to do that every time I want to interact with a database in a celery task though so I would much prefer not to go that route.
Currently I'm reading a bit more about how celery actually works but I think that if I periodically restart the celery worker that would do the trick. Am I on the right path? Or is there something simple and obvious that I'm missing?
I have a task which I execute once a minute using celerybeat. It works fine. Sometimes though, the task takes a few seconds more than a minute to run because of which two instances of the task run. This leads to some race conditions that mess things up.
I can (and probably should) fix my task to work properly but I wanted to know if celery has any builtin ways to ensure this. My cursory Google searches and RTFMs yielded no results.
You could add a lock, using something like memcached or just your db.
If you are using a cron schedule or time interval for run periodic tasks you will still have the problem. You can always use a lock mechanism using a db or cache or even filesystem or also schedule the next task from the previous one, maybe not the best approach.
This question can probably help you:
django celery: how to set task to run at specific interval programmatically
You can try adding a classfield to the object that holds the function that youre making run and use that field as a "some other guy is working or not" control
The lock is a good way with either beat or a cron.
But, be aware that beat jobs run at worker start time, not at beat run time.
This was causing me to get a race condition even with a lock. Lets say the worker is off and beat throws 10 jobs into the queue. When celery starts up with 4 processes, all 4 of them grab a task and in my case 1 or 2 would get and set the lock at the same time.
Solution one is to use a cron with a lock, as a cron will execute at that time, not at worker start time.
Solution two is to use a slightly more advanced locking mechanism that handles race conditions. For redis look into setnx, or the newer redlock.
This blog post is really good, and includes a decorator pattern that uses redis-py's locking mechanism: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html.