Search and selectively remove tasks from Celery - python

I want to clean up my celery queues. How can I search the tasks by their types arguments and the selectively remove some of them? I am using Redis as a broker if that matters; I prefer to not deal with this at the Redis level though.

The only option I see here is using coding directly using the Kombu library that is what Celery uses as AMQP a library that can dialogue with all the supported broker in an abstract way including Redis.
I will anyway discourage this practice as often the need of cleanup queue is an outcome of bad design.
Regards

Related

Using Celery with RabbitMQ as broker vs using just RabbitMQ + Pika for async tasks, advantages of using one over another

The debate I am in currently is whether we should stick with RabbitMQ implementation using Pika or move to celery, what all advantages are there if we go with Celery. From what I have understood is Celery is a distributed job queue that simplifies the management of task distribution. It uses broker (RabbitMQ, Redis and so on) for the sending and receiving the message between client and worker, it also can optionally use backend such as Redis to store the results.
Where as RabbitMQ is a message Queue which can be used to perform the Jobs in async manner. Eventually if we use either RabbitMQ and implement it using Pika in python it will do the same Job which is to execute long running processes in background.
The few advantages that I see in using Celery are:
Can store result of each task, using backend (such as redis).
Easier to implement.
Also allows to add retries.
Is a distributed job queue, can be run on multiple nodes/clusters.
Packages like flower can be used to monitor each task, their states, results, time taken and some other metadata too.
Task chaining
But on the other side it seems it does restrict us to use some of the features of RabbitMQ and also it has some limitations like it will connect with broker synchronously (issue on github https://github.com/celery/celery/issues/3884
)
I am familiar with this Question already asked here Why use Celery instead of RabbitMQ? but it does not seem to be clear.
Any help would be highly appreciated.
Erm... it seems to me like you are comparing mosquitoes and elephants here. RabbitMQ+Pika is not a replacement for Celery. However, RabbitMQ+Pika can help you implement a (miniature) service such as Celery, if that is really what you want.
If you use RabbitMQ as backend, Celery (actually kombu) will use something similar to Pika - the celery amqp project, to communicate with the broker.

What is the relationship between Celery and RabbitMQ?

Is Celery mostly just a high level interface for message queues like RabbitMQ? I am trying to set up a system with multiple scheduled workers doing concurrent http requests, but I am not sure if I would need either of them. Another question I am wondering is where do you write the actual task in code for the workers to complete, if I am using Celery or RabbitMQ?
RabbitMQ is indeed a message queue, and Celery uses it to send messages to and from workers. Celery is more than just an interface for RabbitMQ. Celery is what you use to create workers, kick off tasks, and define your tasks. It sounds like your use case makes sense for Celery/RabbitMQ. You create a task using the #app.task decorator. Check the docs for more info. In previous projects, I've set up a module for celery, where I define any tasks I need. Then you can pull in functions from other modules to use in your tasks.
Celery is the task management framework--the API you use to schedule jobs, the code that gets those jobs started, the management tools (e.g. Flower) you use to monitor what's going on.
RabbitMQ is one of several "backends" for Celery. It's an oversimplification to say that Celery is a high-level interface to RabbitMQ. RabbitMQ is not actually required for Celery to run and do its job properly. But, in practice, they are often paired together, and Celery is a higher-level way of accomplishing some things that you could do at a lower level with just RabbitMQ (or another queue or message delivery backend).

Is it possible to use a single celery instance to connect to multiple brokers?

I have a use case where there are two RabbitMQs which I would like to connect to, RabbitMQ instance A and instance B. Assume for the moment that I cannot combine these two instances into a single RabbitMQ instance and they must be separate. Please note that these two instances have different exchanges/queues and are not by-any-means replications of the data or messages.
Is it possible, using a single celery application, to connect to two brokers, and their exchanges/queues at: amqp://<instance-a>:5672 and amqp://<instance-b>:5672?
I have looked through the documentation and this doesn't seem to be possible, celery seems to be monolithic for the most part--however I am relatively new to celery (and Python) so I may have missed something.
I suspect you might be "abusing" celery as a rabbitmq consumer. Using rabbitmq as a message queue (or event queue) is a great idea, but you don't need to use celery to consume from it (and frankly - since celery is not adapted for this kind of work, it would probably bite you later on)
So you better choose some rabbiqmq client abstraction library (Kombu Pika, Puke are the major python options) and build your self a decent consumer.
You can also try to use the shovel plugin for rabbitmq. That can be used to "shovel" messages from one queue/exchange to another. Might also work

Using Multiple Installations of Celery with a Redis Backend

Is it possible to use the same redis database for multiple projects using celery? Like using the same database for multiple projects as a cache using a key prefix. Or do i have to use a seperate database for every installation?
To summarize from this helpful blog post: https://kfalck.net/2013/02/21/run-multiple-celeries-on-a-single-redis/
Specify a different database number for each project, e.g. redis://localhost/0 and redis://localhost/1
Define and use different queue names for the different projects. On the task side, define CELERY_DEFAULT_QUEUE, and when starting up your worker, use the -Q parameter to specify that queue. Read more about routing here: http://docs.celeryproject.org/en/latest/userguide/routing.html
I've used a redis backend for celery while also using the same redis db with prefixed cache data. I was doing this during development, I only used redis for the result backend not to queue tasks, and the production deployment ended up being all AMQP (redis only for caching). I didn't have any problems and don't see why one would (other than performance issues).
For running multiple celery projects with different task definitions, I think the issue would be if you have two different types of workers that each can only handle a subset of job types. Without separate databases, I'm not sure how the workers would be able to tell which jobs they could process.
I'd probably either want to make sure all workers had all task types defined and could process anything, or would want to keep the separate projects in separate databases. This wouldn't require installing anything extra, you'd just specify a REDIS_DB=1 in one of your celery projects. There might be another way to do this. I don't know for sure that multiple DBs are required, but it kinda makes sense.
If you're only using redis for a result backend, maybe that would work for having multiple celery projects on one redis db... I'm not really sure.

Is Celery appropriate for use with many small, distributed systems?

I'm writing some software which will manage a few hundred small systems in “the field” over an intermittent 3G (or similar) connection.
Home base will need to send jobs to the systems in the field (eg, “report on your status”, “update your software”, etc), and the systems in the field will need to send jobs back to the server (eg, “a failure has been detected”, “here is some data”, etc).
I've spent some time looking at Celery and it seems to be a perfect fit: celeryd running at home base could collect jobs for the systems in the field, a celeryd running on the field systems could collect jobs for the server, and these jobs could be exchanged as clients become available.
So, is Celery a good fit for this problem? Specifically:
The majority of tasks will be directed to an individual worker (eg, “send the ‘get_status’ job to ‘system51’”) — will this be a problem?
Does it gracefully handle adverse network conditions (like, eg, connections dying)?
What functionality is only available if RabbitMQ is being used as a backend? (I'd rather not run RabbitMQ on the field systems)
Is there any other reason Celery could make my life difficult if I use it like I've described?
Thanks!
(it would be valid to suggest that Celery is overkill, but there are other reasons that it would make my life easier, so I would like to consider it)
The majority of tasks will be directed
to an individual worker (eg, “send the
‘get_status’ job to ‘system51’”) —
will this be a problem?
Not at all. Just create a queue for each worker, e.g. say each node listens to a round robin queue called default and each node has its own queue named after its node name:
(a)$ celeryd -n a.example.com -Q default,a.example.com
(b)$ celeryd -n b.example.com -Q default,b.example.com
(c)$ celeryd -n c.example.com -Q default,c.example.com
Routing a task directly to a node is simple:
$ get_status.apply_async(args, kwargs, queue="a.example.com")
or by configuration using a Router:
# Always route "app.get_status" to "a.example.com"
CELERY_ROUTES = {"app.get_status": {"queue": "a.example.com"}}
Does it gracefully handle adverse
network conditions (like, eg,
connections dying)?
The worker gracefully recovers from broker connection failures.
(at least from RabbitMQ, I'm not sure about all the other backends, but this
is easy to test and fix (you only need to add the related exceptions to a list)
For the client you can always retry sending the task if the connection is down,
or you can set up HA with RabbitMQ: http://www.rabbitmq.com/pacemaker.html
What functionality is only available
if RabbitMQ is being used as a
backend? (I'd rather not run RabbitMQ
on the field systems)
Remote control commands, and only "direct" exchanges are supported (not "topic" or "fanout"). But this will be supported in Kombu (http://github.com/ask/kombu).
I would seriously reconsider using RabbitMQ. Why do you think it's not a good fit?
IMHO I wouldn't look elsewhere for a system like this, (except maybe ZeroMQ if the system
is transient and you don't require message persistence).
Is there any other reason Celery could make my life
difficult if I use it like I've described?
I can't think of anything from what you describe above. Since the concurrency model
is multiprocessing it does require some memory (I'm working on adding support for
thread pools and eventlet pools, which may help in some cases).
it would be valid to suggest that Celery is overkill, but there are
other reasons that it would make my life easier, so I would like to
consider it)
In that case I think you use the word overkill lightly. It really depends
on how much code and tests you need to write without it. I think
it's better to improve an already existing general solution, and in theory it sounds
like it should work well for your application.
I would probably set up a (django) web service to accept requests. The web service could do the job of validating requests and deflecting bad requests. Then celery can just do the work.
This would require the remote devices to poll the web service to see if their jobs were done though. That may or may not be appropriate, depending on what exactly you're doing.

Categories