Using Multiple Installations of Celery with a Redis Backend

Using Multiple Installations of Celery with a Redis Backend - python

Is it possible to use the same redis database for multiple projects using celery? Like using the same database for multiple projects as a cache using a key prefix. Or do i have to use a seperate database for every installation?

To summarize from this helpful blog post: https://kfalck.net/2013/02/21/run-multiple-celeries-on-a-single-redis/
Specify a different database number for each project, e.g. redis://localhost/0 and redis://localhost/1
Define and use different queue names for the different projects. On the task side, define CELERY_DEFAULT_QUEUE, and when starting up your worker, use the -Q parameter to specify that queue. Read more about routing here: http://docs.celeryproject.org/en/latest/userguide/routing.html

I've used a redis backend for celery while also using the same redis db with prefixed cache data. I was doing this during development, I only used redis for the result backend not to queue tasks, and the production deployment ended up being all AMQP (redis only for caching). I didn't have any problems and don't see why one would (other than performance issues).
For running multiple celery projects with different task definitions, I think the issue would be if you have two different types of workers that each can only handle a subset of job types. Without separate databases, I'm not sure how the workers would be able to tell which jobs they could process.
I'd probably either want to make sure all workers had all task types defined and could process anything, or would want to keep the separate projects in separate databases. This wouldn't require installing anything extra, you'd just specify a REDIS_DB=1 in one of your celery projects. There might be another way to do this. I don't know for sure that multiple DBs are required, but it kinda makes sense.
If you're only using redis for a result backend, maybe that would work for having multiple celery projects on one redis db... I'm not really sure.

Related

What's the difference between FastAPI background tasks and Celery tasks?

Recently I read something about this and the point was that celery is more productive.
Now, I can't find detailed information about the difference between these two and what should be the best way to use them.

Straight from the documentation:
If you need to perform heavy background computation and you don't
necessarily need it to be run by the same process (for example, you
don't need to share memory, variables, etc), you might benefit from
using other bigger tools like Celery.
They tend to require more complex configurations, a message/job queue
manager, like RabbitMQ or Redis, but they allow you to run
background tasks in multiple processes, and especially, in multiple
servers.
To see an example, check the Project Generators, they all include
Celery already configured.
But if you need to access variables and objects from the same
FastAPI app, or you need to perform small background tasks (like
sending an email notification), you can simply just use
BackgroundTasks.
Have a look at this answer as well.

What options exist for segregating python environments in a mult-user dask.distributed cluster?

I'm specifically interested in avoiding conflicts when multiple users upload (upload_file) slightly different versions of the same python file or zip contents.
It would seem this is not really a supported use case as the worker process is long-running and subject to the environment changes/additions of others.
I like the library for easy, on-demand local/remote context switching, so would appreciate any insight on what options we might have, even if it means some seamless deploy-like step for user-specific worker processes.

Usually the solution to having different user environments is to launch and destroy networks of different Dask workers/schedulers on the fly on top of some other job scheduler like Kubernetes, Marathon, or Yarn.
If you need to reuse the same set of dask workers then you could also be careful about specifying the workers= keyword consistently, but this would be error prone.

Is it possible to use a single celery instance to connect to multiple brokers?

I have a use case where there are two RabbitMQs which I would like to connect to, RabbitMQ instance A and instance B. Assume for the moment that I cannot combine these two instances into a single RabbitMQ instance and they must be separate. Please note that these two instances have different exchanges/queues and are not by-any-means replications of the data or messages.
Is it possible, using a single celery application, to connect to two brokers, and their exchanges/queues at: amqp://<instance-a>:5672 and amqp://<instance-b>:5672?
I have looked through the documentation and this doesn't seem to be possible, celery seems to be monolithic for the most part--however I am relatively new to celery (and Python) so I may have missed something.

I suspect you might be "abusing" celery as a rabbitmq consumer. Using rabbitmq as a message queue (or event queue) is a great idea, but you don't need to use celery to consume from it (and frankly - since celery is not adapted for this kind of work, it would probably bite you later on)
So you better choose some rabbiqmq client abstraction library (Kombu Pika, Puke are the major python options) and build your self a decent consumer.
You can also try to use the shovel plugin for rabbitmq. That can be used to "shovel" messages from one queue/exchange to another. Might also work

Search and selectively remove tasks from Celery

I want to clean up my celery queues. How can I search the tasks by their types arguments and the selectively remove some of them? I am using Redis as a broker if that matters; I prefer to not deal with this at the Redis level though.

The only option I see here is using coding directly using the Kombu library that is what Celery uses as AMQP a library that can dialogue with all the supported broker in an abstract way including Redis.
I will anyway discourage this practice as often the need of cleanup queue is an outcome of bad design.
Regards

Is Celery appropriate for use with many small, distributed systems?

I'm writing some software which will manage a few hundred small systems in “the field” over an intermittent 3G (or similar) connection.
Home base will need to send jobs to the systems in the field (eg, “report on your status”, “update your software”, etc), and the systems in the field will need to send jobs back to the server (eg, “a failure has been detected”, “here is some data”, etc).
I've spent some time looking at Celery and it seems to be a perfect fit: celeryd running at home base could collect jobs for the systems in the field, a celeryd running on the field systems could collect jobs for the server, and these jobs could be exchanged as clients become available.
So, is Celery a good fit for this problem? Specifically:
The majority of tasks will be directed to an individual worker (eg, “send the ‘get_status’ job to ‘system51’”) — will this be a problem?
Does it gracefully handle adverse network conditions (like, eg, connections dying)?
What functionality is only available if RabbitMQ is being used as a backend? (I'd rather not run RabbitMQ on the field systems)
Is there any other reason Celery could make my life difficult if I use it like I've described?
Thanks!
(it would be valid to suggest that Celery is overkill, but there are other reasons that it would make my life easier, so I would like to consider it)

The majority of tasks will be directed
to an individual worker (eg, “send the
‘get_status’ job to ‘system51’”) —
will this be a problem?
Not at all. Just create a queue for each worker, e.g. say each node listens to a round robin queue called default and each node has its own queue named after its node name:
(a)$ celeryd -n a.example.com -Q default,a.example.com
(b)$ celeryd -n b.example.com -Q default,b.example.com
(c)$ celeryd -n c.example.com -Q default,c.example.com
Routing a task directly to a node is simple:
$ get_status.apply_async(args, kwargs, queue="a.example.com")
or by configuration using a Router:
# Always route "app.get_status" to "a.example.com"
CELERY_ROUTES = {"app.get_status": {"queue": "a.example.com"}}
Does it gracefully handle adverse
network conditions (like, eg,
connections dying)?
The worker gracefully recovers from broker connection failures.
(at least from RabbitMQ, I'm not sure about all the other backends, but this
is easy to test and fix (you only need to add the related exceptions to a list)
For the client you can always retry sending the task if the connection is down,
or you can set up HA with RabbitMQ: http://www.rabbitmq.com/pacemaker.html
What functionality is only available
if RabbitMQ is being used as a
backend? (I'd rather not run RabbitMQ
on the field systems)
Remote control commands, and only "direct" exchanges are supported (not "topic" or "fanout"). But this will be supported in Kombu (http://github.com/ask/kombu).
I would seriously reconsider using RabbitMQ. Why do you think it's not a good fit?
IMHO I wouldn't look elsewhere for a system like this, (except maybe ZeroMQ if the system
is transient and you don't require message persistence).
Is there any other reason Celery could make my life
difficult if I use it like I've described?
I can't think of anything from what you describe above. Since the concurrency model
is multiprocessing it does require some memory (I'm working on adding support for
thread pools and eventlet pools, which may help in some cases).
it would be valid to suggest that Celery is overkill, but there are
other reasons that it would make my life easier, so I would like to
consider it)
In that case I think you use the word overkill lightly. It really depends
on how much code and tests you need to write without it. I think
it's better to improve an already existing general solution, and in theory it sounds
like it should work well for your application.

I would probably set up a (django) web service to accept requests. The web service could do the job of validating requests and deflecting bad requests. Then celery can just do the work.
This would require the remote devices to poll the web service to see if their jobs were done though. That may or may not be appropriate, depending on what exactly you're doing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.