In production we facing some unknown errors occuring in celery when we use redis as message broker. So we thought of migrate to rabbitmq until the errors get fixed. So that in future if there is any error in any one of them we can quickly switch between them. Is this idea feasible and is it possible to implement?
Thanks in advance
In general, yes, transports are interchangeable, with some caveats. Celery will work the same way when you swap between supported brokers. It's important, however, to know what contracts Celery offers you when you use it and which behaviors may be broker-specific.
Caveats:
Some celery settings/features depend on a specific transport. (for example, the broker_use_ssl setting is only valid with redis and amqp, not rabbitmq)
Different brokers have different default settings/behavior (for example, redis defaults to 1 hour visibility timeout and SQS uses 30 seconds by default)
In addition to differences with defaults, your broker could be configured to behave differently completely independently of your application configuration / celery settings (for example, queue settings when creating an SQS queue).
You have to consider potential for data/message loss. Any in-flight messages or messages on the queue will be lost when you switch. So you'll want to make sure you handle the swap gracefully to avoid losing messages -- or in other words: you need to migrate or redeliver any existing messages when you switch between transports. Similarly, existing rate limit counters will not transition over, message deduplication mechanisms will not transfer, etc.
Workers must be restarted for broker change to take effect. Settings cannot be changed 'on-the-fly' or automatically in response to errors.
So, yes, you can change transports whenever you want in general. However, it's probably not a good strategy for added fault tolerance or automated failover, particularly because of caveats (4) and (5).
Related
I would like to use NATS to distribute tasks among several worker-processes. Everything works as expected if I have at least one worker "online", but if there are no worker-processes, messages are just thrown away, when I turn on one worker I got no messages (which were created when it was not online).
I know how to do it with RabbitMQ, but is it possible to do it with NATS?
I do project in Python, producer-process in aiohttp, worker-processes are also in Python and do some CPU-heavy tasks.
Are you familiar with JetStream? JetStream retains messages so they can be replayed. You can configure your stream to only discard the message once it's been acknowledged.
Not sure what the state of python client is in regard to JetStream, I know it it being worked on. https://github.com/nats-io/nats.py
By this day official Python connector does not support NATS/Jetsream issue-209
We are looking to run Celery/Redis in a kubenetes cluster, and currently do not have Redis persistence enabled (everything is in-memory). I am concerned about: Redis restarts (losing in-memory data), worker restarts/outages (due to crashes and/or pod scheduling), and transient network issues.
When using Celery to do task processing using Redis, what is required to ensure that tasks are reliable?
On the redis side, just make sure that you are using the backup features:
https://redis.io/topics/persistence
How to recover redis data from snapshot(rdb file) copied from another machine?
On the celery side, make sure your tasks are idempotent. If they are re-submitted that are only run once.
If a task is in the middle of processing and there is a re-start. Then hopefully when redis and the app are backup, celery will see an incomplete task and try to schedule it again.
In order to make your Celery cluster be more robust when using Redis as a broker (and result backend) I recommend using one (or more) replicas. Unfortunately redis-py does not yet have support for clustered Redis, but that is just a matter of time. In the replicated mode, when the master server goes down, replica takes its place and this is (almost) entirely transparent. Celery also supports Redis sentinels.
Celery became much more robust over the years in terms of ensuring that tasks get redelivered in some critical cases. If the task failed because the worker is lost (there is a configuration parameter for it), some exception was thrown, etc - it will be redelivered, and executed again.
I only created the last 2 queue names that show in Rabbitmq management Webui in the below table:
The rest of the table has hash-like queues, which I don't know:
1- Who created them? (I know it is celery, but which process, task,etc.)
2- Why they are created, and what they are created for?.
I can notice that when the number of pushed messages increase, the number of those hash-like messages increase.
When using celery, Rabbitmq is used as a default result backend, and also to store errors of failing
tasks(that raised exceptions).
Every new task creates a new queue on the server, with thousands of tasks the
broker may be overloaded with queues and this will affect performance
in negative ways.
Each queue in Rabbit will be a separate Erlang process, so if you’re planning to
keep many results simultaneously you may have to increase the Erlang
process limit, and the maximum number of file descriptors your OS
allows.
Old results will not be cleaned automatically, so we have to tell
rabbit to do so.
The below conf. line dictates the time to live of the temp
queues. The default is 1 day
CELERY_AMQP_TASK_RESULT_EXPIRES = Number of seconds
OR, We can change the backend store totally, and not make it in Rabbit.
CELERY_BACKEND = "amqp"
We may also ignore it:
CELERY_IGNORE_RESULT = True.
Also, when ignoring the result, we can also keep the errors stored for later usage,
which means one more queue for the failing tasks.
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True.
I will not mark this question as answered, waiting for a better answer.
Rererences:
This SO link
Celery documentation
Rabbitmq documentation
I have a service that needs a sort of coordinator component. The coordinator will manage entities that need to be assigned to users, taken away from users if the users do not respond on a timely manner, and also handle user responses if they do response. The coordinator will also need to contact messaging services to notify the users they have something to handle.
I want the coordinator to be a single-threaded process, as the load is not expected to be too much for the first few years of usage, and I'd much rather postpone all the concurrency issues to when I really need to handle them (if at all).
The coordinator will receive new entities and user responses from a Django webserver. I thought the easiest way to handle this is with Celery tasks - the webserver just starts a task that the coordinator consumes on its own time.
For this to happen, I need the coordinator to contain a celery worker, and replace the current worker mainloop with my own version (one that checks the broker for a new message and handles the scheduling).
How feasible is it? The alternative is to avoid Celery and use RabbitMQ directly. I'd rather not do that.
Replace this names: coordinator with rabbitmq (or some other broker kombu supports) and users with celery workers.
I am pretty sure you can do all you need (and much more) just by configuring celery / kombu and rabbitmq and without writing too many (if any) lines of code.
small note: Celery features scheduled tasks.
I'm writing some software which will manage a few hundred small systems in “the field” over an intermittent 3G (or similar) connection.
Home base will need to send jobs to the systems in the field (eg, “report on your status”, “update your software”, etc), and the systems in the field will need to send jobs back to the server (eg, “a failure has been detected”, “here is some data”, etc).
I've spent some time looking at Celery and it seems to be a perfect fit: celeryd running at home base could collect jobs for the systems in the field, a celeryd running on the field systems could collect jobs for the server, and these jobs could be exchanged as clients become available.
So, is Celery a good fit for this problem? Specifically:
The majority of tasks will be directed to an individual worker (eg, “send the ‘get_status’ job to ‘system51’”) — will this be a problem?
Does it gracefully handle adverse network conditions (like, eg, connections dying)?
What functionality is only available if RabbitMQ is being used as a backend? (I'd rather not run RabbitMQ on the field systems)
Is there any other reason Celery could make my life difficult if I use it like I've described?
Thanks!
(it would be valid to suggest that Celery is overkill, but there are other reasons that it would make my life easier, so I would like to consider it)
The majority of tasks will be directed
to an individual worker (eg, “send the
‘get_status’ job to ‘system51’”) —
will this be a problem?
Not at all. Just create a queue for each worker, e.g. say each node listens to a round robin queue called default and each node has its own queue named after its node name:
(a)$ celeryd -n a.example.com -Q default,a.example.com
(b)$ celeryd -n b.example.com -Q default,b.example.com
(c)$ celeryd -n c.example.com -Q default,c.example.com
Routing a task directly to a node is simple:
$ get_status.apply_async(args, kwargs, queue="a.example.com")
or by configuration using a Router:
# Always route "app.get_status" to "a.example.com"
CELERY_ROUTES = {"app.get_status": {"queue": "a.example.com"}}
Does it gracefully handle adverse
network conditions (like, eg,
connections dying)?
The worker gracefully recovers from broker connection failures.
(at least from RabbitMQ, I'm not sure about all the other backends, but this
is easy to test and fix (you only need to add the related exceptions to a list)
For the client you can always retry sending the task if the connection is down,
or you can set up HA with RabbitMQ: http://www.rabbitmq.com/pacemaker.html
What functionality is only available
if RabbitMQ is being used as a
backend? (I'd rather not run RabbitMQ
on the field systems)
Remote control commands, and only "direct" exchanges are supported (not "topic" or "fanout"). But this will be supported in Kombu (http://github.com/ask/kombu).
I would seriously reconsider using RabbitMQ. Why do you think it's not a good fit?
IMHO I wouldn't look elsewhere for a system like this, (except maybe ZeroMQ if the system
is transient and you don't require message persistence).
Is there any other reason Celery could make my life
difficult if I use it like I've described?
I can't think of anything from what you describe above. Since the concurrency model
is multiprocessing it does require some memory (I'm working on adding support for
thread pools and eventlet pools, which may help in some cases).
it would be valid to suggest that Celery is overkill, but there are
other reasons that it would make my life easier, so I would like to
consider it)
In that case I think you use the word overkill lightly. It really depends
on how much code and tests you need to write without it. I think
it's better to improve an already existing general solution, and in theory it sounds
like it should work well for your application.
I would probably set up a (django) web service to accept requests. The web service could do the job of validating requests and deflecting bad requests. Then celery can just do the work.
This would require the remote devices to poll the web service to see if their jobs were done though. That may or may not be appropriate, depending on what exactly you're doing.