Is Celery appropriate for use with many small, distributed systems?

Is Celery appropriate for use with many small, distributed systems? - python

I'm writing some software which will manage a few hundred small systems in “the field” over an intermittent 3G (or similar) connection.
Home base will need to send jobs to the systems in the field (eg, “report on your status”, “update your software”, etc), and the systems in the field will need to send jobs back to the server (eg, “a failure has been detected”, “here is some data”, etc).
I've spent some time looking at Celery and it seems to be a perfect fit: celeryd running at home base could collect jobs for the systems in the field, a celeryd running on the field systems could collect jobs for the server, and these jobs could be exchanged as clients become available.
So, is Celery a good fit for this problem? Specifically:
The majority of tasks will be directed to an individual worker (eg, “send the ‘get_status’ job to ‘system51’”) — will this be a problem?
Does it gracefully handle adverse network conditions (like, eg, connections dying)?
What functionality is only available if RabbitMQ is being used as a backend? (I'd rather not run RabbitMQ on the field systems)
Is there any other reason Celery could make my life difficult if I use it like I've described?
Thanks!
(it would be valid to suggest that Celery is overkill, but there are other reasons that it would make my life easier, so I would like to consider it)

The majority of tasks will be directed
to an individual worker (eg, “send the
‘get_status’ job to ‘system51’”) —
will this be a problem?
Not at all. Just create a queue for each worker, e.g. say each node listens to a round robin queue called default and each node has its own queue named after its node name:
(a)$ celeryd -n a.example.com -Q default,a.example.com
(b)$ celeryd -n b.example.com -Q default,b.example.com
(c)$ celeryd -n c.example.com -Q default,c.example.com
Routing a task directly to a node is simple:
$ get_status.apply_async(args, kwargs, queue="a.example.com")
or by configuration using a Router:
# Always route "app.get_status" to "a.example.com"
CELERY_ROUTES = {"app.get_status": {"queue": "a.example.com"}}
Does it gracefully handle adverse
network conditions (like, eg,
connections dying)?
The worker gracefully recovers from broker connection failures.
(at least from RabbitMQ, I'm not sure about all the other backends, but this
is easy to test and fix (you only need to add the related exceptions to a list)
For the client you can always retry sending the task if the connection is down,
or you can set up HA with RabbitMQ: http://www.rabbitmq.com/pacemaker.html
What functionality is only available
if RabbitMQ is being used as a
backend? (I'd rather not run RabbitMQ
on the field systems)
Remote control commands, and only "direct" exchanges are supported (not "topic" or "fanout"). But this will be supported in Kombu (http://github.com/ask/kombu).
I would seriously reconsider using RabbitMQ. Why do you think it's not a good fit?
IMHO I wouldn't look elsewhere for a system like this, (except maybe ZeroMQ if the system
is transient and you don't require message persistence).
Is there any other reason Celery could make my life
difficult if I use it like I've described?
I can't think of anything from what you describe above. Since the concurrency model
is multiprocessing it does require some memory (I'm working on adding support for
thread pools and eventlet pools, which may help in some cases).
it would be valid to suggest that Celery is overkill, but there are
other reasons that it would make my life easier, so I would like to
consider it)
In that case I think you use the word overkill lightly. It really depends
on how much code and tests you need to write without it. I think
it's better to improve an already existing general solution, and in theory it sounds
like it should work well for your application.

I would probably set up a (django) web service to accept requests. The web service could do the job of validating requests and deflecting bad requests. Then celery can just do the work.
This would require the remote devices to poll the web service to see if their jobs were done though. That may or may not be appropriate, depending on what exactly you're doing.

Related

Is it feasible to often switching between redis and rabbitmq in celery?

In production we facing some unknown errors occuring in celery when we use redis as message broker. So we thought of migrate to rabbitmq until the errors get fixed. So that in future if there is any error in any one of them we can quickly switch between them. Is this idea feasible and is it possible to implement?
Thanks in advance

In general, yes, transports are interchangeable, with some caveats. Celery will work the same way when you swap between supported brokers. It's important, however, to know what contracts Celery offers you when you use it and which behaviors may be broker-specific.
Caveats:
Some celery settings/features depend on a specific transport. (for example, the broker_use_ssl setting is only valid with redis and amqp, not rabbitmq)
Different brokers have different default settings/behavior (for example, redis defaults to 1 hour visibility timeout and SQS uses 30 seconds by default)
In addition to differences with defaults, your broker could be configured to behave differently completely independently of your application configuration / celery settings (for example, queue settings when creating an SQS queue).
You have to consider potential for data/message loss. Any in-flight messages or messages on the queue will be lost when you switch. So you'll want to make sure you handle the swap gracefully to avoid losing messages -- or in other words: you need to migrate or redeliver any existing messages when you switch between transports. Similarly, existing rate limit counters will not transition over, message deduplication mechanisms will not transfer, etc.
Workers must be restarted for broker change to take effect. Settings cannot be changed 'on-the-fly' or automatically in response to errors.
So, yes, you can change transports whenever you want in general. However, it's probably not a good strategy for added fault tolerance or automated failover, particularly because of caveats (4) and (5).

Is it possible to use a single celery instance to connect to multiple brokers?

I have a use case where there are two RabbitMQs which I would like to connect to, RabbitMQ instance A and instance B. Assume for the moment that I cannot combine these two instances into a single RabbitMQ instance and they must be separate. Please note that these two instances have different exchanges/queues and are not by-any-means replications of the data or messages.
Is it possible, using a single celery application, to connect to two brokers, and their exchanges/queues at: amqp://<instance-a>:5672 and amqp://<instance-b>:5672?
I have looked through the documentation and this doesn't seem to be possible, celery seems to be monolithic for the most part--however I am relatively new to celery (and Python) so I may have missed something.

I suspect you might be "abusing" celery as a rabbitmq consumer. Using rabbitmq as a message queue (or event queue) is a great idea, but you don't need to use celery to consume from it (and frankly - since celery is not adapted for this kind of work, it would probably bite you later on)
So you better choose some rabbiqmq client abstraction library (Kombu Pika, Puke are the major python options) and build your self a decent consumer.
You can also try to use the shovel plugin for rabbitmq. That can be used to "shovel" messages from one queue/exchange to another. Might also work

Using Celery for Realtime, Synchronous External API Querying with Gevent

I'm working on a web application that will receive a request from a user and have to hit a number of external APIs to compose the answer to that request. This could be done directly from the main web thread using something like gevent to fan out the request.
Alternatively, I was thinking, I could put incoming requests into a queue and use workers to distribute the load. The idea would be to try to keep it real time, while splitting up the requests amongst several workers. Each of these workers would be querying only one of the many external APIs. The response they receive would then go through a series transformations, be saved into a DB, be transformed to a common schema and saved in a common DB to finally be composed into one big response that would be returned through the web request. The web request is most likely going to be blocking all this time, with a user waiting, so keeping
the queueing and dequeueing as fast as possible is important.
The external API calls can easily be turned into individual tasks. I think the linking
from one api task to a transformation to a DB saving task could be done using a chain, etc, and the final result combining all results returned to the web thread using a chord.
Some questions:
Can this (and should this) be done using celery?
I'm using django. Should I try to use django-celery over plain celery?
Each one of those tasks might spawn off other tasks - such as logging what just
happened or other types of branching off. Is this possible?
Could tasks be returning the data they get - i.e. potentially Kb of data through celery (redis as underlying in this case) or should they write to the DB, and just pass pointers to that data around?
Each task is mostly I/O bound, and was initially just going to use gevent from the web thread to fan out the requests and skip the whole queuing design, but it turns out that it would be reused for a different component. Trying to keep the whole round trip through the Qs real time will probably require many workers making sure the queueus are mostly empty. Or is it? Would running the gevent worker pool help with this?
Do I have to write gevent specific tasks or will using the gevent pool deal with network IO automagically?
Is it possible to assign priority to certain tasks?
What about keeping them in order?
Should I skip celery and just use kombu?
It seems like celery is geared more towards "tasks" that can be deferred and are
not time sensitive. Am I nuts for trying to keep this real time?
What other technologies should I look at?
Update: Trying to hash this out a bit more. I did some reading on Kombu and it seems to be able to do what I'm thinking of, although at a much lower level than celery. Here is a diagram of what I had in mind.
What seems to be possible with raw queues as accessible with Kombu is the ability for a number of workers to subscribe to a broadcast message. The type and number does not need to be known by the publisher if using a queue. Can something similar be achieved using Celery? It seems like if you want to make a chord, you need to know at runtime what tasks are going to be involved in the chord, whereas in this scenario you can simply add listeners to the broadcast, and simply make sure they announce they are in the running to add responses to the final queue.
Update 2: I see there is the ability to broadcast Can you combine this with a chord? In general, can you combine celery with raw kombu? This is starting to sound like a question about smoothies.

I will try to answer as many of the questions as possible.
Can this (and should this) be done using celery?
Yes you can
I'm using django. Should I try to use django-celery over plain celery?
Django has a good support for celery and would make the life much easier during development
Each one of those tasks might spawn off other tasks - such as logging
what just happened or other types of branching off. Is this possible?
You can start subtasks from withing a task with ignore_result = true for only side effects
Could tasks be returning the data they get - i.e. potentially Kb of
data through celery (redis as underlying in this case) or should they
write to the DB, and just pass pointers to that data around?
I would suggest putting the results in db and then passing id around would make your broker and workers happy. Less data transfer/pickling etc.
Each task is mostly I/O bound, and was initially just going to use
gevent from the web thread to fan out the requests and skip the whole
queuing design, but it turns out that it would be reused for a
different component. Trying to keep the whole round trip through the
Qs real time will probably require many workers making sure the
queueus are mostly empty. Or is it? Would running the gevent worker
pool help with this?
Since the process is io bound then gevent will definitely help here. However, how much the concurrency should be for gevent pool'd worker, is something that I'm looking for answer too.
Do I have to write gevent specific tasks or will using the gevent pool
deal with network IO automagically?
Gevent does the monkey patching automatically when you use it in pool. But the libraries that you use should play well with gevent. Otherwise, if your parsing some data with simplejson (which is written in c) then that would block other gevent greenlets.
Is it possible to assign priority to certain tasks?
You cannot assign specific priorities to certain tasks, but route them to different queue and then have those queues being listened to by varying number of workers. The more the workers for a particular queue, the higher would be the priority of that tasks on that queue.
What about keeping them in order?
Chain is one way to maintain order. Chord is a good way to summarize. Celery takes care of it, so you dont have to worry about it. Even when using gevent pool, it would at the end be possible to reason about the order of the tasks execution.
Should I skip celery and just use kombu?
You can, if your use case will not change to something more complex over time and also if you are willing to manage your processes through celeryd + supervisord by yourself. Also, if you don't care about the task monitoring that comes with tools such as celerymon, flower, etc.
It seems like celery is geared more towards "tasks" that can be
deferred and are not time sensitive.
Celery supports scheduled tasks as well. If that is what you meant by that statement.
Am I nuts for trying to keep this real time?
I don't think so. As long as your consumers are fast enough, it will be as good as real time.
What other technologies should I look at?
Pertaining to celery, you should choose result store wisely. My suggestion would be to use cassandra. It is good for realtime data (both write and query wise). You can also use redis or mongodb. They come with their own set of problems as result store. But then a little tweaking in configuration can go a long way.
If you mean something completely different from celery, then you can look into asyncio (python3.5) and zeromq for achieving the same. I can't comment more on that though.

Embed a celery worker in my own code

I have a service that needs a sort of coordinator component. The coordinator will manage entities that need to be assigned to users, taken away from users if the users do not respond on a timely manner, and also handle user responses if they do response. The coordinator will also need to contact messaging services to notify the users they have something to handle.
I want the coordinator to be a single-threaded process, as the load is not expected to be too much for the first few years of usage, and I'd much rather postpone all the concurrency issues to when I really need to handle them (if at all).
The coordinator will receive new entities and user responses from a Django webserver. I thought the easiest way to handle this is with Celery tasks - the webserver just starts a task that the coordinator consumes on its own time.
For this to happen, I need the coordinator to contain a celery worker, and replace the current worker mainloop with my own version (one that checks the broker for a new message and handles the scheduling).
How feasible is it? The alternative is to avoid Celery and use RabbitMQ directly. I'd rather not do that.

Replace this names: coordinator with rabbitmq (or some other broker kombu supports) and users with celery workers.
I am pretty sure you can do all you need (and much more) just by configuring celery / kombu and rabbitmq and without writing too many (if any) lines of code.
small note: Celery features scheduled tasks.

Maximally simple django timed/scheduled tasks (e.g.: reminders)?

Question is relevant to this and this;
the difference is, I'd prefer something with possibly more precision and low load (per-minute cron job isn't preferable for those) and with minimal overhead (i.e. installing celery with rabbitmq seems like a big overkill).
An example task for such is personal reminders server (with reminders that could be edited over web and sent out through e-mail or XMPP).
I'm probably looking for something more like node.js's setTimeout but for django (and though I might prefer to implement reminders in node.js anyway, it's still a possibly interesting question).
For example, it's possible to start new threads in django app (with functions consisting of sleep() and send()); in what ways this can be bad?

The problem with using threads for this solution are the typical problems with Python threads that always drive people towards multi-process solutions instead. The problem is compounded here by the fact your thread isn't driven by the normal request-response cycle. This is summarized nicely by Malcolm Tredinnick here:
Have to disagree. Threads are not a good solution to this problem. The
issue is process management. As written, your threads will never be
rejoined. Webserver processes have a lifecycle uncontrollable by you
(the MaxRequestsPerChild Apache parameter and similar things in other
servers) and you are messing with that by using threads.
If you need a process with a lifecycle that is not matched by the
request-response path — something long running and independent of the
response — a completely separate process is definitely the right model
to use. Using a thread is tying it to the response lifecycle, which
wil have unintended side-effects.
A possible solution for you might be to have a long running process performing your tasks which gets a wake-up signal from a light cron process.
Another possibility would be build something using 0mq, which is much lighter than AMQP style queues (at the cost of some features of course). Tarek Ziade is working on a Mozilla project called powerhose that uses 0mq, looks super simple, and has a heartbeat capability with resolution to the second.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.