I only created the last 2 queue names that show in Rabbitmq management Webui in the below table:
The rest of the table has hash-like queues, which I don't know:
1- Who created them? (I know it is celery, but which process, task,etc.)
2- Why they are created, and what they are created for?.
I can notice that when the number of pushed messages increase, the number of those hash-like messages increase.
When using celery, Rabbitmq is used as a default result backend, and also to store errors of failing
tasks(that raised exceptions).
Every new task creates a new queue on the server, with thousands of tasks the
broker may be overloaded with queues and this will affect performance
in negative ways.
Each queue in Rabbit will be a separate Erlang process, so if you’re planning to
keep many results simultaneously you may have to increase the Erlang
process limit, and the maximum number of file descriptors your OS
allows.
Old results will not be cleaned automatically, so we have to tell
rabbit to do so.
The below conf. line dictates the time to live of the temp
queues. The default is 1 day
CELERY_AMQP_TASK_RESULT_EXPIRES = Number of seconds
OR, We can change the backend store totally, and not make it in Rabbit.
CELERY_BACKEND = "amqp"
We may also ignore it:
CELERY_IGNORE_RESULT = True.
Also, when ignoring the result, we can also keep the errors stored for later usage,
which means one more queue for the failing tasks.
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True.
I will not mark this question as answered, waiting for a better answer.
Rererences:
This SO link
Celery documentation
Rabbitmq documentation
Related
Using distributed to schedule lots of interdependent tasks, running on google compute engine. When I start an extra instance with workers halfway, no tasks get scheduled to it (though it registers fine with the scheduler). I presume this is because (from http://distributed.readthedocs.io/en/latest/scheduling-state.html#distributed.scheduler.decide_worker):
"If the task requires data communication, then we choose to minimize the number of bytes sent between workers. This takes precedence over worker occupancy."
Once I'm halfway running the task tree, all remaining tasks depend on the result of tasks which have already run. So, if I interpret the above quote right, nothing will ever be scheduled on the new workers, no matter how idle they are, as the dependent data is never already there but always on a 'old' worker.
However, I do make sure the amount of data to transfer is fairly minimal, usually just a small string. So in this case it would make much more sense to let idleness prevail over data communication. Would it be possible to allow this (e.g. setting a 'scheduler policy'? Or maybe even have a data-vs-idleness tradeoff coefficent which could be tuned?
Update after comment #1:
Complicating factor: every task is using the resources framework to make sure it either runs on the set of workers for cpu-bound tasks ("CPU=1") or on the set of workers for network-bound tasks ("NET=1"). This separation was made to avoid overloading up/download servers and restrict up/download tasks to a certain max, while still being able to scale the other tasks. However, according to http://distributed.readthedocs.io/en/latest/work-stealing.html, task stealing will not happen in these cases? Is there a way to allow task stealing while keeping the resource restrictions?
Update 2: I see there is an open issue for that: https://github.com/dask/distributed/issues/1389. Are there plans to implement this?
While Dask prefers to schedule work to reduce communication it also acknowledges that this isn't always best. Generally Dask runs a task on the machine where it will finish first, taking into account both communication costs and existing task backlogs on overloaded workers.
For more information on load balancing you might consider reading this documentation page:
http://distributed.readthedocs.io/en/latest/work-stealing.html
Are there substantial performance implications that I should keep in mind when Celery workers are pulling from multiple (or perhaps many) queues? For example, would there be a significant performance penalty if my system were designed so that workers pulled from 10 to 15 queues rather than just 1 or 2? As a follow-up, what if some of those queues are sometimes empty?
The short answer to your question on queue limits is:
Don't worry having multiple queues will not be worse or better, broker are designed to handle huge numbers of them. Off course in a lot of use cases you don't need so many, except really advanced one. Empty queues don't create any problem, they just take a tiny amount of memory on the broker.
Don't forget also that you have other things like exchanges and bindings, also there you don't have real limits but is better you understand the performance implication of each of them before using it (a TOPIC exchange will use more CPU than a direct one for example)
To give you a more complete answer let's look at the performance topic from a more generic point of view.
When looking at a distributed system based on message passing like Celery there are 2 main topics to analyze from the point of view of performance:
The workers number and concurrency factor.
As you probably already know each celery worker has a concurrency parameter that sets how many tasks can be executed at the same time, this should be set in relation with the server capacity(CPU, RAM, I/O) and off course also based on the type of tasks that the specific consumer will execute (depends on the queue that it will consume).
Off course depending on the total number of tasks you need to execute in a certain time window you will need to decide how many workers/servers you will need to have up and running.
The broker, the Single point of Failure in this architecture style.
The broker, especially RabbitMQ, is designed to manage millions of messages without any problem, however more messages it will need to store more memory will use and more are the messages to route more CPU it will use.
This machine should be well tuned too and if possible be in high availability.
Off course the main thing to avoid is the messages are consumed at a lower rate than they are produced otherwise your queue will keep growing and your RabbitMQ will explode. Here you can find some hints.
There are cases where you may also need to increase the number of tasks executed in a certain time frame but on only in response to peaks of requests. The nice thing about this architecture is that you can monitor the size of the queues and when you understand is growing to fast you could create new machines on the fly with a celery worker already configured and than turn it off when they are not needed. This is a quite cost saving and efficient approach.
One hint, remember to don't store celery tasks results in RabbitMQ.
I have some independent tasks which I am currently putting into different/independent workers.
To be understood easily I will walk you through an example. Let's say I have three independent tasks namely sleep, eat, smile. A task may need to work under different celery configurations. So, I think, it is better to separate each of these tasks into different directories with different workers. Some tasks may be required to work on different servers.
I am planning add some more tasks in the future and each of them will be implemented by different developers.
Providing these conditions, there are more than one workers associated to each individual task.
Now, here is the problem and my question.
When I start three smile tasks, one of these will be fetched by smile's worker and carried out. But the next task will be fetched from eat's worker and never will be carried out.
So, what is the accepted, most common pattern? Should I send each tasks into different queues and workers should listen its own queue?
The answer depends on couple of things that should be taken in consideration:
Does order of commands should be preserved ?
If so the best approach is placing some sort of command pattern as serialized message so each fetched/consumed message can be executed in it's order in single place in your application.
If it's not an issue for you - you can play with topic exchange while publishing different message types in single exchange, and having different workers receiving the messages by predefined pattern. This by the way will let you easily add another task lets say "drink" without changing a line in already existing transportation topology/already existing workers.
Are you planning scaling queues among different machines to increase throughput ?
In case you have very intense traffic of tasks (in terms of frequency) it may be worth creating different queue for each task type so latter when you grow you can place each one on different node in rabbit cluster.
In a similar setup, I decided to go with specific queues for different tasks, and then I can decide which worker listens on which queue (which can also be changed dynamically !).
I'm working on a web application that will receive a request from a user and have to hit a number of external APIs to compose the answer to that request. This could be done directly from the main web thread using something like gevent to fan out the request.
Alternatively, I was thinking, I could put incoming requests into a queue and use workers to distribute the load. The idea would be to try to keep it real time, while splitting up the requests amongst several workers. Each of these workers would be querying only one of the many external APIs. The response they receive would then go through a series transformations, be saved into a DB, be transformed to a common schema and saved in a common DB to finally be composed into one big response that would be returned through the web request. The web request is most likely going to be blocking all this time, with a user waiting, so keeping
the queueing and dequeueing as fast as possible is important.
The external API calls can easily be turned into individual tasks. I think the linking
from one api task to a transformation to a DB saving task could be done using a chain, etc, and the final result combining all results returned to the web thread using a chord.
Some questions:
Can this (and should this) be done using celery?
I'm using django. Should I try to use django-celery over plain celery?
Each one of those tasks might spawn off other tasks - such as logging what just
happened or other types of branching off. Is this possible?
Could tasks be returning the data they get - i.e. potentially Kb of data through celery (redis as underlying in this case) or should they write to the DB, and just pass pointers to that data around?
Each task is mostly I/O bound, and was initially just going to use gevent from the web thread to fan out the requests and skip the whole queuing design, but it turns out that it would be reused for a different component. Trying to keep the whole round trip through the Qs real time will probably require many workers making sure the queueus are mostly empty. Or is it? Would running the gevent worker pool help with this?
Do I have to write gevent specific tasks or will using the gevent pool deal with network IO automagically?
Is it possible to assign priority to certain tasks?
What about keeping them in order?
Should I skip celery and just use kombu?
It seems like celery is geared more towards "tasks" that can be deferred and are
not time sensitive. Am I nuts for trying to keep this real time?
What other technologies should I look at?
Update: Trying to hash this out a bit more. I did some reading on Kombu and it seems to be able to do what I'm thinking of, although at a much lower level than celery. Here is a diagram of what I had in mind.
What seems to be possible with raw queues as accessible with Kombu is the ability for a number of workers to subscribe to a broadcast message. The type and number does not need to be known by the publisher if using a queue. Can something similar be achieved using Celery? It seems like if you want to make a chord, you need to know at runtime what tasks are going to be involved in the chord, whereas in this scenario you can simply add listeners to the broadcast, and simply make sure they announce they are in the running to add responses to the final queue.
Update 2: I see there is the ability to broadcast Can you combine this with a chord? In general, can you combine celery with raw kombu? This is starting to sound like a question about smoothies.
I will try to answer as many of the questions as possible.
Can this (and should this) be done using celery?
Yes you can
I'm using django. Should I try to use django-celery over plain celery?
Django has a good support for celery and would make the life much easier during development
Each one of those tasks might spawn off other tasks - such as logging
what just happened or other types of branching off. Is this possible?
You can start subtasks from withing a task with ignore_result = true for only side effects
Could tasks be returning the data they get - i.e. potentially Kb of
data through celery (redis as underlying in this case) or should they
write to the DB, and just pass pointers to that data around?
I would suggest putting the results in db and then passing id around would make your broker and workers happy. Less data transfer/pickling etc.
Each task is mostly I/O bound, and was initially just going to use
gevent from the web thread to fan out the requests and skip the whole
queuing design, but it turns out that it would be reused for a
different component. Trying to keep the whole round trip through the
Qs real time will probably require many workers making sure the
queueus are mostly empty. Or is it? Would running the gevent worker
pool help with this?
Since the process is io bound then gevent will definitely help here. However, how much the concurrency should be for gevent pool'd worker, is something that I'm looking for answer too.
Do I have to write gevent specific tasks or will using the gevent pool
deal with network IO automagically?
Gevent does the monkey patching automatically when you use it in pool. But the libraries that you use should play well with gevent. Otherwise, if your parsing some data with simplejson (which is written in c) then that would block other gevent greenlets.
Is it possible to assign priority to certain tasks?
You cannot assign specific priorities to certain tasks, but route them to different queue and then have those queues being listened to by varying number of workers. The more the workers for a particular queue, the higher would be the priority of that tasks on that queue.
What about keeping them in order?
Chain is one way to maintain order. Chord is a good way to summarize. Celery takes care of it, so you dont have to worry about it. Even when using gevent pool, it would at the end be possible to reason about the order of the tasks execution.
Should I skip celery and just use kombu?
You can, if your use case will not change to something more complex over time and also if you are willing to manage your processes through celeryd + supervisord by yourself. Also, if you don't care about the task monitoring that comes with tools such as celerymon, flower, etc.
It seems like celery is geared more towards "tasks" that can be
deferred and are not time sensitive.
Celery supports scheduled tasks as well. If that is what you meant by that statement.
Am I nuts for trying to keep this real time?
I don't think so. As long as your consumers are fast enough, it will be as good as real time.
What other technologies should I look at?
Pertaining to celery, you should choose result store wisely. My suggestion would be to use cassandra. It is good for realtime data (both write and query wise). You can also use redis or mongodb. They come with their own set of problems as result store. But then a little tweaking in configuration can go a long way.
If you mean something completely different from celery, then you can look into asyncio (python3.5) and zeromq for achieving the same. I can't comment more on that though.
I'm using Celery with Redis as broker and I can see that the queue is actually a redis list with the serialized task as the items.
My question is, if I have an AsyncResult object as a result of calling <task>.delay(), is there a way to determine the item's position in the queue?
UPDATE:
I'm finally able to get the position using:
from celery.task.control import inspect
i = inspect()
i.reserved()
but its a bit slow since it needs to communicate with all the workers.
The inspect.reserved()/scheduled() you mention may work, but not
always accurate since it only take into account the tasks
that the workers have prefetched.
Celery does not allow out of band operations on the queue, like removing messages
from the queue, or reordering them, because it will not scale in a distributed system.
The messages may not have reached the queue yet, which can result
in race conditions and in practice it is not a sequential queue with transactional
operations, but a stream of messages originating from several locations.
That is, the Celery API is based around strict message passing semantics.
It is possible to access the queue directly on some of the brokers
Celery supports (like Redis or Database), but that is not part of the public API,
and you are discouraged from doing so, but of course if you are not planning on
supporting operations at scale you should do whatever is the most convenient for you
and discard my advice.
If this is just to give the user some idea when his job will be completed, then
I'm sure you could come up with an algorithm to predict when the task will be executed,
if you just had the length of the queue and the time at which each task was inserted.
The first is just a redis.len("celery"), and the latter you could
add yourself by listening to the task_sent signal:
from celery.signals import task_sent
#task_sent.connect
def record_insertion_time(id, **kwargs):
redis.zadd("celery.insertion_times", id)
Using a sorted set here: http://redis.io/commands/zadd
For a pure message passing solution you could use a dedicated monitor
that consumes the Celery event stream and predicts when tasks will finish.
http://docs.celeryproject.org/en/latest/userguide/monitoring.html#event-reference
(just noticed that the task-sent is missing the timestamp field in
the documentation, but a timestamp is sent with that event so I will fix it).
The events also contain a "clock" field which is a logical clock
(see http://en.wikipedia.org/wiki/Lamport_timestamps),
this can be used to detect the order of events in a distributed
system without depending on the system time on each machine
to be in sync (which is ~impossible to achieve).