I'm confused about Task execution using queues. I've read the documentation and I thought I understood bucket_size and rate, but when I send 20 Tasks to a queue set to 5/h, size 5, all 20 Tasks execute one after the other as quickly as possible, finishing in less than 1 minute.
deferred.defer(spam.cookEggs,
egg_keys,
_queue="tortoise")
- name: tortoise
rate: 5/h
bucket_size: 5
What I want is whether I create 10 or 100 Tasks, I only want 5 of them to run per hour. So it would take 20 Tasks approximately 4 hours to complete. I want their execution spread out.
UPDATE
The problem was I assumed that when running locally, that Task execution rate rules were followed, but that is not the case. You cannot test execution rates locally. When I deployed to production, the rate and bucket size I had set executed as I expected.
Execution rates are not honored by the app_devserver. This issue should not occur in production.
[Answer discovered by Nick Johnson and/or question author; posting here as community wiki so we have something that can get marked accepted]
You want to set bucket_size to 1, or else you'll have "bursts" of queued activity like you saw there.
From the documentation:
bucket_size
Limits the burstiness of the queue's
processing, i.e. a higher bucket size
allows bigger spikes in the queue's
execution rate. For example, consider
a queue with a rate of 5/s and a
bucket size of 10. If that queue has
been inactive for some time (allowing
its "token bucket" to fill up), and 20
tasks are suddenly enqueued, it will
be allowed to execute 10 tasks
immediately. But in the following
second, only 5 more tasks will be able
to be executed because the token
bucket has been depleted and is
refilling at the specified rate of
5/s.
Related
I want to run 1000 tasks in parallel. These are short running batch jobs that use the same taskdef (hence same container) with just the args being passed different (basically the arg passed is the value from 0 thru 999).
I used airflow to call the ECSOperator in a loop just as it is explained here:
https://headspring.com/2020/06/17/airflow-parallel-tasks/.
When I look at the 'Tasks' tab for my ECS cluster in AWS, I see the tasks queued up with a mix of PROVISIONING, PENDING and RUNNING.
The RUNNING jobs are just a handful - most of the tasks are in PENDING state which eventually go into RUNNING state.
Questions:
Why are most jobs in PENDING state ? what are they waiting for (like a limit on RUNNING jobs?) ? How can I check what it is doing during this PENDING state ?
Why are the RUNNING jobs just a handful ? How can I make most, if not all, tasks go to RUNNING state simultaneously ? Is there some limit on how many jobs can run simultaneously while using Fargate ?
The Services tab is empty - I have not configured any Services - isn't this meant only for long running jobs/daemons or can batch jobs like mine take advantage of it too (and reach the goal of getting all the 1000 tasks run at the same time) ?
I have not setup anything in the 'Capacity Providers' tab. Will that help in getting getting more tasks to run in parallel ?
I am not clear on the concept of autoscaling here - isn't Fargate supposed to provision the 1000 CPUs if need be so that all those tasks can run in parallel ? Is there a default limit and if so, how do I control it ?
So much to unpack.
1-2: there is a TPS (tasks per second) provisioning throughput to be considered. We (AWS) are in the process of documenting better these limits (which we don't do today) but for 1000 tasks consider that it can be expected to take "a few minutes" to have ALL of them in RUNNING state. If you see them taking "hours" to get to RUNNING state then that's not normal. Also note that each account/region has a default concurrent task limit of 1000 (which is not to be confused with the throughput with which you can scale to 1000 concurrently running tasks).
3: No. As you said that's just for control loop so that you can say I always want to run n tasks (or DAEMONS) and ECS will do that. You are essentially using an external control loop (Airflow) that manages the task. This won't have any influence on the throughput.
4: No (or at least I don't think so). You may try if Airflow supports launching tasks using CPs instead of the traditional "launch type" mode.
5: the autoscaler (in the context of Fargate) is pretty much an ECS Service construct (see point #3). There you basically say "I want to run between n and m tasks and scale-in/out based on these metrics". And ECS/Autoscaling will make the task count fluctuate based on that. As I said you are doing all this externally launching tasks individually. If Airflow says "launch 1000 tasks" there is no autoscaling... just a rush to go from 0 to 1000 (see #1 and #2).
I am trying to limit the rate of one celery task. Here is how I am doing it:
from project.celery import app
app.control.rate_limit('task_a', '10/m')
It is working well. However, there is a catch. Other tasks that this worker is responsible for are being blocked as well.
Let's say, 100 of task_a have been scheduled. As it is rate-limited, it will take 10 minutes to execute all of them. During this time, task_b has been scheduled as well. It will not be executed until task_a is done.
Is it possible to not block task_b?
By the looks of it, this is just how it works. I just didn't get that impression after reading the documentation.
Other options include:
Separate worker and queue only for this task
Adding an eta to the task task_a so that all of it are scheduled to run during the night
What is the best practice in such cases?
This should be part of a task declaration to work on per-task basis. The way you are doing it via control probably why it has this side-effect on other tasks
#task(rate_limit='10/m')
def task_a():
...
After more reading
Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.
You probably will have to do this in separate queue
The easiest (no coding required) way is separating the task into its own queue and running a dedicated worker just for this purpose.
There's no shame in that, it is totally fine to have many Celery queues and workers, each dedicated just for a specific type of work. As an added bonus you may get some more control over the execution, you can easily turn workers ON/OFF to pause certain processes if needed, etc.
On the other hand, having lots of specialized workers idle most of the time (waiting for a specific job to be queued) is not particularly memory-efficient.
Thus, in case you need to rate limit more tasks and expect the specific workers to be idle most of the time, you may consider increasing the efficiency and implement a Token Bucket. With that all your workers can be generic-purpose and you can scale them naturally as your overall load increases, knowing that the work distribution will not be crippled by a single task's rate limit anymore.
I have question about CELERYD_CONCURRENCY and CELERYD_PREFETCH_MULTIPLIER
Because my english is not well to understand the official site description,
I want to make sure it
I set CELERYD_CONCURRENCY=40
I think it will use 40 workers to do things
But I usually see INFO/MainProcess ,seldom see INFO/Worker-n
Is it because the task is fast,so it didn't have to assign to worker??
Here is a task architecture :
I have a period_task is celery period_task , and mail_it is normal celery task
#shared_task
def period_task():
do_something()
....
for mail in mail_list:
mail_it.delay(mail)
And the second question is CELERYD_PREFETCH_MULTIPLIER ,the default value is 4
Is it means that each worker can get 4 tasks from queue one time ??? So I have 40 worker,I can get 40*4 task????
My understanding:
CELERYD_CONCURRENCY:
This is the number of THREADS/GREENTHREADS a given worker will have. Celery calls these "processes". This is the number of tasks a single worker can execute in parallel. I believe Celery creates this numbe PLUS ONE internally, and that the additional 1 is for actually managing/assigning to the others (in your case, 40 of them!). In my experience, you likely don't need/want 40 (closer to 1 or 2 per CPU), but your mileage may vary.
CELERYD_PREFETCH_MULTIPLIER:
Prefetch is how many tasks are reserved per "process" according to the docs. It's a bit like a mini-queue just for that specific thread. This would indeed mean that your ONE started worker would potentially 'reserve' 40 * 4 tasks to do. Keep in mind that these reserved tasks cannot be "stolen" or sent to another worker or thread, so if they are long running you may wish to disable this feature to allow faster stations to pickup the slack of slower ones.
If this isn't clear in your current setup, I might suggest adding a sleep() to your task to be able to observe it.
Say, there are 40 different users of a mobile app calling the server, which delivers some content created using FFMPEG.
It takes about 5 seconds to create the content for each user.
I was just wondering if FFMPEG would process, the commands simultaneously, or if it would be done in a queue.
Basically would it take approximately 5 seconds for everyone, or would it take 5 seconds - 200 seconds for each person, depending on their queue positioning.
Also, if it would be done via queueing, how would it be possible to change the task to become simultaneous because I don't want my users to wait for a long time.
Depends on how many worker processes you have.
Since you added the Heroku tag, I'm assuming you're using Heroku. On Heroku, one dyno is one such worker process.
Routing is more or less random on Heroku, but provided you have a large number of users (40 is probably not enough though), you should be able to serve as many users as you have dynos simultaneously.
Azure, Amazon and other instance based cloud providers can be used to carry out website load tests (by spinning up numerous instances running programs that send requests to a set of URLs) and I was wondering if I would be able to do this with Google App Engine.
So far, however it seems this is not the case. The only implementation I can think of at the moment is setting up the maximum number of cron jobs each executing at the highest frequency, each task requesting a bunch of URLs and at the same time popping in further tasks in the task queue.
According to my calculations this is only enough to fire off a maximum of 25 concurrent requests (as an application can have maximum 20 cron tasks each executing no more frequent than once a minute and the default queue has a throughput rate of 5 task invocations per second.
Any ideas if there is a way I could have more concurrent requests fetching URLs in an automated way?
The taskqueue API allows 100 task invocations per second per queue with the following max active queues quota:
Free: 10 active queues (not including the default queue)
Billing: 100 active queues (not including the default queue)
With a single UrlFetch per task, multiplying [max number of active queues] * [max number of tasks invocation per second] * [60 seconds] you can reach these nominal Urlfetch calls rate:
Free:
11 * 100 * 60 = 66000 Urlfetch calls/minute
Billing:
101 * 100 * 60 = 606000 Urlfetch calls/minute
These rates are limited by the number of allowed UrlFetch per minute quota:
Free:
3,000 calls/minute
Billing: 32,000 calls/minute
As you can see, Taskqueue + Urlfetch APIs can be used effectively to suit your load testing need.
Load testing against a public url may not be as accurate as getting boxes attached directly to the same switch as your target server. There are so many uncontrollable network effects.
Depending on your exact circumstances I would recommend borrowing a few desktop boxes for the purpose and using them. Any half decent machine should be able to generate a 2-3 thousand calls a minute.
That said, it really depends on the target scale you wish to achieve.