Celery worker stops after being idle for a few hours

Celery worker stops after being idle for a few hours - python

I have a Flask app that uses WSGI. For a few tasks I'm planning to use Celery with RabbitMQ. But as the title says, I am facing an issue where the Celery tasks run for a few minutes and then after a long time of inactivity it just dies off.
Celery config:
CELERY_BROKER_URL='amqp://guest:guest#localhost:5672//'
BROKER_HEARTBEAT = 10
BROKER_HEARTBEAT_CHECKRATE = 2.0
BROKER_POOL_LIMIT = None
From this question, I added BROKER_HEARTBEAT and BROKER_HEARTBEAT_CHECKRATE.
I run the worker inside the venv with celery -A acmeapp.celery worker & to run it in the background. And while checking the status, for the first few minutes, it shows that one node is online and gives an OK response. But after a few hours of the app being idle, when I check the Celery status, it shows Error: No nodes replied within time constraint..
I am new to Celery and I don't know what to do now.

Your Celery worker might be trying to reconnect to the app until it reaches the retry limit. If that is the case, setting up this options in your config file will fix that problem.
BROKER_CONNECTION_RETRY = True
BROKER_CONNECTION_MAX_RETRIES = 0
The first line will make it retry whenever it fails, and the second one will disable the retry limit.
If that solution does not suit you enough, you can also try a high timeout (specified in seconds) for your app using this option:
BROKER_CONNECTION_TIMEOUT = 120
Hope it helps!

Related

PreconditionFailed with Celery and RabbitMQ

I am experiencing an issue between Celery:5.2.7 and RabbitMQ:3.11 (through Docker). My program is periodically sending tasks to Celery, like below:
from tasks import getElements
def collectElements(elements):
for el in elements:
getElements.apply_async(queue="queue_getelements",kwargs={"elDict": el.__dict__})
collectElements(elements)
time.sleep(600)
while(True):
collectElements(elements)
time.sleep(120)
Strangely, the queue queue_getelements freezes after the first launch of collectElements(elements) (After 5 minutes/600 seconds), and the message appears after 30 minutes (Which is the default consumer_timeout time):
[2022-10-05 02:45:32,706: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', (0, 0), '')
I have tried to change the default consumer_timeout time in the configuration by increasing the timer (Like seen as a solution here and here), but the freezes still happens before the first loop of my program. Celery seems to receive the tasks only the first time in the queue, and freezes afterward. If I relaunch the queue after stopped it, it receives again the tasks that are waiting on RabbitMQ:
celery -A tasks worker --loglevel=info -E -Q queue_getelements -c 4
Does anyone experienced this issue before? Any help would be appreciated, thank you in advance!

Heroku with Django, Celery and CloudAMPQ - timeout error

I am building an online shop, following Chapter 7 in the book "Django 3 by Example." The book was written by Antonio Melé.
Everything works fine in my local machine. It also works well when I deploy it to Heroku.
However, when I try to use Celery and RabbitMQ (CloudAMQP, Little Lemur, on Heroku), the email message the worker was supposed to send to the customer is not sent. The task takes more then 30 seconds and then it crashes:
heroku[router]: at=error code=H12 desc="Request timeout" method=POST
I have created a tasks.py file with the task of sending emails. My settings.py file include the following lines for Celery:
broker_url = os.environ.get('CLOUDAMQP_URL')
broker_pool_limit = 1
broker_heartbeat = None
broker_connection_timeout = 30
result_backend = None
event_queue_expires = 60
worker_prefetch_multiplier = 1
worker_concurrency = 50
This was taken from https://www.cloudamqp.com/docs/celery.html
And my Procfile is as follows,
web: gunicorn shop.wsgi --log-file -
worker: celery worker --app=tasks.app
Am I missing something?
Thanks!

So fairly familiar with heroku, though not your tech stack. So the general approach to deal with heroku timeout is this:
First, determine exactly what is causing the timeout. One or more things are taking a lot of time.
Now you have 3 main options.
Heroku Scheduler (or one of several other similar addons). Very useful if you can run a script of some sort via a terminal command, and 10 minutes/1 hour/24 hour checks to see if the script needs to be run is good enough for you. I typically find this to be the most straightforward solution, but it's not always an acceptable one. Depending on what you are emailing, an email being delayed 5-15 minutes might be acceptable.
Background process worker. Looks like this is what you are trying to do with Celery, but it's not configured right, probably can't help much on that.
Optimize. The reason heroku sets a 30 second timeout is because generally speaking there really isn't a good reason for a user to wait 30 seconds for a response. I'm scratching my head as to why sending an email would take more than 30 seconds, unless you need to send a few hundred of them or the email is very, very large. Alternatively, you might be doing a lot of work before you send the email, though that raises the question fo why not do that work seperately from the send email command. I suspect you should probably look into the why of this before you try to get a background process worker setup.

After several days trying to solve this issue, I contacted the support department at CLOUDAMQP
They helped me figure out that the problem was related to Celery not identifying my BROKER_URL properly.
Then I came across this nice comment by #jainal09 here. There was an extra variable that should be set in settings.py:
CELERY_BROKER_URL = '<broker address given by Heroku config>'
Adding that extra line solved the problem. Now Heroku is able to send the email correctly.

Not able to see the output messages of the tasks performed in celery-rabbitMQ combination

I am trying out celery and rabbitMQ combinations for asynchronous task scheduling, below is the sample program I tried on Pycharm IDE....Everything seems to be working but I am not able to see the return value from the tasks.
I am also monitoring the rabbitMQ management console , but still not able to see the return value from the tasks.
I dont understand where I am going wrong,this is my first attempt at celery and RabbitMQ
I have created a tasks.py file with 2 sample tasks(with proper decorators assigned) and returning a value for each task.
I have also started teh RabbitMQ server (using {rabbitmq-server start} command).
Then I have started the celery worker, command used : {celery -A tasks --loglevel=info}
Now, when I am trying to execute these tasks using delay() method, the command ({reverse.delay('andy')}) is running and I am getting , something like this, but I am not able to see the returned value.
from celery import Celery
app = Celery('tasks', broker= 'amqp://localhost//', backend='rpc://')
#app.task
def reverse(string):
return string[::-1]
#app.task
def add(x, y):
return x + y

Well, I have figured out the issue... It seams the latest versions of celery don't go well with windows. To fix this issue, I have installed 'eventlet' package and it takes care of the rest.
One thing to note is that, we need to start the celery worker using eventlet support, PFB the command:
celery -A <module_name> worker -loglevel=info -P eventlet

You can check the return value from task by using the result property. The syntax below is in python3 with type hinting so that you can follow along with what is going on.
import time
from celery.result import AsyncResult
result: AsyncResult = reverse.delay('andy')
task_id: str = result.id
while not result.ready():
time.sleep(1)
result = AsyncResult(task_id)
if result.successful():
return_value = result.result
n.b., the above is a naive example because in a production environment, you typically don't busy wait like we did here. Instead the client issues periodic polls checking whether the status is ready and, if not, returns control to some other portion of the code (like a UI with a spinner).

How to schedule a single-time event in Django in Heroku?

I have a question about the software design necessary to schedule an event that is going to be triggered once in the future in Heroku's distributed environment.
I believe it's better to write what I want to achieve, but I have certainly done my research and could not figure it out myself even after two hours of work.
Let's say in my views.py I have a function:
def after_6_hours():
print('6 hours passed.')
def create_game():
print('Game created')
# of course time will be error, but that's just an example
scheduler.do(after_6_hours, time=now + 6)
so what I want to achieve is to be able to run after_6_hours function exactly 6 hours after create_game has been invoked. Now, as you can see, this function is defined out of the usual clock.py or task.py or etc etc files.
Now, how can I have my whole application running in Heroku, all the time, and be able to add this job into the queue of this imaginary-for-now-scheduler library?
On a side note, I can't use Temporizer add-on of Heroku. The combination of APScheduler and Python rq looked promising, but examples are trivial, all scheduled on the same file within clock.py, and I just simply don't know how to tie everything together with the setup I have. Thanks in advance!

In Heroku you can have your Django application running in a Web Dyno, which will be responsible to serve your application and also to schedule the tasks.
For example (Please note that I did not test run the code):
Create after_hours.py, which will have the function you are going to schedule (note that we are going to use the same source code in worker too).
def after_6_hours():
print('6 hours passed.')
in your views.py using rq (note that rq alone is not enough in your situation as you have to schedule the task) and rq-scheduler:
from redis import Redis
from rq_scheduler import Scheduler
from datetime import timedelta
from after_hours import after_6_hours
def create_game():
print('Game created')
scheduler = Scheduler(connection=Redis()) # Get a scheduler for the "default" queue
scheduler.enqueue_in(timedelta(hours=6), after_6_hours) #schedules the job to run 6 hours later.
Calling create_game() should schedule after_6_hours() to run 6 hours later.
Hint: You can provision Redis in Heroku using Redis To Go add-on.
Next step is to run rqscheduler tool, which polls Redis every minute to see if there is any job to be executed at that time and places it in the queue(to which rq workers will be listening to).
Now, in a Worker Dyno create a file after_hours.py
def after_6_hours():
print('6 hours passed.')
#Better return something
And create another file worker.py:
import os
import redis
from rq import Worker, Queue, Connection
from after_hours import after_6_hours
listen = ['high', 'default', 'low'] # while scheduling the task in views.py we sent it to default
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
and run this worker.py
python worker.py
That should run the scheduled task(afer_6_hours in this case) in Worker Dyno.
Please note that the key here is to make the same source code (after_hours.py in this case) available to worker too. The same is emphasized in rq docs
Make sure that the worker and the work generator share exactly the
same source code.
If it helps, there is a hint in the docs to deal with different code bases.
For cases where the web process doesn't have access to the source code
running in the worker (i.e. code base X invokes a delayed function
from code base Y), you can pass the function as a string reference,
too.
q = Queue('low', connection=redis_conn)
q.enqueue('my_package.my_module.my_func', 3, 4)
Hopefully rq-scheduler too respects this way of passing string instead of function object.
You can use any module/scheduling tool (Celery/RabbitMQ, APScheduler etc) as long as you understand this thing.

subprocess in views.py does not work

I need a function that starts several beanstalk workers before starting to record some videos with different cameras. All of these work in beanstalk. As I need to start the workers before the video record, I want to do a subprocess, but this does not work. The most curious thing is that if I run the subprocess alone in a different python script outside of this function (in the shell), this works! This is my code (the one which is not working):
os.chdir(path_to_the_manage.py)
subprocess.call("python manage.py beanstalk_worker -w 4",shell=True)
phase = get_object_or_404(Phase, pk=int(phase_id))
cameras = Video.objects.filter(phase=phase)
###########################################################################
## BEANSTALK
###########################################################################
num_workers = 4
time_to_run = 86400
[...]
for camera in cameras:
arg = phase_id +' '+settings.PATH_ORIGIN_VIDEOS +' '+camera.name
beanstalk_client.call('video.startvlcserver', arg=arg, ttr=time_to_run)
I want to include the subprocess because it's annoying to me if I have to run the beanstalk workers on each video record I wanna do.
Thanks in advance.

I am not quite sure that subproccess.call is what you are looking for. I believe the issue is subproccess call is syncronous. It doesn't spawn a new process but calls the command within the context of the web request. This ties up resources and if the request times out or the user cancels, weird things could happen?
I have never used beanstalkd, but with celery (another job queue) the celeryd worker process is always running, waiting for jobs. This makes it easy to manage using supervisord. If you look at beanstalkd deployment, I wouldn't be suprised if they recommend doing the same thing. This should include starting your beanstalk workers outside of the context of a view.
From the command line
python manage.py beanstalk_worker -w 4
Once your beanstalkd workers are set up and running, you can send jobs to the queue using an async beanstalk api call, from your view
https://groups.google.com/forum/#!topic/django-users/Vyho8TFew2I

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery worker stops after being idle for a few hours - python

Related

PreconditionFailed with Celery and RabbitMQ

Heroku with Django, Celery and CloudAMPQ - timeout error

Not able to see the output messages of the tasks performed in celery-rabbitMQ combination

How to schedule a single-time event in Django in Heroku?

subprocess in views.py does not work

Categories

Resources