Dealing with django + celery when it goes offline - python

I am trying to figure out how to deal with celery in my django project when it goes offline for some reason.
i found in the documentation the worker-offline event, so i am guessing i can somehow catch this event when celery goes offline and email myself informing my celery worker is down.
My question is how do i implement this behaviour? Are there any examples or a Django app? Is this how i am supposed to deal with these situations?

In my production setup, I use supervisor to daemonize and control celery workers. You can a have script running on the system monitoring the state of the supervisor controlled states (fire an alert anytime the state of the process enters FATAL state).
Assuming your workers are continuously spitting out text in a log file, you can setup a log freshness check. If you detect the log has not been updated for X seconds/minute, you fire an alert.
There's also Celery Flower, a system designed for remote monitoring and managing Celery remotely. I have not used it in production, so I cannot tell whether it meets your specific needs.

You can process events using real-time processing, write a consumer and announce when event come in, a example can be found in Monitoring and Management Guide, replace the task-failed event with worker-offline event and write your own handler to deal with the event your captured.

I ended up creating this Django Reusable App with a REST API to monitor my celery workers from an external service or machine:
https://github.com/psychok7/django-celery-inspect

Related

Periodic and Non periodic tasks with Django + Telegram + Celery

I am building a project based on Django and one of my intentions is to have a telegram bot which is receiving information from a Telegram group. I was able to implement the bot to send messages in Telegram, no issues.
In this moment I have a couple of Celery tasks which are running with Beat and also the Django web, which are decopled. All good here.
I have seen that the python-telegram-bot is running a function in one of the examples (https://github.com/python-telegram-bot/python-telegram-bot/blob/master/examples/echobot.py) which is waiting idle to receive data from Telegram. Now, all my tasks in Celery are in this moment periodic and are called each 10 or 60 minutes by Beat.
How can I run this non-periodic task with Celery in my configuration? I am saying non-periodic because I understood that it will wait for content until it is manually interrupted.
Django~=3.2.6
celery~=5.1.2
CELERY_BEAT_SCHEDULE = {
'task_1': {
'task': 'apps.envc.tasks.Fetch1',
'schedule': 600.0,
},
'task_2': {
'task': 'apps.envc.tasks.Fetch2',
'schedule': crontab(minute='*/60'),
},
'task_3': {
'task': 'apps.envc.tasks.Analyze',
'schedule': 600,
},
}
In my tasks.py I have one of the tasks like this:
#celery_app.task(name='apps.envc.tasks.TelegramBot')
def TelegramBot():
status = start_bot()
return status
And as the start_bot implemenation, I simply copied the echobot.py example and I have added my TOKEN there (of course the functions for different commands from the example are also there).
Set up a webhook instead of polling with Celery
With Django, you shouldn't be using Celery to run Telegram polling (what you call PTB's “non-periodic task”, which is better described as a long-running process or service). Celery is designed for definite tasks, not indefinitely-running processes.
As Django implies that you're already running a web server, then the webhook option is a better fit. (Remember that you can either do polling or set up a webhook in order to receive updates from Telegram's servers.) The option that #CallMeStag suggested, of using a non-threading webhook setup, makes the most sense for Django-PTB integration.
You can do the bot setup (defining and registering your handler functions on a Dispatcher instance) in a separate module; to avoid threading, you should pass update_queue=None, workers=0 to your Dispatcher instantiation. And then, use it in a Django view, like this:
import json
from django.views.decorators.csrf import csrf_exempt
from telegram import Update
from .telegram_init import telegram_bot, telegram_dispatcher
...
#csrf_exempt
def telegram_webhook(request):
data = json.loads(request.body)
update = Update.de_json(data, telegram_bot)
telegram_dispatcher.process_update(update)
return JsonResponse({})
where telegram_bot is the Bot instance that I use for instantiating telegram_dispatcher. (I left out error handling in this snippet.)
Why avoid threading? Threads in the more general sense are not forbidden in Django, but in the context of PTB, threading usually means running bot updaters or dispatchers in a long-running thread that share an update/message queue, and that's a complication that doesn't look nice nor play well with, for example, a typical Django deployment that uses multiple Gunicorn workers in separate processes. There is, however, a motivation for using multithreading (multiple processes, actually, using Celery) in Django-PTB integration; see below.
Development environment caveat
The above setup is what you'd want to use for a basic production system. But during dev, unless your dev machine is internet-facing with a fixed IP, you probably can't use a webhook, so you'd still want to do polling. One way to do this is by creating a custom Django management command:
<my_app>/management/commands/polltelegram.py:
from django.core.management.base import BaseCommand
from my_django_project.telegram_init import telegram_updater
class Command(BaseCommand):
help = 'Run Telegram bot polling.'
def handle(self, *args, **options):
updater.start_polling()
self.stdout.write(
'Telegram bot polling started. '
'Press CTRL-BREAK to terminate.'
)
updater.idle()
self.stdout.write('Polling stopped.')
And then, during dev, run python manage.py polltelegram to fetch and process Telegram updates. (Run this along with python manage.py runserver to be able to use the main Django app simultaneously; the polling runs in a separate process with this setup, not just a separate thread.)
When Celery makes sense
Celery does have a role to play if you're integrating PTB with Django, and this is when reliability is a concern. For instance, when you want to be able to retry sending replies in case of transient network issues. Another potential issue is that the non-threading webhook setup detailed above can, in a high-traffic scenario, run into flood/rate limits. PTB's current solution for this, MessageQueue, uses threading, and while it can work, it can introduce other problems, for example interference with Django's autoreload function when running runserver during dev.
A more elegant and reliable solution is to use Celery to run the message sending function of PTB. This allows for retries and rate limiting for better reliability.
Briefly described, this integration can still use the non-threading webhook setup above, but you have to isolate the Bot.send_message() function into a Celery task, and then make sure that all handlers call this Celery task asynchronously instead of using the bot to run send_message() in the webhook process 'eagerly'.
In PTB, Updater.start_polling/webhook() starts a background thread that waits for incoming updates. Updater.idle() blocks the main thread and when receiving a stop signal, it ends the background thread mentioned above.
I'm not familiar with Celery and only know the basics of Django, but I see a few options here that I'd like to point out.
You can run the PTB-related code in a standalone thread, i.e. a thread that calls Updater.start_polling and Updater.idle. To end that thread on shutdown, you'll have to forward the stop signal to that thread
Vice versa, you can run PTB in the main thread and the Django & Celeray related tasks in a standalone thread
You don't have to use Updater. Since you're using Django anyway, you could switch to a webhook-based solution for receiving updates, where Django serves as webhook for you. You can even eliminate threading for PTB completely by calling Dispatcher.process_update manually. Please see this wiki page for more info on custom webhook solutions
Finally, I'd like to mention that PTB comes with a built-in solution of scheduling tasks, see the wiki page on Job Queue. This may or may not be relevant for you depending on your setup.
Dislaimer: I'm currently the maintainer of python-telegram-bot

Creating queue for Flask backend that can handle multiple users

I am creating a robot that has a Flask and React (running on raspberry pi zero) based interface for users to request it to perform tasks. When a user requests a task I want the backend to put it in a queue, and have the backend constantly looking at the queue and processing it on a one-by-one basis. Each tasks can take anywhere from 15-60 seconds so they are pretty lengthy.
Currently I just immediately do the task in the same python process that is running the Flask server, and from testing locally It seems like i can go to the react app in two different browsers and request tasks at the same time and it looks like the raspberry pi is trying to run them in parallel (from what I'm seeing in the printed logs).
What is the best way to allow multiple users to go to the front-end and queue up tasks? When multiple users go to the react app I assume they all connect to the same instance of the back-end. So it it enough just to add a dequeue to the back-end and protect it with a mutex lock (what is the pythonic way to use mutexes?). Or is this too simple? Do I need some other process or method to implement the task queue (such as writing/reading to an external file to act as the queue)?
In general, the most popular way to run tasks in Python is using Celery. It is a Python framework that runs on a separate process, continuously checking a queue (like Redis or AMQP) for tasks. When it finds one, it executes it, and logs the result to a "result backend" (like a database or Redis again). Then you have the Flask servers just push the tasks to the queue.
In order to notify the users, you could use polling from the React app, which is just requesting an update every 5 seconds until you see from the result backend that the task has completed successfully. As soon as you see that, stop polling and show the user the notification.
You can easily have multiple worker processes run in parallel, if the app would become large enough to need it. In general, you just need to remember to have every process do what it's needed to do: Flask servers should answer web requests, and Celery servers should process tasks. Not the other way around.

The best practice to run periodic tasks in Python?

In my system user is allowed to set notifications schedule. He can choose any date and time when he wants to get messages. I have discovered one mechanims is named as Celery in Python. That executes tasks asyncronly. Due this I have pair of questions:
How to intergrate Celery with user interface?
Are there any Celery alternatives?
Is it panacea?
What you are looking for is something to process background tasks submitted to a queue from your web server. To that end, Celery is a good option and easy to configure. A more comprehensive list can be found here. None of these options would integrate with a user interface, they would integrate with your web server. They can queue jobs based on what is sent from the client side, which could be included as part of handling the request-response flow.
Also, this article provides a good reference for how to schedule periodic tasks using celery.

How to trigger email when celery worker goes down?

I have configured django celery with rabbitmq in my server. Currently I am having only one node for my tasks.
I have tried with celery-flower, events, celerycam, etc. for monitoring the the worker/tasks status and it worked well.
My Problem is:-
I want to send mail notification if worker goes down for some reason.
I thought of creating cron job and running every 5 mins and check the status of worker(not sure this the correct way)
Is there any other extensions or other way to do this without cron??
Run your workers using supervisor. There's an example in the documentation. Then, take a look at this answer for how to send an email when the worker process goes down.

reliable way to deploy new code into a production celery cluster without pausing service

I have a few celery nodes running in production with rabbitmq and I have been handling deploys with service interruption. I have to take down the whole site in order to deploy new code out to celery. I have max tasks per child set to 1, so in theory, if I make changes to an existing task, they should take effect when the next time they are run, but what about registering new tasks? I know that restarting the daemon won't kill running workers, but instead will let them die on their own, but it still seems dangerous. Is there an elegant solution to this problem?
The challenging part here seems to be identifying which celery tasks are new versus old. I would suggest creating another vhost in rabbitmq and performing the following steps:
Update django web servers with new code and reconfigure to point to the new vhost.
While tasks are queuing up in the new vhost, wait for celery works to finish up with the tasks from the old vhost.
When workers have completed, update the code and configuration to the new vhost
I haven't actually tried this but I don't see why this wouldn't work. One annoying aspect is having to alternate between the vhosts with each deploy.
a kind of work around for you can be to set the config variable MAX_TASK_PER_CHILD.
This variable specify the number of task that a Pool Worker execute before kill himself.
Off course when a new Pool Worker is executed this will load the new code.
On my system normally I use to restart celery leaving other task running on background, normally everything goes fine, sometimes happen that one of this task is never killed and you can still kill it with a script.

Categories