Periodic and Non periodic tasks with Django + Telegram + Celery - python

I am building a project based on Django and one of my intentions is to have a telegram bot which is receiving information from a Telegram group. I was able to implement the bot to send messages in Telegram, no issues.
In this moment I have a couple of Celery tasks which are running with Beat and also the Django web, which are decopled. All good here.
I have seen that the python-telegram-bot is running a function in one of the examples (https://github.com/python-telegram-bot/python-telegram-bot/blob/master/examples/echobot.py) which is waiting idle to receive data from Telegram. Now, all my tasks in Celery are in this moment periodic and are called each 10 or 60 minutes by Beat.
How can I run this non-periodic task with Celery in my configuration? I am saying non-periodic because I understood that it will wait for content until it is manually interrupted.
Django~=3.2.6
celery~=5.1.2
CELERY_BEAT_SCHEDULE = {
'task_1': {
'task': 'apps.envc.tasks.Fetch1',
'schedule': 600.0,
},
'task_2': {
'task': 'apps.envc.tasks.Fetch2',
'schedule': crontab(minute='*/60'),
},
'task_3': {
'task': 'apps.envc.tasks.Analyze',
'schedule': 600,
},
}
In my tasks.py I have one of the tasks like this:
#celery_app.task(name='apps.envc.tasks.TelegramBot')
def TelegramBot():
status = start_bot()
return status
And as the start_bot implemenation, I simply copied the echobot.py example and I have added my TOKEN there (of course the functions for different commands from the example are also there).

Set up a webhook instead of polling with Celery
With Django, you shouldn't be using Celery to run Telegram polling (what you call PTB's “non-periodic task”, which is better described as a long-running process or service). Celery is designed for definite tasks, not indefinitely-running processes.
As Django implies that you're already running a web server, then the webhook option is a better fit. (Remember that you can either do polling or set up a webhook in order to receive updates from Telegram's servers.) The option that #CallMeStag suggested, of using a non-threading webhook setup, makes the most sense for Django-PTB integration.
You can do the bot setup (defining and registering your handler functions on a Dispatcher instance) in a separate module; to avoid threading, you should pass update_queue=None, workers=0 to your Dispatcher instantiation. And then, use it in a Django view, like this:
import json
from django.views.decorators.csrf import csrf_exempt
from telegram import Update
from .telegram_init import telegram_bot, telegram_dispatcher
...
#csrf_exempt
def telegram_webhook(request):
data = json.loads(request.body)
update = Update.de_json(data, telegram_bot)
telegram_dispatcher.process_update(update)
return JsonResponse({})
where telegram_bot is the Bot instance that I use for instantiating telegram_dispatcher. (I left out error handling in this snippet.)
Why avoid threading? Threads in the more general sense are not forbidden in Django, but in the context of PTB, threading usually means running bot updaters or dispatchers in a long-running thread that share an update/message queue, and that's a complication that doesn't look nice nor play well with, for example, a typical Django deployment that uses multiple Gunicorn workers in separate processes. There is, however, a motivation for using multithreading (multiple processes, actually, using Celery) in Django-PTB integration; see below.
Development environment caveat
The above setup is what you'd want to use for a basic production system. But during dev, unless your dev machine is internet-facing with a fixed IP, you probably can't use a webhook, so you'd still want to do polling. One way to do this is by creating a custom Django management command:
<my_app>/management/commands/polltelegram.py:
from django.core.management.base import BaseCommand
from my_django_project.telegram_init import telegram_updater
class Command(BaseCommand):
help = 'Run Telegram bot polling.'
def handle(self, *args, **options):
updater.start_polling()
self.stdout.write(
'Telegram bot polling started. '
'Press CTRL-BREAK to terminate.'
)
updater.idle()
self.stdout.write('Polling stopped.')
And then, during dev, run python manage.py polltelegram to fetch and process Telegram updates. (Run this along with python manage.py runserver to be able to use the main Django app simultaneously; the polling runs in a separate process with this setup, not just a separate thread.)
When Celery makes sense
Celery does have a role to play if you're integrating PTB with Django, and this is when reliability is a concern. For instance, when you want to be able to retry sending replies in case of transient network issues. Another potential issue is that the non-threading webhook setup detailed above can, in a high-traffic scenario, run into flood/rate limits. PTB's current solution for this, MessageQueue, uses threading, and while it can work, it can introduce other problems, for example interference with Django's autoreload function when running runserver during dev.
A more elegant and reliable solution is to use Celery to run the message sending function of PTB. This allows for retries and rate limiting for better reliability.
Briefly described, this integration can still use the non-threading webhook setup above, but you have to isolate the Bot.send_message() function into a Celery task, and then make sure that all handlers call this Celery task asynchronously instead of using the bot to run send_message() in the webhook process 'eagerly'.

In PTB, Updater.start_polling/webhook() starts a background thread that waits for incoming updates. Updater.idle() blocks the main thread and when receiving a stop signal, it ends the background thread mentioned above.
I'm not familiar with Celery and only know the basics of Django, but I see a few options here that I'd like to point out.
You can run the PTB-related code in a standalone thread, i.e. a thread that calls Updater.start_polling and Updater.idle. To end that thread on shutdown, you'll have to forward the stop signal to that thread
Vice versa, you can run PTB in the main thread and the Django & Celeray related tasks in a standalone thread
You don't have to use Updater. Since you're using Django anyway, you could switch to a webhook-based solution for receiving updates, where Django serves as webhook for you. You can even eliminate threading for PTB completely by calling Dispatcher.process_update manually. Please see this wiki page for more info on custom webhook solutions
Finally, I'd like to mention that PTB comes with a built-in solution of scheduling tasks, see the wiki page on Job Queue. This may or may not be relevant for you depending on your setup.
Dislaimer: I'm currently the maintainer of python-telegram-bot

Related

Creating queue for Flask backend that can handle multiple users

I am creating a robot that has a Flask and React (running on raspberry pi zero) based interface for users to request it to perform tasks. When a user requests a task I want the backend to put it in a queue, and have the backend constantly looking at the queue and processing it on a one-by-one basis. Each tasks can take anywhere from 15-60 seconds so they are pretty lengthy.
Currently I just immediately do the task in the same python process that is running the Flask server, and from testing locally It seems like i can go to the react app in two different browsers and request tasks at the same time and it looks like the raspberry pi is trying to run them in parallel (from what I'm seeing in the printed logs).
What is the best way to allow multiple users to go to the front-end and queue up tasks? When multiple users go to the react app I assume they all connect to the same instance of the back-end. So it it enough just to add a dequeue to the back-end and protect it with a mutex lock (what is the pythonic way to use mutexes?). Or is this too simple? Do I need some other process or method to implement the task queue (such as writing/reading to an external file to act as the queue)?
In general, the most popular way to run tasks in Python is using Celery. It is a Python framework that runs on a separate process, continuously checking a queue (like Redis or AMQP) for tasks. When it finds one, it executes it, and logs the result to a "result backend" (like a database or Redis again). Then you have the Flask servers just push the tasks to the queue.
In order to notify the users, you could use polling from the React app, which is just requesting an update every 5 seconds until you see from the result backend that the task has completed successfully. As soon as you see that, stop polling and show the user the notification.
You can easily have multiple worker processes run in parallel, if the app would become large enough to need it. In general, you just need to remember to have every process do what it's needed to do: Flask servers should answer web requests, and Celery servers should process tasks. Not the other way around.

Django: How to ignore tasks with Celery?

Without changing the code itself, Is there a way to ignore tasks in Celery?
For example, when using Django mails, there is a Dummy Backend setting. This is perfect since it allows me, from a .env file to deactivate mail sending in some environments (like testing, or staging). The code itself that handles mail sending is not changed with if statements or decorators.
For celery tasks, I know I could do it in code using mocks or decorators, but I'd like to do it in a clean way that is 12factors compliant, like with Django mails. Any idea?
EDIT to explain why I want to do this:
One of the main motivation behind this, is that it creates coupling between Django web server and Celery tasks.
For example, when running unit tests, if the broker server (Redis for me) is not running, then if delay() method is called, it freezes forever, because there is no timeout when Celery tries to send a task to Redis.
From an architecture view, this is very bad. I'd like my unit tests can run properly without the requirement to run a Celery broker!
Thanks!
As far as the coupling is concerned, your Django application would still be tied to celery if you use a dummy backend. Just your tasks won't execute. Maybe this is acceptable in your case but in my opinion, it can cause some problems. For example, if the piece of code you are trying to test, submits a task to celery, and in a later part, tries to retrieve the result for that task, it will fail. Because the dummy backend will never execute the task.
For unit testing, as you mentioned in your question, you can use the task_always_eager setting. If you turn it on, your Django app will no longer depend upon a running worker. It will execute tasks in the same thread in a synchronous fashion and return the result.

The best practice to run periodic tasks in Python?

In my system user is allowed to set notifications schedule. He can choose any date and time when he wants to get messages. I have discovered one mechanims is named as Celery in Python. That executes tasks asyncronly. Due this I have pair of questions:
How to intergrate Celery with user interface?
Are there any Celery alternatives?
Is it panacea?
What you are looking for is something to process background tasks submitted to a queue from your web server. To that end, Celery is a good option and easy to configure. A more comprehensive list can be found here. None of these options would integrate with a user interface, they would integrate with your web server. They can queue jobs based on what is sent from the client side, which could be included as part of handling the request-response flow.
Also, this article provides a good reference for how to schedule periodic tasks using celery.

Tornado Subprocess confusion

I am trying to build a Tornado web server which takes requests from multiple clients. The request consists of:
a. For a given directory name passed through an URL, zip the files, etc and FTP it out.
b. Providing a status of sorts if the task is completed.
So, rather than making it a synchronous and linear process, I wanted to break it down into multiple subtasks. The client will submit the URL request and then simply receive a response of sorts 'job submitted'. A bit later, the client can come along asking status on this job. During this time the job obviously has to finish its task.
I am confused between what modules to use - Tornado Subprocess, Popen contructor, Subprocess.Call, etc. I've read Python docs but can't find anything where the task is running longer and Tornado is not supposed to wait for it to finish. So, I need a mechanism to start a job, let it run its course but relinquish the client and then when asked by client provide a status on it.
Any help is appreciated. Thanks.
Python programmers widely use Celery for a set of processes to manage a queue of tasks. Set up Celery with RabbitMQ and write a Celery worker (perhaps with Celery Canvas that does the work you need: zips a directory, ftps it to somewhere, etc.
The Tornado-Celery integration package provides something that appears close to what you need to integrate your Tornado application with Celery.
This is all a lot of moving parts to install and configure at first, of course, but it will prepare you for a maintainable application architecture.

Celery: why do I need a broker for periodic tasks?

I have a standalong script that scrapes a page, initiates a connection to a database, and writes database to it. I need it to execute periodically after x hours. I can make it with using a bash script, with the pseudocode:
while true
do
python scraper.py
sleep 60*60*x
done
From what I read about message brokers, they are used for sending "signals" from one running program to another, like HTTP in principle. Like I have a piece of code that accepts an email id from user, it sends signal with email-id to another piece of code that will send the email.
I need celery to run a periodic task on heroku. I already have a mongodb on a separate server. WHy do I need to run another server for rabbitmq or redis just for this? Can I use celery without the broker?
Celery architecture is designed to scale and distribute tasks across several servers. For sites like yours it might be an overkill. Queue service is generally needed to maintain the task list and signal the status of finished tasks.
You might want to take a look in Huey instead. Huey is small-scale Celery "Clone" needing only Redis as an external dependency, not RabbitMQ. It's still using Redis queue mechanism to line the tasks in queue.
There also exists Advanced Python scheduler which does not need even Redis, but can hold the state of the queue in memory in-process.
Alternatively if you have very small amount of periodical tasks, no delayed tasks, I would just use Cron and pure Python scripts to run the tasks.
As the Celery documentation explains:
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task, a client adds a message to the queue, which the broker then delivers to a worker.
You can use your existing MongoDB database as broker. see Using MongoDB.
For the application like this, its better use Django Background Tasks
,
Installation
Install from PyPI:
pip install django-background-tasks
Add to INSTALLED_APPS:
INSTALLED_APPS = (
# ...
'background_task',
# ...
)
Migrate your database:
python manage.py makemigrations background_task
python manage.py migrate
Creating and registering tasks
To register a task use the background decorator:
from background_task import background
from django.contrib.auth.models import User
#background(schedule=60)
def notify_user(user_id):
# lookup user by id and send them a message
user = User.objects.get(pk=user_id)
user.email_user('Here is a notification', 'You have been notified')
This will convert the notify_user into a background task function. When you call it from regular code it will actually create a Task object and stores it in the database. The database then contains serialised information about which function actually needs running later on. This does place limits on the parameters that can be passed when calling the function - they must all be serializable as JSON. Hence why in the example above a user_id is passed rather than a User object.
Calling notify_user as normal will schedule the original function to be run 60 seconds from now:
notify_user(user.id)
This is the default schedule time (as set in the decorator), but it can be overridden:
notify_user(user.id, schedule=90) # 90 seconds from now
notify_user(user.id, schedule=timedelta(minutes=20)) # 20 minutes from now
notify_user(user.id, schedule=timezone.now()) # at a specific time
Also you can run original function right now in synchronous mode:
notify_user.now(user.id) # launch a notify_user function and wait for it
notify_user = notify_user.now # revert task function back to normal function.
Useful for testing.
You can specify a verbose name and a creator when scheduling a task:
notify_user(user.id, verbose_name="Notify user", creator=user)

Categories