The best practice to run periodic tasks in Python? - python

In my system user is allowed to set notifications schedule. He can choose any date and time when he wants to get messages. I have discovered one mechanims is named as Celery in Python. That executes tasks asyncronly. Due this I have pair of questions:
How to intergrate Celery with user interface?
Are there any Celery alternatives?
Is it panacea?

What you are looking for is something to process background tasks submitted to a queue from your web server. To that end, Celery is a good option and easy to configure. A more comprehensive list can be found here. None of these options would integrate with a user interface, they would integrate with your web server. They can queue jobs based on what is sent from the client side, which could be included as part of handling the request-response flow.
Also, this article provides a good reference for how to schedule periodic tasks using celery.

Related

Creating queue for Flask backend that can handle multiple users

I am creating a robot that has a Flask and React (running on raspberry pi zero) based interface for users to request it to perform tasks. When a user requests a task I want the backend to put it in a queue, and have the backend constantly looking at the queue and processing it on a one-by-one basis. Each tasks can take anywhere from 15-60 seconds so they are pretty lengthy.
Currently I just immediately do the task in the same python process that is running the Flask server, and from testing locally It seems like i can go to the react app in two different browsers and request tasks at the same time and it looks like the raspberry pi is trying to run them in parallel (from what I'm seeing in the printed logs).
What is the best way to allow multiple users to go to the front-end and queue up tasks? When multiple users go to the react app I assume they all connect to the same instance of the back-end. So it it enough just to add a dequeue to the back-end and protect it with a mutex lock (what is the pythonic way to use mutexes?). Or is this too simple? Do I need some other process or method to implement the task queue (such as writing/reading to an external file to act as the queue)?
In general, the most popular way to run tasks in Python is using Celery. It is a Python framework that runs on a separate process, continuously checking a queue (like Redis or AMQP) for tasks. When it finds one, it executes it, and logs the result to a "result backend" (like a database or Redis again). Then you have the Flask servers just push the tasks to the queue.
In order to notify the users, you could use polling from the React app, which is just requesting an update every 5 seconds until you see from the result backend that the task has completed successfully. As soon as you see that, stop polling and show the user the notification.
You can easily have multiple worker processes run in parallel, if the app would become large enough to need it. In general, you just need to remember to have every process do what it's needed to do: Flask servers should answer web requests, and Celery servers should process tasks. Not the other way around.

Fair scheduling of task execution time between web service users

Suppose we have the following web service. The main function is doing screenshots for the given website URL. There is REST API and user interface for entering URLs. For each new URL is a task in Celery is created. For frontend UI is important that screens for some URL will follow in a reasonable time, like 10 seconds.
Now a user, intensionally or by a software error, enters few hundreds URLs. This bloats task queue and other users must wait until all those tasks will be done.
So the request here is to:
Running tasks in some fair order. The simplest solution is to run one task for each user in one time. Like: user1 task, user2 task, user1 task, user2 task, and so on.
Having some priorities on tasks. Like tasks of priority 1 is always done before tasks of priority 2.
Currently, we utilize our handcrafted module. It stores tasks in Redis and pushes them in fair order to Celery. To not depend on Celery ordering it pushes only as many tasks as there are free Celery workers available, and checking Celery queue for free workers every 100 milliseconds.
Are there any libraries or services which meet my requirements?
How many tasks do you have?
How many users you have?
Sounds like you need rate-limiting mechanism in your webserver per user.
For your question, there are serval options:
you can use celery router and assign different tasks for different queues (and then consume from those queues by different workers.
Celery support tasks priority, you can read about it here.
You can rate-limit per task in Celery - again, depends on your usage.
EDIT:
#uhbif19 I described those features since you asked for them - you wanted a way to achieve priority and you send tasks with a specific priority.
In your current architecture you might want to decrease priority to abusers and avoid starvation of other users.
A better way to tackle this problem IMO is to add a rate-limiting mechanism in the gateway and ensure that a single user won't be able to abuse the system and make starvation for all others.
Good luck!

What is the relationship between Celery and RabbitMQ?

Is Celery mostly just a high level interface for message queues like RabbitMQ? I am trying to set up a system with multiple scheduled workers doing concurrent http requests, but I am not sure if I would need either of them. Another question I am wondering is where do you write the actual task in code for the workers to complete, if I am using Celery or RabbitMQ?
RabbitMQ is indeed a message queue, and Celery uses it to send messages to and from workers. Celery is more than just an interface for RabbitMQ. Celery is what you use to create workers, kick off tasks, and define your tasks. It sounds like your use case makes sense for Celery/RabbitMQ. You create a task using the #app.task decorator. Check the docs for more info. In previous projects, I've set up a module for celery, where I define any tasks I need. Then you can pull in functions from other modules to use in your tasks.
Celery is the task management framework--the API you use to schedule jobs, the code that gets those jobs started, the management tools (e.g. Flower) you use to monitor what's going on.
RabbitMQ is one of several "backends" for Celery. It's an oversimplification to say that Celery is a high-level interface to RabbitMQ. RabbitMQ is not actually required for Celery to run and do its job properly. But, in practice, they are often paired together, and Celery is a higher-level way of accomplishing some things that you could do at a lower level with just RabbitMQ (or another queue or message delivery backend).

Display progress of a long running Python task in Django

I currently have a typical Django structure set up for a project and one web application.
The web application is set up so that a user inputs some information, and this information is taken as the input to run a Python program.
This python program sometimes can take quite a while to finish (grabbing things from the web and doing some text mining scoring) - sometimes it can take multiple minutes to load.
On the command line, this program would periodically display where it was in the process (it'd first say how many things it found to score against, then it'd say where in the number of things found it is in the scoring process), which was very useful. However, when I moved this over to a Django set up, I no longer have this capability (at least, not in the same way since now this is sent to log files).
The way I set it up is that there is an input view, and then a results view. The results view takes the input and runs the Python program. It won't display the results until the entire program is run. So on the user side, the browser just sits there for sometimes minutes before the results are displayed. Obviously, this is not ideal.
Does anyone know of the best way to bring status information on a task to Django?
I've looked into Celery a little bit, but I think since I'm still a beginner in Django that I'm confusing myself with some of the documentation. For instance: even if the task is sent off asynchronously to a worker, how does the browser grab the current state of the program?? Also, consistent documentation seems to be lacking for celery on Django (I've seen people set up celery many different ways on their Django projects).
I would appreciate any input here, I've been stuck on this for a while now.
My first suggestion is to psychologically separate celery from django when you start to think of the two. They can run in the same environment, but celery is to asynchronous processes what django is to http requests.
Also remember that celery is unlike diango in that it requires other services to function; a message broker. So by using celery you will increase your architectural requirements.
To address you specific use case, you'll need a system to publish messages from each celery task to a message broker and your web client will need to subscribe to those messages.
There's a lot involved here, but the short version is that you can use Redis as your celery message broker as well as your pub/sub service to get messages back to the browser. You can then use e.g diango-redis-websockets to subscribe the browser to the task state messages in redis

Embed a celery worker in my own code

I have a service that needs a sort of coordinator component. The coordinator will manage entities that need to be assigned to users, taken away from users if the users do not respond on a timely manner, and also handle user responses if they do response. The coordinator will also need to contact messaging services to notify the users they have something to handle.
I want the coordinator to be a single-threaded process, as the load is not expected to be too much for the first few years of usage, and I'd much rather postpone all the concurrency issues to when I really need to handle them (if at all).
The coordinator will receive new entities and user responses from a Django webserver. I thought the easiest way to handle this is with Celery tasks - the webserver just starts a task that the coordinator consumes on its own time.
For this to happen, I need the coordinator to contain a celery worker, and replace the current worker mainloop with my own version (one that checks the broker for a new message and handles the scheduling).
How feasible is it? The alternative is to avoid Celery and use RabbitMQ directly. I'd rather not do that.
Replace this names: coordinator with rabbitmq (or some other broker kombu supports) and users with celery workers.
I am pretty sure you can do all you need (and much more) just by configuring celery / kombu and rabbitmq and without writing too many (if any) lines of code.
small note: Celery features scheduled tasks.

Categories