implement a background job inside a http microservice - python

i have a microservice (written in Python) which exposes a few endpoints, each of these endpoints can be executed by a http request(flask).
I have one specific endpoint which takes a long time to be finished, so I thought to run a background job once this entrypoint is triggered, in order to reduce its respond time.
For example, i want flask to start the process by a http request and when it starts i want to push a task into a rabbitMQ queue.. is the consumer should be in the same app or in a different service?

Consumer could live in the same codebase to share the models, functions and other tools just run it as separate worker process.
Celery is redundant for such a task, try Pika or Dramatiq.

Related

Which requests should be handled by the webserver and which by a task queue worker?

I am working on a Python web app that uses Celery to schedule and execute user job requests.
Most of the time the requests submitted by a user can't be resolved immediately and thus it makes sense to me to schedule them in a queue.
However, now that I have the whole queuing architecture in place, I'm confused about whether I should delegate all the request processing logic to the queue/workers or if I should leave some of the work to the webserver itself.
For example, apart from the job scheduling, there are times where a user only needs to perform a simple database query, or retrieve a static JSON file. Should I also delegate these "synchronous" requests to the queue/workers?
Right now, my webserver controllers don't do anything except validating incoming JSON request schemas and forwarding them to the queue. What are the pros and cons of having a dumb webserver like this?
I believe the way you have it right now plus giving the workers the small jobs now is good. That way the workers would be overloaded first in the event of an attack or huge request influx. :)

Creating queue for Flask backend that can handle multiple users

I am creating a robot that has a Flask and React (running on raspberry pi zero) based interface for users to request it to perform tasks. When a user requests a task I want the backend to put it in a queue, and have the backend constantly looking at the queue and processing it on a one-by-one basis. Each tasks can take anywhere from 15-60 seconds so they are pretty lengthy.
Currently I just immediately do the task in the same python process that is running the Flask server, and from testing locally It seems like i can go to the react app in two different browsers and request tasks at the same time and it looks like the raspberry pi is trying to run them in parallel (from what I'm seeing in the printed logs).
What is the best way to allow multiple users to go to the front-end and queue up tasks? When multiple users go to the react app I assume they all connect to the same instance of the back-end. So it it enough just to add a dequeue to the back-end and protect it with a mutex lock (what is the pythonic way to use mutexes?). Or is this too simple? Do I need some other process or method to implement the task queue (such as writing/reading to an external file to act as the queue)?
In general, the most popular way to run tasks in Python is using Celery. It is a Python framework that runs on a separate process, continuously checking a queue (like Redis or AMQP) for tasks. When it finds one, it executes it, and logs the result to a "result backend" (like a database or Redis again). Then you have the Flask servers just push the tasks to the queue.
In order to notify the users, you could use polling from the React app, which is just requesting an update every 5 seconds until you see from the result backend that the task has completed successfully. As soon as you see that, stop polling and show the user the notification.
You can easily have multiple worker processes run in parallel, if the app would become large enough to need it. In general, you just need to remember to have every process do what it's needed to do: Flask servers should answer web requests, and Celery servers should process tasks. Not the other way around.

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).
About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here
Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Realtime progress tracking of celery tasks

I have a main celery task that starts multiple sub-tasks (thousands) doing multiple actions (same actions per sub-task).
What i want is, from the main celery task to track in real-time for each action, how many are done and how many have failed for each sub-task.
In summary!
Main task: receive list of objects, and a list of actions to do for each object.
For each object, a sub-task is started to perform the actions for the object.
The main task is finished when all the sub-tasks are finished
So i need to know from the main task the real-time progress of the sub-tasks.
The app i am developing is using django/angularJs, and i need to show the real-time progress asynchronously in the front-end.
I am new to celery, and i am confused and don't know how to implement this.
Any help would be appreciated.
Thanks in advance.
I have done this before, there's too much code to put in here, so please allow me to simply put the outline, as I trust you can take care of the actual implementation and configuration:
Socket.io-based microservice to send real time events to browser
First, Django is synchronous, so it's not easy doing anything real time with it.
So I resorted to a socket.io process. You could say it's a microservice that only listens to a "channel" that was Redis-backed, and sends notifications to a browser client that listens to a given channel.
Celery -> Redis -> Socket.io -> Browser
I made it so each channel is identified with a Celery task ID. So when I fire a celery task from browser, I get the task ID, keep it and start listening to events from socket.io via that channel.
In chronological order it looks like this:
Fire off the Celery task, get the ID
Keep the ID in your client app, open a socket.io channel to listen for updates
The celery task sends messages to Redis, this will trigger socket.io events
Socket.io relays the messages to the browser, in real time
Reporting the progress
As for the actual updating of the status of the task, I just make it so that the Celery task, within its code, sends a message on Redis with something like e.g. {'done': 2, 'total_to_be_done': 10} (to represent a task that went through 2 out of 10 steps, a 20% progress, I prefer to send both numbers for better UI/UX)
import redis
redis_pub = redis.StrictRedis()
channel = 'task:<task_id>:progress'
redis_pub.publish(channel, json.dumps({'done': 2, 'total_to_be_done': 10}))
Find documentation for publishing messages on Redis with Python here
AngularJS/Socket.io integration
You can use or at least get some inspiration from a library like angular-socket-io

SocketIO emit from Asynchronous Celery worker is not working

I am using Flask-SocketIO to create a real-time notification system. There is an external API server that calls the socketio server in a separate thread via an RPC. The method invoked by the RPC creates a Celery task that when consumed, calls a method that invokes socketio.emit(). However, the message doesn't seem to actually be sent as no message is received in the javascript client. My instinct tells me that as the Celery worker is running in a separate process, the socketio.emit() method being called is not sending to the connected client although the objects exist at the same place in memory. The server is running gevent and Celery is receiving and completing the tasks as seen by the logs. Further I have verified that socketio.emit() is being called by the Celery worker and I have verified that when the task is called directly, bypassing Celery, socketio works as expected. Any ideas for how to get socketio to communicate correctly when it is being referenced by a celery task in a separate process?
Did you forget adding the message_queue ?
socketio.init_app(app, message_queue='redis://localhost:6379/0')
You can run Celery in multiprocessing or eventlet mode.
By default, Celery uses multiprocessing to set up a new process for a new worker. Eventlet uses threads, which I believe is what you want to use in this scenario since you want shared memory.
You may find this documentation useful.

Categories