SocketIO emit from Asynchronous Celery worker is not working - python

I am using Flask-SocketIO to create a real-time notification system. There is an external API server that calls the socketio server in a separate thread via an RPC. The method invoked by the RPC creates a Celery task that when consumed, calls a method that invokes socketio.emit(). However, the message doesn't seem to actually be sent as no message is received in the javascript client. My instinct tells me that as the Celery worker is running in a separate process, the socketio.emit() method being called is not sending to the connected client although the objects exist at the same place in memory. The server is running gevent and Celery is receiving and completing the tasks as seen by the logs. Further I have verified that socketio.emit() is being called by the Celery worker and I have verified that when the task is called directly, bypassing Celery, socketio works as expected. Any ideas for how to get socketio to communicate correctly when it is being referenced by a celery task in a separate process?

Did you forget adding the message_queue ?
socketio.init_app(app, message_queue='redis://localhost:6379/0')

You can run Celery in multiprocessing or eventlet mode.
By default, Celery uses multiprocessing to set up a new process for a new worker. Eventlet uses threads, which I believe is what you want to use in this scenario since you want shared memory.
You may find this documentation useful.

Related

How do gevent workers behave with sync vs async processes?

I have a Flask app that uses a Gunicorn server for hosting. We are running into an issue where our workers keep getting locked up by long-running requests to different microservices. We currently only have Gunicorn set to give us 3 workers, so if there are 3 requests that are waiting on calls to those microservices, the server is completely locked up.
I started searching around and ran into this post:
gunicorn async worker class
This made sense to me, and it seemed like I could make the endpoint whose only job is to call these microservices asynchronous, then install gunicorn[gevent] and add --worker-class gevent to my start script. I implemented this and tested by using only 1 worker and adding a long time.sleep to the microservice being called. Everything worked perfectly and my server could process other requests while waiting for the async process to complete.
Then I opened pandora's box and added another long time.sleep to a synchronous endpoint within my server, expecting that because this endpoint is synchronous, everything would be locked up during the time it took for the one worker to finish that process. I was surprised that my worker was responding to pings, even while it was processing the synchronous task.
To me, this suggests that the Gunicorn server is adding threads for the worker to use even when the worker is in the middle of processing a synchronous task, instead of only freeing up the worker to operate on a new thread while waiting for an asynchronous IO process like the above post suggested.
I'm relatively new to thread safety so I want to confirm if I am going about solving my issue the right way with this implementation, and if so, how can I expect the new worker class to generate new threads for synchronous vs asynchronous processes?

implement a background job inside a http microservice

i have a microservice (written in Python) which exposes a few endpoints, each of these endpoints can be executed by a http request(flask).
I have one specific endpoint which takes a long time to be finished, so I thought to run a background job once this entrypoint is triggered, in order to reduce its respond time.
For example, i want flask to start the process by a http request and when it starts i want to push a task into a rabbitMQ queue.. is the consumer should be in the same app or in a different service?
Consumer could live in the same codebase to share the models, functions and other tools just run it as separate worker process.
Celery is redundant for such a task, try Pika or Dramatiq.

Understanding celery worker nodes

I am trying to understand the working of celery and AMQP here.
My scenario
I install celery in my machine
pip install celery
I make tasks using
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://')
#app.task
def print_hello():
print 'hello there'
As far as I understood, celery converts this task to message and send to brokers(redis or rabbitmq) via AMQP protocol. Then these messages are queued and delivered to worker nodes to process the message.
My questions are,
Suppose I created task in a Java environment and if the message is sent to a external worker node, does it mean the worker node server must have Java installed in it to execute the task ?
If the message is picked by external worker node, how does worker node and broker find each other ? In the above code I only have the broker address to store tasks.
Also Why are we storing the tasks in a broker ? Why couldn't we implement exchange algorithm in celery and send the message direct to workers ?
What is the difference between SOAP and AMQP ?
The workers need not only Python, but all the code for the tasks you want to run on them.
But you don't address the nodes specifically, that is precisely why there is a broker. You put your tasks on the queue, and the workers pick them up.
I have no idea why you've mentioned SOAP in this context. It has nothing whatsoever to do with anything.
The specific answers to your questions are:
"if the message is sent to a external worker node" is slightly misleading. A message is not sent to a worker node per se. It is sent to the Broker (identified by a URL) and specifically an Exchange on that broker with a Routing Key which sees it landing in a Queue. Workers are all configured with the same Broker URL and read this Queue, and it's very much a case of [first-in-best-dressed][1], the first Worker to consume the message (to read a message in an AMQP it is removed from the Queue in one atomic operation). The [messages][2] are language independent. The Workers however are written in Python and the task definition must be in Python, though the Python task definition can of course call out to any other library by whatever means to execute the task. But in a sense yes, whatever run time libraries your task needs in order to run it needs to have on the same machine as the Worker, and they must have a Python wrapper around them so the Worker can load them.
"If the message is picked by external worker node, how does worker node and broker find each other?" - This question is misleading. They don't find each other. The Worker is configured with the exact same Broker URL as the Client is. It has know the URL. The way Celery typically solves this in Python is that the code snippet you shared is loaded by both the Client, and the Worker. This is in fact one of the beauties of Celery. That you write you tasks in Python and you load the definitions in the Worker unaltered. They thus use the same Broker, and have the same Task defined. The #app.task actually creates a Task class instance which has two very important methods: apply_async() which is what creates and sends the message requesting the task, and run() which runs the decorated function. The former is called int he Client. The latter by the Worker (to actually run the task).
"Why are we storing the tasks in a broker?" -Tasks are not stored in a broker. The task is defined in a python file like your code snippet. As described in 2. The same definition is read by both Client and Worker. A messages is sent from Client to Worker asking it to run the task.
"Why couldn't we implement exchange algorithm in celery and send the message direct to workers?" - I'll have to take a guess here, but I would ask, Why reinvent the wheel? There is a standard defined, AMQP (the Advanced Message Queueing Protocol), and there are a number of implementations of that standard. Why write yet another one? Celery is FOSS, and like so much FOSS I imagine the people who started writing it wanted to focus on task management not message management and chose to lean on AMQP for message management. A fair choice. But for what it's worth Celery does implement quite a lot in Kombu, to provide a Python API to AMQP.
SOAP (abbreviation for Simple Object Access Protocol) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks.
AMQP (abbreviation for Advanced Message Queuing Protocol) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security.
SOAP is typically much higher level int the protocol stack. Described here:
https://www.amqp.org/product/different

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).
About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here
Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Custom signalling with Celery tasks

We're using Celery's Tasks for handling some messages that define what work needs to be done. Since these Task objects are spawned by Celery and not manually, there's a limitation on messaging that is imposed by Celery, where the signals sent can only be those defined on the app.
Currently, the plan is to have an AMQP consumer thread running within the Task process that Celery will create and that will be responsible for setting a flag/storing a value/whatever every time a specially crafted message is sent a specific signals queue.
Is there a specific feature of Celery that enables sending such signals without hacking in the extraneous and independent consumer? No Django on our environment.

Categories