i'm currently working on a Python web app that needs to implement RabbitMQ.
The app is structured like that :
The client connects to a HTTP server
His connexion is send to a message queue that is connected to the main service of my app
the main service receive the message and give the user his information
I understand how to make work RabbitMq using the documentation and tutorial on the website but I have trouble seeing how can it work with real tasks like displaying a web page or printing a file ? How does my service connected to the message queue will read the message received and say : "oh, i'm gonna display this webpage".
Sorry if this is confusing, if you need further explanations on what i'm trying to get, just tell me.
Thanks for reading!
RabbitMq can be good to send message to service which can execute long running process - ie download big file, generate complex animation. Web server can't (or shoudn't) execute long running process.
Web page sends message to RabbitMq (ie. with parameters for long running process) and get unique number. When service has free worker then it checks if there is new message in queue, get it (with unique number) and start worker. When worker finish job then service send result to RabbitMQ with the same uniqe number.
At the same time web page uses JavaScript to run loop which periodically check in RabbitMQ if there is result with this unique number. If there is no result then it may display progressbar, if there is result then it may display this result.
Example: Celery - Distributed Task Queue.
Celery can use RabbitMQ to communicate with Django or Flask.
(but it can use other modules ie. Redis)
Using Celery with Django.
Flask - Celery Background Tasks
From Celery repo
Celery is usually used with a message broker to send and receive messages.
The RabbitMQ, Redis transports are feature complete, but there's also
experimental support for a myriad of other solutions, including using
SQLite for local development.
Related
Im receiving a heroku timeout error with code H12 when Im calling an api via my flask app. The api usualy responds within 2min. Im calling the api via a different thread so that the main flask app thread keeps running.
with ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(shub_api, website, merchant.id)
result = future.result()
There is some documentation on Heroku on running background tasks, however the python examples were for using Redis that i know nothing about. Are there some other solutions to this problem?
This is not working because of the way Heroku is architected.
When your web application is deployed to Heroku, it runs on dynos. Dynos are "ephemeral webservers" that only live for a small amount of time. This means that when a user makes a request to your app, the user's request will be handled by a dyno that may only live for a short period of time.
Heroku dynos are constantly starting, stopping, and being moved around to other physical hosts. This means that web dynos should not be used to run tasks that take a long time to complete (there are different worker dynos for that).
Furthermore, every web request that is served by a Heroku dyno has a 30-second timeout. What this means is that if someone makes an HTTP request to your app on Heroku, your app must start responding to the client within 30 seconds, otherwise, Heroku's routing layer will issue an H12 TIMEOUT error to you because it thinks your app has frozen or gotten stuck in a loop somewhere.
To sum it up: Heroku's architecture is such that it is designed from the ground up to follow web best practices, which means having your HTTP requests finish quickly (< 30 seconds) and not relying on your web servers being permanent fixtures where you can just run code on them all the time.
What you should do to resolve this issue instead is to use a background worker process (essentially it's just a second type of dyno you can run some code on that will process long-running tasks) and have your web application send a notification to your worker process to start running your task code.
This is typically done via a message queue like Redis, AWS SQS, etc. This Heroku article explains the concept in more detail.
I'm trying to add some real-time features to my Django applications, for that i'm using RabbitMQ and Celery on my django project, so what i would like to do is this: i have an external Python script which sends data to RabbitMQ > from RabbitMQ it should be retrieved from the Django app.
I'm sending some muppet data, like this:
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='Test')
channel.basic_publish(exchange='',
routing_key='Test',
body='Hello world!')
print(" [x] Sent 'Hello World!'")
connection.close()
What i would like to do is: as soon as i send Hello World!, my Django app should receive the string, so that i can perform some operations with it, such as saving it on my database, passing it to an HTML template or simply printing it to my console.
My actual problem is that i still have no idea how to do this. I added Celery to my Django project but i don't know how to connect to RabbitMQ and receive the message. Would i have to do it with Django Channels? Is there some tutorial on this? I found various material about using RabbitMQ and Celery with Django but nothing on this particular matter.
This is not directly connected to Celery.
You could solve it like this:
create a manage command in Django that does whatever needs to be done with the incoming message/data (https://docs.djangoproject.com/en/3.0/howto/custom-management-commands/)
create a consumer which could also be a Django command that is then started in a separate process, it is not part of the regular Django process (though it can be part of your Django code). In this consumer, you listen to the queue and whenever data comes in, the manage command from (1.) is called.
Of course, 1. and 2. could also be done in one single command. I've separated it to illustrate better the different aspects. And you might have different tasks and reuse one consumer. Also, if you have already (1.) you can reuse it like this, and you can test it easily without the overhead of the consumer.
More on RabbitMQ python consumers: https://github.com/celery/celery/blob/master/celery/worker/consumer/consumer.py
Here is the Celery consumer:
https://github.com/celery/celery/blob/master/celery/worker/consumer/consumer.py
because Celery of course has it's own consumer. It looks rather generic though, a simple consumer should be less complex.
(I only ever have written NSQ python consumers as part of Django, so no direct experience with RabbitMQ consumers (only as backend for Celery).)
EDIT: What you should ask yourself is - do I want the realtime data saved and stored in my Django app, first of all?
If yes - then RabbitMQ+Consumer is a very valid approach.
If no, if this is just for the user - you could also think about directly exposing it via an API to your frontend (and there use ajax calls to fetch it).
If no but you want to buffer the data to avoid hitting the other app that generates the data - then a queue is a very nice tool. In this case though, you might change the consumer not to save the data but to expose it to your frontend. If you only have to support new browsers you could use websockets which are supported now with Django 3:
https://blog.heroku.com/in_deep_with_django_channels_the_future_of_real_time_apps_in_django
While using Django, I had noticed when I send an email there is a delay and to overcome this I had to use Celery tool. So I wanted to know what is actually happening under the hood. Why does Django/Python need an async tool to accomplish this task. I have learned about MultiThreading, MultiProcessing in Python but if somebody could give me a brief idea about what exactly is happening when Django is trying to send an email without using Celery.
Think of sending an email like sending a request, in a synchronous context the process would go as follows:
Send request
Wait for response..........
Receive response
The whole time you're waiting for the response that thread cannot do anything else, it's wasting CPU cycles that could be used by something else (such as serving other users requests).
I'd like to make a distinction here between your usage of asynchronous and celery.
Pythons actual asynchronous implementation uses an "event loop" to dispatch and receive messages. The "waiting" is done in a separate thread/process which is used exclusively to receive messages, and dispatch those messages to receivers, in this way your thread that sent the request is no longer wasting CPU cycles waiting, it will be invoked by the event loop when it's ready. This is a very vague description of how pythons async works. It won't necessarily make the whole process faster for the user unless there are a lot of emails being sent.
Celery on the other hand is an asynchronous task queue, in which you have producers (your web application) sending messages, a broker (data store) which stores and distributes messages, and consumers (workers) which pull messages from the broker and processes them. The consumers are a totally separate process (often a totally separate server) from your web application, it frees up your web application to focus on returning the response to the client as soon as possible. The process when sending emails through celery would look like:
Web application sends a message to the broker and returns the response to the user. Here's a json pseudo-message. (The broker actually stores the messages as either pickled objects or JSON)
{
"task": "my_app.send_email",
"args": ["Subject Line", "Hello, World! This is your email contents", "to_email#example.com", "from_email#example.com"], #
"kwargs": {} # No keyword arguments
}
The celery worker is constantly checking with the broker for new messages to process if it is not currently processing. Sometimes the celery worker will pull in batches of messages so there is less overhead, this is configurable.
The celery worker executes a function (defined by the "task" in the message), using the arguments and keyword arguments.
That is a very simple example of why you may want to use celery to send emails, so you can return the response to the user as fast as possible! It's also well suited to longer running tasks, such as processing image thumbnails:
User uploads an image, which you store somewhere (Amazon S3 for example)
You send a message to the broker saying "execute my process_image_thumbails task with the files S3 URL as the argument"
You return the response to your user. It's nice and quick from the users perspective.
A worker picks up the message, downloads the file from S3, and processes it into thumbnails of varying sizes.
As you use celery for more new use cases you encounter new problems. For example, what do we do if someone requests the thumbnail while it's processing? I'll leave that to your imagination.
I am working on a Python 2.7 project with a simple event loop that checks a variety of data sources (rabbitmq, mongodb, postgres, etc) for new data, processes the data and writes data to the next stage.
I would like to embed a web server in the application so it can receive simple REST commands, for shutting it down, diagnosis etc.
However, from reading the documentation on the available web servers it wasn’t clear if they will allow the event loop described above to function outside of the web server’s event loop. Ie. it looks like I would have to do something like launch the event loop using a REST call and have the loop live on an io thread, or similar.
Can someone explain which embedded server (cherrypy, bottle, flask, etc) / concurrency framework (tornado, gevent, twisted etc.) are best suited for this problem?
Thank you in advance!
I would recommend you use a separate process for your app that will receive REST commands (use Pyramid or Flask), and have it send messages over RabbitMQ to the real time part. I like Kombu myself for interfacing with RabbitMQ, and your message bus will nicely decouple your web/rest needs from your event driven needs. Your event driven part just gets messages off the bus, and doesn't need to know anything about REST.
I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm
I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!