Using RabbitMQ with Plone - Celery or not? - python

I hope I am posting this in the right place.
I am researching RabbitMQ for potential use in our Plone sites. We currently us Async on a dedicated worker client in the Plone server, but we are thinking about building a dedicated RabbitMQ server that will handle all Plone messaging and other activity.
My specific question is, what are the advantages of using Celery to work with RabbitMQ in Plone versus just using RabbitMQ? I found this plone add-on for Celery integration, but not sure if that is best route to go. I noticed Celery has the Flower tool for monitoring the queues, which would be a huge plus.
As a side question, if you feel so inclined, does anyone have any tips or references for integration RabbitMQ with Plone to handle all of these requests? I have been doing research, and get the general gist of RabbitMQ, but I can't seem to make the connection with Plone activities, such as Content Rules and PloneFormGen submissions for example. So far I see this add-on that I am going to install and see if I can figure out, but I am just trying to get a little guidance if I can.
Thanks for your time!

At first, ask yourself, if you need the features of RabbitMQ or just want to do some asynchronous tasks in Python with Plone.
If you don't really need RabbitMQ, you could look into David Glick's gists for how to integrate Celery with Plone (and still use RabbitMQ with Celery):
https://gist.github.com/davisagli/5824662
https://gist.github.com/davisagli/5824709
You could also look into collective.taskqueue (simple queues without Celery nor RabbitMQ), but it does not provide any monitoring solution yet.
If you really need RabbitMQ, skip Celery, and try out collective.zamqp. Celery tries to be broker by itself and would prevent you from using most of AMQP's and RabbitMQ's built-in features.
RabbitMQ ships with great web admin plugin for monitoring and there are also plugins for 3rd party monitoring systems (like Zenoss).
I'm sorry that collective.zamqp is still missing narrative documentation, but you can look into collective.zamqpdemo for various examples of its configuration and usage.
In short, c.zamqp allows you to define configure broker usage in terms of producers and consumers:
from five import grok
from zope.interface import Interface
from collective.zamqp.producer import Producer
from collective.zamqp.consumer import Consumer
class CreateItemProducer(Producer):
"""Produces item creation requests"""
grok.name("amqpdemo.create") # is also used as default routing key
connection_id = "superuser"
serializer = "msgpack"
queue = "amqpdemo.create"
durable = False
class ICreateItemMessage(Interface):
"""Marker interface for item creation message"""
class CreateItemConsumer(Consumer):
"""Consumes item creation messages"""
grok.name("amqpdemo.create") # is also used as the queue name
connection_id = "superuser"
marker = ICreateItemMessage
durable = False
Publish messages through transaction bound producer (to publish messages only after a successful transaction):
import uuid
from zope.component import getUtility
from collective.zamqp.interfaces import IProducer
producer = getUtility(IProducer, name="amqpdemo.create")
producer._register() # register to bound to successful transaction
message = {"title": u"My title"}
producer.publish(message)
And consume the messages in a familiar content event handler environment:
from zope.component.hooks import getSite
from collective.zamqp.interfaces import IMessageArrivedEvent
from plone.dexterity.utils import createContentInContainer
#grok.subscribe(ICreateItemMessage, IMessageArrivedEvent)
def createItem(message, event):
"""Consume item creation message"""
portal = getSite()
obj = createContentInContainer(
portal, "Document", checkConstraints=True, **message.body)
message.ack()
Finally, it decouples broker connection configuration from code and the actual connection parameters can be defined in buildout.cfg (allowing required amount of consuming instances):
[instance]
recipe = plone.recipe.zope2instance
...
zope-conf-additional =
%import collective.zamqp
<amqp-broker-connection>
connection_id superuser
heartbeat 120
# These are defaults, but can be defined when required:
# hostname localhost
# virtual_host /
# username guest
# password guest
</amqp-broker-connection>
<amqp-consuming-server>
connection_id superuser
site_id Plone
user_id admin
vhm_method_prefix /VirtualHostBase/https/example.com:443/Plone/VirtualHostRoot
</amqp-consuming-server>
c.zamqp cannot be directly called from RestrictedPython, so integrating it to PloneFormGen would need either a custom action adapter or a custom External method to be called from PFG's Python script adapter.

Related

Python-Flask setting up a que if a process is already running

I have a python flask application where a person is gonna request data by sending a query from A form. This data goes through a certain python script that does some api-requests and covert the data into a geographic standard.
The thing is because this data can take some time based on how many datapoints there are, this will happen in the background (we are researching this for Azure). Also there is another problem, and that is cueing up. Because if one request is running, another one cannot be started up. And the last command cannot be saved:
#app.route('/handle_data', methods=['POST'])
def handle_data():
sper_year = int(request.form["Speryear"])
email = request.form["inputEmail"]
url = request.form["api-url-input"]
random_string=get_random_string(5)
# app.route('/request-completed')
Requested_data = Program_Converter.main(url, sper_year,random_string)
Requested_data = Program_Converter.main(url, sper_year,random_string) is the function that needs to be qued.
How do I do this?
I believe that the most recommended way is to run this task asynchronously. Take a look at Celery and pick up a backend (I recommend Redis), with this setup you can provide Celery a task that will run your GIPOD_Converter process in the background and store the result somewhere you chose, then it can be sent back to the user.
Note that Celery will provide you a task id and is up to your client (web interface or mobile app, I'm not sure what you're working with) to poll and endpoint and wait for the celery task to come to an end.
There are a couple of examples on who to implement this all over the web, but take a look at the Flask official documentation and check out this article from the mighty Miguel Grinberg, I believe those are the best starting points for you.

Dramatiq doesn't add tasks to the queue

I'm trying to run some dramatiq actors from my Falcon API method, like this:
def on_post(self, req, resp):
begin_id = int(req.params["begin_id"])
count = int(req.params["count"])
for page_id in range(begin_id, begin_id + count):
process_vk_page.send(f"https://vk.com/id{page_id}")
resp.status = falcon.HTTP_200
My code gets to "send" method, goes through the loop without any problems. But where are no new tasks in the queue! Actor itself is not called, and "default" queue in my broker is empty. If I set custom queue, it is still empty. My actor looks like this:
#dramatiq.actor(broker=broker)
def process_vk_page(link: str):
pass
Where broker is
broker = RabbitmqBroker(url="amqp://guest:guest#rabbitmq:5672")
RabbitMQ logs tell that it is connecting fine
I've done some additional research in debugger. It gets the message (which is meant to be sent to broker) fine, and broker.enqueue in Actor.send_with_options() returns no exceptions, although I can't really get it's internal logic. I don't really know why it fails, but it is definitely RabbitmqBroker.enqueue() which is causing the problem.
Broker is RabbitMQ 3.8.2 on Erlang 22.2.1, running in Docker from rabbitmq Docker Hub image with default settings. Dramatiq version is 1.7.0.
In RabbitMQ logs there are only connections to broker when app starts and disconnections when I turn it off, like this:
2020-01-05 08:25:35.622 [info] <0.594.0> accepting AMQP connection <0.594.0> (172.20.0.1:51242 -> 172.20.0.3:5672)
2020-01-05 08:25:35.627 [info] <0.594.0> connection <0.594.0> (172.20.0.1:51242 -> 172.20.0.3:5672): user 'guest' authenticated and granted access to vhost '/'
2020-01-05 08:28:35.625 [error] <0.597.0> closing AMQP connection <0.597.0> (172.20.0.1:51246 -> 172.20.0.3:5672):
missed heartbeats from client, timeout: 60s
Broker is defined in __init__.py of main package and imported in subpackages. I'm not sure that specifying the same broker instance in decorators of all the functions is fine, but where are nothing in docs which bans it. I guess it doesn't matter, since if I create new broker for each Actor it still doesn't work.
I've tried to set Redis as broker, but I still get the same issue.
What might be the reason for this?
Most likely the issue is that you're not telling the workers which broker to use, since you're not declaring a default broker.
You haven't mentioned how your files are laid out in your application, but, assuming your broker is defined as broker inside tasks.py, then you would have to let your workers know about it like so:
dramatiq tasks:broker
See the examples at the end of dramatiq --help for more information and patterns.

How do I statically configure Celery application differently in production and development?

I want to use Celery to implement a task queue to perform long(ish) running tasks like interacting with external APIs (e.g. Twilio for SMS sending). However, I use different API credentials in production and in development.
I can't figure out how to statically configure Celery (i.e. from the commandline) to pass in the appropriate API credentials. Relatedly, how does my application code (which launches Celery tasks) specify which Celery queue to talk to if there are both development and production queues?
Thanks for any help you can offer.
Avi
EDIT: additional bonus for a working example of how to use the --config option of celery.
The way that I do it is using an environment variable. As a simple example...
# By convention, my configuration files are in a "configs/XXX.ini" file, with
# XXX being the configuration name (e.g., "staging.ini")
config_filename = os.path.join('configs', os.environ['CELERY_CONFIG'] + '.ini')
configuration = read_config_file(config_filename)
# Now you can create the Celery object using your configuration...
celery = Celery('mymodule', broker=configuration['CELERY_BROKER_URL'])
#celery.task
def add_stuff(x, y):
....
You end up running from the command line like so...
export CELERY_CONFIG=staging
celery -A mymodule worker
This question has an example of doing something like this, but they say "how can I do this in a way that is not so ugly?" As far as I'm concerned, this is quite acceptable, and not "ugly" at all.
According to the twelve factor app, you should use environment variables instead of command line parameters.
This is specially true if you are using sensitive information like access credentials because they are visible in the ps output. The other idea (storing credentials in config files) is far from ideal because you should avoid storing sensitive information in the VCS.
That is why many container services and PaaS providers favor this approach: easier instrumentation and automated deployments.
You may want to take a look at Python Deployment Anti-patterns.

Dynamic pages with Django & Celery

I have a Celery task registered in my tasks.py file. When someone POST to /run/pk I run the task with the given parameters. This task also executes other tasks (normal Python functions), and I'd like to update my page (the HttpResponse returned at /run/pk) whenever a subtask finishes its work.
Here is my task:
from celery.decorators import task
#task
def run(project, branch=None):
if branch is None:
branch = project.branch
print 'Creating the virtualenv'
create_virtualenv(project, branch)
print 'Virtualenv created' ##### Here I want to send a signal or something to update my page
runner = runner(project, branch)
print 'Using {0}'.format(runner)
try:
result, output = runner.run()
except Exception as e:
print 'Error: {0}'.format(e)
return False
print 'Finished'
run = Run(project=project, branch=branch,
output=output, **result._asdict())
run.save()
return True
Sending push notifications to the client's browser using Django isn't easy, unfortunately. The simplest implementation is to have the client continuously poll the server for updates, but that increases the amount of work your server has to do by a lot. Here's a better explanation of your different options:
Django Push HTTP Response to users
If you weren't using Django, you'd use websockets for these notifications. However Django isn't built for using websockets. Here is a good explanation of why this is, and some suggestions for how to go about using websockets:
Making moves w/ websockets and python / django ( / twisted? )
With many years past since this question was asked, Channels is a way you could now achieve this using Django.
Then Channels website describes itself as a "project to make Django able to handle more than just plain HTTP requests, including WebSockets and HTTP2, as well as the ability to run code after a response has been sent for things like thumbnailing or background calculation."
There is a service called Pusher that will take care of all the messy parts of Push Notifications in HTML5. They supply a client-side and server-library to handle all the messaging and notifications, while taking care of all the HTML5 Websocket nuances.

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?
The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.
reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.
You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

Categories