integration testing rabbitmq consumer

integration testing rabbitmq consumer - python

I have a simple consumer written in python using pika and rabbitmq.
The consumer connects to rabbitmq and listens on a queue. When a message arrives it transforms the message and publishes it on another queue.
Presented here: https://bitbucket.org/snippets/fbanke/8e7zbX
I would like to make test cases to test the interaction between the consumer and the queue. For example, I would like to make sure that when a message is consumed the "basic_ack" function is called to let rabbitmq know that the message was processed.
Another test case is if the consumer reconnects to rabbitmq if the connection gets dropped.
And so on. I want to test the interaction between the consumer and the queue not the actual business logic in the consumer.
If I mock the pika objects it requires me to understand 100% how the API behaves and any misunderstanding of the API will result in faulty code. Code that passes the tests, but actually doesn't work.
I would rather test the consumer using a live queue, and manipulate it from the test to see if the consumer behaves as expected.
For example
1. setup the queue
2. start the consumer
3. publish a valid message to the queue
4. assert that the message was consumed by the worker
or
setup the queue
start the consumer
kill the queue
assert that the worker terminated as expected
Does there exist any best-practice on how to do this? I can find many examples of similar tests on databases, but not for queues. It seems that I need to start the consumer in a separate thread and work with it, but it seems that there is no infrastructure to support this.

Related

Understanding celery worker nodes

I am trying to understand the working of celery and AMQP here.
My scenario
I install celery in my machine
pip install celery
I make tasks using
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://')
#app.task
def print_hello():
print 'hello there'
As far as I understood, celery converts this task to message and send to brokers(redis or rabbitmq) via AMQP protocol. Then these messages are queued and delivered to worker nodes to process the message.
My questions are,
Suppose I created task in a Java environment and if the message is sent to a external worker node, does it mean the worker node server must have Java installed in it to execute the task ?
If the message is picked by external worker node, how does worker node and broker find each other ? In the above code I only have the broker address to store tasks.
Also Why are we storing the tasks in a broker ? Why couldn't we implement exchange algorithm in celery and send the message direct to workers ?
What is the difference between SOAP and AMQP ?

The workers need not only Python, but all the code for the tasks you want to run on them.
But you don't address the nodes specifically, that is precisely why there is a broker. You put your tasks on the queue, and the workers pick them up.
I have no idea why you've mentioned SOAP in this context. It has nothing whatsoever to do with anything.

The specific answers to your questions are:
"if the message is sent to a external worker node" is slightly misleading. A message is not sent to a worker node per se. It is sent to the Broker (identified by a URL) and specifically an Exchange on that broker with a Routing Key which sees it landing in a Queue. Workers are all configured with the same Broker URL and read this Queue, and it's very much a case of [first-in-best-dressed][1], the first Worker to consume the message (to read a message in an AMQP it is removed from the Queue in one atomic operation). The [messages][2] are language independent. The Workers however are written in Python and the task definition must be in Python, though the Python task definition can of course call out to any other library by whatever means to execute the task. But in a sense yes, whatever run time libraries your task needs in order to run it needs to have on the same machine as the Worker, and they must have a Python wrapper around them so the Worker can load them.
"If the message is picked by external worker node, how does worker node and broker find each other?" - This question is misleading. They don't find each other. The Worker is configured with the exact same Broker URL as the Client is. It has know the URL. The way Celery typically solves this in Python is that the code snippet you shared is loaded by both the Client, and the Worker. This is in fact one of the beauties of Celery. That you write you tasks in Python and you load the definitions in the Worker unaltered. They thus use the same Broker, and have the same Task defined. The #app.task actually creates a Task class instance which has two very important methods: apply_async() which is what creates and sends the message requesting the task, and run() which runs the decorated function. The former is called int he Client. The latter by the Worker (to actually run the task).
"Why are we storing the tasks in a broker?" -Tasks are not stored in a broker. The task is defined in a python file like your code snippet. As described in 2. The same definition is read by both Client and Worker. A messages is sent from Client to Worker asking it to run the task.
"Why couldn't we implement exchange algorithm in celery and send the message direct to workers?" - I'll have to take a guess here, but I would ask, Why reinvent the wheel? There is a standard defined, AMQP (the Advanced Message Queueing Protocol), and there are a number of implementations of that standard. Why write yet another one? Celery is FOSS, and like so much FOSS I imagine the people who started writing it wanted to focus on task management not message management and chose to lean on AMQP for message management. A fair choice. But for what it's worth Celery does implement quite a lot in Kombu, to provide a Python API to AMQP.
SOAP (abbreviation for Simple Object Access Protocol) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks.
AMQP (abbreviation for Advanced Message Queuing Protocol) is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security.
SOAP is typically much higher level int the protocol stack. Described here:
https://www.amqp.org/product/different

how to Asynchronously consume from 2 RabbitMQ queues from one consumer using pika

I am writing a Consumer that need to consume from two different queues.
1-> for the actual messages(queue declared before hand).
2-> for command messages to control the behavior of the consumer(dynamically declared by the consumer and binds to an existing exchange with a routing key in a specific format(need one for each instance of consumer running))
I am using selection connection to consume async'ly.
self.channel.basic_qos(prefetch_count = self.prefetch_count)
log.info("Establishing channel with the Queue: "+self.commandQueue)
print "declaring command queue"
self.channel.queue_declare(queue=self.commandQueue,
durable = True,
exclusive=False,
auto_delete=True,
callback = self.on_command_queue_declared)
The queue is not being declared or the callback is not getting called.
On the other hand the messages from the actual message Queue are not being consumed since i added this block of code.
Pika logs do not show any errors nor the consumer app crashes.
does anybody know why this is happening or is there a better way to do this?

Have you looked at the example here: http://pika.readthedocs.org/en/latest/examples/asynchronous_consumer_example.html ?
And some blocking examples:
http://pika.readthedocs.org/en/latest/examples/blocking_consume.html
http://pika.readthedocs.org/en/latest/examples/blocking_consumer_generator.html
Blocking and Select connection comparison: http://pika.readthedocs.org/en/latest/examples/comparing_publishing_sync_async.html
Blocking and Select connections in pika 0.10.0 pre-release are faster and there are a number of bug fixes in that version.

RabbitMQ: can both consuming and publishing be done in one thread?

can both consuming and publishing be done in one Python thread using RabbitMQ channels?

Actually this isn't a problem at all and you can do it quite easily with for example pika the problem is however that you'd have to stop the consuming since it's a blocking loop or do the producing during the consume of a message.
Consuming and producing is a normal usecase, especially in pika since it isn't threadsafe, when for example you'd want to implement some form of filter on the messages, or, perhaps a smart router, which in turn will pass on the messages to another queue.

I don't think you should want to. MQ means asynch processing. Doing both consuming and producing in the same thread defeats the purpose in my opinion.

I'd recommend taking a look at Celery (http://celery.readthedocs.org/en/latest/) to manage worker tasks. With that, you won't need to integrate with RMQ directly as it will handle the the producing and consuming for you.
But, if you do desire to integrate with RMQ directly and manage your own workers, check out Kombu (http://kombu.readthedocs.org/en/latest/) for the integration. There are non-blocking consumers and producers that would permit you to have both in the same event loop.

I think the simple answer to your question is yes. But it depends on what you want to do. My guess is you have a loop that is consuming from your thread on one channel and after some (small or large) processing it decides to send it on to another queue (or exchange) on a different channel then I do not see any problem with that at all. Though it might be preferable to dispatch it to a different thread it is not necessary.
If you give more details about your process then it might help give a more specific answer.

Kombu is a common python library for working with RabbitMQ (Celery uses it under the hood). It is worth pointing out here that the answer to your question for the simplest use of Kombu that I tried is "No - you can't receive and publish on the same consumer callback thread."
Specifically if there are several messages in the queue for a consumer that has registered a callback for that topic and that callback does some processing and publishes the results then the publishing of the result will cause the 2nd message in the queue to hit the callback before it has returned from the publish from 1st message - so you end up with a recursive call to the callback. If you have n message on the queue your call stack will end up n message deep before it unwinds. Obviously that explodes pretty quickly.
One solution (not necessarily the best) is to have the callback just post the message into a simple queue internal to the consumer that could be processed on the main process thread (i.e. off the callback thread)
def process_message(self, body: str, message: Message):
# Queue the message for processing off this thread:
print("Start process_message ----------------")
self.do_process_message(body, message) if self.publish_on_callback else self.queue.put((body, message))#
print("End process_message ------------------")
def do_process_message(self, body: str, message: Message):
# Deserialize and "Process" the message:
print(f"Process message: {body}")
# ... msg processing code...
# Publish a processing output:
processing_output = self.get_processing_output()
print(f"Publishing processing output: {processing_output}")
self.rabbit_msg_transport.publish(Topics.ProcessingOutputs, processing_output)
# Acknowledge the message:
message.ack()
def run_message_loop(self):
while True:
print("Waiting for incoming message")
self.rabbit_connection.drain_events()
while not self.queue.empty():
body, message = self.queue.get(block=False)
self.do_process_message(body, message)
In this snippet above process_message is the callback. If publish_on_callback is True you'll see recursion in the callback n deep for n message on rabbit queue. If publish_on_callback is False it runs correctly without recursion in the callback.
Another approach is to use a second Connection for the Producer Exchange - separate from the Connection used for the Consumer. This also works so that callback from consuming a message and publishing the result completes before the callback is again fired for the next message on queue.

Can we create queue in rabbitmq with python

I'm working on project that need to control sending queue by code. So I just curious that anybody use to create queue in rabbitmq by python/django code? :)

Usual python clients should do from django (but beware, you may need to block the request when you're running AMQP commands). Take a look at rabbitmq tutorials
http://www.rabbitmq.com/getstarted.html
https://github.com/rabbitmq/rabbitmq-tutorials
There are at least three python clients: python-amqplib, pika and puka.
Also, you may find www.celeryproject.org useful.

In AMQP, you don't create a queue. Instead, you declare a queue, and if the queue doesn't already exist, then it is created.
In some cases all you need to do is to declare the queue in the processes that consume messages. But if you want persistent and durable queues then it is best to declare them beforehand with a shell script, or in the message publisher. Even if the message publisher does not do anything with the queue, it can still declare it to ensure that messages from the exchange are never dropped.

What's the best pattern to design an asynchronous RPC application using Python, Pika and AMQP?

The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.

As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.

Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.

Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.