I am using txamqp python library to connect to an AMQP broker (RabbitMQ) and i have a consumer with the following callback:
#defer.inlineCallbacks
def message_callback(self, message, queue, chan):
"""This callback is a queue listener
it is called whenever a message was consumed from queue
c.f. test_amqp.ConsumeTestCase for use cases
"""
# The callback should be redefined here to keep getting further messages from queue
queue.get().addCallback(self.message_callback, queue, chan).addErrback(self.message_errback)
print " [x] Received a valid message: [%r]" % (message.content.body,)
yield self.smpp.sendDataRequest(SubmitSmPDU)
# ACK the message in queue, this will remove it from the queue
chan.basic_ack(message.delivery_tag)
When "ack"ing a message, it will be deleted (to confirm ?) from the queue, but what happens when the message is not "ack"ed ? i need to get a "retry" mechanism where i can postpone the message to be callbacked again later on and to keep track of how much retries did it take.
And how can i list/delete messages from a queue ?
RabbitMQ has a nice management plugin, however that doesn't even allow one to delete messages from queues.
You basically would have to write your own application, or figure out which of these 3rd party management applications can delete messaages.
It's resolved, in order to retry a message from the queue, it is necessary to reject the message with 'retry' flag, it will be enqueued back to the queue.
If i reject it with a timer (callLater in twisted), the message enqueuing will be postponed for whatever time i want.
Related
We have a Kafka consumer which will read messages and do so stuff and again publish to Kafka topic using below script
producer config :
{
"bootstrap.servers": "localhost:9092"
}
I haven't configured any other configuration like queue.buffering.max.messages queue.buffering.max.ms batch.num.messages
I am assuming these all will be going to be default values from configuration
queue.buffering.max.messages : 100000
queue.buffering.max.ms : 0
batch.num.messages : 10000
my understanding : When internal queue reaches either of queue.buffering.max.ms or batch.num.messages messages will get published to Kafka in separate thread. in my configuration queue.buffering.max.ms is 0, so every message will be published as soon as when I call produce(). correct me if I am wrong.
My producer snippet:
def send(topic, message):
p.produce(topic, json.dumps(message), callback=delivery_callback(err, msg))
p.flush()
from this post i understand that using flush after every message, producer is going to be sync producer . if I use above script it is taking ~ 45ms to publish to Kafka
If I change above snippet to
def send(topic, message):
p.produce(topic, json.dumps(message), callback=delivery_callback(err, msg))
p.poll(0)
Is there any performance will be improved ? Can you clarify my understanding.
Thanks
The difference between flush() and poll() is explained in the client's documentation.
For flush(), it states:
Wait for all messages in the Producer queue to be delivered. This is a
convenience method that calls poll() until len() is zero or the
optional timeout elapses.
For poll():
Polls the producer for events and calls the corresponding callbacks
(if registered).
Calling poll() just after a send() does not make the producer synchronous as it's unlikely the message just sent will already have reached the broker and a delivery report was already sent back to the client.
Instead flush() will block until the previously sent messages have been delivered (or errored), effectively making the producer synchronous.
I am writing a Consumer that need to consume from two different queues.
1-> for the actual messages(queue declared before hand).
2-> for command messages to control the behavior of the consumer(dynamically declared by the consumer and binds to an existing exchange with a routing key in a specific format(need one for each instance of consumer running))
I am using selection connection to consume async'ly.
self.channel.basic_qos(prefetch_count = self.prefetch_count)
log.info("Establishing channel with the Queue: "+self.commandQueue)
print "declaring command queue"
self.channel.queue_declare(queue=self.commandQueue,
durable = True,
exclusive=False,
auto_delete=True,
callback = self.on_command_queue_declared)
The queue is not being declared or the callback is not getting called.
On the other hand the messages from the actual message Queue are not being consumed since i added this block of code.
Pika logs do not show any errors nor the consumer app crashes.
does anybody know why this is happening or is there a better way to do this?
Have you looked at the example here: http://pika.readthedocs.org/en/latest/examples/asynchronous_consumer_example.html ?
And some blocking examples:
http://pika.readthedocs.org/en/latest/examples/blocking_consume.html
http://pika.readthedocs.org/en/latest/examples/blocking_consumer_generator.html
Blocking and Select connection comparison: http://pika.readthedocs.org/en/latest/examples/comparing_publishing_sync_async.html
Blocking and Select connections in pika 0.10.0 pre-release are faster and there are a number of bug fixes in that version.
I use rabbitmq in Python via amqplib. I try to use AMQP for something more than just a queue, if that's possible - searching messages by ID, modifying them before dequeing, deleting from queue before dequeing. Those things are used to store/update a real users queue for a balancer, and that queue could be updated asynchronously by changing real user' state (for example, user is dead - his AMQP message must be deleted, or user changed it's state - and every such a change must be reflected in users' AMQP queue, in appropriate user's AMQP message) , and before the real dequeuing of a message happens.
My questions are the following :
Is there a way through amqplib to modify AMQP message body in
some queueN before it would be dequed , searching it by some ID in
it's header? I mean - i want to modify message body before
dispatching it by receiver.
Is there a way for a worker to pop
excactly 5 (any number) last messages from queueN via amqplib?
Can i asynchronously delete message from a queueN before it would be
dequed, and it's neighbors would take it's place in the queueN?
Which is the way for a message ID1 from queueN - to get it's real
current queue position, counted from the beginning of the queueN?
Does AMQP stores/updates for any message it's real queue position?
Thanks in advance.
UPDATE: according to rabbitmq documentation, there are problem with such a random access to messages in AMQP queue. Please advise another proper decision of a queue in Python, which supports fast asynchronous access to it's elements- searching a message by it's body, updating/deleting queue messages and getting fast queue index for any queue message. We tried deque + additional dict with user_info, but in this case we need to lock this deque+dict on each update, to avoid race conditions. Main purpose - is to serve a load balancer's queue and get rid of blocking when counting changes in queue.
What you're describing sounds like a pretty typical middleware pipeline. While that achieves the same effect of modifying messages before they are delivered to their intended consumer, it doesn't work by accessing queues.
The basic idea is that all messages first go into a special queue where they are delivered to the middleware. Th middleware then composes a new message, based on the one it just received, and the publishes that to the intended recipient's queue
I am wanting to create a RabbitMQ receiver/consumer in Python and am not sure how to check for messages. I am trying to do this in my own loop, not using the call-backs in pika.
If I understand things, in the Java client I can use getBasic() to check to see if there are any messages available without blocking. I don't mind blocking while getting messages, but I don't want to block until there is a message.
I don't find any clear examples and haven't yet figured out the corresponding call in pika.
If you want to do it synchronously then you will need to look at the pika BlockingConnection
The BlockingConnection creates a layer on top of Pika’s asynchronous
core providng methods that will block until their expected response
has returned. Due to the asynchronous nature of the Basic.Deliver and
Basic.Return calls from RabbitMQ to your application, you are still
required to implement continuation-passing style asynchronous methods
if you’d like to receive messages from RabbitMQ using basic_consume or
if you want to be notified of a delivery failure when using
basic_publish.
More info and an example here
https://pika.readthedocs.org/en/0.9.12/connecting.html#blockingconnection
You can periodically check the queue size using the example of this answer Get Queue Size in Pika (AMQP Python)
Queue processing loop can be done iteratively with the help of process_data_events():
import pika
# A stubborn callback that still wants to be in the code.
def mq_callback(ch, method, properties, body):
print(" Received: %r" % body)
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()
queue_state = channel.queue_declare(queue="test")
# Configure a callback.
channel.basic_consume(mq_callback, queue="test")
try:
# My own loop here:
while(True):
# Do other processing
# Process message queue events, returning as soon as possible.
# Issues mq_callback() when applicable.
connection.process_data_events(time_limit=0)
finally:
connection.close()
can both consuming and publishing be done in one Python thread using RabbitMQ channels?
Actually this isn't a problem at all and you can do it quite easily with for example pika the problem is however that you'd have to stop the consuming since it's a blocking loop or do the producing during the consume of a message.
Consuming and producing is a normal usecase, especially in pika since it isn't threadsafe, when for example you'd want to implement some form of filter on the messages, or, perhaps a smart router, which in turn will pass on the messages to another queue.
I don't think you should want to. MQ means asynch processing. Doing both consuming and producing in the same thread defeats the purpose in my opinion.
I'd recommend taking a look at Celery (http://celery.readthedocs.org/en/latest/) to manage worker tasks. With that, you won't need to integrate with RMQ directly as it will handle the the producing and consuming for you.
But, if you do desire to integrate with RMQ directly and manage your own workers, check out Kombu (http://kombu.readthedocs.org/en/latest/) for the integration. There are non-blocking consumers and producers that would permit you to have both in the same event loop.
I think the simple answer to your question is yes. But it depends on what you want to do. My guess is you have a loop that is consuming from your thread on one channel and after some (small or large) processing it decides to send it on to another queue (or exchange) on a different channel then I do not see any problem with that at all. Though it might be preferable to dispatch it to a different thread it is not necessary.
If you give more details about your process then it might help give a more specific answer.
Kombu is a common python library for working with RabbitMQ (Celery uses it under the hood). It is worth pointing out here that the answer to your question for the simplest use of Kombu that I tried is "No - you can't receive and publish on the same consumer callback thread."
Specifically if there are several messages in the queue for a consumer that has registered a callback for that topic and that callback does some processing and publishes the results then the publishing of the result will cause the 2nd message in the queue to hit the callback before it has returned from the publish from 1st message - so you end up with a recursive call to the callback. If you have n message on the queue your call stack will end up n message deep before it unwinds. Obviously that explodes pretty quickly.
One solution (not necessarily the best) is to have the callback just post the message into a simple queue internal to the consumer that could be processed on the main process thread (i.e. off the callback thread)
def process_message(self, body: str, message: Message):
# Queue the message for processing off this thread:
print("Start process_message ----------------")
self.do_process_message(body, message) if self.publish_on_callback else self.queue.put((body, message))#
print("End process_message ------------------")
def do_process_message(self, body: str, message: Message):
# Deserialize and "Process" the message:
print(f"Process message: {body}")
# ... msg processing code...
# Publish a processing output:
processing_output = self.get_processing_output()
print(f"Publishing processing output: {processing_output}")
self.rabbit_msg_transport.publish(Topics.ProcessingOutputs, processing_output)
# Acknowledge the message:
message.ack()
def run_message_loop(self):
while True:
print("Waiting for incoming message")
self.rabbit_connection.drain_events()
while not self.queue.empty():
body, message = self.queue.get(block=False)
self.do_process_message(body, message)
In this snippet above process_message is the callback. If publish_on_callback is True you'll see recursion in the callback n deep for n message on rabbit queue. If publish_on_callback is False it runs correctly without recursion in the callback.
Another approach is to use a second Connection for the Producer Exchange - separate from the Connection used for the Consumer. This also works so that callback from consuming a message and publishing the result completes before the callback is again fired for the next message on queue.