RabbitMQ / Pika - guaranteeing that messages are received in the order created? - python

As a simple example, I'm adding 5 items to a new RabbitMQ(v 2.6.1) queue:
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host='not.my.real.server.net'))
channel = connection.channel()
channel.queue_declare(queue='dw.neil',durable=True)
# add 5 messages to the queue, the numbers 1-5
for x in range(5):
message = x+1
channel.basic_publish(exchange='',routing_key='dw.neil', body=str(message))
print " [x] Sent '%s'" % message
connection.close()
I purge my queue and then run the above code to add the 5 items:
nkodner#hadoop4 sports_load_v2$ python send_5.py
[x] Sent '1'
[x] Sent '2'
[x] Sent '3'
[x] Sent '4'
[x] Sent '5'
Now, I'm trying to simulate failed processing. Given the following code to consume from the queue. Notice that I have the call to basic_ack commented out:
#!/usr/bin/env python
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(host='not.my.real.server.net'))
channel = connection.channel()
channel.queue_declare(queue='dw.neil',durable=True)
method_frame, header_frame, body=channel.basic_get(queue='dw.neil')
print method_frame, header_frame
print "body: %s" % body
#channel.basic_ack(delivery_tag=method_frame.delivery_tag)
connection.close()
I run the receiving code to grab an item off the queue. As I'd expect, I get item #1:
nkodner#hadoop4 sports_load_v2$ python r5.py
<Basic.GetOk(['message_count=9', 'redelivered=False', 'routing_key=dw.neil', 'delivery_tag=1', 'exchange='])>
<BasicProperties([])>
body: 1
Since the call to channel.basic_ack() is commented out, I would expect the unacknowledged message to be placed on the queue so that the next consumer will get it. I would hope that message #1 is the first message (again) out of the queue, with the Redelivered property set to True. Instead, message #2 is received:
nkodner#hadoop4 sports_load_v2$ python r5.py
<Basic.GetOk(['message_count=9', 'redelivered=False', 'routing_key=dw.neil', 'delivery_tag=1', 'exchange='])>
<BasicProperties([])>
body: 2
And all of the other messages in the queue are received before #1 comes back with Redelivered flag set to True:
...
nkodner#hadoop4 sports_load_v2$ python r5.py
<Basic.GetOk(['message_count=9', 'redelivered=False', 'routing_key=dw.neil', 'delivery_tag=1', 'exchange='])>
<BasicProperties([])>
body: 5
nkodner#hadoop4 sports_load_v2$ python r5.py
<Basic.GetOk(['message_count=9', 'redelivered=True', 'routing_key=dw.neil', 'delivery_tag=1', 'exchange='])>
<BasicProperties([])>
body: 1
Are there any properties or options I could be setting so that I keep getting #1 delivered until its acknowledged?
My use-case is loading a data warehouse with sequentially-generated files. We're using message-based processing to let my program know some new files are ready and are to be loaded into the DW. We have to process the files in the order that they're generated.

This has been addressed in RabbitMQ 2.7.0 - we were running 2.6.1.
From the release notes:
New features in this release include:
order preserved of messages re-queued for a consumer

Try using channel.basic_reject - this should push the unacknowledged message back to RabbitMQ which will treat the message as a new message. Also -- if you have a failed message stuck, you can use channel.basic_recover to tell RabbitMQ to redeliver all non-acknowledged messages.
http://pika.github.com/communicating.html#channel.Channel.basic_reject
http://pika.github.com/communicating.html#channel.Channel.basic_recover
http://www.rabbitmq.com/extensions.html#negative-acknowledgements provides distinguishing information on Basic.Reject vs Basic.Nack.
Message ordering semantics are explained at http://www.rabbitmq.com/semantics.html

Related

Kafka consumer receives message if set group_id to None, but it doesn't receive any message if not None?

I have the following Kafka consumer, it works well if assigning the group_id to None - it received all historical messages and my newly tested message.
consumer = KafkaConsumer(
topic,
bootstrap_servers=bootstrap_servers,
auto_offset_reset=auto_offset_reset,
enable_auto_commit=enable_auto_commit,
group_id=group_id,
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
for m in consumer:
However, it doesn't receive anything if I set the group_id to some value. I tried to run the test producer to send new messages and nothing is received.
The consumer console does show the following message:
2020-11-07 00:56:01 INFO ThreadPoolExecutor-0_0 base.py (Re-)joining group my_group
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 base.py Successfully joined group my_group with generation 497
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 subscription_state.py Updated partition assignment: []
2020-11-07 00:56:07 INFO ThreadPoolExecutor-0_0 consumer.py Setting newly assigned partitions set() for group my_group
One partition of a topic can only be consumed by one consumer within the same ConsumerGroup.
If you do not set the group.id, the KafkaConsumer will generate a new, random group.id for you. As this group.id is unique you will see data is being consumed.
If you have multiple consumers running with the identical group.id, only one consumer will read the data whereas the other one stays idle not consuming anything.
I know, this is not the solution to the author's problem. Still, if you landed here you might be having this problem for another reason. Same as I had.
So, at least for kafka-python v2.0.2 and Aiven Kafka broker setup, the problem was solved by adding dry call of consumer.poll().
This is especially weird since this is not required when no group_id is asssigned.
Output from:
def get():
for message in consumer:
print(message.value)
consumer.commit()
Is nothing in this case
While below works as expected. It reads out only new messages from last commit():
Output from:
def get():
consumer.poll()
for message in consumer:
print(message.value)
consumer.commit()
It outputs all messages in this topic since last commit, as expected
JFYI, class constructor looks like this:
consumer = KafkaConsumer(
topics,
bootstrap_servers=self._service_uri,
auto_offset_reset='earliest',
enable_auto_commit=False,
client_id='my_consumer_name',
group_id=self.GROUP_ID,
security_protocol="SSL",
ssl_cafile=self._ca_path,
ssl_certfile=self._cert_path,
ssl_keyfile=self._key_path,
)
¯\_(ツ)_/¯

kafka-python Error when i havent received message since a long period

I use the kafka_python==2.0.0 library ,
With the piece of code below, if I do not receive a message for 1 hour, the next message pushed in the kafka topic are not processed by the consumer, however the loop does not stop.
I would like my listener to run 24/24 without losing the connection
consumer = KafkaConsumer(
os.environ.get('MY_TOPIC'),
bootstrap_servers=broker,
api_version=my_version,
security_protocol='SASL_PLAINTEXT',
sasl_mechanism='GSSAPI',
sasl_kerberos_service_name=service_name,
group_id='MY_GRP_ID',
max_poll_records=1
)
try:
for msg in consumer:
##PROCESS function ...
consumer.commit()
finally:
consumer.close()
I finally use the poll method :
from kafka import KafkaConsumer
# To consume latest messages and auto-commit offsets
consumer = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers=['localhost:9092'])
while True:
# Response format is {TopicPartiton('topic1', 1): [msg1, msg2]}
msg_pack = consumer.poll(timeout_ms=500)
for tp, messages in msg_pack.items():
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
print ("%s:%d:%d: key=%s value=%s" % (tp.topic, tp.partition,
message.offset, message.key,
message.value))
The advantage of my point to this syntax is better visibility on how to retrieve messages. it is not in my example but I could better manage the stopping of the program by loking for a sigterm signal

Binding presence plugin exchange to another exchange produces no messages

I'm using RabbitMQ 3.5.1 with the rabbit_presence_exchange (binary distribution) and rabbitmq_event_exchange (to help debug this issue) plugins and the Python Pika client.
The presence plugin works by giving you a new exchange type: x-presence. Binding a queue to this with a routing key generates presence notifications when the queue is bound and unbound (where the routing key is the username for example). Binding a queue without a routing key signs you up to receive presence notifications.
This is fine, I can successfully generate and receive presence notifications like this. However, now I'd like to route the presence messages through an exchange. Initially, I tried to use a header exchange but I wasn't seeing any message coming though, so I changed to a fanout exchange (in case I'd set up the header matching incorrectly) but I'm still not seeing anything coming through.
This is my script to generate and receive presence messages without the additional exchange (i.e. this is the one that works):
#!/usr/bin/env python3
import pika
import names
MY_NAME = names.get_first_name()
PRESENCE_EXCHANGE = 'presence'
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.exchange_declare(exchange=PRESENCE_EXCHANGE,
exchange_type='x-presence')
result = channel.queue_declare('', exclusive=True)
queue_name = result.method.queue
print('My name is %s and my queue is %r' % (MY_NAME, queue_name))
channel.queue_bind(exchange=PRESENCE_EXCHANGE,
queue=queue_name,
routing_key=MY_NAME)
channel.queue_bind(exchange=PRESENCE_EXCHANGE,
queue=queue_name,
routing_key='')
def on_message(ch, method, properties, body):
print(method, '\n', properties, '\n', body)
exchange = method.exchange
if exchange == PRESENCE_EXCHANGE:
action = properties.headers['action']
who = properties.headers['key']
if action == 'bind':
print(' [+] %s has come online.' % (who,))
elif action == 'unbind':
print(' [-] %s has gone offline.' % (who,))
channel.basic_consume(queue=queue_name,
on_message_callback=on_message,
auto_ack=True)
print(' [*] Waiting for messages. To exit press CTRL+C')
try:
channel.start_consuming()
except KeyboardInterrupt:
pass
finally:
connection.close()
I modified the above to route the presence messages to a built-in fanout exchange and binding my queue to that:
...
print('My name is %s and my queue is %r' % (MY_NAME, queue_name))
channel.queue_bind(exchange=PRESENCE_EXCHANGE,
queue=queue_name,
routing_key=MY_NAME)
channel.exchange_bind(source=PRESENCE_EXCHANGE,
destination='amq.fanout',
routing_key='')
channel.queue_bind(exchange='amq.fanout',
queue=queue_name)
def on_message(ch, method, properties, body):
...
I'm stumped as to why the exchange isn't receiving the messages. Erlang isn't one of my languages so I'm having trouble trying to read the presence plugin's source to determine if this is supported (though I can't see why it wouldn't be).
If anyone has any ideas (or a better way to handle presence with RabbitMQ) I'd love to hear it.
EDIT:
With this code and two clients running, my exchanges and bindings look like this:
Listing exchanges ...
direct
amq.direct direct
amq.fanout fanout
amq.headers headers
amq.match headers
amq.rabbitmq.event topic
amq.rabbitmq.log topic
amq.rabbitmq.trace topic
amq.topic topic
presence x-presence
Listing bindings ...
exchange amq.gen-6aU7qS-ikR4cLmxcT6VKDQ queue amq.gen-6aU7qS-ikR4cLmxcT6VKDQ []
exchange amq.gen-MiyEpW9VIxD49PE9SqATFA queue amq.gen-MiyEpW9VIxD49PE9SqATFA []
amq.fanout exchange amq.gen-6aU7qS-ikR4cLmxcT6VKDQ queue amq.gen-6aU7qS-ikR4cLmxcT6VKDQ []
amq.fanout exchange amq.gen-MiyEpW9VIxD49PE9SqATFA queue amq.gen-MiyEpW9VIxD49PE9SqATFA []
presence exchange amq.fanout exchange []
presence exchange amq.gen-6aU7qS-ikR4cLmxcT6VKDQ queue Sheila []
presence exchange amq.gen-MiyEpW9VIxD49PE9SqATFA queue Joaquin []
I posted this same question to the rabbitmq-users mailing list and got this response:
The plugin implicitly assumes that it works with regular (queue-to-exchange) bindings [3].
RabbitMQ 3.5.x has been out of support for at least a couple of years. Please upgrade [1][2].
https://www.rabbitmq.com/upgrade.html
https://www.rabbitmq.com/changelog.html
https://github.com/rabbitmq/presence-exchange/blob/master/src/presence_exchange.erl#L72

Publishing a message to queue in RabbitPy

I am looking to RabbitPy for consuming and publishing messages. The consumption part is fine, However I want to send back messages to a specific queue after doing something with the inbound message. However I cannot find anyway to do this in the documentation.
https://rabbitpy.readthedocs.io/en/latest/api/message.html
Here is my consume code:
with rabbitpy.Connection('amqp://guest:guest#localhost:5672/%2f') as conn:
with conn.channel() as channel:
queue_read = rabbitpy.Queue(channel, QUEUE_NAME)
queue_write = rabbitpy.Queue(channel, QUEUE_NAME_RESPONSE)
# Exit on CTRL-C
try:
# Consume the message
for body in queue_read:
# do something on inboud...
Here is what it seems I can only set in creating and sending a message:
message = rabbitpy.Message(channel, result)
message.publish(EXCHANGE, RESPONSE_ROUTING_KEY, mandatory=False)
There is nothing related to a queue. How can I send a message to a specific queue with a routing key?
In Pika I would use the following:
channel = cnn.channel()
channel.exchange_declare(exchange=EXCHANGE, durable='True')
channel.exchange_declare(exchange=EXCHANGE, type='direct', durable='True')
channel.queue_declare(queue=QUEUE_NAME_RESPONSE, durable='True')
channel.queue_bind(exchange=EXCHANGE, queue=QUEUE_NAME_RESPONSE)
channel.basic_publish(exchange=EXCHANGE, routing_key=RESPONSE_ROUTING_KEY, body=return_JSON,properties=pika.BasicProperties(content_type='text/plain', delivery_mode=2))
channel.close()
How can I do this with RabbitPy?
I possibly have this by binding the routing key to the exchange and the queue. I have the following:
queue_write.bind(EXCHANGE, routing_key=RESPONSE_ROUTING_KEY, arguments=None)
using EXCHANGE, RESPONSE_ROUTING_KEY the queue is binded.
For example
We have 2 queue, Queue1 and Queue2. 1 exchange as Exchange and routingky as
routingkey1 and routingkey2.
Exchange + routingkey1 ==> Queue1
Exchange + routingkey1 ==> Queue2
Assume that w ehave the above binding.
When your publisher send a message using Exchange + routingkey1 it will be send to Queue 1. If the publisher use routingkey1 message will be sent to Queue2.
You can create the necessary binding using Rabbitmq Management UI or form your code. more details can be found here
In Rabbitpy you can do something like this,
amqp = rabbitpy.AMQP(channel)
amqp.queue_bind(queue='Queuq1', exchange='Exchange', routing_key='RoutingKey1')

How to create a delayed queue in RabbitMQ?

What is the easiest way to create a delay (or parking) queue with Python, Pika and RabbitMQ? I have seen an similar questions, but none for Python.
I find this an useful idea when designing applications, as it allows us to throttle messages that needs to be re-queued again.
There are always the possibility that you will receive more messages than you can handle, maybe the HTTP server is slow, or the database is under too much stress.
I also found it very useful when something went wrong in scenarios where there is a zero tolerance to losing messages, and while re-queuing messages that could not be handled may solve that. It can also cause problems where the message will be queued over and over again. Potentially causing performance issues, and log spam.
I found this extremely useful when developing my applications. As it gives you an alternative to simply re-queuing your messages. This can easily reduce the complexity of your code, and is one of many powerful hidden features in RabbitMQ.
Steps
First we need to set up two basic channels, one for the main queue, and one for the delay queue. In my example at the end, I include a couple of additional flags that are not required, but makes the code more reliable; such as confirm delivery, delivery_mode and durable. You can find more information on these in the RabbitMQ manual.
After we have set up the channels we add a binding to the main channel that we can use to send messages from the delay channel to our main queue.
channel.queue_bind(exchange='amq.direct',
queue='hello')
Next we need to configure our delay channel to forward messages to the main queue once they have expired.
delay_channel.queue_declare(queue='hello_delay', durable=True, arguments={
'x-message-ttl' : 5000,
'x-dead-letter-exchange' : 'amq.direct',
'x-dead-letter-routing-key' : 'hello'
})
x-message-ttl (Message - Time To Live)
This is normally used to automatically remove old messages in the
queue after a specific duration, but by adding two optional arguments we
can change this behaviour, and instead have this parameter determine
in milliseconds how long messages will stay in the delay queue.
x-dead-letter-routing-key
This variable allows us to transfer the message to a different queue
once they have expired, instead of the default behaviour of removing
it completely.
x-dead-letter-exchange
This variable determines which Exchange used to transfer the message from hello_delay to hello queue.
Publishing to the delay queue
When we are done setting up all the basic Pika parameters you simply send a message to the delay queue using basic publish.
delay_channel.basic_publish(exchange='',
routing_key='hello_delay',
body="test",
properties=pika.BasicProperties(delivery_mode=2))
Once you have executed the script you should see the following queues created in your RabbitMQ management module.
Example.
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(
'localhost'))
# Create normal 'Hello World' type channel.
channel = connection.channel()
channel.confirm_delivery()
channel.queue_declare(queue='hello', durable=True)
# We need to bind this channel to an exchange, that will be used to transfer
# messages from our delay queue.
channel.queue_bind(exchange='amq.direct',
queue='hello')
# Create our delay channel.
delay_channel = connection.channel()
delay_channel.confirm_delivery()
# This is where we declare the delay, and routing for our delay channel.
delay_channel.queue_declare(queue='hello_delay', durable=True, arguments={
'x-message-ttl' : 5000, # Delay until the message is transferred in milliseconds.
'x-dead-letter-exchange' : 'amq.direct', # Exchange used to transfer the message from A to B.
'x-dead-letter-routing-key' : 'hello' # Name of the queue we want the message transferred to.
})
delay_channel.basic_publish(exchange='',
routing_key='hello_delay',
body="test",
properties=pika.BasicProperties(delivery_mode=2))
print " [x] Sent"
You can use RabbitMQ official plugin: x-delayed-message .
Firstly, download and copy the ez file into Your_rabbitmq_root_path/plugins
Secondly, enable the plugin (do not need to restart the server):
rabbitmq-plugins enable rabbitmq_delayed_message_exchange
Finally, publish your message with "x-delay" headers like:
headers.put("x-delay", 5000);
Notice:
It does not ensure your message's safety, cause if your message expires just during your rabbitmq-server's downtime, unfortunately the message is lost. So be careful when you use this scheme.
Enjoy it and more info in rabbitmq-delayed-message-exchange
FYI, how to do this in Spring 3.2.x.
<rabbit:queue name="delayQueue" durable="true" queue-arguments="delayQueueArguments"/>
<rabbit:queue-arguments id="delayQueueArguments">
<entry key="x-message-ttl">
<value type="java.lang.Long">10000</value>
</entry>
<entry key="x-dead-letter-exchange" value="finalDestinationTopic"/>
<entry key="x-dead-letter-routing-key" value="finalDestinationQueue"/>
</rabbit:queue-arguments>
<rabbit:fanout-exchange name="finalDestinationTopic">
<rabbit:bindings>
<rabbit:binding queue="finalDestinationQueue"/>
</rabbit:bindings>
</rabbit:fanout-exchange>
NodeJS implementation.
Everything is pretty clear from the code.
Hope it will save somebody's time.
var ch = channel;
ch.assertExchange("my_intermediate_exchange", 'fanout', {durable: false});
ch.assertExchange("my_final_delayed_exchange", 'fanout', {durable: false});
// setup intermediate queue which will never be listened.
// all messages are TTLed so when they are "dead", they come to another exchange
ch.assertQueue("my_intermediate_queue", {
deadLetterExchange: "my_final_delayed_exchange",
messageTtl: 5000, // 5sec
}, function (err, q) {
ch.bindQueue(q.queue, "my_intermediate_exchange", '');
});
ch.assertQueue("my_final_delayed_queue", {}, function (err, q) {
ch.bindQueue(q.queue, "my_final_delayed_exchange", '');
ch.consume(q.queue, function (msg) {
console.log("delayed - [x] %s", msg.content.toString());
}, {noAck: true});
});
Message in Rabbit queue can be delayed in 2 ways
- using QUEUE TTL
- using Message TTL
If all messages in queue are to be delayed for fixed time use queue TTL.
If each message has to be delayed by varied time use Message TTL.
I have explained it using python3 and pika module.
pika BasicProperties argument 'expiration' in milliseconds has to be set to delay message in delay queue.
After setting expiration time, publish message to a delayed_queue ("not actual queue where consumers are waiting to consume") , once message in delayed_queue expires, message will be routed to a actual queue using exchange 'amq.direct'
def delay_publish(self, messages, queue, headers=None, expiration=0):
"""
Connect to RabbitMQ and publish messages to the queue
Args:
queue (string): queue name
messages (list or single item): messages to publish to rabbit queue
expiration(int): TTL in milliseconds for message
"""
delay_queue = "".join([queue, "_delay"])
logging.info('Publishing To Queue: {queue}'.format(queue=delay_queue))
logging.info('Connecting to RabbitMQ: {host}'.format(
host=self.rabbit_host))
credentials = pika.PlainCredentials(
RABBIT_MQ_USER, RABBIT_MQ_PASS)
parameters = pika.ConnectionParameters(
rabbit_host, RABBIT_MQ_PORT,
RABBIT_MQ_VHOST, credentials, heartbeat_interval=0)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
channel.queue_declare(queue=queue, durable=True)
channel.queue_bind(exchange='amq.direct',
queue=queue)
delay_channel = connection.channel()
delay_channel.queue_declare(queue=delay_queue, durable=True,
arguments={
'x-dead-letter-exchange': 'amq.direct',
'x-dead-letter-routing-key': queue
})
properties = pika.BasicProperties(
delivery_mode=2, headers=headers, expiration=str(expiration))
if type(messages) not in (list, tuple):
messages = [messages]
try:
for message in messages:
try:
json_data = json.dumps(message)
except Exception as err:
logging.error(
'Error Jsonify Payload: {err}, {payload}'.format(
err=err, payload=repr(message)), exc_info=True
)
if (type(message) is dict) and ('data' in message):
message['data'] = {}
message['error'] = 'Payload Invalid For JSON'
json_data = json.dumps(message)
else:
raise
try:
delay_channel.basic_publish(
exchange='', routing_key=delay_queue,
body=json_data, properties=properties)
except Exception as err:
logging.error(
'Error Publishing Data: {err}, {payload}'.format(
err=err, payload=json_data), exc_info=True
)
raise
except Exception:
raise
finally:
logging.info(
'Done Publishing. Closing Connection to {queue}'.format(
queue=delay_queue
)
)
connection.close()
Depends on your scenario and needs, I would recommend the following approaches,
Using the official plugin, https://www.rabbitmq.com/blog/2015/04/16/scheduling-messages-with-rabbitmq/, but it will have a capacity issue if the total count of delayed messages exceeds certain number (https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/issues/72), it will not have the high availability option and it will suffer lose of data when it runs out of delayed time during a MQ restart.
Implement a set of cascading delayed queues just like NServiceBus did (https://docs.particular.net/transports/rabbitmq/delayed-delivery).

Categories