Why do my AWS SQS messages not get deleted? - python

I have an AWS SQS queue which receives the messages, iterates through them printing the details and then I attempt to delete them. Unfortunately they are not deleting even though I get a success response. I can't figure out why they are not being removed when I am sure I've used similar code before.
The basic example I'm trying is like this:
import boto3
# Create SQS client
sqs = boto3.client('sqs',
region_name='',
aws_access_key_id='',
aws_secret_access_key=''
)
queue_url = ''
# Receive message from SQS queue
response = sqs.receive_message(
QueueUrl=queue_url,
AttributeNames=[
'All'
],
MaxNumberOfMessages=10,
MessageAttributeNames=[
'All'
],
VisibilityTimeout=0,
WaitTimeSeconds=0
)
print(len(response['Messages']))
for index, message in enumerate(response['Messages']):
print("Index Number: ", index)
print(message)
receipt_handle = message['ReceiptHandle']
# do some function
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=receipt_handle
)

Probably because you are using VisibilityTimeout=0. This means that the message immediately goes back to the SQS queue. So there is nothing to delete for you.

You are setting VisibilityTimeout=0 and WaitTimeSeconds=0 - the message will timeout and become visible again after zero seconds.
This is probably not what you want - you should try with higher values here and read the docs about them: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
You can time out the usual processing time and set the values to safe ones, so that messages will be delivered in case of errors.

Key points to understand regarding SQS:
Visibility timeout
Wait time seconds
Receipt handle
When a message is received by a Lambda function, it gets a 'ReceiptHandle' for that receive.
Now, this ReceiveHandle changes whenever a new lambda thread receives the message making the previous handler invalid.
Since VisibilityTimeout = 0 and WaitTimeSeconds = 0,(as pointed out by #Mandraenke), the message is immediately made available to another lambda thread with a new 'ReceiptHandle'.
The receipt 'ReceiptHandle' being processed by the previous lambda thread becomes invalid and could not be reached by the lambda function processing it.

Related

How to process messages in Kafka once, so a Service when is restarted doesnt process all messages

First time using Kafka, I learning Kafka using a microservice architecture and I am finding the next issue.
Every time I restart my service is processing all the messages in the topics. Is there a way I could only process those messages once, flag them as read or something?
This is my snippet in Pytho 3:
class EmailStreamConsumer:
def __init__(self, bootstrap_servers='localhost:9092'):
self.__bootstrap_servers = bootstrap_servers
self.__new_emails_consumer = KafkaConsumer('NewEmails', bootstrap_servers=bootstrap_servers,
auto_offset_reset='earliest')
self.__sent_emails_consumer = KafkaConsumer('SentEmails', bootstrap_servers=bootstrap_servers,
auto_offset_reset='earliest')
def start(self):
for message in self.__new_emails_consumer:
value = message.value.decode('utf-8')
email = json.loads(value)
self.send_email(email['content'], email['to_email'], email['title'], email['from_email'])
print("%s:%d:%d: key=%s value=%s" % (
message.topic, message.partition, message.offset, message.key, message.value))
I wish that the service sends the emails only once. Even when the service is restarted.
I think your problem is that you don't have a GROUP ID for your Kafka-Consumer
Just add:
String groupId = "kafka-new-emails";
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
Your application will start read from the latest email as your
consumer group labeled where the last commit you read was. Also, if you have more than one consumer and one of them gets down, consumer group will help you in making a rebalance as to make the consumer that is online to read from the partition that was assigned to the consumer that is down.
If consumer acknowleges to Kafka that it has read the message. Then we will not have this problem.
This can be done in 2 ways.
Approach 1 : enable auto commit when once we get the messages.
For this approach we need to add property enable.auto.commit with value true.
Approach 2 : If we need programmatic control we can use commitSync() and commitAsync().

pika add headers to nack response

I am modifying pika headers using
properties.headers = {
'myheader': myheader
}
But I am acking and nacking with the delivery_tag
channel.basic_nack(delivery_tag=delivery_tag, requeue=False)
How can I pass the update properties with the headers to the ack and nack response functions? Or what is the pika way of doing this?
It is correct that basic_nack cannot change the headers.
The way to do this is not to use NACK at all but to generate and return a 'new' message (which is simply the current message you are handling, but adding new headers to it).
It appears that a NACK is basically doing this anyway according to the AMQP spec.
So my logic is using basic_ack on success and message generation with updated headers on failure. And in my case I 'redirect' the new message to a dead letter exchange which a dead letter queue is binded to.
An extract from pika docs say that you misspell basic_nck... is just a question error or is your actual issue?
def basic_nack(self, delivery_tag=None, multiple=False, requeue=True):
"""This method allows a client to reject one or more incoming messages.
It can be used to interrupt and cancel large incoming messages, or
return untreatable messages to their original queue.
:param integer delivery-tag: int/long The server-assigned delivery tag
:param bool multiple: If set to True, the delivery tag is treated as
"up to and including", so that multiple messages
can be acknowledged with a single method. If set
to False, the delivery tag refers to a single
message. If the multiple field is 1, and the
delivery tag is zero, this indicates
acknowledgement of all outstanding messages.
:param bool requeue: If requeue is true, the server will attempt to
requeue the message. If requeue is false or the
requeue attempt fails the messages are discarded or
dead-lettered.
"""
self._raise_if_not_open()
return self._send_method(
spec.Basic.Nack(delivery_tag, multiple, requeue))
Sorry for that but as far as I know is not possible for a basic_nack (or basic_ack) modify the headers. The problem is that the modified message will be put in the dead letter queue and the new one has a newer ID.
L-

kafka-python read from last produced message after a consumer restart

i am using kafka-python to consume messages from a kafka queue (kafka version 0.10.2.0). In particular i am using KafkaConsumer type.
If the consumer stops and after a while it is restarted i would like to restart from the latest produced message, that is drop all the messages produced during the time the consumer was down.
How can i achieve this?
Thanks
Thanks,
it works!
This is a simplified versione of my code:
consumer = KafkaConsumer('mytopic', bootstrap_servers=[server], group_id=group_id, enable_auto_commit=True)
#dummy poll
consumer.poll()
#go to end of the stream
consumer.seek_to_end()
#start iterate
for message in consumer:
print(message)
consumer.close()
The documentation states that the poll() method is incompatible with the iterator interface, which i guess is the the one I use in the loop at the end of my script. However from initial testing, this code looks like to work correctly.
Is it safe to use it? Or did I misunderstood the docuementation?
Thanks
You will not to seekToEnd() to the end of the log.
Keep in mind, that you first need to subscribe to a topic before you can seek. Also, subscribing is lazy. Thus, you will need to add a "dummy poll" before you can seek, too.
consumer.subscribe(...)
consumer.poll() // dummy poll
consumer.seekToEnd()
// now enter your regular poll-loop
In response to your question in your answer:
It is my understanding that when you execute consumer.poll() a dictionary is returned. So, when I wanted to poll for information I used a loop to walk through the dictionary.
consumer = KafkaConsumer('mytopic', bootstrap_servers=[server], group_id=group_id, enable_auto_commit=True)
messages = consumer.poll()
data = []
for msg in messages:
for value in messages[msg]:
#Add just the values to the list
data.append(value[6])
I believe what you are doing is getting the iterator with consumer = KafkaConsumer('mytopic', bootstrap_servers=[server], group_id=group_id, enable_auto_commit=True) and then walking the iterator with
#start iterate
for message in consumer:
print(message)
It doesn't look like you are actually getting just the 500 results from the poll. You can confirm this by adding max_poll_records=5 to your KafkaConsumer configuration. Then when you run the code, if more than 5 messages print out you can tell that you aren't using the poll functionality.
Hope that helps!
Here is a convenient way to have all messages returned by a poll in a list:
while True:
messages = [] # Store all messages
crs = [] # Store all consumer records
tpd = consumer.poll(timeout_ms=60000, max_records=1)
[ crs.extend(tp) for tp in tpd.values() ] # List of cr's
[ messages.extend([json.loads(cr.value)]) for cr in crs ]
print messages

Pika python asynchronous publisher: how to send data from user via console?

I am using the standard asynchronous publisher example. and i noticed that the publisher will keep publishing the same message in a loop forever.
So i commented the schedule_next_message call from publish_message to stop that loop.
But what i really want is for the publissher to start and publish only when a user give it a "message_body" and "Key"
basically publisher to publish the user inputs.
i was not able to fin any examples or hints of how to make the publisher take inputs from user in real time.
I am new to raabitmq, pika, python e.t.c
here is the snippet of code i am talking about :-
def publish_message(self):
"""If the class is not stopping, publish a message to RabbitMQ,
appending a list of deliveries with the message number that was sent.
This list will be used to check for delivery confirmations in the
on_delivery_confirmations method.
Once the message has been sent, schedule another message to be sent.
The main reason I put scheduling in was just so you can get a good idea
of how the process is flowing by slowing down and speeding up the
delivery intervals by changing the PUBLISH_INTERVAL constant in the
class.
"""
if self._stopping:
return
message = {"service":"sendgrid", "sender": "nutshi#gmail.com", "receiver": "nutshi#gmail.com", "subject": "test notification", "text":"sample email"}
routing_key = "email"
properties = pika.BasicProperties(app_id='example-publisher',
content_type='application/json',
headers=message)
self._channel.basic_publish(self.EXCHANGE, routing_key,
json.dumps(message, ensure_ascii=False),
properties)
self._message_number += 1
self._deliveries.append(self._message_number)
LOGGER.info('Published message # %i', self._message_number)
#self.schedule_next_message()
#self.stop()
def schedule_next_message(self):
"""If we are not closing our connection to RabbitMQ, schedule another
message to be delivered in PUBLISH_INTERVAL seconds.
"""
if self._stopping:
return
LOGGER.info('Scheduling next message for %0.1f seconds',
self.PUBLISH_INTERVAL)
self._connection.add_timeout(self.PUBLISH_INTERVAL,
self.publish_message)
def start_publishing(self):
"""This method will enable delivery confirmations and schedule the
first message to be sent to RabbitMQ
"""
LOGGER.info('Issuing consumer related RPC commands')
self.enable_delivery_confirmations()
self.schedule_next_message()
the site does not let me add the solution .. i was able to solve my issue using raw_input()
Thanks
I know I'm a bit late to answer the question but have you looked at this one?
Seems to be a bit more related to what you need than using a full async publisher. Normally you use those with a Python Queue to pass messages between threads.

How to get messages receive count in Amazon SQS using boto library in Python?

I am using boto library in Python to get Amazon SQS messages. In exceptional cases I don't delete messages from queue in order to give a couple of more changes to recover temporary failures. But I don't want to keep receiving failed messages constantly. What I would like to do is either delete messages after receiving more than 3 times or not get message if receive count is more than 3.
What is the most elegant way of doing it?
There are at least a couple of ways of doing this.
When you read a message in boto, you receive a Message object or some subclass thereof. The Message object has an "attributes" field that is a dict containing all message attributes known by SQS. One of the things SQS tracks is the approximate # of times the message has been read. So, you could use this value to determine whether the message should be deleted or not but you would have to be comfortable with the "approximate" nature of the value.
Alternatively, you could record message ID's in some sort of database and increment a count field in the database each time you read the message. This could be done in a simple Python dict if the messages are always being read within a single process or it could be done in something like SimpleDB if you need to record readings across processes.
Hope that helps.
Here's some example code:
>>> import boto.sqs
>>> c = boto.sqs.connect_to_region()
>>> q = c.lookup('myqueue')
>>> messages = c.receive_message(q, num_messages=1, attributes='All')
>>> messages[0].attributes
{u'ApproximateFirstReceiveTimestamp': u'1365474374620',
u'ApproximateReceiveCount': u'2',
u'SenderId': u'419278470775',
u'SentTimestamp': u'1365474360357'}
>>>
Other way could be you can put an extra identifier at the end of the message in your SQS queue. This identifier can keep the count of the number of times the message has been read.
Also if you want that your service should not poll these message again and again then you can create one more queue say "Dead Message Queue" and can transfer then message which has crossed the threshold to this queue.
aws has in-built support for this, just follow the below steps:
create a dead letter queue
enable Redrive policy for the source queue by checking "Use Redrive Policy"
select the dead letter queue you created in step#1 for "Dead Letter Queue"
Set "Maximum Receives" as "3" or any value between 1 and 1000
How it works is, whenever a message is received by the worker, the receive count increments. Once it reaches "Maximum Receives" count, the message is pushed to the dead letter queue. Note, even if you access the message via aws console, the receive count increments.
Source Using Amazon SQS Dead Letter Queues
Get ApproximateReceiveCount attribute from message you read.
move it to another queue(than you can manage error messages) or just delete it.
foreach (var message in response.Messages){
try{
var notifyMessage = JsonConvert.DeserializeObject<NotificationMessage>(message.Body);
Global.Sqs.DeleteMessageFromQ(message.ReceiptHandle);
}
catch (Exception ex){
var receiveMessageCount = int.Parse(message.Attributes["ApproximateReceiveCount"]);
if (receiveMessageCount >3 )
Global.Sqs.DeleteMessageFromQ(message.ReceiptHandle);
}
}
It should be done in few steps.
create SQS connection :-
sqsconnrec = SQSConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
create queue object :-
request_q = sqsconnrec.create_queue("queue_Name")
load the queue messages :-
messages= request_q.get_messages()
now you get the array of message objects and to find the total number of messages :-
just do len(messages)
should work like charm.

Categories