Python + Azure Storage Queue receive_messages()

Python + Azure Storage Queue receive_messages() - python

I'm using an azure queue storage to get blob-paths for an Azure Function to access a blob on the same storage account. (It turns out I've more or less manually created a blob storage Azure Function).
I'm using the QueueClient class to get the messages from the queue and there are two methods:
Azure Python Documentation
receive_messages(**kwargs)
peek_messages(max_messages=None, **kwargs)
I would like to be able to scale this function horizontally, so each time it's triggered (I've set it up as an HTTP function being triggered from an Azure Logic App) it grabs the FIRST message in the queue and only the first, and once retrieved deletes said message.
My problem is that peek does not make it invisible or return a pop_receipt for deletion later. And receive does not have a parameter for max_messages so that I can take one and only one message.
Does anyone have any knowledge of how to get around this roadblock?

You can try receiving messages in a batch by passing messages_per_page argument to receive_messages. From this link:
# Receive messages by batch
messages = queue.receive_messages(messages_per_page=5)
for msg_batch in messages.by_page():
for msg in msg_batch:
print(msg.content)
queue.delete_message(msg)

#Robert,
To fetch only one message from a queue you can use the code below:
pages = queue.receive_messages(visibility_timeout=30, messages_per_page=1).by_page()
page = next(pages)
msg = next(page)
print(msg)
The documentation of the receive_messages() is wrong.
Please see this for more information.

Related

Telegram Telethon: Sharing media downloads across multiple different clients

we tried to use 1 telegram client to continuously streaming messages from a list of channels, and then produce the messages to kafka. We then have a 2nd telegram client to consume the messages and download the associated media (photos/videos) using client.download_media(). Our issue is that this only works if client 1 and 2 are the same, but not when they are different accounts. We are not sure if this has to do with the session files or access hash, or maybe something else?
Is support for our use case possible? The main thing we are trying to address is that the async media download could result in a large backlog, and the backlog may go away if our server dies. That's why we wanted to put the messages into kafka for short term storage in the first place. Would also appreciate if you have better suggestions.
this is producer side
async with client:
messages = client.iter_messages(channel_id, limit=10)
async for message in messages:
print(message)
if message.media is not None:
# orig_media = message.media
# converted_media = BinaryReader(bytes(orig_media)).tgread_object()
# print('orig, media', orig_media)
# print('converted media', converted_media)
message_bytes = bytes(message) #convert to bytes
producer.produce(topic, message_bytes)
this is consumer side with a different client
with self._client:
#telethon.errors.rpcerrorlist.FileReferenceExpiredError: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)
try:
self._client.loop.run_until_complete(self._client.download_media(orig_media, in_memory))
except Exception as e:
print(e)

Media files (among many other things in Telegram) contain an access_hash. While Account-A and Account-B will both see media with ID 1234, Account-A may have a hash of 5678 and Account-B may have a hash of 8765.
This is a roundabout way of saying that every account will see an access_hash that is only valid within that account. If that same hash is attempted to be used by a different account, it will fail, because that other account needs its own hash.
There is no way to bypass this, other than giving actual access to the right media files (or whatever it is) so that it can obtain its own hash.

Get the size of a single message in Google Cloud PubSub

I have a setup where I am publishing messages to Google Cloud PubSub service.
I wish to get the size of each individual message that I am publishing to PubSub. So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):
View the message count from the Google Cloud Console using the 'Monitoring' feature
Create a pull subscription client and view the size using message.size in the callback function for the messages that are being pulled from the requested topic.
Estimate the size of the messages before publishing by converting them to JSON as per the PubSub message schema and using sys.getsizeof()
For a sample message like as follows which I published using a Python publisher client:
{
"data": 'Test_message',
"attributes": {
'dummyField1': 'dummyFieldValue1',
'dummyField2': 'dummyFieldValue2'
}
}
, I get the size as 101 as the message.size output from the following callback function in the subcription client:
def callback(message):
print(f"Received {message.data}.")
if message.attributes:
print("Attributes:")
for key in message.attributes:
value = message.attributes.get(key)
print(f"{key}: {value}")
print(message.size)
message.ack()
Whereas the size displayed on Cloud Console Monitoring is something around 79 B.
So these are my questions:
Why are the sizes different for the same message?
Is the output of message.size in bytes?
How do I view the size of a message before publishing using the python client?
How do I view the size of a single message on the Cloud Console, rather than a aggregated measure of size during a given timeframe which I could find in the Monitoring section?

In order to further contribute to the community, I am summarising our discussion as an answer.
Regarding message.size, it is an attribute from a message in the subscriber client. In addition, according to the documentation, its definition is:
Returns the size of the underlying message, in bytes
Thus you would not be able to use it before publishing.
On the opposite side, message_size is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here.
Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier. For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost. Here are some links, where you can find more about it: Quota utilisation, Alert event based, Alert Policies.

Regarding your third question about viewing the message size before publishing, the billable message size is the sum of the message data, the attributes (key plus value), 20 bytes for the timestamp, and some bytes for the message_id. See the Cloud Pub/Sub Pricing guide. Note that the minimum of 1000 bytes is billable regardless of message size so if your messages may be smaller than 1000 bytes it’s important to have good batch settings. The message_id is assigned server-side and is not guaranteed to be a certain size but it is returned by the publish call as a future so you can see examples. This should allow you to get a pretty accurate estimate of message cost within the publisher client. Note that you can also use the monitoring client library to read Cloud Monitoring metrics from within the Python client.
Regarding your fourth question, there’s no way to extract single data points from a distribution metric (Unless you have only published one message during the time period in the query in which case the mean would tell you the size of that one message).

Getting 400 error when trying to send an azure service bus message with broker properties

I have small python app that sends information to my azure service bus. I've noticed that each message has "broker_properties" dictionary and there is a property named "Label" that I can access later on from the service bus.
I am trying to send my message populating that property:
properties = {"Label":label}
msg = Message(bytes(messagebody, "utf-8"), bus_service, broker_properties=properties)
bus_service.send_queue_message("queue", msg)
But this doesn't seem to work. When the command above is executed I get back an error from Azure:
The value '{'Label': 'testtest'}' of the HTTP header 'BrokerProperties' is invalid.
Is this a bug in Python Azure SDK or am I doing something wrong?

According to your code, the issue was caused by using a Python dict object as the value of the broker_properties, but the broker_properties value should be a json string. Please refer to the test code in Azure SDK for Python on GitHub.
So please modified your code as below.
properties = '{"Label": "%s"}' % label
Or
import json
properties = json.dumps({"Label":label})

Why does search in gmail API return different result than search in gmail website?

I'm using the gmail API to search emails from users. I've created the following search query:
ticket after:2015/11/04 AND -from:me AND -in:trash
When I run this query in the browser interface of Gmail I get 11 messages (as expected). When I run the same query in the API however, I get only 10 messages. The code I use to query the gmail API is written in Python and looks like this:
searchQuery = 'ticket after:2015/11/04 AND -from:me AND -in:trash'
messagesObj = google.get('/gmail/v1/users/me/messages', data={'q': searchQuery}, token=token).data
print messagesObj.resultSizeEstimate # 10
I sent the same message on to another gmail address and tested it from that email address and (to my surprise) it does show up in an API-search with that other email address, so the trouble is not the email itself.
After endlessly emailing around through various test-gmail accounts I *think (but not 100% sure) that the browser-interface search function has a different definition of "me". It seems that in the API-search it does not include emails which come from email addresses with the same name while these results are in fact included in the result of the browser-search. For example: if "Pete Kramer" sends an email from petekramer#icloud.com to pete#gmail.com (which both have their name set to "Pete Kramer") it will show in the browser-search and it will NOT show in the API-search.
Can anybody confirm that this is the problem? And if so, is there a way to circumvent this to get the same results as the browser-search returns? Or does anybody else know why the results from the gmail browser-search differ from the gmail API-search? Al tips are welcome!

I would suspect it is the after query parameter that is giving you trouble. 2015/11/04 is not a valid ES5 ISO 8601 date. You could try the alternative after:<time_in_seconds_since_epoch>
# 2015-11-04 <=> 1446595200
searchQuery = 'ticket AND after:1446595200 AND -from:me AND -in:trash'
messagesObj = google.get('/gmail/v1/users/me/messages', data={'q': searchQuery}, token=token).data
print messagesObj.resultSizeEstimate # 11 hopefully!

The q parameter of the /messages/list works the same as on the web UI for me (tried on https://developers.google.com/gmail/api/v1/reference/users/messages/list#try-it )
I think the problem is that you are calling /messages rather than /messages/list

The first time your application connects to Gmail, or if partial synchronization is not available, you must perform a full sync. In a full sync operation, your application should retrieve and store as many of the most recent messages or threads as are necessary for your purpose. For example, if your application displays a list of recent messages, you may wish to retrieve and cache enough messages to allow for a responsive interface if the user scrolls beyond the first several messages displayed. The general procedure for performing a full sync operation is as follows:
Call messages.list to retrieve the first page of message IDs.
Create a batch request of messages.get requests for each of the messages returned by the list request. If your application displays message contents, you should use format=FULL or format=RAW the first time your application retrieves a message and cache the results to avoid additional retrieval operations. If you are retrieving a previously cached message, you should use format=MINIMAL to reduce the size of the response as only the labelIds may change.
Merge the updates into your cached results. Your application should store the historyId of the most recent message (the first message in the list response) for future partial synchronization.
Note: You can also perform synchronization using the equivalent Threads resource methods. This may be advantageous if your application primarily works with threads or only requires message metadata.
Partial synchronization
If your application has synchronized recently, you can perform a partial sync using the history.list method to return all history records newer than the startHistoryId you specify in your request. History records provide message IDs and type of change for each message, such as message added, deleted, or labels modified since the time of the startHistoryId. You can obtain and store the historyId of the most recent message from a full or partial sync to provide as a startHistoryId for future partial synchronization operations.
Limitations
History records are typically available for at least one week and often longer. However, the time period for which records are available may be significantly less and records may sometimes be unavailable in rare cases. If the startHistoryId supplied by your client is outside the available range of history records, the API returns an HTTP 404 error response. In this case, your client must perform a full sync as described in the previous section.
From gmail API Documentation
https://developers.google.com/gmail/api/guides/sync

Accessing Amazon Web Services SQS Queue in Python

I currently have an Amazon Web Services SQS queue which I use throughout my php project. I am trying to now write a service in Python which adds items to the SQS queue. However I cannot get a connection to my existing queue. The code I have is:
import boto.sqs
from boto.sqs.message import Message
conn = boto.sqs.connect_to_region('us-west-2', aws_access_key_id='my key', aws_secret_access_key='my secret key')
print(conn.get_all_queues())
When I run the above code I get an empty array back instead of my current queue. Any ideas why this is happening or how to fix it? Thanks.

You can create the object directly as long as you have the URL and a SQSConnection object.
q = Queue(connection, url)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.