Azure Streaming Analytics input/output - python

I implemented a very simple Streaming Analytics query:
SELECT
Collect()
FROM
Input TIMESTAMP BY ts
GROUP BY
TumblingWindow(second, 3)
I produce on an event hub input with a python script:
...
iso_ts = datetime.fromtimestamp(ts).isoformat()
data = dict(ts=iso_ts, value=value)
msg = json.dumps(data, encoding='utf-8')
# bus_service is a ServiceBusService instance
bus_service.send_event(HUB_NAME, msg)
...
I consume from a queue:
...
while True:
msg = bus_service.receive_queue_message(Q_NAME, peek_lock=False)
print msg.body
...
The problem is that I cannot see any error from any point in the Azure portal (the input and the output are tested and are ok), but I cannot get any output from my running process!
I share a picture of the diagnostic while the query is running:
Can somebody give me an idea for where to start troubleshooting?
Thank you so much!
UPDATE
Ok, I guess I isolated the problem.
First of all, the query format should be like this:
SELECT
Collect()
INTO
[output-alias]
FROM
[input-alias] TIMESTAMP BY ts
GROUP BY
TumblingWindow(second, 3)
I tried to remove the TIMESTAMP BY clause and everything goes well; so, I guess that the problem is with that clause.
I paste an example of JSON-serialized input data:
{
"ts": "1970-01-01 01:01:17",
"value": "foo"
}
One could argue that the timestamp is too old (seventies), but I also tried with current timestamps and I didn't get any output and any error on the input.
Can somebody imagine what is going wrong? Thank you!

I discovered that my question was a duplicate of Basic query with TIMESTAMP by not producing output.
So, the solution is that you cannot use data from the seventies, because streaming analytics will consider that all the tuples are late and will drop them.
I re-tried to produce in-time tuples and, after a long latency, I could see the output.
Thanks to everybody!

Can you check the Service Bus queue from Azure portal for number of messages received?

Related

How to get PartitionKeyRangeId of a container in Azure Cosmos Python SDK

I am trying to implement Pull Model to query change feed using Azure Cosmos Python SDK. I found that to parallelise the querying process, the official documentation mentions about FeedRange value and create FeedIterator to iterate through each range of partition key values obtained from the FeedRange.
Currently my code snippet to query change feed looks like this and it is pretty straight-forward:
# function to get items from change feed based on a condition
def get_response(container_client, condition):
# Historical data read
if condition:
response = container.query_items_change_feed(
is_start_from_beginning = True,
# partition_key_range_id = 0
)
# reading from a checkpoint
else:
response = container.query_items_change_feed(
is_start_from_beginning = False,
continuation = last_continuation_token
)
return response
The problem with this approach is the efficiency when getting all the items from beginning (Historical Data Read). I tried this method with pretty small dataset of 500 items and the response took around 60 seconds. When dealing with millions or even billions of items the response might take too long to return.
Would querying change feed parallelly for each partition key range save time?
If yes, how to get PartitionKeyRangeId in Python SDK?
Is there any problems I need to consider when implementing this?
I hope I make sense!

Write custom timestamps to InfluxDB with Python

I'm currently struggling with a basic function of the influx_client in Python. I have a set of time series data which I want to add into an influxdb on a different client. My current code looks kinda like this:
client = InfluxDBClient(url=f"http://{ip}:{port_db}", token=token, org=org)
write_api = client.write_api(write_options=ASYNCHRONOUS)
p = Point("title_meas").field("column_data", value_data)
write_api.write(bucket=bucket, org=org, record=p)
Now I got a specific timestamp for each point I want to use as the InfluxDB keys/timestamps but whatever I try - it keeps on adding the system time of my host device (But as I'm working with historical data I need to adjust the timespecs). How I can achieve my custom timestamps or is there a easier way instead of using the Point method adding my data line by line... something like a Pandas dataframe maybe?
Thankful for every advice.
You can write via line protocol, Point objects, Pandas Dataframe, or json Dictionary. All are viable methods.
If you care about throughput line protocol is the fastest, but if tiny speed differences are not important just use whatever you want. I highly recommend reading this. The "tag" you are looking to modify on the influx datapoint is called "_time".
To add it to a Point do:
p = Point("title_meas").field("column_data", value_data).time('2021-08-09T18:04:56.865943'))
or json dictionary protocol:
p = {'measurement':'title_meas', 'time': '2021-08-09T18:04:56.865943',
'tags':{'sometag': 'sometag'},
'fields':{'column_data': value_data}
}
Easiest way to ensure timestamps are what you expect is to use UTC/ISO format.

How do you split a big message in different lines in python telebot?

So hey. My previous question was not well received so I'll try to do better this time.
One of my help commands for the bot sends them a list of commands that they can do. Here's the code for the specific part of the problem:
def help_command(update, context):
update.message.reply_text("What do you need my help in?")
update.message.reply_text("/commandhelp - Know my commands")
update.message.reply_text("/helpmegetapartner - Get advice on getting a partner")
dp.add_handler(CommandHandler("jhelp", help_command))
Now, I will with time add more commands, which may not fall under the given list. But there's no way (that I know of) to send the same message but in one single one, along with line breaks. This method will bombard them with messages and make them hate me. Please help!
You can use triple quotes like that :
help_command_text = """What do you need my help in?
/commandhelp - Know my commands
/helpmegetapartner - Get advice on getting a partner
/anothercommand ...
"""
And then
update.message.reply_text(help_command_text)

How to make a Slack App/Bot post a code block (Python)

In Slack you're able to post as a user in a code block like in Stackoverflow
like so
As a posting user, you're able to do this by typing "```" it then changes your input box to one formatted for code.
I need to get my Slack App/Bot to post a Tabluate Table as a code block so the formatting stays consistent with my Python output. At the moment, my code looks like this:
client.chat_postMessage(channel="#google-analytics-test",text="```" + table)
This simply posts the table in a text format with "```" added onto the start of it.
This is what comes from the bot:
How it should look coming from the user:
Any help on this would be greatly appreciated, any alternative methods to get the Tabulate Table being posted by the bot in the right format would also be very welcomed!
You also need "```" after the table and both should be in a separate line. This should do it:
client.chat_postMessage(channel="#google-analytics-test",text="```\n" + table + "\n```")
Note the added newlines "\n".

Python - I want to increase the Row-Index automatically

I am absolutely new to Python or coding for that matter, hence, any help would be greatly appreciated. I have around 21 Salesforce orgs and am trying to get some information from each of the org into one place to send out in an email.
import pandas as pd
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
username = df.loc[[1],'uname'].values[0]
password = df.loc[[1],'passw'].values[0]
sectocken = df.loc[[1],'stoken'].values[0]
I have saved all my username, password, security tokens in secretCSV.csv file and with the above code I can get the data for 1 row as the index value I have given is 0. I would like to know how can I loop through this and after each loop, how to increase the index value until all rows from the CSV file is read.
Thank you in advance for any assistance you all can offer.
Adil
--
You can iterate on the dataframe but it's highly not recommend (not efficient, looks bad, too much code etc)
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
so DO NOT DO THIS EVEN IF IT WORKS:
for i in range (0, df.shape[0]):
username = df.loc[[i],'uname'].values[0]
password = df.loc[[i],'passw'].values[0]
sectocken = df.loc[[i],'stoken'].values[0]
Instead, do this:
sec_list = [(u,p,s) for _,u,p,s in df.values]
now you have a sec_list with tuples (username, password, sectocken)
access example: sec_list[0][1] - as in row=0 and get the password (located at [1]).
Pandas is great when you want to apply operations to a large set of data, but is usually not a good fit when you want to manipulate individual cells in python. Each cell would need to be converted to a python object each time its touched.
For your goals, I think the standard csv module is what you want
import csv
with open("secretCSV.csv", newline='') as f:
for username, password, sectoken in csv.reader(f):
# do all the things
Thank you everyone for your responses. I think I will first start with python learning and then get back to this. I should have learnt coding before coding. :)
Also, I was able to iterate (sorry, most of you said not to iterate the dataframe) and get the credentials from the file.
I actually have 21 salesforce orgs and am trying to get License information from each of them and email to certain people on a daily basis. I didn't want to expose salesforce credentials, hence, went with a flat file option.
I have build the code to get the salesforce license details and able to pull the same in the format I want for 1 client. However, I have to do this for 21 clients and thought of iterating the credentials so I can run the getLicense function on loop until all 21 client's data is fetched.
I will learn Python or at least learn a little bit more than what I know now and come back to this again. Until then, Informatica and batch script would have to do.
Thank you again to each one of you for your help!
Adil
--

Categories