I'm a beginner in python, I have an Azure function that runs with a time trigger. This function reads a batch of raw JSON data from an Azure service bus with string format.
This is a two-row of data. In reality, I received about 50 like this message is continuous. Now I want to split this message row by row and then archive it to Azure Storage.
The message is like the below sample ( concat of row1 and row2 ) :
{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":4560000,"SiName":"","As":"","PId":2107401,"ICheck":0,"SeeNum":40509704561424,"Type":0,"Counter":34,"PaId":0,"MeType":31,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.0254566,\"10\":-0.054562772}},\"NUMBER_TAG\":\"2145600\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":1,"id":"074222a38-2816-42c7-b95c-6644448ba9d","t":-2}
Row 1 is:
{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}
Row 2 is:
{"Name":"","Seri":4560000,"SiName":"","As":"","PId":2107401,"ICheck":0,"SeeNum":40509704561424,"Type":0,"Counter":34,"PaId":0,"MeType":31,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTTAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-2}
The structure of a row is like the below image:
In my opinion, First I should split each row and then create a data frame and insert each value in the related column. After that, I append to a blob. Is it right?
How can I do? What is your suggested solution?
Edited:
My code for reading from service bus:
from azure.servicebus import ServiceBusClient, ServiceBusMessage
connection_str = "**"
topic_name = "***"
subscription_name = "***"
servicebus_client = ServiceBusClient.from_connection_string(
conn_str=connection_str, logging_enable=True)
with servicebus_client:
# get the Subscription Receiver object for the subscription
receiver = servicebus_client.get_subscription_receiver(
topic_name=topic_name, subscription_name=subscription_name, )
with receiver:
for msg in receiver:
print("Received: " + str(msg))
# complete the message so that the message is removed from the subscription
receiver.complete_message(msg)
Since the messages are sent individually, you can process them individually. There is no need to concatenate into a string. Just keep appending them into a data frame. Below sample is for a queue but you can extend to a topic/subscription. I've also attached the results to show you what the output looks like.
from azure.servicebus import ServiceBusClient
import pandas as pd
import json
from pandas import json_normalize
CONNECTION_STR = 'Endpoint=sb://xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
QUEUE_NAME = 'xxxxxxxxxxx'
servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR)
with servicebus_client:
receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME)
# create an Empty DataFrame object
df = pd.DataFrame()
msg_concat = ""
dfs = []
with receiver:
received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5)
for msg in received_msgs:
msg_dict = json.loads(str(msg))
df2 = json_normalize(msg_dict)
df = df.append(df2, ignore_index = True)
receiver.complete_message(msg)
print(df)
print("Receive is done.")
Name Seri SiName As ... Id Asse id t
0 21000000 ... 0 None 075f0a38-2816-42c7-b95c-66c425b8ba9d -1
1 4560000 ... 0 None 075f0a38-2816-42c7-b95c-66c425b8ba9d -2
[2 rows x 21 columns]
Receive is done.
Consider sample data with three rows:
data = '{"Name": "Hassan", "code":"12"}{"Name": "Jack", "code":"345"}{"Name": "Jack", "code":"345"}'
Here is how you can get dataframe from this data:
from ast import literal_eval
data = [literal_eval(d + '}')for d in data.split('}')[0:-1]]
df = pd.DataFrame.from_records(data)
Output:
Name code
0 Hassan 12
1 Jack 345
2 Jack 345
i'm new to OOP in Python. So, I want to read CSV files using OOP.
I have a CSV file with 5 columns separated by a comma.
I want to read that CSV file, with each column stored in a table of the new dataframe.
So, suppose I have data like these:
1434,"2021-08-13 06:31:59",unread,082196788998,kuse.hamdy#gmail.com
1433,"2021-08-13 06:09:41",unread,081554220007,ritaambarwati1#umsida.ac.id
1432,"2021-08-13 05:35:07",unread,081911075017,rifqinaufalfayyadh#gmail.com
I want the OOP code to read that CSV file and store it to a new table like these:
id date status number email
1434 2021-08-13 06:31:59 unread 089296788998 kuse.hamdy#gmail.com
1433 2021-08-13 06:09:41 unread 081554271927 ritati1#yahoo.com
1432 2021-08-13 05:35:07 unread 081911075017 rifqinaufalfayyadh#gmail.com
I tried this code:
import csv
class Complete_list:
def __init__(self, row, header, list_):
self.__dict__ = dict(zip(header, row))
self.list_ = list_
def __repr__(self):
return self.list_
data = list(csv.reader(open("complete_list.csv")))
instances = [Complete_list(a, data[1], "date_{}".format(i+1)) for i, a in enumerate(data[1:])]
instances = list(instances)
for i in instances:
j = i.list_.split(',')
print(j)
Somehow, I could not access the value of each list separated by the comma and put it into a new dataframe with multiple columns. Instead, I got the result like this:
['date_1']
['date_2']
['date_3']
To be honest , you are better of using libraies like pandas. but this is how i would approach it.
class complete_list:
def __init__(self, path, header=None):
self.data = path
self.header= header
def read(self):
with open(self.data, 'r') as f:
data = [x.split(',') for x in f.readlines()]
return data
def printer(self):
if self.header:
a,b,c,d,e = self.header
yield(f'{a:^10} {b:^15} {c:^25}{d:10}{e:^10}')
for i in self.read():
# print(i)
yield f'{i[0]:^10}| {i[1]:^10} | {i[2]:^10} | {i[3]:^10} | {i[4]:^10}'
headers= ['id', 'date', 'status', 'number', 'email']
data_frame = complete_list('yes.txt',header = headers).printer()
output
id date status number email
1434 | "2021-08-13 06:31:59" | unread | 082196788998 | kuse.hamdy#gmail.com
1433 | "2021-08-13 06:09:41" | unread | 081554220007 | ritaambarwati1#umsida.ac
1432 | "2021-08-13 05:35:07" | unread | 081911075017 | rifqinaufalfayyadh#gmail.com
The pandas library is the perfect tool for that
import pandas as pd
df = pd.read_csv("data.csv", sep=",", names=['id', 'date', 'status', 'number', 'email'])
print(df)
id date status number email
0 1434 2021-08-13 06:31:59 unread 82196788998 kuse.hamdy#gmail.com
1 1433 2021-08-13 06:09:41 unread 81554220007 ritaambarwati1#umsida.ac.id
2 1432 2021-08-13 05:35:07 unread 81911075017 rifqinaufalfayyadh#gmail.com
im beginner with python.
i want to read this json file data1 like in the attachment.
i have tried to read all columns in the file, but i can only read the 'data' nest. i don't know how to read all the columns in both "data" and "quotes" nest. can you please help me?
Thankyou
my code:
import pandas as pd
data = json.load(open('C:/JSON_IMPORT/data1.json'))
df = pd.DataFrame(data["data"])
print (df)
**Json file:**
```{"status":
{"timestamp":"2021-03-16T19:27:55.404Z","error_code":0,"error_message":null,"elapsed":173,"credit_count":22,"notice":null,"total_count":4368},
"data":[{"id":1,
"name":"Bitcoin",
"symbol":"BTC",
"slug":"bitcoin",
"num_market_pairs":9862,
"date_added":"2013-04-28T00:00:00.000Z",
"tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio"],
"max_supply":21000000,
"circulating_supply":18655725,
"total_supply":18655725,
"platform":null,
"cmc_rank":1,
"last_updated":"2021-03-16T19:26:11.000Z",
"quote":{
"USD":{
"price":55643.86231386882,
"volume_24h":57006039705.56386,
"percent_change_1h":-0.22948654,
"percent_change_24h":-0.66133846,
"percent_change_7d":3.26713607,
"percent_change_30d":14.24843475,
"percent_change_60d":54.21680422,
"percent_change_90d":168.83609047,
"market_cap":1038076593265.4004,
"last_updated":"2021-03-16T19:26:11.000Z"}}
}}
]
Here you go. You should use pd.json_normalize and concatenate that with the dataframe made from data['status']
df = pd.concat([pd.DataFrame(data['status'],index=[0]),
pd.json_normalize(data, record_path=['data'])],
axis=1)
print(df)
# > timestamp error_code error_message elapsed credit_count notice total_count id name symbol slug num_market_pairs date_added tags max_supply circulating_supply total_supply platform cmc_rank last_updated quote.USD.price quote.USD.volume_24h quote.USD.percent_change_1h quote.USD.percent_change_24h quote.USD.percent_change_7d quote.USD.percent_change_30d quote.USD.percent_change_60d quote.USD.percent_change_90d quote.USD.market_cap quote.USD.last_updated
0 2021-03-16T19:27:55.404Z 0 null 173 22 null 4368 1 Bitcoin BTC bitcoin 9862 2013-04-28T00:00:00.000Z ['mineable', 'pow', 'sha-256', 'store-of-value', 'state-channels', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio'] 21000000 18655725 18655725 null 1 2021-03-16T19:26:11.000Z 55643.86231386882 57006039705.56386 -0.22948654 -0.66133846 3.26713607 14.24843475 54.21680422 168.83609047 1038076593265.4004 2021-03-16T19:26:11.000Z