Parsing Panda to_dict

Parsing Panda to_dict - python

I have data is being fetch via API, but the data is in HTML format, so I used panda to convert the HTML to to_dict but when fetching the data in Django, it adds wraps around with string, which I'm not able to use the for loop to parse the data. How to remove the string so that I can fetch data.
Data:
output = fetchdata(datacenter)
## Dict format to fetch
context = {
'datacenter': datacenter,
'output': output
}
Here is the below OUTPUT:
{'datacenter': 'DC1', 'output': b"[{'Device': 'device01', 'Port': 'Ge0/0/5', 'Provider': 'L3', 'ID': 3324114459135, 'Remote': 'ISP Circuit', 'Destination Port': 'ISP Port'}, {'Device': 'device02', 'Port': 'Ge0/0/5', 'Provider': 'L3', 'ID': 334555114459135, 'Remote': 'ISP Circuit', 'Destination Port': 'ISP Port'}]\n"}
I would like to garb data from the output and present in Table format

The output should be json object, so:
import json
json.loads(output)
It must work.

Related

how to read json file with json.load

I want to pick "ocr_text" in this json
How can I pick ocr_text with json.loads
{'message': 'Success', 'result': [{'message': 'Success', 'input': '1.jpg', 'prediction': [{'id': 'a6447ad9-80f7-4bce-bb5e-588bef3874e6', 'label': 'number_plate', 'xmin': 93, 'ymin': 405, 'xmax': 248, 'ymax': 445, 'score': 0.99992895, **'ocr_text': 'MH 02 CB 4545'**, 'type': 'field', 'status': 'correctly_predicted', 'page_no': 0, 'label_id': '45aaf761-4b60-42e9-b9a7-21d7ea8b927a'}], auto=compress&expires=1670532718&or=90&s=373803a82f093ab6b3b68d530f85f294', 'original_with_long_expiry': 'https://nnts.imgix.net/uploadedfiles/59aedc47-df0d-4e93-a52d-dd7076da1287/PredictionImages/658c79d6-c4c7-4ce3-8dfc-41d8884d5719.jpeg?expires=1686070318&or=0&s=849652a08454ccca0ac5cfb779c0cba3'}, 'uploadedfiles/59aedc47-df0d-4e93-a52d-dd7076da1287/RawPredictions/1-2022-12-08T16-51-56.347.jpg': {'original': 'https://nanonets.s3.us-west-2.amazonaws.com/uploadedfiles/59aedc47-df0d-4e93-a52d-dd7076da1287/RawPredictions/1-2022-12-08T16-51-56.347.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5F4WPNNTLX3QHN4W%2F20221208%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20221208T165158Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&response-cache-control=no-cache&X-Amz-Signature=6ccfc59eb43ffe89dda229ca2a91f09f883596014c7ab0bba6028432f506438d', 'original_compressed': '', 'thumbnail': '', 'acw_rotate_90': '', 'acw_rotate_180': '', 'acw_rotate_270': '', 'original_with_long_expiry': ''}}}

There's an error with the JSON you provided, however I will try my best to answer.
if the json data is within a string in the code, you can use loads like so:
import json
json_string = ... # this is the string with json data
json_dict = json.loads(json_string)
print(json_dict["ocr_text"])
first, I use json.loads to load json data from the json_string into a dictionary (json_dict)
then, I treat it as a regular dictionary with the square brackets.
The python json module documentation has more info if you'd like.

converting a deep nested loop from JSON into Pandas DF

I am getting info from an API, and getting this as the resulting JSON file:
{'business_discovery': {'media': {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00+0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
# NOTE: I scrubbed the numbers with dummy data
I know to get the data I can run this script to get all the data within the data
# "a" is the json without business discovery or media, which would be this:
a = {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00+0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code = [],[],[],[],[],[],[],[]
for result in a['data']:
media_url.append(result[u'media_url']) #Appending all the info within their Json to a list
timestamp.append(result[u'timestamp'])
caption.append(result[u'caption'])
media_type.append(result[u'media_type'])
media_product_type.append(result[u'media_product_type'])
comment_count.append(result[u'comments_count'])
like_count.append(result[u'like_count'])
id_code.append(result[u'id']) # All info exists, even when a value is 0
df = pd.DataFrame([media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code]).T
when I run the above command on the info from the api, I get errors saying that the data is not found
This works fine for now, but trying to figure out a way to "hop" over both business discovery, and media, to get straight to data so I can run this more effectively, rather than copying and pasting where I skip over business discovery and media

Using json.normalize
df = pd.json_normalize(data=data["business_discovery"]["media"], record_path="data")

Why reading a json format file resulting all the records going to _corrupt_record in pyspark

I am reading data from an api call and the data is in the form of json like below:
{'success': True, 'errors': \[\], 'requestId': '151a2#fg', 'warnings': \[\], 'result': \[{'id': 10322433, 'name': 'sdfdgd', 'desc': '', 'createdAt': '2016-09-20T13:48:58Z+0000', 'updatedAt': '2020-07-16T13:08:03Z+0000', 'url': 'https://eda', 'subject': {'type': 'Text', 'value': 'Register now'}, 'fromName': {'type': 'Text', 'value': 'ramjdn fg'}, 'fromEmail': {'type': 'Text', 'value': 'ffdfee#ozx.com'}, 'replyEmail': {'type': 'Text', 'value': 'ffdfee#ozx.com'}, 'folder': {'type': 'Folder', 'value': 478, 'folderName': 'sjha'}, 'operational': False, 'textOnly': False, 'publishToMSI': False, 'webView': False, 'status': 'approved', 'template': 1031, 'workspace': 'Default', 'isOpenTrackingDisabled': False, 'version': 2, 'autoCopyToText': True, 'preHeader': None}\]}
Now when I am creating a dataframe out of this data using below code:
df = spark.read.json(sc.parallelize(\[data\]))
I am getting only one column which is _corrupt_record, below is the dataframe o/p I am getting. I have tried using multine is true but am still not getting the desired output.
+--------------------+
| \_corrupt_record|
\+--------------------+
|{'id': 12526, 'na...|
\+--------------------+
Expected o/p is the dataframe after exploding json with different columns, like id as one column, name as other column and so on.
I have tried lot of things but not able to fix this.

I have made certain changes and it worked.
I need to define the custom schema
Then used this bit of code
data = sc.parallelize([items])
df = spark.createDataFrame(data,schema=schema)
And It worked.
If there are any optimized solution to this please feel free to share.

Sending python requests and handling JSON lists

I am sending requests to a crypto network for data on accounts. You get sent back information, but I haven't yet encountered lists being sent in JSON until now. I want to parse certain information, but am having trouble because the JSON is a list and is not as easy to parse compared to normal JSON data.
import requests
import json
url = ' https://s1.ripple.com:51234/'
payload = {
"method": "account_objects",
"params": [
{
"account": "r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59",
"ledger_index": "validated",
"type": "state",
"deletion_blockers_only": False,
"limit": 10
}
]
}
response = requests.post(url, data=json.dumps(payload))
print(response.text)
data = response.text
parsed = json.loads(data)
price = parsed['result']
price = price['account_objects']
for Balance in price:
print(Balance)
You will receive all the tokens the account holds and the value. I can not figure out how to parse this correctly and receive the correct one. This particular test account has a lot of tokens so I will only show the first tokens info.
RESULT
{'Balance': {'currency': 'ASP', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '0'}, 'Flags': 65536, 'HighLimit': {'currency': 'ASP', 'issuer': 'r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59', 'value': '0'}, 'HighNode': '0', 'LedgerEntryType': 'RippleState', 'LowLimit': {'currency': 'ASP', 'issuer': 'r3vi7mWxru9rJCxETCyA1CHvzL96eZWx5z', 'value': '10'}, 'LowNode': '0', 'PreviousTxnID': 'BF7555B0F018E3C5E2A3FF9437A1A5092F32903BE246202F988181B9CED0D862', 'PreviousTxnLgrSeq': 1438879, 'index': '2243B0B630EA6F7330B654EFA53E27A7609D9484E535AB11B7F946DF3D247CE9'}
I want to get the first bit of info, here. {'Balance': {'currency': 'ASP', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '0'},
Specifically 'value' and the number
I have tried to take parse 'Balance' but since it is a list it is not as straight forward.

You're mixing up lists and dictionaries. In order to access a dictionary by key, you need to invoke the key, as such:
for Balance in price:
print(Balance['Balance'])
Yields the following results:
{'currency': 'CHF', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '-0.3488146605801446'}
{'currency': 'BTC', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '0'}
{'currency': 'USD', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '-11.68225001668339'}
If you only wanted to extract the value, you simply dive one level deeper:
for Balance in price:
print(Balance['Balance']['value')
Which yields:
-0.3488146605801446
0
-11.68225001668339

I assume that under price['account_objects'] you have a list of dictionaries? And then in each dictionary you have in one of the keys: 'Balance': {'currency': 'ASP', 'issuer': 'rrrrrrrrrrrrrrrrrrrrBZbvji', 'value': '0'. If so, why don't you iterate over the list and then access each dictionary, like:
account_objects = price['account_objects']
for account_object in price:
print(account_object['Balance'])

I want to change json format to table format

I want to change json format to tabular format using python.
dict and list are used in nesting.
Currently
{'tables': [{'name': 'PrimaryResult', 'columns': [{'name': 'TimeGenerated', 'type': 'datetime'}, {'name': 'OperationName', 'type': 'string'}, {'name': 'Category', 'type': 'string'}], 'rows': [['2021-08-24T04:08:01.966Z', 'Restore application', 'ApplicationManagement'], ['2021-08-24T06:52:22.14Z', 'Bulk create users - started (bulk)', 'UserManagement'], ['2021-08-24T06:52:22.671Z', 'Bulk create users - finished (bulk)', 'UserManagement'], ['2021-08-24T06:52:22.471Z', 'Add user', 'UserManagement'], ['2021-08-24T06:52:22.501Z', 'Add user', 'UserManagement'], ['2021-08-24T06:52:22.594Z', 'Add user', 'UserManagement'], ['2021-08-24T06:52:22.513Z', 'Add user', 'UserManagement'], ['2021-08-24T06:54:48.482Z', 'Enable Strong Authentication', 'UserManagement'], ['2021-08-24T06:54:48.487Z', 'Update user', 'UserManagement'], ['2021-08-24T06:54:33.391Z', 'Enable Strong Authentication', 'UserManagement']]}]}
Table
headers: tables | TimeGenerated | OperationName | Category
eg: PrimaryResult, 2021-08-24T04:08:01.966Z, Restore application, ApplicationManagement

Here is the quick and straightforward solution:
import pandas as pd
import json
# Open JSON file
with open('{your_file_path}') as json_file:
data = json.load(json_file)
# Create dataframe
pd_data = data['tables'][0]['rows']
pd_columns = [v['name'] for k, v in enumerate(data['tables'][0]['columns'])]
df = pd.DataFrame(data=pd_data, columns=pd_columns)
You may export the dataframe to various table format provided by pandas.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing Panda to_dict - python

The output should be json object, so: import json json.loads(output) It must work.

Related

how to read json file with json.load

converting a deep nested loop from JSON into Pandas DF

Why reading a json format file resulting all the records going to _corrupt_record in pyspark

Sending python requests and handling JSON lists

I want to change json format to table format

Categories

Resources