How can I extract data from json - python

I want to extract code from JSON format.
import json
json_data = '''
{
"Body": {
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1.0
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
}
'''
json_da = data['Body']
list_data = data['Body']['MpesaReceiptNumber']
print (json_da)
print (list_data)
I want to print this: NBN52K8A1J

The problem is that you have a list of dicts you need to search first:
>>> for obj in data['Body']['stkCallback']['CallbackMetadata']['Item']:
... print(obj)
...
{'Name': 'Amount', 'Value': 1.0}
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'}
{'Name': 'Balance'}
{'Name': 'TransactionDate', 'Value': 20190223144807}
{'Name': 'PhoneNumber', 'Value': 254725696042}
One possibility is
>>> [x['Value'] for x in data['Body']['stkCallback']['CallbackMetadata']['Item'] if x['Name'] == 'MpesaReceiptNumber'][0]
'NBN52K8A1J'

Just use the library json. Then you can print its inner elements
import json
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
a = json.loads(json_data)
print(a["Body"]["stkCallback"]["CallbackMetadata"]["Item"][1]["Value"])

You were almost there just have to get the the key value pair itself from the dicts and check if it is the name you wanted:
data = json.loads(json_data)
list_data = data['Body']["stkCallback"]['CallbackMetadata']['Item']
var: str
for x in list_data:
if x['Name'] == 'MpesaReceiptNumber':
var = x['Value']
break
print(var)
You can use this in the future easily by replacing the if check with the name of something else so you can grab the value depending on a variable.

I find using pprint to get the shape of the data structure is helpful when you're learning how to navigate it all out.
import json
import pprint
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
data = json.loads(json_data)
pprint.pprint(data)
Results in:
{'Body': {'stkCallback': {'CallbackMetadata': {'Item': [{'Name': 'Amount', 'Value': 1.0},
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'},
{'Name': 'Balance'},
{'Name': 'TransactionDate', 'Value': 20190223144807},
{'Name': 'PhoneNumber', 'Value': 254725696042}]},
'CheckoutRequestID': 'ws_CO_DMZ_250600506_23022019144745852',
'MerchantRequestID': '22531-976234-1',
'ResultCode': 0,
'ResultDesc': 'The service request is processed successfully.'}}}
So you should be able to see that data["Body"]["stkCallback"]["CallbackMetadata"]["Item"] gets to to the depth you need for your data.
>>> pprint.pprint(data["Body"]["stkCallback"]["CallbackMetadata"]["Item"])
[{'Name': 'Amount', 'Value': 1.0},
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'},
{'Name': 'Balance'},
{'Name': 'TransactionDate', 'Value': 20190223144807},
{'Name': 'PhoneNumber', 'Value': 254725696042}]
So next you need to iterate through that list and find a match (if one exists) for the MpesaReceiptNumber key.
receipt_no = None
for item in data["Body"]["stkCallback"]["CallbackMetadata"]["Item"]:
if item.get('Name') == 'MpesaReceiptNumber':
receipt_no = item.get('Value')
print(f"The receipt # is: {receipt_no}")

If you parse the json you will notice that the path to the data is not simply ['Body']['MpesaReceiptNumber']. In fact you have a list of dicts inside ['Item'] that needs to be searched.
Parsed data tree
One suggestion is to run the following code to find the data you are looking for:
import json
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
data = (json.loads(json_data))
list_data = data['Body']['stkCallback']['CallbackMetadata']['Item']
# Returns:
# [{'Name': 'Amount', 'Value': 1.0}, {'Name': 'MpesaReceiptNumber', 'Value':'NBN52K8A1J'}, {'Name': 'Balance'}, {'Name': 'TransactionDate', 'Value': 20190223144807}, {'Name': 'PhoneNumber', 'Value': 254725696042}]
# Now find Name: 'MpesaReceiptNumber' inside the dict list
find_it = next(item for item in list_data if item["Name"] == "MpesaReceiptNumber")
find_it = find_it['Value']
print (find_it)
Result
NBN52K8A1J

Use jq.
First off, it can "pretty print" any JSON data.
Put the value of json_data into a file test.json, and then show the formatted output of the JSON data with:
$ jq <test.json
{
"Body": {
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
}
Next, to extract values, a selector path needs to be given on the jq command line:
jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")|.Value' test.json
"NBN52K8A1J"
Now to make this sequence easier to understand, let's break it down component by component.
To extract and return only the .Body:
$ jq '.Body' <test.json
{
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
Now let's fetch the stkCallback component:
$ jq '.Body.stkCallback' <test.json
{
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
Ok, now the callbackMetadata:
$ jq '.Body.stkCallback.CallbackMetadata' <test.json
{
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
Next, the Item part:
$ jq '.Body.stkCallback.CallbackMetadata.Item' <test.json
[
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
Notice that the result is an array of list items? Let's filter the data out of the array:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]' <test.json
{
"Name": "Amount",
"Value": 1
}
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
}
{
"Name": "Balance"
}
{
"Name": "TransactionDate",
"Value": 20190223144807
}
{
"Name": "PhoneNumber",
"Value": 254725696042
}
Now the result is just the list of tuples, each with a "Name" and "Value". So, let's select just the one we (you) wanted:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")' <test.json
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
}
Cool. We've got the tuple we wanted. Let's extract just the value now:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")|.Value' <test.json
"NBN52K8A1J"
And, there you go.

Simplest way to get the value associated with a specific CallbackMetadata item name:
json_string = '''
{
"Body": {
"stkCallback": {
...
}
'''
json_data = json.loads(json_string)
for item in json_data["Body"]["stkCallback"]["CallbackMetadata"]["Item"]:
if item["Name"] == "MpesaReceiptNumber":
print(item["Value"]) # -> NBN52K8A1J

Related

JSON filter "smaller then" condition

I have a JSON which looks like this:
{
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
},
{
"Name": "!",
"Number": "28"
}
}
and I want to get everything what is smaller than 28, it should look like this:
{
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
}
}
I looked for a solution but all I have found was to remove an exact value.
I'm doing this with a much larger file this is just an example.
You can do it with a simple for loop
import json
with open('your_path_here.json', 'r') as f:
data = json.load(f)
for elem in data['data']:
if int(elem['Number']) >= 28:
data['data'].remove(elem)
print(data)
>>> {
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
}
}
An example could use list comprehension:
data = {
"data": [
{
"Name": "Hello",
"Number": "20"
},
{
"Name": "Beautiful",
"Number": "22"
},
{
"Name": "World",
"Number": "25"
},
{
"Name": "!",
"Number": "28"
}
]
}
filter_ = 28
filtered = {
"data": [
item for item in data["data"]
if int(item["Number"]) < filter_
]
}
print(filtered)
Basically, this creates iterates through data["data"], checks if that current item's number is less than the filter (28 in this case), and adds those to the list. You're left with:
{'data': [{'Name': 'Hello', 'Number': '20'}, {'Name': 'Beautiful', 'Number': '22'}, {'Name': 'World', 'Number': '25'}]}
...which should be what you need, but unformatted.
However, for larger JSON files, you might want to look into ijson, which allows you to load json files in a memory-efficient way. Here's an example:
import ijson
import json
filter_ = 28
with open('data.json', 'r') as file:
items = ijson.items(file, 'data.item')
filtered = [item for item in items if int(item["Number"]) < filter_]
with open('filtered.json', 'w') as output:
json.dump(filtered, output, indent=2)
Try this code online

How to get this json specific word from this array

I have this json and I would like to get only the Name from every array. How do I write it in python,
Currently, I have this li = [item.get(data_new[0]'id') for item in data_new]
where data_new is my json data.
[
{
"id": "1687fbfa-8936-4b77-a7bc-123f9f276c49",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-06-25T16:22:07.578Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T12:43:09.361Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "release-v10",
"scope": "inventory"
},
{
"name": "device_type",
"value": "proddemo-device",
"scope": "inventory"
},
],
"updated_ts": "2020-07-08T12:43:09.361Z"
},
{
"id": "0bf2a1fe-6004-473f-88b7-aab061972115",
"attributes": [
{
"name": "status",
"value": "rejected",
"scope": "identity"
},
{
"name": "created_ts",
"value": "2020-07-01T16:23:00.631Z",
"scope": "system"
},
{
"name": "updated_ts",
"value": "2020-07-08T17:41:16.45Z",
"scope": "system"
},
{
"name": "artifact_name",
"value": "Module_logs_v7",
"scope": "inventory"
},
{
"name": "cpu_model",
"value": "ARMv8 Processor",
"scope": "inventory"
},
{
"name": "device_type",
"value": "device",
"scope": "inventory"
},
{
"name": "hostname",
"value": "device004",
"scope": "inventory"
},
{
"name": "ipv4_br-d6eae8b3a339",
"value": "172.0.0.1/18",
"scope": "inventory"
}
],
"updated_ts": "2020-07-08T12:43:09.361Z"
}
]
This is the output snippet from my API and from this output I want to retrieve the value of the device whose name is hostname, as you can see that is the second last entry from this code where "name": "hostname"
So, I want to retrieve the value for that particular json only where the name will be "hostname", how can I do that.
Please guide me through.
a = [{'id': '291ae0e5956c69c2267489213df4459d19ed48a806603def19d417d004a4b67e',
'attributes': [{'name': 'ip_addr',
'value': '1.2.3.4',
'descriptionName': 'IP address'},
{'name': 'ports', 'value': ['8080', '8081'], 'description': 'Open ports'}],
'updated_ts': '2016-10-03T16:58:51.639Z'},
{'id': '76f40e5956c699e327489213df4459d1923e1a806603def19d417d004a4a3ef',
'attributes': [{'name': 'mac',
'value': '00:01:02:03:04:05',
'descriptionName': 'MAC address'}],
'updated_ts': '2016-10-04T18:24:21.432Z'}]
descriptionName = []
for i in a:
for j in i["attributes"]:
for k in j:
if k == "descriptionName":
descriptionName.append(j[k])
One liner:
[j["descriptionName"] for j in i["attributes"] for i in a if "descriptionName" in j ]
Output:
['IP address', 'MAC address']
Update 1:
To get all names
One liner code -
[j["name"] for j in i["attributes"] for i in a if "name" in j.keys()]
Output:
['status',
'status',
'created_ts',
'created_ts',
'updated_ts',
'updated_ts',
'artifact_name',
'artifact_name',
'cpu_model',
'cpu_model',
'device_type',
'device_type',
'hostname',
'hostname',
'ipv4_br-d6eae8b3a339',
'ipv4_br-d6eae8b3a339']
To get value for which name is "hostname"
[j["value"] for j in i["attributes"] for i in a if "name" in j.keys() and j["name"] == "hostname"]
Output:
['device004', 'device004']

How to access certain values inside a string in Python

I need to extract only a particular element value inside the string.
Below is the code which I used to get the AdsInsight data using Facebook AdsInsight API.
class LibFacebook:
def __init__(self, app_id, app_secret, access_token, ad_account_id):
FacebookAdsApi.init(app_id, app_secret, access_token)
self.account = AdAccount(ad_account_id)
#get ads insight
insights = self.account.get_insights(fields=[
AdsInsights.Field.campaign_id,
AdsInsights.Field.actions,
], params={
'level': AdsInsights.Level.campaign,
})
print(insights)
Output
<AdsInsights> {
"campaign_id": "23843294609751234",
"actions": [
{
"action_type": "post_reaction",
"value": "1"
},
{
"action_type": "landing_page_view",
"value": "78"
},
{
"action_type": "link_click",
"value": "163"
}
]
Question : Along with campaign_id value(23843294609751234) , I need the value of only landing_page_view i.e 78 (and not other action items)and put it in a df. How do I access them ?
Further Information: AdsInsights.Field.actions is of type string.
type(AdsInsights.Field.actions)
str
hope this will work,
lets take your data is a list of AdsInsights objects
obj = [{
"campaign_id": "23843294609751234",
"actions" : [
{
"action_type": "post_reaction",
"value": "1"
},
{
"action_type": "landing_page_view",
"value": "78"
},
{
"action_type": "link_click",
"value": "163"
}
]
},
{
"campaign_id": "112233",
"actions" : [
{
"action_type": "post_reaction",
"value": "1"
},
{
"action_type": "landing_page_view",
"value": "100"
},
{
"action_type": "link_click",
"value": "163"
}
]
}]
you can get result like this
result_arr = []
for i in obj:
datadict = {}
datadict["campaign_id"] = i.get("campaign_id")
for action in i.get("actions"):
if action.get("action_type") == "landing_page_view":
datadict["value"]= action.get("value")
result_arr.append(datadict)
result_arr would be
[{'campaign_id': '23843294609751234', 'value': '78'},
{'campaign_id': '112233', 'value': '100'}]
next convert list of dictionaries to a dataframe
df=pd.DataFrame(result_arr)

Is there more effective way to get result (O(n+m) rather than O(n*m))?

Origin data as below show, every item has a type mark, such as interests, family, behaviors, etc and I want to group by this type field.
return_data = [
{
"id": "112",
"name": "name_112",
"type": "interests",
},
{
"id": "113",
"name": "name_113",
"type": "interests",
},
{
"id": "114",
"name": "name_114",
"type": "interests",
},
{
"id": "115",
"name": "name_115",
"type": "behaviors",
},
{
"id": "116",
"name": "name_116",
"type": "family",
},
{
"id": "117",
"name": "name_117",
"type": "interests",
},
...
]
And expected ouput data format like:
output_data = [
{"interests":[
{
"id": "112",
"name": "name_112"
},
{
"id": "113",
"name": "name_113"
},
...
]
},
{
"behaviors": [
{
"id": "115",
"name": "name_115"
},
...
]
},
{
"family": [
{
"id": "116",
"name": "name_116"
},
...
]
},
...
]
And here is my trial:
type_list = []
for item in return_data:
if item['type'] not in type_list:
type_list.append(item['type'])
interests_list = []
for type in type_list:
temp_list = []
for item in return_data:
if item['type'] == type:
temp_list.append({"id": item['id'], "name": item['name']})
interests_list.append({type: temp_list})
Obviously my trial is low efficient as it is O(n*m), but I cannot find the more effective way to solve the problem.
Is there more effective way to get the result? any commentary is great welcome, thanks.
Use a defaultdict to store a list of items for each type:
from collections import defaultdict
# group by type
temp_dict = defaultdict(list)
for item in return_data:
temp_dict[item["type"]].append({"id": item["id"], "name": item["name"]})
# convert back into a list with the desired format
output_data = [{k: v} for k, v in temp_dict.items()]
Output:
[
{
'behaviors': [
{'name': 'name_115', 'id': '115'}
]
},
{
'family': [
{'name': 'name_116', 'id': '116'}
]
},
{
'interests': [
{'name': 'name_112', 'id': '112'},
{'name': 'name_113', 'id': '113'},
{'name': 'name_114', 'id': '114'},
{'name': 'name_117', 'id': '117'}
]
},
...
]
If you don't want to import defaultdict, you could use a vanilla dictionary with setdefault:
# temp_dict = {}
temp_dict.setdefault(item["type"], []).append(...)
Behaves in exactly the same way, if a little less efficient.
please see Python dictionary for map.
for item in return_data:
typeMap[item['type']] = typeMap[item['type']] + delimiter + item['name']

Create dynamic json object in python

I have a dictionary which is contain multiple keys and values and the values also contain the key, value pair. I am not getting how to create dynamic json using this dictionary in python. Here's the dictionary:
image_dict = {"IMAGE_1":{"img0":"IMAGE_2","img1":"IMAGE_3","img2":"IMAGE_4"},"IMAGE_2":{"img0":"IMAGE_1", "img1" : "IMAGE_3"},"IMAGE_3":{"img0":"IMAGE_1", "img1":"IMAGE_2"},"IMAGE_4":{"img0":"IMAGE_1"}}
My expected result like this :
{
"data": [
{
"image": {
"imageId": {
"id": "IMAGE_1"
},
"link": {
"target": {
"id": "IMAGE_2"
},
"target": {
"id": "IMAGE_3"
},
"target": {
"id": "IMAGE_4"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_2"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_3"
}
}
},
"updateData": "link"
},
{
"image": {
"imageId": {
"id": "IMAGE_3"
},
"link": {
"target": {
"id": "IMAGE_1"
},
"target": {
"id": "IMAGE_2"
}
}
},
"updateData": "link"
} ,
{
"image": {
"imageId": {
"id": "IMAGE_4"
},
"link": {
"target": {
"id": "IMAGE_1"
}
}
},
"updateData": "link"
}
]
}
I tried to solve it but I didn't get expected result.
result = {"data":[]}
for k,v in sorted(image_dict.items()):
for a in sorted(v.values()):
result["data"].append({"image":{"imageId":{"id": k},
"link":{"target":{"id": a}}},"updateData": "link"})
print(json.dumps(result, indent=4))
In Python dictionaries you can't have 2 values with the same key. So you can't have multiple targets all called "target". So you can index them. Also I don't know what this question has to do with dynamic objects but here's the code I got working:
import re
dict_res = {}
ind = 0
for image in image_dict:
lin_ind = 0
sub_dict = {'image' + str(ind): {'imageId': {image}, 'link': {}}}
for sub in image_dict[image].values():
sub_dict['image' + str(ind)]['link'].update({'target' + str(lin_ind): {'id': sub}})
lin_ind += 1
dict_res.update(sub_dict)
ind += 1
dict_res = re.sub('target\d', 'target', re.sub('image\d', 'image', str(dict_res)))
print dict_res

Categories