Finding specific values in nested json using Python - python

I have a file containing a large number of nested json objects. I pasted a snippet of it below. I am trying to use python to query all of the objects in the file to pull out those objects that have at least one custom feeds - url value that begins with "http://commshare" Some objects will not have any custom feeds, and the others will have one or more custom feed each of which might or might not begin with that string I am searching for. Any help would be appreciated! I am very new to Python.
Example JSON:
[{
"empid": "12345",
"values": {
"custom_feeds": {
"custom_feeds": [
{
"name": "Bulletins",
"url": "http://infoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
]
},
"gadgetTitle": "InfoSec Updates",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
},
{
"empid": "23456",
"values": {
"custom_feeds": {
"custom_feeds": [
{
"name": "1 News",
"url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "2 News",
"url": "http://info.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "3 News",
"url": "http://blogs.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
{
"name": "4 News",
"url": "http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
]
},
"gadgetTitle": "Org News",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
}, {
"empid": "34567",
"values": {
"custom_feeds": {
"custom_feeds": []
},
"gadgetTitle": "Org News",
"newWindow": false,
"article_limit_value": 10,
"show_source": true
}
}]

Assuming your file is named input.json and you want the object for each feed, you could parse the JSON and create a new list where the feeds meet your criteria using list comprehension:
import json
with open('input.json') as input_file:
items = json.loads(input_file.read())
feeds = [{'name': feed['name'], 'url': feed['url'], 'empid': item['empid']}
for item in items
for feed in item['values']['custom_feeds']['custom_feeds']
if feed['url'].startswith('http://commshare')]
assert feeds == [{'name': '4 News', 'url': 'http://commshare.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 'empid': '23456'}]

Related

replace nested document array mongodb with python

i have this document in mongodb
{
"_id": {
"$oid": "62644af0368cb0a46d7c2a95"
},
"insertionData": "23/04/2022 19:50:50",
"ipfsMetadata": {
"Name": "data.json",
"Hash": "Qmb3FWgyJHzJA7WCBX1phgkV93GiEQ9UDWUYffDqUCbe7E",
"Size": "431"
},
"metadata": {
"sessionDate": "20220415 17:42:55",
"dataSender": "user345",
"data": {
"height": "180",
"weight": "80"
},
"addtionalInformation": [
{
"name": "poolsize",
"value": "30m"
},
{
"name": "swimStyle",
"value": "mariposa"
},
{
"name": "modality",
"value": "swim"
},
{
"name": "gender-title",
"value": "schoolA"
}
]
},
"fileId": {
"$numberLong": "4"
}
}
I want to update nested array document, for instance the name with gender-tittle. This have value schoolA and i want to change to adult like the body. I give the parameter number of fileId in the post request and in body i pass this
post request : localhost/sessionUpdate/4
and body:
{
"name": "gender-title",
"value": "adultos"
}
flask
#app.route('/sessionUpdate/<string:a>', methods=['PUT'])
def sessionUpdate(a):
datas=request.json
r=str(datas['name'])
r2=str(datas['value'])
print(r,r2)
r3=collection.update_one({'fileId':a, 'metadata.addtionalInformation':r}, {'$set':{'metadata.addtionalInformation.$.value':r2}})
return str(r3),200
i'm getting the 200 but the document don't update with the new value.
As you are using positional operator $ to work with your array, make sure your select query is targeting array element. You can see in below query that it is targeting metadata.addtionalInformation array with the condition that name: "gender-title"
db.collection.update({
"fileId": 4,
"metadata.addtionalInformation.name": "gender-title"
},
{
"$set": {
"metadata.addtionalInformation.$.value": "junior"
}
})
Here is the Mongo playground for your reference.

Referring to parts of large JSON files

I am currently trying to have python parse JSON similar to the one at https://petition.parliament.uk/petitions/560216.json.
My problem is that the data I need is nested in a lot of parts and I don't know how to tell python which part to take.
A simplified version of the data I need is below
{
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
I am trying to pull the "signature_count" part.
The below code collect what you have asked to a list
data = {
"data": {
"attributes": {
"signatures_by_country": [
{
"name": "Afghanistan",
"code": "AF",
"signature_count": 1
},
{
"name": "Algeria",
"code": "DZ",
"signature_count": 2
},
]
}
}
}
counts = [x['signature_count'] for x in data['data']['attributes']['signatures_by_country']]
print(counts)
output
[1,2]
Count by country below
counts = [{x['name']:x['signature_count']} for x in data['data']['attributes']['signatures_by_country']]
output
[{'Afghanistan': 1}, {'Algeria': 2}]

Getting the specified data from the json format using python?

I am having a json data like this
and I want the specific data from this json format based on the condition given below
{
"ok": true,
"members": [
{
"id": "W012A3CDE",
"team_id": "T012AB3C4",
"name": "spengler",
"deleted": false,
"color": "9f69e7",
"real_name": "spengler",
"tz": "America/Los_Angeles",
"tz_label": "Pacific Daylight Time",
"tz_offset": -25200,
"profile": {
"avatar_hash": "ge3b51ca72de",
"status_text": "Print is dead",
"status_emoji": ":books:",
}
},
{
"id": "W07QCRPA4",
"team_id": "T0G9PQBBK",
"name": "glinda",
"deleted": false,
"color": "9f69e7",
"real_name": "Glinda Southgood",
"tz": "America/Los_Angeles",
"tz_label": "Pacific Daylight Time",
"tz_offset": -25200,
"profile": {
"phone": "",
"skype": "",
"real_name": "Glinda Southgood",
"real_name_normalized": "Glinda Southgood",
"display_name": "Glinda the Fairly Good",
"display_name_normalized": "Glinda the Fairly Good",
"email": "glenda#south.oz.coven"
},
}
],
"cache_ts": 1498777272,
"response_metadata": {
"next_cursor": "dXNlcjpVMEc5V0ZYTlo="
}
}
now I want to get only name and id from the members where deleted:false from this json format using python Can anyone help me
Just a simple list comprehension should be enough assuming you've managed to load the json into a dict.
response = json.loads(<your_json_string_here>)
undeleted_members = [dict(id=member['id'], name=member['name']) for member in response['members'] if not member['deleted']]
print(undeleted_members)
Returns a list of dicts with just the name & ID like:
{'id': 'W07QCRPA4', 'name': 'glinda'}
Or if you want separate lists for IDs & names:
all_ids = [member['id'] for member in response['members'] if not member['deleted']]
all_names = [member['name'] for member in response['members'] if not member['deleted']]

How can I extract data from json

I want to extract code from JSON format.
import json
json_data = '''
{
"Body": {
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1.0
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
}
'''
json_da = data['Body']
list_data = data['Body']['MpesaReceiptNumber']
print (json_da)
print (list_data)
I want to print this: NBN52K8A1J
The problem is that you have a list of dicts you need to search first:
>>> for obj in data['Body']['stkCallback']['CallbackMetadata']['Item']:
... print(obj)
...
{'Name': 'Amount', 'Value': 1.0}
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'}
{'Name': 'Balance'}
{'Name': 'TransactionDate', 'Value': 20190223144807}
{'Name': 'PhoneNumber', 'Value': 254725696042}
One possibility is
>>> [x['Value'] for x in data['Body']['stkCallback']['CallbackMetadata']['Item'] if x['Name'] == 'MpesaReceiptNumber'][0]
'NBN52K8A1J'
Just use the library json. Then you can print its inner elements
import json
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
a = json.loads(json_data)
print(a["Body"]["stkCallback"]["CallbackMetadata"]["Item"][1]["Value"])
You were almost there just have to get the the key value pair itself from the dicts and check if it is the name you wanted:
data = json.loads(json_data)
list_data = data['Body']["stkCallback"]['CallbackMetadata']['Item']
var: str
for x in list_data:
if x['Name'] == 'MpesaReceiptNumber':
var = x['Value']
break
print(var)
You can use this in the future easily by replacing the if check with the name of something else so you can grab the value depending on a variable.
I find using pprint to get the shape of the data structure is helpful when you're learning how to navigate it all out.
import json
import pprint
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
data = json.loads(json_data)
pprint.pprint(data)
Results in:
{'Body': {'stkCallback': {'CallbackMetadata': {'Item': [{'Name': 'Amount', 'Value': 1.0},
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'},
{'Name': 'Balance'},
{'Name': 'TransactionDate', 'Value': 20190223144807},
{'Name': 'PhoneNumber', 'Value': 254725696042}]},
'CheckoutRequestID': 'ws_CO_DMZ_250600506_23022019144745852',
'MerchantRequestID': '22531-976234-1',
'ResultCode': 0,
'ResultDesc': 'The service request is processed successfully.'}}}
So you should be able to see that data["Body"]["stkCallback"]["CallbackMetadata"]["Item"] gets to to the depth you need for your data.
>>> pprint.pprint(data["Body"]["stkCallback"]["CallbackMetadata"]["Item"])
[{'Name': 'Amount', 'Value': 1.0},
{'Name': 'MpesaReceiptNumber', 'Value': 'NBN52K8A1J'},
{'Name': 'Balance'},
{'Name': 'TransactionDate', 'Value': 20190223144807},
{'Name': 'PhoneNumber', 'Value': 254725696042}]
So next you need to iterate through that list and find a match (if one exists) for the MpesaReceiptNumber key.
receipt_no = None
for item in data["Body"]["stkCallback"]["CallbackMetadata"]["Item"]:
if item.get('Name') == 'MpesaReceiptNumber':
receipt_no = item.get('Value')
print(f"The receipt # is: {receipt_no}")
If you parse the json you will notice that the path to the data is not simply ['Body']['MpesaReceiptNumber']. In fact you have a list of dicts inside ['Item'] that needs to be searched.
Parsed data tree
One suggestion is to run the following code to find the data you are looking for:
import json
json_data = '{"Body":{"stkCallback":{"MerchantRequestID":"22531-976234-1","CheckoutRequestID":"ws_CO_DMZ_250600506_23022019144745852","ResultCode":0,"ResultDesc":"The service request is processed successfully.","CallbackMetadata":{"Item":[{"Name":"Amount","Value":1.00},{"Name":"MpesaReceiptNumber","Value":"NBN52K8A1J"},{"Name":"Balance"},{"Name":"TransactionDate","Value":20190223144807},{"Name":"PhoneNumber","Value":254725696042}]}}}}'
data = (json.loads(json_data))
list_data = data['Body']['stkCallback']['CallbackMetadata']['Item']
# Returns:
# [{'Name': 'Amount', 'Value': 1.0}, {'Name': 'MpesaReceiptNumber', 'Value':'NBN52K8A1J'}, {'Name': 'Balance'}, {'Name': 'TransactionDate', 'Value': 20190223144807}, {'Name': 'PhoneNumber', 'Value': 254725696042}]
# Now find Name: 'MpesaReceiptNumber' inside the dict list
find_it = next(item for item in list_data if item["Name"] == "MpesaReceiptNumber")
find_it = find_it['Value']
print (find_it)
Result
NBN52K8A1J
Use jq.
First off, it can "pretty print" any JSON data.
Put the value of json_data into a file test.json, and then show the formatted output of the JSON data with:
$ jq <test.json
{
"Body": {
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
}
Next, to extract values, a selector path needs to be given on the jq command line:
jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")|.Value' test.json
"NBN52K8A1J"
Now to make this sequence easier to understand, let's break it down component by component.
To extract and return only the .Body:
$ jq '.Body' <test.json
{
"stkCallback": {
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
}
Now let's fetch the stkCallback component:
$ jq '.Body.stkCallback' <test.json
{
"MerchantRequestID": "22531-976234-1",
"CheckoutRequestID": "ws_CO_DMZ_250600506_23022019144745852",
"ResultCode": 0,
"ResultDesc": "The service request is processed successfully.",
"CallbackMetadata": {
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
}
Ok, now the callbackMetadata:
$ jq '.Body.stkCallback.CallbackMetadata' <test.json
{
"Item": [
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
}
Next, the Item part:
$ jq '.Body.stkCallback.CallbackMetadata.Item' <test.json
[
{
"Name": "Amount",
"Value": 1
},
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
},
{
"Name": "Balance"
},
{
"Name": "TransactionDate",
"Value": 20190223144807
},
{
"Name": "PhoneNumber",
"Value": 254725696042
}
]
Notice that the result is an array of list items? Let's filter the data out of the array:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]' <test.json
{
"Name": "Amount",
"Value": 1
}
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
}
{
"Name": "Balance"
}
{
"Name": "TransactionDate",
"Value": 20190223144807
}
{
"Name": "PhoneNumber",
"Value": 254725696042
}
Now the result is just the list of tuples, each with a "Name" and "Value". So, let's select just the one we (you) wanted:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")' <test.json
{
"Name": "MpesaReceiptNumber",
"Value": "NBN52K8A1J"
}
Cool. We've got the tuple we wanted. Let's extract just the value now:
$ jq '.Body.stkCallback.CallbackMetadata.Item|.[]|select(.Name == "MpesaReceiptNumber")|.Value' <test.json
"NBN52K8A1J"
And, there you go.
Simplest way to get the value associated with a specific CallbackMetadata item name:
json_string = '''
{
"Body": {
"stkCallback": {
...
}
'''
json_data = json.loads(json_string)
for item in json_data["Body"]["stkCallback"]["CallbackMetadata"]["Item"]:
if item["Name"] == "MpesaReceiptNumber":
print(item["Value"]) # -> NBN52K8A1J

Accessing nested value in loop in json using python

I want to fetch the value of each api3 in this json object where each array has api3 value.
{
"count": 10,
"result": [
{
"type": "year",
"year": {
"month": {
"api1": {
"href": "https://Ap1.com"
},
"api2": {
"href": "FETCH-CONTENT"
},
"api3": {
"href": "https://Ap3.com"
},
"api4": {
"href": "https://Ap4.com"
}
},
"id": "sdvnkjsnvj",
"summary": "summeryc",
"type": "REST",
"apiId": "mlksmfmksdfs",
"idProvider": {
"id": "sfsmkfmskf",
"name": "Apikey"
},
"tags": []
}
},
{
"type": "year1",
"year": {
"month": {
"api1": {
"href": "https://Ap11.com"
},
"api2": {
"href": "FETCH-CONTENT-1"
},
"api3": {
"href": "https://Ap13.com"
},
"api4": {
"href": "https://Ap14.com"
}
},
"id": "sdvnkjsnvj",
"summary": "summeryc",
"type": "REST",
"apiId": "mlksmfmksdfs",
"idProvider": {
"id": "sfsmkfmskf",
"name": "Apikey"
},
"tags": []
}
},
I am able to get the whole json object and first value inside it.
with open('C:\python\examplee.json','r+') as fr:
data = json.load(fr)
print(data["result"])
Thank you in advance for helping me figuring this.
For each element in list of result key, get the value for the nested dictionary within item
print([item['year']['month']['api3'] for item in data['result']])
The output will be [{'href': 'https://Ap3.com'}, {'href': 'https://Ap13.com'}]
Or if you want to get the href value as well
print([item['year']['month']['api3']['href'] for item in data['result']])
The output will be
['https://Ap3.com', 'https://Ap13.com']
So your whole code will look like
data = {}
with open('C:\python\examplee.json','r+') as fr:
data = json.load(fr)
print([item['year']['month']['api3']['href'] for item in dct['result']])
Looks like your JSON schema is static so you can just use this:
print([x['year']['month']['api3']['href'] for x in data['result']])
will return you:
['https://Ap3.com', 'https://Ap13.com']

Categories