Get property in nested JSON Python - python

I am trying to get the values of the properties in JSON but I'm having a hard time fetching the ones inside an object array.
I have a function that gets a test JSON which has these lines of code:
def get_test_body() -> str:
directory = str(pathlib.Path(__file__).parent.parent.as_posix())
f = open(directory + '/tests/json/test.json', "r")
body = json.loads(f.read())
f.close()
return body
This is the first half of the JSON file (modified the names):
"id": "112358",
"name": "test",
"source_type": "SqlServer",
"connection_string_name": "123134-SQLTest-ConnectionString",
"omg_test": "12312435-123123-41232b5-asd123-1232145",
"triggers": [
{
"frequency": "Day",
"interval": 1,
"start_time": "2019-06-17T21:37:00",
"end_time": "2019-06-18T21:37:00",
"schedule": [
{
"hours": [
2
],
"minutes": [
0
],
"week_days": [],
"month_days": [],
"monthly_occurrences": []
}
]
}
]
The triggers has more objects within it I couldn't figure out the syntax for it.
I am then able to fetch the some of the data using:
name = body['name']
But I couldn't fetch anything under the triggers Array. I tried using body['triggers']['frequency'] and even ['triggers'][0] (lol) but I couldn't get it to work. I'm fairly new to Python any help would be appreciated!

I getting the right output, even with bwhat you did?
import json
string = """
{
"id": "112358",
"name": "test",
"source_type": "SqlServer",
"connection_string_name": "123134-SQLTest-ConnectionString",
"omg_test": "12312435-123123-41232b5-asd123-1232145",
"triggers": [
{
"frequency": "Day",
"interval": 1,
"start_time": "2019-06-17T21:37:00",
"end_time": "2019-06-18T21:37:00",
"schedule": [
{
"hours": [
2
],
"minutes": [
0
],
"week_days": [],
"month_days": [],
"monthly_occurrences": []
}
]
}
]
}
"""
str_dict = json.loads(string)
print(str_dict["triggers"][0]["frequency"])
Giving me Day

Related

calculate json values average in python

how can I calculate in python the values JSON file in the following example:
"items": [
{
"start": "0.6",
"end": "0.9",
"alter": [
{
"conf": "0.6",
"content": ""
}
],
"type": "pron"
},
]
import json
with open("./file.json") as f:
dict_data = json.load(f) # passing file object and will return json in dictionary datatype
confidences = [float(i['alternatives'][0]['confidence']) for i in dict_data['items']]
confidence_avg = sum(confidences) / len(confidences)
print(confidence_avg)
Output:
0.8534666666666667
For starters, your JSON file is missing the first and last curly brackets, so I've added them manually. Without them, it is not valid JSON.
Use json.loads to parse the JSON string and return a dict.
The confidence values are stored as strings, so they need to be transformed to floats.
Add them one by one and divide by the number of confidence values. In this case we assume each item has only 1.
import json
json_str = r"""{
"items": [
{
"start_time": "0.0",
"end_time": "0.46",
"alternatives": [
{
"confidence": "0.9534",
"content": "رسالة"
}
],
"type": "pronunciation"
},
{
"start_time": "0.46",
"end_time": "0.69",
"alternatives": [
{
"confidence": "0.6475",
"content": "اللغة"
}
],
"type": "pronunciation"
},
{
"start_time": "0.69",
"end_time": "1.23",
"alternatives": [
{
"confidence": "0.9595",
"content": "العربية"
}
],
"type": "pronunciation"
}
]
}"""
items = json.loads(json_str)["items"]
average = 0
for item in items:
confidence = float(item["alternatives"][0]["confidence"])
average += confidence
average /= len(items)
print(average)
Output:
0.8534666666666667

Extract values from json based on select condition using python

I am trying to Extract values from json based on select condition using python.
My Json file looks like below:
{
"bindings": [
{
"members": [
"user:rohithmn3#gmail.com"
],
"role": "roles/browser"
},
{
"members": [
"serviceAccount:admin-user#linuxacademy-3.iam.gserviceaccount.com",
"user:rohithmn03#gmail.com"
],
"role": "roles/owner"
},
{
"members": [
"user:rohithmn3#gmail.com"
],
"role": "roles/viewer"
}
],
"etag": "BwrRsH-UhJ0=",
"version": 1
}
I am trying to parse this above file in python based on the user. For Example: Get the roles defined for user rohithmn3#gmail.com; as per the json the output should be :
roles/browser
roles/viewer
Regards,
Rohith
Using a list comprehension and dictionary input d:
var = 'rohithmn3#gmail.com'
res = [subd['role'] for subd in d['bindings'] if 'user:'+var in subd['members']]
print(res)
['roles/browser', 'roles/viewer']
Setup
d = {
"bindings": [
{
"members": [
"user:rohithmn3#gmail.com"
],
"role": "roles/browser"
},
{
"members": [
"serviceAccount:admin-user#linuxacademy-3.iam.gserviceaccount.com",
"user:rohithmn03#gmail.com"
],
"role": "roles/owner"
},
{
"members": [
"user:rohithmn3#gmail.com"
],
"role": "roles/viewer"
}
],
"etag": "BwrRsH-UhJ0=",
"version": 1
}

dictionary does not give me unique Ids in python

I have the output of an elasticsearch query saved in a file. The first few lines looks like this:
{"took": 1,
"timed_out": false,
"_shards": {},
"hits": {
"total": 27,
"max_score": 6.5157733,
"hits": [
{
"_index": "dbgap_062617",
"_type": "dataset",
***"_id": "595189d15152c64c3b0adf16"***,
"_score": 6.5157733,
"_source": {
"dataAcquisition": {
"performedBy": "\n\t\tT\n\t\t"
},
"provenance": {
"ingestTime": "201",
},
"studyGroup": [
{
"Identifier": "1",
"name": "Diseas"
}
],
"license": {
"downloadURL": "http",
},
"study": {
"alternateIdentifiers": "yes",
},
"disease": {
"name": [
"Coronary Artery Disease"
]
},
"NLP_Fields": {
"CellLine": [],
"MeshID": [
"C0066533",
],
"DiseaseID": [
"C0010068"
],
"ChemicalID": [],
"Disease": [
"coronary artery disease"
],
"Chemical": [],
"Meshterm": [
"migen",
]
},
"datasetDistributions": [
{
"dateReleased": "20150312",
}
],
"dataset": {
"citations": [
"20032323"
],
**"description": "The Precoc.",**
**"title": "MIGen_ExS: PROCARDIS"**
},
.... and the list goes on with a bunch of other items ....
From all of these nodes I was interested in Unique _Ids, title, and description. So, I created a dictionary and extracted the parts that I was interested in using json. Here is my code:
import json
s={}
d=open('local file','w')
with open('localfile', 'r') as ready:
for line in ready:
test=json.loads(line, encoding='utf-8')
for i in (test['hits']['hits']):
for x in i:
s.setdefault(i['_id'], [i['_source']['dataset']
['description'], i['_source']['dataset']['title']])
for k, v in s.items():
d.write(k +'\t'+v[0] +'\t' + v[1] + '\n')
d.close()
Now, when I run it, it gives me a file with duplicated _Ids! Does not dictionary suppose to give me unique _Ids? In my original output file, I have lots of duplicated Ids that I wanted to get rid of them.
Also, I ran set() only on _ids to get unique number of them and it came to 138. But with dictionary if i remove generated duplicated ids it comes down to 17!
Can someone please tell me why this is happening?
If you want a unique ID, if you're using a database it will create it for you. If you're not, you'll need to generate a unique number or string. Depending on how the dictionaries are created, you could use the timestamp of when the dictionary was created, or you could use uuid.uuid4(). For more info on uuid, here are the docs.

Read sequence of JSON objects with spaces and commas in between with Python

I have never had experience with parsing JSON files until last week when I was given this task: to read 23 MB JSON file with some Python script and store some specific data to CSV. I've been searching a lot last days how to parse it, seen different implementations how one can do it with Python, but nothing works in my case. There is an example of JSON objects in the file:
{
"created": "2017-01-19T04:39:41.012",
"expired": "2017-01-21T04:39:41.012",
"id": "0000e0be-d2c6-4a89-ad37-8f71d0dd9e9a",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "f1aa98d6-ff25-4dde-81f5-2587ccbe36af",
"tasks": [
{
"id": "ffbc4048-cc5a-4578-b0d9-0705a588b55d",
"input_values": {
"address-ru": "\u0420\u043e\u0441\u0441\u0438\u044f, \u0421\u0432\u0435\u0440\u0434\u043b\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c, \u041f\u0435\u0440\u0432\u043e\u0443\u0440\u0430\u043b\u044c\u0441\u043a, 1-\u044f \u041f\u0438\u043b\u044c\u043d\u0430\u044f \u0443\u043b\u0438\u0446\u0430",
"company-id": "1542916387",
"coordinates": "56.91969408920,60.03087172680",
"country": "RU",
"language": "RU",
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"rubric-id": 184108079
}
]
}
}
],
"user_id": "165684b434e6390fb8da262978601397"
},
{
"created": "2017-02-24T16:08:10.280",
"expired": "2017-02-26T16:08:10.280",
"id": "0001b81e-dbcc-4de3-985d-4397b97dbffa",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "5dcbbd70-e570-4026-8246-a30bb462f35d",
"tasks": [
{
"id": "90437e00-d15c-4679-b7be-6d3660efdbce",
"input_values": {
"address-ru": "\u041c\u043e\u0441\u043a\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b., \u041a\u043e\u0440\u043e\u043b\u0435\u0432, \u043c\u0438\u043a\u0440\u043e\u0440\u0430\u0439\u043e\u043d \u0412\u0430\u043b\u0435\u043d\u0442\u0438\u043d\u043e\u0432\u043a\u0430, \u0443\u043b. \u0413\u043e\u0440\u044c\u043a\u043e\u0433\u043e, 12, \u043a\u043e\u0440\u043f.\u0412",
"company-id": "662316782",
"coordinates": "55.915326,37.869891",
"country": "RU",
"language": "RU",
"meta": [
{
"permlink-id": 1119957838
}
],
"name-ru": "\u041d\u0435\u0430\u0442\u044d\u043b",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0420\u0435\u043c\u043e\u043d\u0442 \u0438\u0437\u043c\u0435\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0438\u0431\u043e\u0440\u043e\u0432",
"rubric-id": 184106846
},
{
"name-ru": "\u0412\u043e\u0434\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0433\u0430\u0437\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0442\u0435\u043f\u043b\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438",
"rubric-id": 184106834
},
{
"name-ru": "\u041e\u0442\u043e\u043f\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u043e\u0431\u043e\u0440\u0443\u0434\u043e\u0432\u0430\u043d\u0438\u0435 \u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b",
"rubric-id": 184107475
}
]
}
}
],
"user_id": "0ba1f0e613c9b1db5fcbddd342e44a15"
},
...and so on for several hundred of thousand lines.
If I remove spaces and commas between JSON objects manually, this code (which I've found on Stackoverflow) seems to work:
import json
json_objects = []
def stream_read_json(file):
start_pos = 0
while True:
try:
obj = json.load(file)
yield obj
return
except json.JSONDecodeError as e:
file.seek(start_pos)
json_str = file.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
with open('task1.json', 'r') as source:
objCount = 0
for data in stream_read_json(source):
json_objects.append(data)
objCount += 1
print('Added ' + str(objCount) + 'th json object.')
But I just can't find anywhere how to get rid of this spaces and commas while reading JSON file. It is even more frustrating that I can't find any tutorial or manual how to write JSON parser with Python for different cases to be able to do it by myself without bothering Stackoverflow.
Any hints and thoughts will be very appreciated. Thank you in advance.

API Nested JSON Response TO CSV

I am trying to convert a Nested JSON Response to CSV. Following is the JSON Response
{
"rows": [
[
{
"postId": 188365573,
"messageId": 198365562,
"accountId": 214,
"messageType": 2,
"channelType": "TWITTER",
"accountType": "TWITTER",
"taxonomy": {
"campaignId": "2521_4",
"clientCustomProperties": {
"PromotionChannelAbbreviation": [
"3tw"
],
"PromotionChannels": [
"Twitter"
],
"ContentOwner": [
"Audit"
],
"Location": [
"us"
],
"Sub_Category": [
"dbriefs"
],
"ContentOwnerAbbreviation": [
"aud"
],
"PrimaryPurpose_Outcome": [
"Engagement"
],
"PrimaryPurposeOutcomeAbbv": [
"eng"
]
},
"partnerCustomProperties": {},
"tags": [],
"urlShortnerDomain": "2721_spr.ly"
},
"approval": {
"approvalOption": "NONE",
"comment": ""
},
"status": "SENT",
"createdDate": 1433331585000,
"scheduleDate": 1435783440000,
"version": 4,
"deleted": false,
"publishedDate": 1435783441000,
"statusID": "6163465412728176",
"permalink": "https://twitter.com/Acctg/status/916346541272498176",
"additional": {
"links": []
}
},
0
],
[
{
"postId": 999145171,
"messageId": 109145169,
"accountId": 21388,
"messageType": 2,
"channelType": "TWITTER",
"accountType": "TWITTER",
"taxonomy": {
"campaignId": "2521_4",
"clientCustomProperties": {
"PromotionChannelAbbreviation": [
"3tw"
],
"Eminence_Registry_Number": [
"1000159"
],
"PromotionChannels": [
"Twitter"
],
"ContentOwner": [
"Ctr. Health Solutions"
],
"Location": [
"us"
],
"Sub_Category": [
"fraud"
],
"ContentOwnerAbbreviation": [
"chs"
],
"PrimaryPurpose_Outcome": [
"Awareness"
],
"PrimaryPurposeOutcomeAbbv": [
"awa"
]
},
"partnerCustomProperties": {},
"tags": [],
"urlShortnerDomain": "2521_spr.ly"
},
"approval": {
"approvalOption": "NONE",
"comment": ""
},
"status": "SENT",
"createdDate": 1434983660000,
"scheduleDate": 1435753800000,
"version": 4,
"deleted": false,
"publishedDate": 1435753801000,
"statusID": "616222222198407168",
"permalink": "https://twitter.com/Health/status/6162222221984070968",
"additional": {
"links": []
}
},
0
]
}
And the python code I am using to covert this is
import json
import csv
# importing the data
with open('Post_Insights_test.json') as Test:
data1 = json.load(Test)
# opening the csv
csvdata= open('Data_table2.csv', 'w')
csvwriter = csv.writer(csvdata, delimiter=',')
#Taking the keys out from 1st dict, that too which aren't nested
header= data1["rows"][1][0].keys()
csvwriter.writerow(header)
for i in range(0,70):
csvwriter.writerow(data1["rows"][i][0].values())
csvdata.close()
Problems are following:
Unable to get the keys for nested responses like taxonomy
Unable to get the values for nested responses like taxonomy
Many responses have different headers/ keys, so ideally I should have them as headers in my excel, but I am not able to figure out how to do it in python
My excel sheet shows gap of row after every entry , I dont know why
Please help. All criticism are welcome. Kind Regards

Categories