how can I calculate in python the values JSON file in the following example:
"items": [
{
"start": "0.6",
"end": "0.9",
"alter": [
{
"conf": "0.6",
"content": ""
}
],
"type": "pron"
},
]
import json
with open("./file.json") as f:
dict_data = json.load(f) # passing file object and will return json in dictionary datatype
confidences = [float(i['alternatives'][0]['confidence']) for i in dict_data['items']]
confidence_avg = sum(confidences) / len(confidences)
print(confidence_avg)
Output:
0.8534666666666667
For starters, your JSON file is missing the first and last curly brackets, so I've added them manually. Without them, it is not valid JSON.
Use json.loads to parse the JSON string and return a dict.
The confidence values are stored as strings, so they need to be transformed to floats.
Add them one by one and divide by the number of confidence values. In this case we assume each item has only 1.
import json
json_str = r"""{
"items": [
{
"start_time": "0.0",
"end_time": "0.46",
"alternatives": [
{
"confidence": "0.9534",
"content": "رسالة"
}
],
"type": "pronunciation"
},
{
"start_time": "0.46",
"end_time": "0.69",
"alternatives": [
{
"confidence": "0.6475",
"content": "اللغة"
}
],
"type": "pronunciation"
},
{
"start_time": "0.69",
"end_time": "1.23",
"alternatives": [
{
"confidence": "0.9595",
"content": "العربية"
}
],
"type": "pronunciation"
}
]
}"""
items = json.loads(json_str)["items"]
average = 0
for item in items:
confidence = float(item["alternatives"][0]["confidence"])
average += confidence
average /= len(items)
print(average)
Output:
0.8534666666666667
Related
I am trying to get the values of the properties in JSON but I'm having a hard time fetching the ones inside an object array.
I have a function that gets a test JSON which has these lines of code:
def get_test_body() -> str:
directory = str(pathlib.Path(__file__).parent.parent.as_posix())
f = open(directory + '/tests/json/test.json', "r")
body = json.loads(f.read())
f.close()
return body
This is the first half of the JSON file (modified the names):
"id": "112358",
"name": "test",
"source_type": "SqlServer",
"connection_string_name": "123134-SQLTest-ConnectionString",
"omg_test": "12312435-123123-41232b5-asd123-1232145",
"triggers": [
{
"frequency": "Day",
"interval": 1,
"start_time": "2019-06-17T21:37:00",
"end_time": "2019-06-18T21:37:00",
"schedule": [
{
"hours": [
2
],
"minutes": [
0
],
"week_days": [],
"month_days": [],
"monthly_occurrences": []
}
]
}
]
The triggers has more objects within it I couldn't figure out the syntax for it.
I am then able to fetch the some of the data using:
name = body['name']
But I couldn't fetch anything under the triggers Array. I tried using body['triggers']['frequency'] and even ['triggers'][0] (lol) but I couldn't get it to work. I'm fairly new to Python any help would be appreciated!
I getting the right output, even with bwhat you did?
import json
string = """
{
"id": "112358",
"name": "test",
"source_type": "SqlServer",
"connection_string_name": "123134-SQLTest-ConnectionString",
"omg_test": "12312435-123123-41232b5-asd123-1232145",
"triggers": [
{
"frequency": "Day",
"interval": 1,
"start_time": "2019-06-17T21:37:00",
"end_time": "2019-06-18T21:37:00",
"schedule": [
{
"hours": [
2
],
"minutes": [
0
],
"week_days": [],
"month_days": [],
"monthly_occurrences": []
}
]
}
]
}
"""
str_dict = json.loads(string)
print(str_dict["triggers"][0]["frequency"])
Giving me Day
I am trying to convert Python code to extract key/value pairs from JSON output (originating from Microsoft Form Recognizer) and cannot recreate the loop within VB.NET (UiPath).
So far I created a nested for-each loop in UiPath to loop through each key/value pair within each page.
import json
response_file = "response.json"
# Specify the list of keys to be reported.
keys = set(["Number","Opened"])
with open(response_file, mode = "r", encoding = "utf-8") as f:
data = json.load(f)
# Loop over all pages in the document.
for page in data["pages"]:
# Loop over all key/value pairs in the page.
for kvp in page["keyValuePairs"]:
key_txt = " ".join([x["text"] for x in kvp["key"]])
# Report only the pre-specified subset of keys.
if key_txt in keys:
print("key: %s" % key_txt)
vals = [x["text"] for x in kvp["value"]]
print("value: %s" % " ".join(vals))
The JSON example I am using:
{
"status": "success",
"pages": [
{
"number": 1,
"height": 792,
"width": 612,
"clusterId": 0,
"keyValuePairs": [
{
"key": [
{
"text": "Number",
"boundingBox": [
71.6,
704.6,
109.0,
704.6,
109.0,
693.6,
71.6,
693.6
]
}
],
"value": [
{
"text": "RITM0041763",
"boundingBox": [
178.7,
704.6,
241.4,
704.6,
241.4,
693.6,
178.7,
693.6
],
"confidence": 1.0
}
]
},
{
"key": [
{
"text": "Opened",
"boundingBox": [
321.0,
704.6,
357.8,
704.6,
357.8,
693.6,
321.0,
693.6
]
}
],
"value": [
{
"text": "09/21/2018 09:04:01 AM",
"boundingBox": [
428.1,
704.6,
536.5,
704.6,
536.5,
693.6,
428.1,
693.6
],
"confidence": 1.0
}
]
The error in UiPath (running on VB.NET) is 'bracketed identifier is missing closing ']'.
I have rather very weird requirement now. I have below json and somehow I have to convert it into flat csv.
[
{
"authorizationQualifier": "SDA",
"authorizationInformation": " ",
"securityQualifier": "ASD",
"securityInformation": " ",
"senderQualifier": "ASDAD",
"senderId": "FADA ",
"receiverQualifier": "ADSAS",
"receiverId": "ADAD ",
"date": "140101",
"time": "0730",
"standardsId": null,
"version": "00501",
"interchangeControlNumber": "123456789",
"acknowledgmentRequested": "0",
"testIndicator": "T",
"functionalGroups": [
{
"functionalIdentifierCode": "ADSAD",
"applicationSenderCode": "ASDAD",
"applicationReceiverCode": "ADSADS",
"date": "20140101",
"time": "07294900",
"groupControlNumber": "123456789",
"responsibleAgencyCode": "X",
"version": "005010X221A1",
"transactions": [
{
"name": "ASDADAD",
"transactionSetIdentifierCode": "adADS",
"transactionSetControlNumber": "123456789",
"implementationConventionReference": null,
"segments": [
{
"BPR03": "ad",
"BPR14": "QWQWDQ",
"BPR02": "1.57",
"BPR13": "23223",
"BPR01": "sad",
"BPR12": "56",
"BPR10": "32424",
"BPR09": "12313",
"BPR08": "DA",
"BPR07": "123456789",
"BPR06": "12313",
"BPR05": "ASDADSAD",
"BPR16": "21313",
"BPR04": "SDADSAS",
"BPR15": "11212",
"id": "aDSASD"
},
{
"TRN02": "2424",
"TRN03": "35435345",
"TRN01": "3435345",
"id": "FSDF"
},
{
"REF02": "fdsffs",
"REF01": "sfsfs",
"id": "fsfdsfd"
},
{
"DTM02": "2432424",
"id": "sfsfd",
"DTM01": "234243"
}
],
"loops": [
{
"id": "24324234234",
"segments": [
{
"N101": "sfsfsdf",
"N102": "sfsf",
"id": "dgfdgf"
},
{
"N301": "sfdssfdsfsf",
"N302": "effdssf",
"id": "fdssf"
},
{
"N401": "sdffssf",
"id": "sfds",
"N402": "sfdsf",
"N403": "23424"
},
{
"PER06": "Wsfsfdsfsf",
"PER05": "sfsf",
"PER04": "23424",
"PER03": "fdfbvcb",
"PER02": "Pedsdsf",
"PER01": "sfsfsf",
"id": "fdsdf"
}
]
},
{
"id": "2342",
"segments": [
{
"N101": "sdfsfds",
"N102": "vcbvcb",
"N103": "dsfsdfs",
"N104": "343443",
"id": "fdgfdg"
},
{
"N401": "dfsgdfg",
"id": "dfgdgdf",
"N402": "dgdgdg",
"N403": "234244"
},
{
"REF02": "23423342",
"REF01": "fsdfs",
"id": "sfdsfds"
}
]
}
]
}
]
}
]
}
]
The column header name corresponding to deeper key-value make take nested form, like functionalGroups[0].transactions[0].segments[0].BPR15.
I am able to do this in java using this github project (here you can find the output format I desire in the explanation) in one line:
flatJson = JSONFlattener.parseJson(new File("files/simple.json"), "UTF-8");
The output was:
date,securityQualifier,testIndicator,functionalGroups[1].functionalIdentifierCode,functionalGroups[1].date,functionalGroups[1].applicationReceiverCode, ...
140101,00,T,HP,20140101,ETIN,...
But I want to do this in python. I tried as suggested in this answer:
with open('data.json') as data_file:
data = json.load(data_file)
df = json_normalize(data, record_prefix=True)
with open('temp2.csv', "w", newline='\n') as csv_file:
csv_file.write(df.to_csv())
However, for column functionalGroups, it dumps json as a cell value.
I also tried as suggested in this answer:
with open('data.json') as f: # this ensures opening and closing file
a = json.loads(f.read())
df = pandas.DataFrame(a)
print(df.transpose())
But this also seem to do the same:
0
acknowledgmentRequested 0
authorizationInformation
authorizationQualifier SDA
date 140101
functionalGroups [{'functionalIdentifierCode': 'ADSAD', 'applic...
interchangeControlNumber 123456789
receiverId ADAD
receiverQualifier ADSAS
securityInformation
securityQualifier ASD
senderId FADA
senderQualifier ASDAD
standardsId None
testIndicator T
time 0730
version 00501
Is it possible to do what I desire in python?
I have never had experience with parsing JSON files until last week when I was given this task: to read 23 MB JSON file with some Python script and store some specific data to CSV. I've been searching a lot last days how to parse it, seen different implementations how one can do it with Python, but nothing works in my case. There is an example of JSON objects in the file:
{
"created": "2017-01-19T04:39:41.012",
"expired": "2017-01-21T04:39:41.012",
"id": "0000e0be-d2c6-4a89-ad37-8f71d0dd9e9a",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "f1aa98d6-ff25-4dde-81f5-2587ccbe36af",
"tasks": [
{
"id": "ffbc4048-cc5a-4578-b0d9-0705a588b55d",
"input_values": {
"address-ru": "\u0420\u043e\u0441\u0441\u0438\u044f, \u0421\u0432\u0435\u0440\u0434\u043b\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c, \u041f\u0435\u0440\u0432\u043e\u0443\u0440\u0430\u043b\u044c\u0441\u043a, 1-\u044f \u041f\u0438\u043b\u044c\u043d\u0430\u044f \u0443\u043b\u0438\u0446\u0430",
"company-id": "1542916387",
"coordinates": "56.91969408920,60.03087172680",
"country": "RU",
"language": "RU",
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0421\u0443\u043f\u0435\u0440\u043c\u0430\u0440\u043a\u0435\u0442",
"rubric-id": 184108079
}
]
}
}
],
"user_id": "165684b434e6390fb8da262978601397"
},
{
"created": "2017-02-24T16:08:10.280",
"expired": "2017-02-26T16:08:10.280",
"id": "0001b81e-dbcc-4de3-985d-4397b97dbffa",
"mixed": false,
"pool_id": "189591",
"reward": 0.5,
"status": "EXPIRED",
"task_suite_id": "5dcbbd70-e570-4026-8246-a30bb462f35d",
"tasks": [
{
"id": "90437e00-d15c-4679-b7be-6d3660efdbce",
"input_values": {
"address-ru": "\u041c\u043e\u0441\u043a\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b., \u041a\u043e\u0440\u043e\u043b\u0435\u0432, \u043c\u0438\u043a\u0440\u043e\u0440\u0430\u0439\u043e\u043d \u0412\u0430\u043b\u0435\u043d\u0442\u0438\u043d\u043e\u0432\u043a\u0430, \u0443\u043b. \u0413\u043e\u0440\u044c\u043a\u043e\u0433\u043e, 12, \u043a\u043e\u0440\u043f.\u0412",
"company-id": "662316782",
"coordinates": "55.915326,37.869891",
"country": "RU",
"language": "RU",
"meta": [
{
"permlink-id": 1119957838
}
],
"name-ru": "\u041d\u0435\u0430\u0442\u044d\u043b",
"org-weight": "30",
"rubric": [
{
"name-ru": "\u0420\u0435\u043c\u043e\u043d\u0442 \u0438\u0437\u043c\u0435\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0438\u0431\u043e\u0440\u043e\u0432",
"rubric-id": 184106846
},
{
"name-ru": "\u0412\u043e\u0434\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0433\u0430\u0437\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438, \u0442\u0435\u043f\u043b\u043e\u0441\u0447\u0435\u0442\u0447\u0438\u043a\u0438",
"rubric-id": 184106834
},
{
"name-ru": "\u041e\u0442\u043e\u043f\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u043e\u0431\u043e\u0440\u0443\u0434\u043e\u0432\u0430\u043d\u0438\u0435 \u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b",
"rubric-id": 184107475
}
]
}
}
],
"user_id": "0ba1f0e613c9b1db5fcbddd342e44a15"
},
...and so on for several hundred of thousand lines.
If I remove spaces and commas between JSON objects manually, this code (which I've found on Stackoverflow) seems to work:
import json
json_objects = []
def stream_read_json(file):
start_pos = 0
while True:
try:
obj = json.load(file)
yield obj
return
except json.JSONDecodeError as e:
file.seek(start_pos)
json_str = file.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
with open('task1.json', 'r') as source:
objCount = 0
for data in stream_read_json(source):
json_objects.append(data)
objCount += 1
print('Added ' + str(objCount) + 'th json object.')
But I just can't find anywhere how to get rid of this spaces and commas while reading JSON file. It is even more frustrating that I can't find any tutorial or manual how to write JSON parser with Python for different cases to be able to do it by myself without bothering Stackoverflow.
Any hints and thoughts will be very appreciated. Thank you in advance.
I have data in JSON format:
data = {"outfit":{"shirt":"red,"pants":{"jeans":"blue","trousers":"khaki"}}}
I'm attempting to plot this data into a decision tree using InfoVis, because it looks pretty and interactive. The problem is that their graph takes JSON data in this format:
data = {id:"nodeOutfit",
name:"outfit",
data:{},
children:[{
id:"nodeShirt",
name:"shirt",
data:{},
children:[{
id:"nodeRed",
name:"red",
data:{},
children:[]
}],
}, {
id:"nodePants",
name:"pants",
data:{},
children:[{
id:"nodeJeans",
name:"jeans",
data:{},
children:[{
id:"nodeBlue",
name:"blue",
data:{},
children[]
},{
id:"nodeTrousers",
name:"trousers",
data:{},
children:[{
id:"nodeKhaki",
name:"khaki",
data:{},
children:[]
}
}
Note the addition of 'id', 'data' and 'children' to every key and value and calling every key and value 'name'. I feel like I have to write a recursive function to add these extra values. Is there an easy way to do this?
Here's what I want to do but I'm not sure if it's the right way. Loop through all the keys and values and replace them with the appropriate:
for name, list in data.iteritems():
for dict in list:
for key, value in dict.items():
#Need something here which changes the value for each key and values
#Not sure about the syntax to change "outfit" to name:"outfit" as well as
#adding id:"nodeOutfit", data:{}, and 'children' before the value
Let me know if I'm way off.
Here is their example http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.html
And here's the data http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.code.html
A simple recursive solution:
data = {"outfit":{"shirt":"red","pants":{"jeans":"blue","trousers":"khaki"}}}
import json
from collections import OrderedDict
def node(name, children):
n = OrderedDict()
n['id'] = 'node' + name.capitalize()
n['name'] = name
n['data'] = {}
n['children'] = children
return n
def convert(d):
if type(d) == dict:
return [node(k, convert(v)) for k, v in d.items()]
else:
return [node(d, [])]
print(json.dumps(convert(data), indent=True))
note that convert returns a list, not a dict, as data could also have more then one key then just 'outfit'.
output:
[
{
"id": "nodeOutfit",
"name": "outfit",
"data": {},
"children": [
{
"id": "nodeShirt",
"name": "shirt",
"data": {},
"children": [
{
"id": "nodeRed",
"name": "red",
"data": {},
"children": []
}
]
},
{
"id": "nodePants",
"name": "pants",
"data": {},
"children": [
{
"id": "nodeJeans",
"name": "jeans",
"data": {},
"children": [
{
"id": "nodeBlue",
"name": "blue",
"data": {},
"children": []
}
]
},
{
"id": "nodeTrousers",
"name": "trousers",
"data": {},
"children": [
{
"id": "nodeKhaki",
"name": "khaki",
"data": {},
"children": []
}
]
}
]
}
]
}
]