Parse a file with JSON objects in Python

Parse a file with JSON objects in Python - python

I have a file with this type of structure:
{
"key" : "A",
"description" : "1",
"uninterestingInformation" : "whatever"
}
{
"key" : "B",
"description" : "2",
"uninterestingInformation" : "whatever"
}
{
"key" : "C",
"description" : "3",
"uninterestingInformation" : "whatever"
}
I want to build a dictionary in Python that contains the key as key and the description as value. I have more fields, but just the 2 of them are interesting for me.
This file is not exactly a .json file, is a file with a lot of similar json objects.
json.loads is not working, obviously.
Any suggestion on how to read the data?
I've already read this post, but my json object is not on one line...
EDIT:
If it wasn't clear in my explanations, the example is quite accurate, I have a lot of similar JSON objects, one after another, separated by new line (\n), with no comma. So, overall the file is not a valid JSON file, while each object is a valid JSON object.
The solution I've applied finally was:
api_key_file = open('mongo-config.json').read()
api_key_file = '[' + api_key_file + ']'
api_key_file= api_key_file.replace("}\n{", "},\n{")
api_key_data = json.loads(api_key_file)
api_key_description = {}
for data in api_key_data:
api_key_description[data['apiKey']] = data['description']
It worked well for my situation. There are maybe better ways of doing this explained in the comments bellow.

Another option would be to use the literal_eval function from the ast module, after making the necessary changes so that it fits the format of a valid type:
from ast import literal_eval
inJson = '''{
"key" : "A"
"description" : "1"
"uninterestingInformation" : "whatever"
}
{
"key" : "B"
"description" : "2"
"uninterestingInformation" : "whatever"
}
{
"key" : "C"
"description" : "3"
"uninterestingInformation" : "whatever"
}'''
inJson = "[" + inJson.replace("}", "},")[:-1] + "]"
inJson = inJson.replace("\"\n ","\",")
newObject = literal_eval(inJson)
print(newObject)
Output:
[{'key': 'A', 'description': '1', 'uninterestingInformation': 'whatever'}, {'key': 'B', 'description': '2', 'uninterestingInformation': 'whatever'}, {'key': 'C', 'description': '3', 'uninterestingInformation': 'whatever'}]

You can use re.split to split the file content into appropriate JSON strings for parsing:
import re
import json
j='''{
"key" : "A",
"description" : "1",
"uninterestingInformation" : "whatever"
}
{
"key" : "B",
"description" : "2",
"uninterestingInformation" : "whatever"
}
{
"key" : "C",
"description" : "3",
"uninterestingInformation" : "whatever"
}'''
print(list(map(json.loads, re.split(r'(?<=})\n(?={)', j))))
This outputs:
[{'key': 'A', 'description': '1', 'uninterestingInformation': 'whatever'}, {'key': 'B', 'description': '2', 'uninterestingInformation': 'whatever'}, {'key': 'C', 'description': '3', 'uninterestingInformation': 'whatever'}]

Related

How to Get Rid of Fields in a Dictionary of Dictionaries?

I'm making this function on Python called clean_data that has...
Parameters: A dictionary of dictionaries (all strings), and a list of strings containing the fields we care about.
Returns: A dictionary of dictionaries with only the fields we care about, and with appropriate data types.
Fields we care about:
Opponent (string)
Power Plays (“PP” -- int)
Sample input:
{"1/1" : {"opponent" : "BU", "X" : "3", "PP" : "0"},
"1/2" : {"opponent" : "HC", "X" : "4", "PP" : "1"},
"1/5" : {"opponent" : "BC", "X" : "8", "PP" : "0"}}
["opponent", "PP"]
Expected output:
{"1/1" : {"opponent" : "BU", "PP" : 0},
"1/2" : {"opponent" : "HC", "PP" : 1},
"1/5" : {"opponent" : "BC", "PP" : 0}}
Currently, I have
def clean_data(my_dict, field_ls):
t = {}
for x in field_ls:
field = x
for line in my_dict:
for i in (my_dict[line]):
if i != field:
my_dict[line].pop(i)
However, I don't quite seem to get the output I want as the pop(i) isn't really working. How can I fix my problem?

Instead of popping keys, it seems it's easier (since there's only two of them) to select the key-value pairs you actually want.
def clean_data(my_dict):
out = {}
for k,d in my_dict.items():
out[k] = {'opponent': str(d['opponent']), 'PP': int(d['PP'])}
return out
Output:
{'1/1': {'opponent': 'BU', 'PP': 0},
'1/2': {'opponent': 'HC', 'PP': 1},
'1/5': {'opponent': 'BC', 'PP': 0}}
If you absolutely have to pop keys from a dictionary, then you can copy the keys using list constructor, then iterate over that list and pop unwanted fields
def clean_data(input_dict, dict_of_fields):
my_dict = input_dict.copy()
for d in my_dict.values():
for field in list(d.keys()):
if field not in dict_of_fields:
d.pop(field)
else:
d[field] = dict_of_fields[field](d[field])
return my_dict
Example:
sample_data = {"1/1" : {"opponent" : "BU", "X" : "3", "PP" : "0"},
"1/2" : {"opponent" : "HC", "X" : "4", "PP" : "1"},
"1/5" : {"opponent" : "BC", "X" : "8", "PP" : "0"}}
fields = {"opponent": str, "PP": int}
Output:
>>> print(clean_data(sample_data, fields))
{'1/1': {'opponent': 'BU', 'PP': 0},
'1/2': {'opponent': 'HC', 'PP': 1},
'1/5': {'opponent': 'BC', 'PP': 0}}

How do I parse nested json objects?

I am trying to load a JSON file to parse the contents nested in the root object. Currently I have the JSON file open and loaded as such:
with open(outputFile.name) as f:
data = json.load(f)
For the sake of the question here is an example of what the contents of the JSON file are like:
{
"rootObject" :
{
"person" :
{
"address" : "some place ave. 123",
"age" : 47,
"name" : "Joe"
},
"kids" :
[
{
"age" : 20,
"name" : "Joey",
"studySubject":"math"
},
{
"age" : 16,
"name" : "Josephine",
"studySubject":"chemistry"
}
],
"parents" :
{
"father" : "Joseph",
"mother" : "Joette"
}
How do I access the nested objects in "rootObject", such as "person", "kids" and its contents, and "parents"?

Below code using recursive function can extract values using specific key in a nested dictionary or 'lists of dictionaries':
data = {
"rootObject" :
{
"person" :
{
"address" : "some place ave. 123",
"age" : 47,
"name" : "Joe"
},
"kids" :
[
{
"age" : 20,
"name" : "Joey",
"studySubject":"math"
},
{
"age" : 16,
"name" : "Josephine",
"studySubject":"chemistry"
}
],
"parents" :
{
"father" : "Joseph",
"mother" : "Joette"
}
}}
def get_vals(nested, key):
result = []
if isinstance(nested, list) and nested != []: #non-empty list
for lis in nested:
result.extend(get_vals(lis, key))
elif isinstance(nested, dict) and nested != {}: #non-empty dict
for val in nested.values():
if isinstance(val, (list, dict)): #(list or dict) in dict
result.extend(get_vals(val, key))
if key in nested.keys(): #key found in dict
result.append(nested[key])
return result
get_vals(data, 'person')
Output
[{'address': 'some place ave. 123', 'age': 47, 'name': 'Joe'}]

The code for loading the JSON object should look like this:
from json import loads, load
with open("file.json") as file:
var = loads(load(file))
# loads() transforms the string in a python dict object

python how to search a string, count values and group by in json

I have a python program calling an API that receives the result as below:
{
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
Now my task is to identify the number of occurrences of elements from the list in JSON output and group them. The expected output should look like this:
{
"BMW" :
{
"5series" : 3,
"3series" : 1,
"7series" : 1,
},
"AUDI" :
{
"A3" : 1,
"A7" : 1,
},
"MERCEDES":
{
"EClass" : 0,
"SClass" : 0
}
}
I need to find the "company" from list of elements. This will include names that may not be in JSON response sometimes, then the expected output should include that as 0. The "model" names (3,5,7,A3 etc..,) are fixed, so we know that's those are only ones that may or may not be in json api response.
For ex: The List has 3 company names in below code. - companyname = ["BMW,"AUDI","MERCEDES"] . However, sometimes, the JSON API response may not have one or more elements. In this case, "MERCEDES" is missing, but the final output should include "MERCEDES" as well with value as 0.
Here is what i have tried so far:
def modelcount():
companyname= ["BMW","AUDI","MERCEDES"]
url = apiurl
#Send Request
apiresponse = requests.get(url, auth=(user, password), headers=headers, proxies=proxies)
# Decode the JSON response into a dictionary and use the data
data = apiresponse.json()
print(len(data['result']))
3series= 0
5series= 0
7series= 0
A3=0
A7=0
EClass = 0
SClass = 0
modelcountjson = {}
for name in companyname:
for item in data['result']:
models= {}
if item['company'] == name:
if item['model'] == 3:
3series = 3series + 1
elif item['model'] == 5:
5series = 5series + 1
elif item['model'] == 7:
7series = 7series + 1
models['3series'] = 3series
models['5series'] = 5series
models['7series'] = 7series
#I still haven't written AUDI, MERCEDES above. This is where i feel i am writing inefficiently.
modelcountjson[name] = models
return jsonify(modelcountjson)
```
As the number of models grow, I am worried of code getting redundant with many for loops and may cause performance overhead. I am looking for help on achieving the end result in most efficient way.
Thank you so much for your help.

A useful package for working directly with JSON-style dictionaries and lists is toolz (see documentation for more details). This way you can concisely group the data and count occurrences of each model while handling potentially missing data separately:
from toolz import itertoolz
result = {
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
final_output = {}
grouped_result = itertoolz.groupby('company', result['result'])
if 'MERCEDES' not in grouped_result:
final_output['MERCEDES'] = {
'EClass': 0,
'SClass': 0
}
for key, value in grouped_result.items():
models = itertoolz.pluck('model', value)
final_output[key] = itertoolz.frequencies(models)
The output results in:
{'AUDI': {'A3': 1, 'A7': 1}, 'BMW': {'3': 1, '5': 3, '7': 1}, 'MERCEDES': {'EClass': 0, 'SClass': 0}}

You could go for a bit of a separation of code and config:
conf = {
'BMW': {'format': '{}series', 'keys': ['3', '5', '7']},
'AUDI': {'format': '{}', 'keys': ['A3', 'A7']},
'MERCEDES': {'format': '{}Class', 'keys': ['E', 'S']},
}
def modelcount():
# retrieve `data`
# ...
result = {
k: {
v['format'].format(key): 0 for key in v['keys']
} for k, v in conf.items()
}
for car in data['result']:
com = car['company']
mod = car['model']
key = conf[com]['format'].format(mod)
result[com][key] += 1
for com in result:
result[com]['Total'] = sum(result[com].values())
return result
>>> modelcount()
{'BMW': {'3series': 1, '5series': 3, '7series': 1},
'AUDI': {'A3': 1, 'A7': 1},
'MERCEDES': {'EClass': 0, 'SClass': 0}}
This way, for more companies and models, you will only have to touch the conf, not the code. The time complexity of this is O(m+n) with m the total number of distinct models and n the number of cars in the API response.

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.

Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']

somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.

This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

Is there a good way python to compare complex dictionaries with unkonwn depth & level

I have two dictionary objects which are very complex and created by converting large xml files into python dictionaries.
I don't know the depth of the dictionaries and just want to compare and want the following output...
e.g. My dictionaries are like this
d1 = {"great grand father":
{"name":"John",
"grand father":
{"name":"Tom",
"father":
{"name":"Andy",
"Me":
{"name":"Mike",
"son":
{"name":"Tom"}
}
}
}
}
}
d2 is also a similar but could be possible any one of the field is missing or changed as below
d2 = {"great grand father":
{"name":"John",
"grand father":
{"name":"Tom",
"father":
{"name":"Andy",
"Me":
{"name":"Tonny",
"son":
{"name":"Tom"}
}
}
}
}
}
The dictionary comparison should give me results like this -
Expected Key/Val : Me->name/"Mike"
Actual Key/Val : Me->name/"Tonny"
If the key "name" does not exists in "Me" in d2, it should give me following output
Expected Key/Val : Me->name/"Mike"
Actual Key/Val : Me->name/NOT_FOUND
I repeat the dictionary depth could be variable or dynamically generated. The two dictionaries here are given as examples...
All the dictionary comparison questions and their answers which I have seen in SO are related fixed depth Dictionaries.....

You're in luck, I did this as part of a project where I worked.
You need a recursive function something like:
def checkDifferences(dict_a,dict_b,differences=[])
You can first check for keys that don't exist in one or the other.
e.g
Expected Name/Tom Actual None
Then you compare the types of the values i.e check if the value is a dict or a list etc.
If it is then you can recursively call the function using the value as dict_a/b. When calling recursively pass the differences array.
If the type of the value is a list and the list may have dictionaries within it then you need to covert the list to a dict and call the function on the converted dictionary.
I'm sorry I can't help more but I no longer have access to the source code. Hopefully this is enough to get you started.

Here I found a way to compare any two dictionaries -
I have tried with various dictionaries of any depths and worked for me. The code is not so modular but just for the reference -
import pprint
pp = pprint.PrettyPrinter(indent=4)
dict1 = { 'Person' : { 'Male' : {'Boys' : {'Roger' : {'age' : 20},
'Rafa' : {'age' : 25}
}
},
'Female' : { 'Girls' : {'Serena' : {'age' : 23},
'Maria' : {'age' : 15}
}
}
},
'Animal' : { 'Huge' : {'Elephant' : {'color' : 'black' }
}
}
}
'''
dict2 = { 'Person' : { 'Male' : {'Boys' : {'Roger' : {'age' : 20}
}
},
'Female' : { 'Girls' : {'Serena' : {'age' : 23},
'Maria' : {'age' : 1}
}
}
}
}
dict2 = { 'Person' : { 'Male' : {'Boys' : {'Roger' : {'age' : 20},
'Rafa' : {'age' : 2}
}
}
}
}
'''
dict2 = { 'Person' : { 'Male' : {'Boys' : {'Roger' : {'age' : 2}}},
'Female' : 'Serena'}
}
key_list = []
err_list = {}
def comp(exp,act):
for key in exp:
key_list.append(key)
exp_val = exp[key]
try:
act_val = act[key]
is_dict_exp = isinstance(exp_val,__builtins__.dict)
is_dict_act = isinstance(act_val,__builtins__.dict)
if is_dict_exp == is_dict_act == True:
comp(exp_val,act_val)
elif is_dict_exp == is_dict_act == False:
if not exp_val == act_val:
temp = {"Exp" : exp_val,"Act" : act_val}
err_key = "-->".join(key_list)
if err_list.has_key(key):
err_list[err_key].update(temp)
else:
err_list.update({err_key : temp})
else:
temp = {"Exp" : exp_val, "Act" : act_val}
err_key = "-->".join(key_list)
if err_list.has_key(key):
err_list[err_key].update(temp)
else:
err_list.update({err_key : temp})
except KeyError:
temp = {"Exp" : exp_val,"Act" : "NOT_FOUND"}
err_key = "-->".join(key_list)
if err_list.has_key(key):
err_list[err_key].update(temp)
else:
err_list.update({err_key : temp})
key_list.pop()
comp(dict1,dict2)
pp.pprint(err_list)
Here is the output of my code -
{ 'Animal': { 'Act': 'NOT_FOUND',
'Exp': { 'Huge': { 'Elephant': { 'color': 'black'}}}},
'Person-->Female': { 'Act': 'Serena',
'Exp': { 'Girls': { 'Maria': { 'age': 15},
'Serena': { 'age': 23}}}},
'Person-->Male-->Boys-->Rafa': { 'Act': 'NOT_FOUND', 'Exp': { 'age': 25}},
'Person-->Male-->Boys-->Roger-->age': { 'Act': 2, 'Exp': 20}
}
One can also try with other dictionaries given in commented code..
One more thing - The keys are checked in expected dictionary and the matched with an actual. If we pass dictionaries in alternate order the other way matching is also possible...
comp(dict2,dict1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse a file with JSON objects in Python - python

Related

How to Get Rid of Fields in a Dictionary of Dictionaries?

How do I parse nested json objects?

python how to search a string, count values and group by in json

Extract values from oddly-nested Python

Is there a good way python to compare complex dictionaries with unkonwn depth & level

Categories

Resources