Related
I have JSON file as mentioned below,
**test.json**
{
"header1" :
{
"header1_body1":
{
"some_key":"some_value",
.......................
},
"header1_body2":
{
"some_key":"some_value",
.......................
}
},
"header2":
{
"header2_body1":
{
"some_key":"some_value",
.......................
},
"header2_body2":
{
"some_key":"some_value",
.......................
}
}
}
Would like to group the JSON content into lists as below:
header1 = ['header1_body1','header1_body2']
header2 = ['header2_body1','header2_body2']
header1, header2 can be till ....header n. So dynamically lists has to be created containing it's values as shown above.
How can i achieve this ?
What's the best optimal way to approach ?
SOLUTION:
with open('test.json') as json_data:
d = json.load(json_data)
for k,v in d.iteritems():
if k == "header1" or k == "header2":
globals()['{}'.format(k)] = d[k].keys()
now, header1 and header2 can be accessed as list.
for i in header1:
print i
Assuming you read the JSON into a variable d (maybe using json.loads), you could iterate over the keys (sorted?) and build the lists with the keys of current value:
for key in sorted(d.keys()):
l = [x for x in sorted(d[key].keys())] # using list comprehension
print(key + ' = ' + str(l))
Fixing your json structure:
{
"header1" :
{
"header1_body1":
{
"some_key":"some_value"
},
"header1_body2":
{
"some_key":"some_value"
}
},
"header2":
{
"header2_body1":
{
"some_key":"some_value"
},
"header2_body2":
{
"some_key":"some_value"
}
}
}
And then loading and creating lists:
header = []
for key, value in dictdump.items():
header.append(list(value.keys()))
for header_num in range(0, len(header)):
print("header{} : {}".format(header_num + 1, header[header_num]))
Gives:
header1 : ['header1_body1', 'header1_body2']
header2 : ['header2_body1', 'header2_body2']
Once you load your json, you can get the list you want for any key by doing something like the following (headers variable below is a placeholder for your loaded json). You don't need to convert it to a list to work with it as an iterable but wrapped it in list(...) to match the output in your question.
list(headers['header1'].keys())
If you need to actually store the list of keys for each of your "header" dicts in some sort of accessible format, then you could create another dictionary that contains the lists you want. For example:
import json
data = """{
"header1" : {
"header1_body1": {
"some_key":"some_value"
},
"header1_body2": {
"some_key":"some_value"
}
},
"header2": {
"header2_body1": {
"some_key":"some_value"
},
"header2_body2": {
"some_key":"some_value"
}
}
}"""
headers = json.loads(data)
# get the list of keys for a specific header
header = list(headers['header1'].keys())
print(header)
# ['header1_body1', 'header1_body2']
# if you really want to store them in another dict
results = {h[0]: list(h[1].keys()) for h in headers.items()}
print(results)
# OUTPUT
# {'header1': ['header1_body1', 'header1_body2'], 'header2': ['header2_body1', 'header2_body2']}
You can use recursion:
d = {'header1': {'header1_body1': {'some_key': 'some_value'}, 'header1_body2': {'some_key': 'some_value'}}, 'header2': {'header2_body1': {'some_key': 'some_value'}, 'header2_body2': {'some_key': 'some_value'}}}
def flatten(_d):
for a, b in _d.items():
yield a
if isinstance(b, dict):
yield from flatten(b)
new_results = {a:[i for i in flatten(b) if i.startswith(a)] for a, b in d.items()}
Output:
{'header1': ['header1_body1', 'header1_body2'], 'header2': ['header2_body1', 'header2_body2']}
import json
with open('test.json') as json_data:
d = json.load(json_data)
for k,v in d.iteritems():
if k == "header1" or k == "header2":
globals()['{}'.format(k)] = d[k].keys()
now, `header1` and `header2` can be accessed as list.
for i in header1:
print i
I've seen similar questions but none that exactly match what I'm doing and I believe other developers might face same issue if they are working with MongoDB.
I'm looking to compare two nested dict objects with dict and arrays and return a dict with additions and deletion (like you would git diff two files)
Here is what I have so far:
def dict_diff(alpha, beta, recurse_adds=False, recurse_dels=False):
"""
:return: differences between two python dict with adds and dels
example:
(This is the expected output)
{
'adds':
{
'specific_hours': [{'ends_at': '2015-12-25'}],
}
'dels':
{
'specific_hours': [{'ends_at': '2015-12-24'}],
'subscription_products': {'review_management': {'thiswillbedeleted': 'deleteme'}}
}
}
"""
if type(alpha) is dict and type(beta) is dict:
a_keys = alpha.keys()
b_keys = beta.keys()
dels = {}
adds = {}
for key in a_keys:
if type(alpha[key]) is list:
if alpha[key] != beta[key]:
adds[key] = dict_diff(alpha[key], beta[key], recurse_adds=True)
dels[key] = dict_diff(alpha[key], beta[key], recurse_dels=True)
elif type(alpha[key]) is dict:
if alpha[key] != beta[key]:
adds[key] = dict_diff(alpha[key], beta[key], recurse_adds=True)
dels[key] = dict_diff(alpha[key], beta[key], recurse_dels=True)
elif key not in b_keys:
dels[key] = alpha[key]
elif alpha[key] != beta[key]:
adds[key] = beta[key]
dels[key] = alpha[key]
for key in b_keys:
if key not in a_keys:
adds[key] = beta[key]
elif type(alpha) is list and type(beta) is list:
index = 0
adds=[]
dels=[]
for elem in alpha:
if alpha[index] != beta[index]:
dels.append(alpha[index])
adds.append(beta[index])
# print('update', adds, dels)
index+=1
else:
raise Exception("dict_diff function can only get dict objects")
if recurse_adds:
if bool(adds):
return adds
return {}
if recurse_dels:
if bool(dels):
return dels
return {}
return {'adds': adds, 'dels': dels}
The result I'm getting now is:
{'adds': {'specific_hours': [{'ends_at': '2015-12-24',
'open_hours': ['07:30-11:30', '12:30-21:30'],
'starts_at': '2015-12-22'},
{'ends_at': '2015-01-03',
'open_hours': ['07:30-11:30'],
'starts_at': '2015-01-0'}],
'subscription_products': {'review_management': {}}},
'dels': {'specific_hours': [{'ends_at': '2015-12-24',
'open_hours': ['07:30-11:30', '12:30-21:30'],
'starts_at': '2015-12-2'},
{'ends_at': '2015-01-03',
'open_hours': ['07:30-11:30'],
'starts_at': '2015-01-0'}],
'subscription_products': {'review_management': {'thiswillbedeleted': 'deleteme'}}}}
And this is the two objects I'm trying to compare:
alpha = {
'specific_hours': [
{
"starts_at": "2015-12-2",
"ends_at": "2015-12-24",
"open_hours": [
"07:30-11:30",
"12:30-21:30"
]
},
{
"starts_at": "2015-01-0",
"ends_at": "2015-01-03",
"open_hours": [
"07:30-11:30"
]
}
],
'subscription_products': {'presence_management':
{'expiration_date': 1953291600,
'payment_type': {
'free': 'iamfree',
'test': "test",
},
},
'review_management':
{'expiration_date': 1511799660,
'payment_type': {
'free': 'iamfree',
'test': "test",
},
'thiswillbedeleted': "deleteme",
}
},
}
beta = {
'specific_hours': [
{
"starts_at": "2015-12-22",
"ends_at": "2015-12-24",
"open_hours": [
"07:30-11:30",
"12:30-21:30"
]
},
{
"starts_at": "2015-01-0",
"ends_at": "2015-01-03",
"open_hours": [
"07:30-11:30"
]
}
],
'subscription_products': {'presence_management':
{'expiration_date': 1953291600,
'payment_type': {
'free': 'iamfree',
'test': "test",
},
},
'review_management':
{'expiration_date': 1511799660,
'payment_type': {
'free': 'iamfree',
'test': "test",
},
}
},
}
With given script I am able to get output as I showed in a screenshot,
but there is a column named as cve.description.description_data which is again in json format. I want to extract that data as well.
import json
import pandas as pd
from pandas.io.json import json_normalize
#load json object
with open('nvdcve-1.0-modified.json') as f:
d = json.load(f)
#tells us parent node is 'programs'
nycphil = json_normalize(d['CVE_Items'])
nycphil.head(3)
works_data = json_normalize(data=d['CVE_Items'], record_path='cve')
works_data.head(3)
nycphil.to_csv("test4.csv")
If I change works_data = json_normalize(data=d['CVE_Items'], record_path='cve.descr') it gives this error:
"result = result[spec] KeyError: 'cve.description'"
JSON format as follows:
{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_numberOfCVEs":"1000",
"CVE_data_timestamp":"2018-04-04T00:00Z",
"CVE_Items":[
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2001-1594",
"ASSIGNER":"cve#mitre.org"
},
"affects":{
"vendor":{
"vendor_data":[
{
"vendor_name":"gehealthcare",
"product":{
"product_data":[
{
"product_name":"entegra_p&r",
"version":{
"version_data":[
{
"version_value":"*"
}
]
}
}
]
}
}
]
}
},
"problemtype":{
"problemtype_data":[
{
"description":[
{
"lang":"en",
"value":"CWE-255"
}
]
}
]
},
"references":{
"reference_data":[
{
"url":"http://apps.gehealthcare.com/servlet/ClientServlet/2263784.pdf?DOCCLASS=A&REQ=RAC&DIRECTION=2263784-100&FILENAME=2263784.pdf&FILEREV=5&DOCREV_ORG=5&SUBMIT=+ ACCEPT+"
},
{
"url":"http://www.forbes.com/sites/thomasbrewster/2015/07/10/vulnerable- "
},
{
"url":"https://ics-cert.us-cert.gov/advisories/ICSMA-18-037-02"
},
{
"url":"https://twitter.com/digitalbond/status/619250429751222277"
}
]
},
"description":{
"description_data":[
{
"lang":"en",
"value":"GE Healthcare eNTEGRA P&R has a password of (1) value."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"cpe":[
{
"vulnerable":true,
"cpe22Uri":"cpe:/a:gehealthcare:entegra_p%26r",
"cpe23Uri":"cpe:2.3:a:gehealthcare:entegra_p\\&r:*:*:*:*:*:*:*:*"
}
]
}
]
},
"impact":{
"baseMetricV2":{
"cvssV2":{
"version":"2.0",
"vectorString":"(AV:N/AC:L/Au:N/C:C/I:C/A:C)",
"accessVector":"NETWORK",
"accessComplexity":"LOW",
"authentication":"NONE",
"confidentialityImpact":"COMPLETE",
"integrityImpact":"COMPLETE",
"availabilityImpact":"COMPLETE",
"baseScore":10.0
},
"severity":"HIGH",
"exploitabilityScore":10.0,
"impactScore":10.0,
"obtainAllPrivilege":false,
"obtainUserPrivilege":false,
"obtainOtherPrivilege":false,
"userInteractionRequired":false
}
},
"publishedDate":"2015-08-04T14:59Z",
"lastModifiedDate":"2018-03-28T01:29Z"
}
]
}
I want to flatten all data.
Assuming the multiple URLs delineate between rows and all else meta data repeats, consider a recursive function call to extract every key-value pair in nested json object, d.
The recursive function will call global to update the needed global objects to be binded into a list of dictionaries for pd.DataFrame() call. Last loop at end updates the recursive function's dictionary, inner, to integrate the different urls (stored in multi)
import json
import pandas as pd
# load json object
with open('nvdcve-1.0-modified.json') as f:
d = json.load(f)
multi = []; inner = {}
def recursive_extract(i):
global multi, inner
if type(i) is list:
if len(i) == 1:
for k,v in i[0].items():
if type(v) in [list, dict]:
recursive_extract(v)
else:
inner[k] = v
else:
multi = i
if type(i) is dict:
for k,v in i.items():
if type(v) in [list, dict]:
recursive_extract(v)
else:
inner[k] = v
recursive_extract(d['CVE_Items'])
data_dict = []
for i in multi:
tmp = inner.copy()
tmp.update(i)
data_dict.append(tmp)
df = pd.DataFrame(data_dict)
df.to_csv('Output.csv')
Output (all columns the same except for URL, widened for emphasis)
As the question explains the problem, I've been trying to generate nested JSON object. In this case I have for loops getting the data out of dictionary dic. Below is the code:
f = open("test_json.txt", 'w')
flag = False
temp = ""
start = "{\n\t\"filename\"" + " : \"" +initial_filename+"\",\n\t\"data\"" +" : " +" [\n"
end = "\n\t]" +"\n}"
f.write(start)
for i, (key,value) in enumerate(dic.iteritems()):
f.write("{\n\t\"keyword\":"+"\""+str(key)+"\""+",\n")
f.write("\"term_freq\":"+str(len(value))+",\n")
f.write("\"lists\":[\n\t")
for item in value:
f.write("{\n")
f.write("\t\t\"occurance\" :"+str(item)+"\n")
#Check last object
if value.index(item)+1 == len(value):
f.write("}\n"
f.write("]\n")
else:
f.write("},") # close occurrence object
# Check last item in dic
if i == len(dic)-1:
flag = True
if(flag):
f.write("}")
else:
f.write("},") #close lists object
flag = False
#check for flag
f.write("]") #close lists array
f.write("}")
Expected output is:
{
"filename": "abc.pdf",
"data": [{
"keyword": "irritation",
"term_freq": 5,
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
}]
}, {
"keyword": "bomber",
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
}],
"term_freq": 5
}]
}
But currently I'm getting an output like below:
{
"filename": "abc.pdf",
"data": [{
"keyword": "irritation",
"term_freq": 5,
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
},] // Here lies the problem "," before array(last element)
}, {
"keyword": "bomber",
"lists": [{
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 1
}, {
"occurance": 2
},], // Here lies the problem "," before array(last element)
"term_freq": 5
}]
}
Please help, I've trying to solve it, but failed. Please don't mark it duplicate since I have already checked other answers and didn't help at all.
Edit 1:
Input is basically taken from a dictionary dic whose mapping type is <String, List>
for example: "irritation" => [1,3,5,7,8]
where irritation is the key, and mapped to a list of page numbers.
This is basically read in the outer for loop where key is the keyword and value is a list of pages of occurrence of that keyword.
Edit 2:
dic = collections.defaultdict(list) # declaring the variable dictionary
dic[key].append(value) # inserting the values - useless to tell here
for key in dic:
# Here dic[x] represents list - each value of x
print key,":",dic[x],"\n" #prints the data in dictionary
What #andrea-f looks good to me, here another solution:
Feel free to pick in both :)
import json
dic = {
"bomber": [1, 2, 3, 4, 5],
"irritation": [1, 3, 5, 7, 8]
}
filename = "abc.pdf"
json_dict = {}
data = []
for k, v in dic.iteritems():
tmp_dict = {}
tmp_dict["keyword"] = k
tmp_dict["term_freq"] = len(v)
tmp_dict["lists"] = [{"occurrance": i} for i in v]
data.append(tmp_dict)
json_dict["filename"] = filename
json_dict["data"] = data
with open("abc.json", "w") as outfile:
json.dump(json_dict, outfile, indent=4, sort_keys=True)
It's the same idea, I first create a big json_dict to be saved directly in json. I use the with statement to save the json avoiding the catch of exception
Also, you should have a look to the doc of json.dumps() if you need future improve in your json output.
EDIT
And just for fun, if you don't like tmp var, you can do all the data for loop in a one-liner :)
json_dict["data"] = [{"keyword": k, "term_freq": len(v), "lists": [{"occurrance": i} for i in v]} for k, v in dic.iteritems()]
It could gave for final solution something not totally readable like this:
import json
json_dict = {
"filename": "abc.pdf",
"data": [{
"keyword": k,
"term_freq": len(v),
"lists": [{"occurrance": i} for i in v]
} for k, v in dic.iteritems()]
}
with open("abc.json", "w") as outfile:
json.dump(json_dict, outfile, indent=4, sort_keys=True)
EDIT 2
It looks like you don't want to save your json as the desired output, but be abble to read it.
In fact, you can also use json.dumps() in order to print your json.
with open('abc.json', 'r') as handle:
new_json_dict = json.load(handle)
print json.dumps(json_dict, indent=4, sort_keys=True)
There is still one problem here though, "filename": is printed at the end of the list because the d of data comes before the f.
To force the order, you will have to use an OrderedDict in the generation of the dict. Be careful the syntax is ugly (imo) with python 2.X
Here is the new complete solution ;)
import json
from collections import OrderedDict
dic = {
'bomber': [1, 2, 3, 4, 5],
'irritation': [1, 3, 5, 7, 8]
}
json_dict = OrderedDict([
('filename', 'abc.pdf'),
('data', [ OrderedDict([
('keyword', k),
('term_freq', len(v)),
('lists', [{'occurrance': i} for i in v])
]) for k, v in dic.iteritems()])
])
with open('abc.json', 'w') as outfile:
json.dump(json_dict, outfile)
# Now to read the orderer json file
with open('abc.json', 'r') as handle:
new_json_dict = json.load(handle, object_pairs_hook=OrderedDict)
print json.dumps(json_dict, indent=4)
Will output:
{
"filename": "abc.pdf",
"data": [
{
"keyword": "bomber",
"term_freq": 5,
"lists": [
{
"occurrance": 1
},
{
"occurrance": 2
},
{
"occurrance": 3
},
{
"occurrance": 4
},
{
"occurrance": 5
}
]
},
{
"keyword": "irritation",
"term_freq": 5,
"lists": [
{
"occurrance": 1
},
{
"occurrance": 3
},
{
"occurrance": 5
},
{
"occurrance": 7
},
{
"occurrance": 8
}
]
}
]
}
But be carefull, most of the time, it is better to save a regular .json file in order to be cross languages.
Your current code is not working because the loop iterates through the before-last item adding the }, then when the loop runs again it sets the flag to false, but the last time it ran it added a , since it thought that there will be another element.
If this is your dict: a = {"bomber":[1,2,3,4,5]} then you can do:
import json
file_name = "a_file.json"
file_name_input = "abc.pdf"
new_output = {}
new_output["filename"] = file_name_input
new_data = []
i = 0
for key, val in a.iteritems():
new_data.append({"keyword":key, "lists":[], "term_freq":len(val)})
for p in val:
new_data[i]["lists"].append({"occurrance":p})
i += 1
new_output['data'] = new_data
Then save the data by:
f = open(file_name, 'w+')
f.write(json.dumps(new_output, indent=4, sort_keys=True, default=unicode))
f.close()
I have the following object in python:
{
name: John,
age: {
years:18
},
computer_skills: {
years:4
},
mile_runner: {
years:2
}
}
I have an array with 100 people with the same structure.
What is the best way to go through all 100 people and make it such that there is no more "years"? In other words, each object in the 100 would look something like:
{
name: John,
age:18,
computer_skills:4,
mile_runner:2
}
I know I can do something in pseudocode:
for(item in list):
if('years' in (specific key)):
specifickey = item[(specific key)][(years)]
But is there a smarter/more efficent way?
Your pseudo-code is already pretty good I think:
for person in persons:
for k, v in person.items():
if isinstance(v, dict) and 'years' in v:
person[k] = v['years']
This overwrites every property which is a dictionary that has a years property with that property’s value.
Unlike other solutions (like dict comprehensions), this will modify the object in-place, so no new memory to keep everything is required.
def flatten(d):
ret = {}
for key, value in d.iteritems():
if isinstance(value, dict) and len(value) == 1 and "years" in value:
ret[key] = value["years"]
else:
ret[key] = value
return ret
d = {
"name": "John",
"age": {
"years":18
},
"computer_skills": {
"years":4
},
"mile_runner": {
"years":2
}
}
print flatten(d)
Result:
{'age': 18, 'mile_runner': 2, 'name': 'John', 'computer_skills': 4}
Dictionary comprehension:
import json
with open("input.json") as f:
cont = json.load(f)
print {el:cont[el]["years"] if "years" in cont[el] else cont[el] for el in cont}
prints
{u'age': 18, u'mile_runner': 2, u'name': u'John', u'computer_skills': 4}
where input.json contains
{
"name": "John",
"age": {
"years":18
},
"computer_skills": {
"years":4
},
"mile_runner": {
"years":2
}
}
Linear with regards to number of elements, you can't really hope for any lower.
As people said in the comments, it isn't exactly clear what your "object" is, but assuming that you actually have a list of dicts like this:
list = [{
'name': 'John',
'age': {
'years': 18
},
'computer_skills': {
'years':4
},
'mile_runner': {
'years':2
}
}]
Then you can do something like this:
for item in list:
for key in item:
try:
item[key] = item[key]['years']
except (TypeError, KeyError):
pass
Result:
list = [{'age': 18, 'mile_runner': 2, 'name': 'John', 'computer_skills': 4}]