Find parent name of a json attribute - unknown structure - Python - python

There is a JSON file with unknown structure.
I need to find an attribute of a known name in this file and, if it exists, return the name of its parent node, or nodes (if there are multiple instances of the attribute).
Example #1:
Input file:
{
"attr1": {
"attr2": {
"attr3": "somevalue"
"attr7": "someothervalue"
}
}
}
Attribute name: "attr7"
Expected return value: "attr2"
Example #2:
Input file:
{
"some": {
"deeply": {
"nested": {
"stuff": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1"
Example #3:
(similar to #2 but with a duplicate)
Input file:
{
"some": {
"deeply": {
"nested": {
"this": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1", "nested"
My starting point is:
import json
if __name__ == "__main__":
jsonFileName = "file.json"
attributeName = "this"
jsonFile = open(jsonFileName, "r")
jsonData = json.load(jsonFile)
# ???
I found this one: Access JSON element with parent name unknown but it is not really applicable in my case because they know the structure of their data and I don't.
Any hints?

So, with a bit of a back and forth with a more experienced colleague I came up with the following solution:
def findKey(jsonData: json, keyName: str, accessPath: str):
if isinstance(jsonData, str):
return None
for key in jsonData.keys():
if key == keyName:
return accessPath + f"/{keyName};"
if isinstance(jsonData[key], list):
for jd in jsonData[key]:
fk = findKey(jd, keyName, accessPath + "/[]" + key)
if(fk):
return fk
elif isinstance(jsonData[key], dict):
fk = findKey(jsonData[key], keyName, accessPath + "/{}" + key)
if(fk):
return fk
return None

Related

How to update/change both keys and values separately (not dedicated key-value pair) in a deeply nested JSON in python 3.x

I have a JSON file where I need to replace the UUID and update it with another one. I'm having trouble replacing the deeply nested keys and values.
Below is my JSON file that I need to read in python, replace the keys and values and update the file.
JSON file - myfile.json
{
"name": "Shipping box"
"company":"Detla shipping"
"description":"---"
"details" : {
"boxes":[
{
"box_name":"alpha",
"id":"a3954710-5075-4f52-8eb4-1137be51bf14"
},
{
"box_name":"beta",
"id":"31be3763-3d63-4e70-a9b6-d197b5cb6929"
} ​
​ ]
​}
"container": [
"a3954710-5075-4f52-8eb4-1137be51bf14":[],
"31be3763-3d63-4e70-a9b6-d197b5cb6929":[] ​
​]
​"data":[
{
"data_series":[],
"other":50
},
{
"data_series":[],
"other":40
},
{
"data_series":
{
"a3954710-5075-4f52-8eb4-1137be51bf14":
{
{
"dimentions":[2,10,12]
}
},
"31be3763-3d63-4e70-a9b6-d197b5cb6929":
{
{
"dimentions":[3,9,12]
}
}
},
"other":50
}
]
}
I want achieve something like the following-
"details" : {
"boxes":[
{
"box_name":"alpha"
"id":"replace_uuid"
},
}
.
.
.
​ "data":[ {
"data_series":
{
"replace_uuid":
{
{
"dimentions":[2,10,12]
}
}
]
In such a type of deeply nested dictionary, how can we replace all the occurrence of keys and values with another string, here replace_uuid?
I tried with pop() and dotty_dict but I wasn't able to replace the nested list.
I was able to achieve it in the following way-
def uuid_change(): #generate a random uuid
new_uuid = uuid.uuid4()
return str(new_uuid)
dict = json.load(f)
for uid in dict[details][boxes]:
old_id = uid['id']
replace_id = uuid_change()
uid['id'] = replace_id
for i in range(n):
for uid1 in dict['container'][i].keys()
if uid1 == old_id:
dict['container'][i][replace_id]
= dict['container'][i].pop(uid1) #replace the key
for uid2 in dict['data'][2]['data_series'].keys()
if uid2 == old_id:
dict['data'][2]['data_series'][replace_id]
= dict['data'][2]['data_series'].pop(uid2) #replace the key

Read and parse JSON-like string file in python

Don't know how to deal with this JSON-like string file in python (Not exactly JSON, but don't what's the format for it).
file = '''
data_structure {
key: "A"
arg {
name: "A1"
type { type: STRING }
function: "A1-1"
examples: "apple1"
examples: "apple2"
}
arg {
name: "A2"
type { type: STRING }
function: "A2-1"
examples: "grape1"
examples: "grape2"
}
description: "ALL"
}
data_structure {
key: "C"
type { type: STRING }
arg {
name: "C1"
function: "C1-1"
examples: "cake1"
examples: "cake2"
}
}
'''
Is there any way to parse it like Object somehow as below?
file_a = open(file, "rt").read()
// file_a = convert(file_a)
print(list(file_a.data_structure.key))
result: ["A", "C"]

extract data from nested loop

I have a scenario where I am extracting data from a json response in this below conditions :
I am Looping through the json and finding value of active if active true then under the same parent array find under cls type if type is alpha1 return the eces value (in this case 260551).
if after looping through json there is no value of active as true or value of active is true but in that same parent array under cls type is not alpha1 then return not found.
Here I am getting the value of eces correctly but how can I get the value of the this fields as well address, c_m, active, type then construct a key value mapping of all the extracted data and save in a json file.
here is what I have tried :
found = False
for di in d:
if di.get('active', False):
for cl in di.get('cls', []):
if cl.get('type') == 'alpha1':
print(di['eces'])
found = True
if not found:
print("Not found")
desired json output :
{
"res1": [{
"eces": "260551",
"res2": [{
"c_m": 345,
"clsfrmt": [
{
"address": "{\"I_G\":\"CD\",\"I_D\":\"01\",\"I_Y\":\"C1\",\"I_XD\":\"04\",\"I_TY\":1,\"S_L\":\"https://testappsampler.com\",\"O_DC\":\"\"}",
"type": "Alpha"
}
],
"active": true
}]
}]
}
I am stuck in creating the json data in this structure , any help would be great.
While I'd advice to refactor this code in some proper way, this will create a mapping in very straight maner:
import json
dump = []
for di in d:
if di.get('active', False):
for cl in di.get('cls', []):
if cl.get('type') == 'alpha1':
dump.append(
{
"res1": [{
"eces": di['eces'],
"res2": [{
"c_m": di['c_m'],
"clsfrmt": [
{
"address": di['cls'][0]['address'],
"type": di['cls'][0]['type']
}
],
"active": di['active']
}]
}]
}
)
s = json.dumps(dump) # this is your JSON string
Result.

Extracting JSON value from key

I have a JSON object like so, and I need to extract the name value of any object, using the id. I have tried many different iterations of this but I can't seem to get anything to work. Any general pointers would be much appreciated. Thank you.
{
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
Hope you need to find the name based on id, then try out the code below,
def get_name(data, id):
for week in data['weeks']:
for i in week:
for j in week[i]:
if j['id'] == id:
return j['name']
return None
get_name(data, 'asdgasdgasdgasd')
output
'Testing 123'
Not sure if this is what you are looking for
for week in a["weeks"]:
for k, v in week.values():
print(v['name'])
considering the variable a your dict.
Is the structure fixed, or can the depth of the JSON differ from the example?
This one would work as well if there are more or lesser hierarchies.
It basically searches in each dictionary inside a JSON-like structure for the field_name and returns the value of the argument output_name.
Maybe it helps you when your data structure changes :)
data = {
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
def extract_name(data, field_name: str, matching_value: str, output_name: str):
"""
:param data: json-like datastructure in which you want to search
:param field_name: the field name with which you want to match
:param matching_value: the value you want to match
:param output_name: the name of the value which you want to get
:return:
"""
if isinstance(data, list):
for item in data:
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
elif isinstance(data, dict):
for item in data.values():
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
def _inner_extract_name(item, field_name, matching_value, output_name):
if isinstance(item, dict):
res = extract_name(item, field_name, matching_value, output_name)
if field_name in item:
if item[field_name] == matching_value:
if output_name in item:
return item[output_name]
else:
res = extract_name(item, field_name, matching_value, output_name)
return res
if __name__ == "__main__":
name = extract_name(data, "id", "aewgasdgewa", "name")
print(name)
``

nested json to csv using pandas normalize

With given script I am able to get output as I showed in a screenshot,
but there is a column named as cve.description.description_data which is again in json format. I want to extract that data as well.
import json
import pandas as pd
from pandas.io.json import json_normalize
#load json object
with open('nvdcve-1.0-modified.json') as f:
d = json.load(f)
#tells us parent node is 'programs'
nycphil = json_normalize(d['CVE_Items'])
nycphil.head(3)
works_data = json_normalize(data=d['CVE_Items'], record_path='cve')
works_data.head(3)
nycphil.to_csv("test4.csv")
If I change works_data = json_normalize(data=d['CVE_Items'], record_path='cve.descr') it gives this error:
"result = result[spec] KeyError: 'cve.description'"
JSON format as follows:
{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_numberOfCVEs":"1000",
"CVE_data_timestamp":"2018-04-04T00:00Z",
"CVE_Items":[
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2001-1594",
"ASSIGNER":"cve#mitre.org"
},
"affects":{
"vendor":{
"vendor_data":[
{
"vendor_name":"gehealthcare",
"product":{
"product_data":[
{
"product_name":"entegra_p&r",
"version":{
"version_data":[
{
"version_value":"*"
}
]
}
}
]
}
}
]
}
},
"problemtype":{
"problemtype_data":[
{
"description":[
{
"lang":"en",
"value":"CWE-255"
}
]
}
]
},
"references":{
"reference_data":[
{
"url":"http://apps.gehealthcare.com/servlet/ClientServlet/2263784.pdf?DOCCLASS=A&REQ=RAC&DIRECTION=2263784-100&FILENAME=2263784.pdf&FILEREV=5&DOCREV_ORG=5&SUBMIT=+ ACCEPT+"
},
{
"url":"http://www.forbes.com/sites/thomasbrewster/2015/07/10/vulnerable- "
},
{
"url":"https://ics-cert.us-cert.gov/advisories/ICSMA-18-037-02"
},
{
"url":"https://twitter.com/digitalbond/status/619250429751222277"
}
]
},
"description":{
"description_data":[
{
"lang":"en",
"value":"GE Healthcare eNTEGRA P&R has a password of (1) value."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"cpe":[
{
"vulnerable":true,
"cpe22Uri":"cpe:/a:gehealthcare:entegra_p%26r",
"cpe23Uri":"cpe:2.3:a:gehealthcare:entegra_p\\&r:*:*:*:*:*:*:*:*"
}
]
}
]
},
"impact":{
"baseMetricV2":{
"cvssV2":{
"version":"2.0",
"vectorString":"(AV:N/AC:L/Au:N/C:C/I:C/A:C)",
"accessVector":"NETWORK",
"accessComplexity":"LOW",
"authentication":"NONE",
"confidentialityImpact":"COMPLETE",
"integrityImpact":"COMPLETE",
"availabilityImpact":"COMPLETE",
"baseScore":10.0
},
"severity":"HIGH",
"exploitabilityScore":10.0,
"impactScore":10.0,
"obtainAllPrivilege":false,
"obtainUserPrivilege":false,
"obtainOtherPrivilege":false,
"userInteractionRequired":false
}
},
"publishedDate":"2015-08-04T14:59Z",
"lastModifiedDate":"2018-03-28T01:29Z"
}
]
}
I want to flatten all data.
Assuming the multiple URLs delineate between rows and all else meta data repeats, consider a recursive function call to extract every key-value pair in nested json object, d.
The recursive function will call global to update the needed global objects to be binded into a list of dictionaries for pd.DataFrame() call. Last loop at end updates the recursive function's dictionary, inner, to integrate the different urls (stored in multi)
import json
import pandas as pd
# load json object
with open('nvdcve-1.0-modified.json') as f:
d = json.load(f)
multi = []; inner = {}
def recursive_extract(i):
global multi, inner
if type(i) is list:
if len(i) == 1:
for k,v in i[0].items():
if type(v) in [list, dict]:
recursive_extract(v)
else:
inner[k] = v
else:
multi = i
if type(i) is dict:
for k,v in i.items():
if type(v) in [list, dict]:
recursive_extract(v)
else:
inner[k] = v
recursive_extract(d['CVE_Items'])
data_dict = []
for i in multi:
tmp = inner.copy()
tmp.update(i)
data_dict.append(tmp)
df = pd.DataFrame(data_dict)
df.to_csv('Output.csv')
Output (all columns the same except for URL, widened for emphasis)

Categories