This question already has an answer here:
Python: Dynamically update a dictionary with varying variable "depth"
(1 answer)
Closed 3 years ago.
I am creating a multi level nested dictionary by reading from a large csv file. The content the files are in the following format, which store relevant information pertaining a unique book. We can assume each row has 6 columns(author, title, year, category, url, citations); all column entries have identical formatting. For example:
Author,Title,Year,Category,Url,Citations
"jk rowling, etc....",goblet of fire,1973,magic: fantasy: english literature,http://doi.acm.org/10.1145/800010.808066,6
"Weiner, Leonard H.",cracking the coding interview,1973,LA: assessment: other,http://doi.acm.org/10.1145/800010.808105,2
"Tolkien",hobbit,1953,magic: fantasy: medieval,http://doi.acm.org/10.1145/800010.808066,6
I want the output to match how each row in the csv file is parsed, similar to the following:
*(note: the # of nested dictionaries is dependent on the book categories under the category header of the csv. Keys are based on successive categories (order matters), separated by the ':' delimiter. Think of the ordering of categories per row in the csv file as the path directory; multiple files can have the same path directory up to a certain point or they can have the same path directory and be placed in the same folder.
results = {'1973':{
"magic": {
"fantasy": {
"English literature": {
"name": "goblet of fire",
"citations": 6,
"url": "http://doi.acm.org/10.1145/800010.808066"
}
},
"medieval": {
"name": "The Hobbit",
"citations": 7,
"url": "http://doi.acm.org/10.1145/800fdfdffd010.808066"
}
}
},
'1953':{
"la": {
"assessment": {
"other": {
"name": "cracking the coding interview",
"citations": 6,
"url": "http://doi.acm.org/10.1145/800010.808105"
}
}
}
}
}
Obviously some books will have share common successive categories together like in the example I showed above. Some books might also share the exact same successive categories. I think I should recursively iterate through the string of categories per row in the csv, either creating new sub dicts that deviate from a preexisting category order, then creating a dictionary representation of the book once there are no more successive categories to check. I'm just not sure exactly how to start.
Here's what I have so far, it's just a standard setup of reading csv files:
with open(DATA_FILE, 'r') as data_file:
data = csv.reader(data_file)
Essentially, I want to create a tree representation of this csv using nested dictionaries, the relative category path (i.e. magic:fantasy:etc...), determining which subtree to traverse/create.If two or more books have the same consecutive path, I want to make all those books leafs of their respective key, instead of overriding each book(leaf) whenever a new book has an identical category path. Leafs represent a dictionary representation of the books mentioned per row in the csv.
You can group your data by category (using a simple dictionary, as you mentioned that you cannot use any modules other than csv) and then apply recursion:
import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[i[3].split(': '), *i[4:], *i[:3]] for i in data]
def group(d):
_d = {}
for a, *b in d:
if a[0] not in _d:
_d[a[0]] = [[a[1:], *b]]
else:
_d[a[0]].append([a[1:], *b])
r = {a:{'books':[{'name':c[-2], 'citations':c[2], 'url':c[1], 'author':c[3]} for c in b if not c[0]], **(lambda x:{} if not x else group(x))([c for c in b if c[0]])} for a, b in _d.items()}
return {a:{c:d for c, d in b.items() if d} for a, b in r.items()}
import json
print(json.dumps(group(new_data), indent=4))
Output:
{
"magic": {
"fantasy": {
"english literature": {
"books": [
{
"name": "goblet of fire",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "jk rowling, etc...."
}
]
},
"medieval": {
"books": [
{
"name": "hobbit",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "Tolkien"
}
]
}
}
},
"LA": {
"assessment": {
"other": {
"books": [
{
"name": "cracking the coding interview",
"citations": "2",
"url": "http://doi.acm.org/10.1145/800010.808105",
"author": "Weiner, Leonard H."
}
]
}
}
}
}
Edit: grouping by publication date:
import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[i[3].split(': '), *i[4:], *i[:3]] for i in data]
_data = {}
for i in new_data:
if i[-1] not in _data:
_data[i[-1]] = [i]
else:
_data[i[-1]].append(i)
final_result = {a:group(b) for a, b in _data.items()}
Output:
{
"1973": {
"magic": {
"fantasy": {
"english literature": {
"books": [
{
"name": "goblet of fire",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "jk rowling, etc...."
}
]
}
}
},
"LA": {
"assessment": {
"other": {
"books": [
{
"name": "cracking the coding interview",
"citations": "2",
"url": "http://doi.acm.org/10.1145/800010.808105",
"author": "Weiner, Leonard H."
}
]
}
}
}
},
"1953": {
"magic": {
"fantasy": {
"medieval": {
"books": [
{
"name": "hobbit",
"citations": "6",
"url": "http://doi.acm.org/10.1145/800010.808066",
"author": "Tolkien"
}
]
}
}
}
}
}
Separate categories by their nest
Parse CSV to pandas dataframe
Groupby by category in a loop
use to_dict() to convert to dict in a groupby loop
You can do something like the following:
import pandas as pd
df = pd.read_csv('yourcsv.csv', sep=',')
Next, you want to isolate the Category column and split its content with columns:
cols_no_categ = list(df.columns)
cols_no_categ.remove('Category')
category = df['Category']
DICT = {}
for c in category:
dicto = df[df.Category == c, cols_no_categ].to_dict()
s = c.split(': ')
DICT[s[0]][s[1]][s[2]] = dicto
Related
Trying to append to a nested json file
My goal is to append some values to a JSON file.
Here is my original JSON file
{
"web": {
"all": {
"side": {
"tags": [
"admin"
],
"summary": "Generates",
"operationId": "Key",
"consumes": [],
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "YES",
"schema": {
"type": "string"
}
}
},
"Honor": [
{
"presidential": []
}
]
}
}
}
}
It is my intention to add two additional lines inside the key "Honor", with the values "Required" : "YES" and "Prepay" : "NO". As a result of appending the two values, I will have the following JSON file.
{
"web": {
"all": {
"side": {
"tags": [
"admin"
],
"summary": "Generates",
"operationId": "Key",
"consumes": [],
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "YES",
"schema": {
"type": "string"
}
}
},
"Honor": [
{
"presidential": [],
"Required" : "YES",
"Prepay" : "NO"
}
]
}
}
}
}
Below is the Python code that I have written
import json
def write_json(data,filename ="exmpleData.json"):
with open(filename,"w") as f:
json.dump(data,f,indent=2)
with open ("exmpleData.json") as json_files:
data= json.load(json_files)
temp = data["Honor"]
y = {"required": "YES","type": "String"}
temp.append(y)
write_json(data)
I am receiving the following error message:
** temp = data["Honor"] KeyError: 'Honor'
**
I would appreciate any guidance that you can provide to help me achieve my goal. I am running Python 3.7
'Honor' is deeply nested in other dictionaries, and its value is a 1-element list containing a dictionary. Here's how to access:
import json
def write_json(data, filename='exmpleData.json'):
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
with open('exmpleData.json') as json_files:
data = json.load(json_files)
# 'Honor' is deeply nested in other dictionaries
honor = data['web']['all']['side']['Honor']
# Its value is a 1-element list containing another dictionary.
honor[0]['Required'] = 'YES'
honor[0]['Prepay'] = 'NO'
write_json(data)
I'd recommend that you practice your fundamentals a bit more since you're making many mistakes in your data structure handling. The good news is, your JSON load/dump is fine.
The cause of your error message is that data doesn't have an "Honor" property. Data only has a "web" property, which contains "all" which contains "side" which contains "Honor", which contains an array with a dictionary that holds the properties you are trying to add to. So you want to set temp with temp = data['web']['all']['side']['Honor'][0]
You also cannot use append on python dictionaries. Instead, check out dict.update().
I have a nested dict(as below). Goal is to to extract the values of "req_key1" and "rkey1", and append them to a list.
raw_json = {
"first_key": {
"f_sub_key1": "some_value",
"f_sub_key2": "some_value"
},
"second_key": {
"another_key": [{
"s_sub_key1": [{
"date": "2022-01-01",
"day": {
"key1": "value_1",
"keyn": "value_n"
}
}],
"s_sub_key2": [{
"req_key1": "req_value1",
"req_key2": {
"rkey1": "rvalue_1",
"rkeyn": "rvalue_n"
}
}]
}]
}
}
I am able to append the values to a list and below is my approach.
emp_ls = []
filtered_key = raw_json["second_key"]["another_key"]
for i in filtered_key:
for k in i.get("s_sub_key2"):
emp_ls.append({"first_val": k.get("req_key1"), "second_val": k["req_key2"].get("rkey1") })
print(emp_ls)
Is it a good approach i.e. it can be used in production or there can be another approach to do this task?
I have a csv file with multiple values in one cell like in this format:
ID, Name, Role, Task, Responsibility
123, Stephen, "1. Give, 2. Take", "1.1. DO, 1.2. AB, 2.1. DF", "1.1.1. FG, 1.1.2. GH, 1.2.1. SG, 2.1.1. DF, 2.1.2. JK"
I added some white space for readability.
I need to convert this csv file into nested json format like:
{
"Name" : "Stephen",
"123": {
"1": {
"Role": "Give",
"1.1": {
"Task": "DO",
"1.1.1": {
"Responsibility": "FG"
},
"1.1.2": {
"Responsibility": "GH"
}
},
"1.2": {
"Task": "AB",
"1.2.1": {
"Responsibility": "SG"
}
}
},
"2": {
"Role": "Take",
"2.1": {
"Task": "DF",
"2.1.1": {
"Responsibility": "DF"
},
"2.1.2": {
"Responsibility": "JK"
}
}
}
}
}
and the numbers go like this 1, 1.1, 1.2.1, 2.2, 2.3, 2.3.1. I need a to detect such cells (or such type of columns) and convert it into the key:value pair like above.
You can use recursion with itertools.groupby:
from itertools import groupby as gb
def to_dict(data):
d = [(a, list(b)) for a,b in gb(sorted(data, key=lambda x:x[0][0]), key=lambda x:x[0][0])]
return {b[0][1]:{**b[0][-1], **to_dict([[j, k, l] for [_, *j], k, l in b if j])} for a,b in d}
import re, json
s = """
ID, Name, Role, Task, Responsibility
123, Stephen, "1. Give, 2. Take", "1.1. DO, 1.2. AB, 2.1. DF", "1.1.1. FG, 1.1.2. GH, 1.2.1. SG, 2.1.1. DF, 2.1.2. JK"
"""
#below: parse desired values from data and format header
[h1, h2, *h], [_id, n, *_data] = [re.findall('(?<=")[^"]+|\w+', i) for i in filter(None, s.split('\n'))]
#transform numerical paths as lists
data = [[b.split('. ') for b in i.split(', ')] for i in _data if i != ', ']
#associate original file headers to the transformed data
formed = [l for a, b in zip(h, data) for l in [[c.split('.'), c, {a:d}] for c, d in b]]
print(json.dumps({h2:n, h1:to_dict(formed)}, indent=4))
Output:
{
"Name": "Stephen",
"ID": {
"1": {
"Role": "Give",
"1.1": {
"Task": "DO",
"1.1.1": {
"Responsibility": "FG"
},
"1.1.2": {
"Responsibility": "GH"
}
},
"1.2": {
"Task": "AB",
"1.2.1": {
"Responsibility": "SG"
}
}
},
"2": {
"Role": "Take",
"2.1": {
"Task": "DF",
"2.1.1": {
"Responsibility": "DF"
},
"2.1.2": {
"Responsibility": "JK"
}
}
}
}
}
What are the options for extracting value from JSON depending on other parameters (using python)? For example, JSON:
"list": [
{
"name": "value",
"id": "123456789"
},
{
"name": "needed-value",
"id": "987654321"
}
]
When using json_name["list"][0]["id"] it obviously returns 123456789. Is there a way to indicate "name" value "needed-value" so i could get 987654321 in return?
For example:
import json as j
s = '''
{
"list": [
{
"name": "value",
"id": "123456789"
},
{
"name": "needed-value",
"id": "987654321"
}
]
}
'''
js = j.loads(s)
print [x["id"] for x in js["list"] if x["name"] == "needed-value"]
The best way to handle this is to refactor the json as a single dictionary. Since "name" and "id" are redundant you can make the dictionary with the value from "name" as the key and the value from "id" as the value.
import json
j = '''{
"list":[
{
"name": "value",
"id": "123456789"
},{
"name": "needed-value",
"id": "987654321"
}
]
}'''
jlist = json.loads(j)['list']
d = {jd['name']: jd['id'] for jd in jlist}
print(d) ##{'value': '123456789', 'needed-value': '987654321'}
Now you can iterate the items like you normally would from a dictionary.
for k, v in d.items():
print(k, v)
# value 123456789
# needed-value 987654321
And since the names are now hashed, you can check membership more efficiently than continually querying the list.
assert 'needed-value' in d
jsn = {
"list": [
{
"name": "value",
"id": "123456789"
},
{
"name": "needed-value",
"id": "987654321"
}
]
}
def get_id(list, name):
for el in list:
if el['name'] == name:
yield el['id']
print(list(get_id(jsn['list'], 'needed-value')))
Python innately treats JSON as a list of dictionaries. With this in mind, you can call the index of the list you need to be returned since you know it's location in the list (and child dictionary).
In your case, I would use list[1]["id"]
If, however, you don't know where the position of your needed value is within the list, the you can run an old fashioned for loop this way:
for user in list:
if user["name"] == "needed_value":
return user["id"]
This is assuming you only have one unique needed_value in your list.
I have a csv file and trying to compose JSON from it. There are mulitple records in a file but I am just giving one set of sample records here.This structure is driven on the claimID. There is nesting on the claimLineDetail and claimSpecDiag.I guess I have to create some sort of list to handle this then the problem is how am I going to append it in the required structure. I really need some guidance here to achieve the desired result. Is it possible to break out different sections and append it later, I am not sure just assuming, as there are multiple columns.
Code :
import csv,json
data = []
with open('JsonRequestPricingMedical.csv','r') as f:
reader = csv.DictReader(f)
for row in reader:
print row
csv file :
claimId,subscriberId,claimType,claimSubType,providerId,totalChargeAmt,claimLineNo,pos_code,procedureCode,subdiagnosisCode,svcLineFromDt,svcLineToDt,chargedAmt,clmLineUnits,presentOnAdmit,diagnosisCode
18A000730400,101924200,M,M,002664514003,585,1,11,92014,H43393,2017-06-19,2017-06-19,160,1,U,H43393
18A000730400,101924200,M,M,002664514003,585,2,12,92015,H43395,2017-06-19,2017-06-19,160,2,U,H43394
Desired JSON
[
{
"claimsHeader":" {
"claimId": "18A000730400",
"subscriberId": "101924200",
"claimType":{
"code": "M"
},
"claimSubType": {
"code": "M"
},
"providerId" :"002664514003",
"totalChargeAmt": "585",
"claimLineDetail" :[
{
"claimLineNo": "1",
"placeOfService": {
"code": "11"
},
"procedureCode": {
"code": "92014"
},
"subDiagnosisCd": {
"code": "H43393"
},
"svcLineFromDt": "2017-06-19",
"svcLineToDt": "2017-06-19",
"chargedAmt": "160",
"clmLineUnits": "1",
},
{
"claimLineNo": "2",
"placeOfService": {
"code": "12"
},
"procedureCode": {
"code": "92015"
},
"subDiagnosisCd": {
"code": "H433945
},
"svcLineFromDt": "2017-06-19",
"svcLineToDt": "2017-06-19",
"chargedAmt": "160",
"clmLineUnits": "2",
}
],
{
"claimSpecDiag": [
"presentOnAdmit": "",
"diagnosisCode": "H43393",
},
{
"presentOnAdmit": "",
"diagnosisCode": "H43394",
}
]
}
]
When you read a csv, each line represents variables separated by a special char, in your case, comas: ",".
You can get each variable separated by doing line_variables = row.split(',')
Just pass the first line, and for all the other, do something like:
result = {
"claimsHeader":" {
"claimId": line_variables[0],
"subscriberId": line_variables[1],
"claimType":{
"code": line_variables[2]
}
...
Finaly, just add the result to a list (created just before your for loop) with your_list.append(result).