Set values for empty list within nested dictionaries - python

I have a cte query that returns me results of values that are linked (i.e. child -> parent).
Then in Python I am trying to create a nested dictionary that would represent something like this:
{
"name": "Child_example",
"parents": [
{
"name": "child_parent_1",
"parents": [{"name": "child_parent_1_parent", "parents": [{"name": "end", "parents": []}]}]
},
{
"name": "child_parent_2",
"parents": [{"name": "end", "parents": []}]
},
{
"name": "child_parent_3",
"parents": [{"name": "child_parent_3_parent", "parents": [{"name": "end", "parents": []}]}]
}
]
}
My input data looks something like so (it can have more data):
child_col
parent_col
name
depth
Child_example
child_parent_1_col
child_parent_1
0
Child_example
child_parent_2_col
child_parent_2
0
Child_example
child_parent_3_col
child_parent_3
0
child_parent_1_col
child_parent_1_parent
1_parent
1
child_parent_2_col
end
1_parent
1
child_parent_3_col
child_parent_3_parent
3_parent
1
child_parent_3_parent
end
end_3
2
child_parent_1_parent
end
end_1
2
However with my code so far:
r_dict = defaultdict(list)
depth_zero = [x for x in rows if x.depth == 0]
for row in depth_zero:
r_dict['name'] = row.path_key
r_dict['parents'].append({'name': row.path_parent_key, 'parents': []})
depth_not_zero = [x for x in rows if x.depth != 0]
# Set inner levels
for parent in r_dict['parents']:
name = parent['name']
inner_parent = parent['parents'].copy()
for row in depth_not_zero:
if row.path_key == name:
inner_parent.append({'name': row.path_parent_key, 'parents': []})
name = row.path_parent_key
parent['parents'] = inner_parent
I only manage to achieve to append it to initial "parents", instead of setting the ultimate nested "parents". I know it is to do with this line of code:
inner_parent.append({'name': row.path_parent_key, 'parents': []})
But I cannot work out how to essentially get and set it. Would this be a case for recursion instead of the way I am doing it?
Below is an example of the first nested dictionary output that I am currently creating with my code:
{
"name": "Child_example",
"parents": [
{
"name": "child_parent_1",
"parents": [
{"name": "child_parent_1", "parents": []}, {"name": "end", "parents": []}
]
}
]
}

I'm a bit baffled by the way you are assigning the "name" value: "Child_example" comes from child_col, "child_parent_1" from name, "child_parent_3_parent" from parent_col. So I simplified it a bit: I put in the second column of the child row the same value as in the first column of its parents rows. That said, if you really need to take the names from different columns it's just a matter of adding some ifs.
My proposal is to loop over the rows in reverse order, creating the inner dicts and then moving them into the outer ones:
rows = [["c1","p1c1",0],
["c1","p2c1",0],
["c1","p3c1",0],
["p1c1","p1p1c1",1],
["p2c1","end",1],
["p3c1","p1p3c1",1],
["p1p3c1","end",2],
["p1p1c1","end",2]]
r_dict = {}
for row in reversed(rows):
if row[1] == "end":
r_dict[row[0]] = {"name":row[0], "parents":[]}
else:
if not row[0] in r_dict:
r_dict[row[0]] = {"name":row[0], "parents":[]}
r_dict[row[0]]["parents"].append(r_dict[row[1]])
del r_dict[row[1]]
r_dict
{'c1': {'name': 'c1', 'parents': [{'name': 'p3c1', 'parents': [{'name': 'p1p3c1', 'parents': []}]}, {'name': 'p2c1', 'parents': []}, {'name': 'p1c1', 'parents': [{'name': 'p1p1c1', 'parents': []}]}]}}

Related

Merge dictionaries with same key from two lists of dicts in python

I have two dictionaries, as below. Both dictionaries have a list of dictionaries as the value associated with their properties key; each dictionary within these lists has an id key. I wish to merge my two dictionaries into one such that the properties list in the resulting dictionary only has one dictionary for each id.
{
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
and the other list:
{
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
The output I am trying to achieve is:
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic",
"language": "english"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
As id: N3 is common in both the lists, those 2 dicts should be merged with all the fields. So far I have tried using itertools and
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
Could someone please help in figuring this out?
Here is one of the approach:
a = {
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
b = {
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
# Create dic maintaining the index of each id in resp dict
a_ids = {item['id']: index for index,item in enumerate(a['properties'])} #{'N3': 0, 'N5': 1}
b_ids = {item['id']: index for index,item in enumerate(b['properties'])} #{'N3': 0, 'N6': 1}
# Loop through one of the dict created
for id in a_ids.keys():
# If same ID exists in another dict, update it with the key value
if id in b_ids:
b['properties'][b_ids[id]].update(a['properties'][a_ids[id]])
# If it does not exist, then just append the new dict
else:
b['properties'].append(a['properties'][a_ids[id]])
print (b)
Output:
{'name': 'harry', 'properties': [{'id': 'N3', 'type': 'energetic', 'language': 'english', 'status': 'OPEN'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}]}
It might help to treat the two objects as elements each in their own lists. Maybe you have other objects with different name values, such as might come out of a JSON-formatted REST request.
Then you could do a left outer join on both name and id keys:
#!/usr/bin/env python
a = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
]
b = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
]
a_names = set()
a_prop_ids_by_name = {}
a_by_name = {}
for ao in a:
an = ao['name']
a_names.add(an)
if an not in a_prop_ids_by_name:
a_prop_ids_by_name[an] = set()
for ap in ao['properties']:
api = ap['id']
a_prop_ids_by_name[an].add(api)
a_by_name[an] = ao
res = []
for bo in b:
bn = bo['name']
if bn not in a_names:
res.append(bo)
else:
ao = a_by_name[bn]
bp = bo['properties']
for bpo in bp:
if bpo['id'] not in a_prop_ids_by_name[bn]:
ao['properties'].append(bpo)
res.append(ao)
print(res)
The idea above is to process list a for names and ids. The names and ids-by-name are instances of a Python set. So members are always unique.
Once you have these sets, you can do the left outer join on the contents of list b.
Either there's an object in b that doesn't exist in a (i.e. shares a common name), in which case you add that object to the result as-is. But if there is an object in b that does exist in a (which shares a common name), then you iterate over that object's id values and look for ids not already in the a ids-by-name set. You add missing properties to a, and then add that processed object to the result.
Output:
[{'name': 'harry', 'properties': [{'id': 'N3', 'status': 'OPEN', 'type': 'energetic'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}]}]
This doesn't do any error checking on input. This relies on name values being unique per object. So if you have duplicate keys in objects in both lists, you may get garbage (incorrect or unexpected output).

Remove duplicate dictionary from a list on the basis of key value by priority

Suppose I have the following type of list of dictionaries:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"},
{"Name": "Zeke", "Type": "Reader"}
]
I want to remove duplicates of "Name" on the basis of "Type" by the following priority (Admin > Writer > Reader), so the end result should be:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Writer"}
]
I found a similar question but it removes duplicates for one explicit type of key-value: Link
Can someone please guide me on how to move forward with this?
This is a modified form of the solution suggested by #azro, their solution and the other solution does not take into account the priority you mentioned, you can get over that using the following code. Have a priority dict.
iterlist = [
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"}
]
# this is used to get the priority
priorites = {i:idx for idx, i in enumerate(['Admin', 'Writer', 'Reader'])}
sort_key = lambda x:(x['Name'], priorites[x['Type']])
groupby_key = lambda x:x['Name']
result = [next(i[1]) for i in groupby(sorted(iterlist, key=sort_key), key=groupby_key)]
print(result)
Output
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
You can also use pandas in the following way:
transform the list of dictionary to data frame:
import pandas as pd
df = pd.DataFrame(iterlist)
create a mapping dict:
m = {'Admin': 3, 'Writer': 2, 'Reader': 1}
create a priority column using replace:
df['pri'] = df['Type'].replace(m)
sort_values by pri and groupby by Name and get the first element only:
df = df.sort_values('pri', ascending=False).groupby('Name').first().reset_index()
drop the pri column and return to dictionary using to_dict:
df.drop('pri', axis='columns').to_dict(orient='records')
This will give you the following:
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
Here is solution you can try out,
unique = {}
for v in iterlist:
# check if key exists, if not update to `unique` dict
if not unique.get(v['Name']):
unique[v['Name']] = v
print(unique.values())
dict_values([{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}])

Creating a flare json to be used in D3 from pandas dataframe

I have a dataframe that I want to convert to a hierarchical flare json to be used in a D3 visulalization like this: D3 sunburst
My dataframe contains a hierarchial data such as this:
And the output I want should look like this:
{"name": "flare","children":
[
{"name": "Animal", "children":
[
{"name": "Mammal", "children":
[
{"name": "Fox","value":35000},
{"name": "Lion","value":25000}
]
},
{"name": "Fish", "children":
[
{"name": "Cod","value":35000}
]
}
]
},
{"name": "Plant", "children":
[
{"name": "Tree", "children":
[
{"name": "Oak","value":35000}
]
}
]
}
]
}
I have tried several approaches, but cant get it right. Here is my non-working code, inspired by this post: Pandas to D3. Serializing dataframes to JSON
from collections import defaultdict
import pandas as pd
df = pd.DataFrame({'group1':["Animal", "Animal", "Animal", "Plant"],'group2':["Mammal", "Mammal", "Fish", "Tree"], 'group3':["Fox", "Lion", "Cod", "Oak"],'value':[35000,25000,15000,1500] })
tree = lambda: defaultdict(tree)
d = tree()
for _, (group0,group1, group2, group3, value) in df.iterrows():
d['name'][group0]['children'] = group1
d['name'][group1]['children'] = group2
d['name'][group2]['children'] = group3
d['name'][group3]['children'] = value
json.dumps(d)
I am working on a similar visualization project that requires moving data from a Pandas DataFrame to a JSON file that works with D3.
I came across your post while looking for a solution and ended up writing something based on this GitHub repository and with input from the link you provided in this post.
The code is not pretty and is a bit hacky and slow. But based on my project, it seems to work just fine for any amount of data as long as it has three levels and a value field. You should be able to simply fork the D3 Starburst notebook and replace the flare.json file with this code's output.
The modification that I made here, based on the original GitHub post, is to provide consideration for three levels of data. So, if the name of the level 0 node exists, then append from level 1 and on. Likewise, if the name of the level 1 node exists, then append the level 2 node (the third level). Otherwise, append the full path of data. If you need more, some kind of recursion might do the trick, or just keep hacking it to add more levels
# code snip to format Pandas DataFrame to json for D3 Starburst Chart
# libraries
import json
import pandas as pd
# example data with three levels and a single value field
data = {'group1': ['Animal', 'Animal', 'Animal', 'Plant'],
'group2': ['Mammal', 'Mammal', 'Fish', 'Tree'],
'group3': ['Fox', 'Lion', 'Cod', 'Oak'],
'value': [35000, 25000, 15000, 1500]}
df = pd.DataFrame.from_dict(data)
print(df)
""" The sample dataframe
group1 group2 group3 value
0 Animal Mammal Fox 35000
1 Animal Mammal Lion 25000
2 Animal Fish Cod 15000
3 Plant Tree Oak 1500
"""
# initialize a flare dictionary
flare = {"name": "flare", "children": []}
# iterate through dataframe values
for row in df.values:
level0 = row[0]
level1 = row[1]
level2 = row[2]
value = row[3]
# create a dictionary with all the row data
d = {'name': level0,
'children': [{'name': level1,
'children': [{'name': level2,
'value': value}]}]}
# initialize key lists
key0 = []
key1 = []
# iterate through first level node names
for i in flare['children']:
key0.append(i['name'])
# iterate through next level node names
key1 = []
for _, v in i.items():
if isinstance(v, list):
for x in v:
key1.append(x['name'])
# add the full row of data if the root is not in key0
if level0 not in key0:
d = {'name': level0,
'children': [{'name': level1,
'children': [{'name': level2,
'value': value}]}]}
flare['children'].append(d)
elif level1 not in key1:
# if the root exists, then append only the next level children
d = {'name': level1,
'children': [{'name': level2,
'value': value}]}
flare['children'][key0.index(level0)]['children'].append(d)
else:
# if the root exists, then only append the next level children
d = {'name': level2,
'value': value}
flare['children'][key0.index(level0)]['children'][key1.index(level1)]['children'].append(d)
# uncomment next three lines to save as json file
# save to some file
# with open('filename_here.json', 'w') as outfile:
# json.dump(flare, outfile)
print(json.dumps(flare, indent=2))
""" the expected output of this json data
{
"name": "flare",
"children": [
{
"name": "Animal",
"children": [
{
"name": "Mammal",
"children": [
{
"name": "Fox",
"value": 35000
},
{
"name": "Lion",
"value1": 25000
}
]
},
{
"name": "Fish",
"children": [
{
"name": "Cod",
"value": 15000
}
]
}
]
},
{
"name": "Plant",
"children": [
{
"name": "Tree",
"children": [
{
"name": "Oak",
"value": 1500
}
]
}
]
}
]
}
"""

Particular nested dictionary from a Pandas DataFrame for circle packing

I am trying to create a particular nested dictionary from a DataFrame in Pandas conditions, in order to then visualize.
dat = pd.DataFrame({'cat_1' : ['marketing', 'marketing', 'marketing', 'communications'],
'child_cat' : ['marketing', 'social media', 'marketing', 'communications],
'skill' : ['digital marketing','media marketing','research','seo'],
'value' : ['80', '101', '35', '31']
and I would like to turn this into a dictionary that looks a bit like this:
{
"name": "general skills",
"children": [
{
"name": "marketing",
"children": [
{
"name": "marketing",
"children": [
{
"name": "digital marketing",
"value": 80
},
{
"name": "research",
"value": 35
}
]
},
{
"name": "social media", // notice that this is a sibling of the parent marketing
"children": [
{
"name": "media marketing",
"value": 101
}
]
}
]
},
{
"name": "communications",
"children": [
{
"name": "communications",
"children": [
{
"name": "seo",
"value": 31
}
]
}
]
}
]
}
So cat_1 is the parent node, child_cat is its children, and skill is its child too. I am having trouble with creating the additional children lists. Any help?
With a lot of inefficiencies I came up with this solution. Probably highly sub-optimal
final = {}
# control dict to get only one broad category
contrl_dict = {}
contrl_dict['dummy'] = None
final['name'] = 'variants'
final['children'] = []
# line is the values of each row
for idx, line in enumerate(df_dict.values):
# parent categories dict
broad_dict_1 = {}
print(line)
# this takes every value of the row minus the value in the end
for jdx, col in enumerate(line[:-1]):
# look into the broad category first
if jdx == 0:
# check in our control dict - does this category exist? if not add it and continue
if not col in contrl_dict.keys():
# if it doesn't it appends it
contrl_dict[col] = 'added'
# then the broad dict parent takes the name
broad_dict_1['name'] = col
# the children are the children broad categories which will be populated further
broad_dict_1['children'] = []
# go to broad categories 2
for ydx, broad_2 in enumerate(list(df_dict[df_dict.broad_categories == col].broad_2.unique())):
# sub categories dict
prov_dict = {}
prov_dict['name'] = broad_2
# children is again a list
prov_dict['children'] = []
# now isolate the skills and values of each broad_2 category and append them
for row in df_dict[df_dict.broad_2 == broad_2].values:
prov_d_3 = {}
# go to each row
for xdx, direct in enumerate(row):
# in each row, values 2 and 3 are name and value respectively add them
if xdx == 2:
prov_d_3['name'] = direct
if xdx == 3:
prov_d_3['size'] = direct
prov_dict['children'].append(prov_d_3)
broad_dict_1['children'].append(prov_dict)
# if it already exists in the control dict then it moves on
else:
continue
final['children'].append(broad_dict_1)

compare two list of dicts in python 2

I want to compare two lists of dictionaries in python, I have a list sent from the frontend and a query result stored in the same function, so all I want is to compare both of the lists with the key barcode and if they matches I want to append the name from the second dictionary to the first one
for example:
data_from_frontend = [
{ barcode: '1', name_en: 'milk' },
{ barcode: '2', name_en: 'water' },
{ barcode: '3', name_en: 'cheese' },
{ barcode: '10', name_en: 'pepsi' },
]
result_from_query = [
{ barcode: '1', name: 'PID012343' },
{ barcode: '2', name: 'PID123454' },
{ barcode: '10', name: 'PID123432' },
]
I want to compare both of the lists and by the barcode and if they match I want to merge the the pair of both variables to a new one + adding the one the doesn't match to another list, so the outcome would be two new variables with the [matched + name] and not_found, how can I achieve that ?
Here is what i've tried
equal = []
not_equal = []
no_barcode = []
x = [ { "age": "22" }, { "name": "John Doe" }, { "name": "Jane Doe" }, { "name": "Doctor" }, { "name": "Engineer" } ]
y = [ { "name": "Engineer" }, { "name": "Jane Doe" }, { "name": "Doctor" } ]
x_sort = sorted(x, key=lambda k: ("name" not in k, k.get("name", None)))
y_sort = sorted(y, key=lambda k: ("name" not in k, k.get("name", None)))
print(y_sort)
for x_val in x_sort:
if "name" not in x_val.keys():
no_barcode.append(x_val)
else:
for y_val in y_sort:
if x_val["name"] == y_val["name"]:
equal.append(x_val)
mapped = map(lambda k: k["name"], y_sort)
if x_val["name"] not in mapped:
not_equal.append(x_val)
print('equal')
print(equal)
print('not equal')
print(not_equal)
First of all you should fix your dict keys and enclose them into quotes.
Then you can use generator expression to find items, for example:
print('initial dict:')
pprint.pprint(data_from_frontend)
for item in result_from_query:
item_found = next((i for i in data_from_frontend if i['barcode'] == item['barcode']), False)
if item_found:
item_found['name'] = item['name']
print('dict after search:')
pprint.pprint(data_from_frontend)
will produce:
initial dict:
[{'barcode': '1', 'name_en': 'milk'},
{'barcode': '2', 'name_en': 'water'},
{'barcode': '3', 'name_en': 'cheese'},
{'barcode': '10', 'name_en': 'pepsi'}]
dict after search:
[{'barcode': '1', 'name': 'PID012343', 'name_en': 'milk'},
{'barcode': '2', 'name': 'PID123454', 'name_en': 'water'},
{'barcode': '3', 'name_en': 'cheese'},
{'barcode': '10', 'name': 'PID123432', 'name_en': 'pepsi'}]
Using False in generator will avoid error when searching by barcode value not existing in target dict.
P.S. dont forget to import pprint if you want to use it
P.P.S. and sure you can create new dict instead of modifying existing one, using same logic
Your barcode matching results can be get like this.
barcode = 'barcode'
name_en = 'name_en'
name = 'name'
matching_result = data_from_frontend[:] #get a copy of front end data to use as the output
for i in range(len(data_from_frontend)):
for j in range(len(result_from_query)):
if(data_from_frontend[i][barcode] == result_from_query[j][barcode]):
matching_result[i][name] = result_from_query[j][name]
break
print(matching_result)

Categories