Python - Getting the intersection of two Json-Files - python

i'm looking for an option to calculate the intersection of two JSON-Files. I have been searching for it and found that i can use sets for my problem. This works "okay". But i have to get a more detailed view of the intersection. And this is where the problems are starting.
How i calc the intersection:
def calcIntersect(ValidationFile, json_object1, json_object2):
with open(ValidationFile) as schema_file:
schema = j.load(schema_file)
js.Draft4Validator.check_schema(schema)
with open(json_object1) as spec_file:
spec1 = j.load(spec_file, object_pairs_hook=OrderedDict)
js.validate(spec1, schema)
with open(json_object2) as spec_file:
spec2 = j.load(spec_file, object_pairs_hook=OrderedDict)
js.validate(spec2, schema)
x = set(spec1) & set(spec2)
print(x)
Example Data1:
{
"Car":{
"Brand":"Audi",
"Nationality":"Germany",
"Modelname":"A6"
},
"Engine":{
"cubic capacity":"2967",
"Enginetype":"V6",
"Fuel":"Diesel",
"MaxSpeed":"250"
},
"Colors":{
"Carcolor":"Black",
"Interiorrcolor":"white"
}
}
Example Data2:
{
"Car":{
"Brand":"Audi",
"Nationality":"USA",
"Modelname":"A6"
},
"Engine":{
"cubic capacity":"2995",
"Enginetype":"V6",
"Fuel":"Petrol",
"MaxSpeed":"250"
},
"Colors":{
"Carcolor":"Black",
"Interiorrcolor":"Black"
}
}
Example-Output:
{'Car', 'Colors', 'Engine'}
This are just the "Keys" but i need the dictonaries. At the moment it is giving me this keys to say that there is a intersection in it. Maybe in 'Car' there is in both Files a "Audi" and the nationality is different because one car is produced in America and the other car is produced in Germany. But still it returns 'Car' and not the "Audi".
I hope i were able to describe my problem for a bit. It's my first question..

The following lines, inspired by #likeon's answer, will give you a dictionary whose keys will be the keys of the intersecting objects in your specs, and the values an array containing the intersecting objects.
intersect = { key: [o, spec2[key]] for key, o in spec1.iteritems()
if key in spec2 };
Edit:
If you are using python 3, you must use itemsinstead of iteritems:
intersect = { key: [o, spec2[key]] for key, o in spec1.items()
if key in spec2 };

Why you don't just iterate over spec1 and compare values with spec2 like that:
x = {k: v for k, v in spec1.iteritems() if k in spec2 and spec2[k] == v}

You'll need a recursive solution:
json1 = {
"Car": {
"Brand": "Audi",
"Nationality": "Germany",
"Modelname": "A6"
},
"Engine": {
"cubic capacity": "2967",
"Enginetype": "V6",
"Fuel": "Diesel",
"MaxSpeed": "250"
},
"Colors": {
"Carcolor": "Black",
"Interiorrcolor": "white"
}
}
json2 = {
"Car": {
"Brand": "Audi",
"Nationality": "USA",
"Modelname": "A6"
},
"Engine": {
"cubic capacity": "2995",
"Enginetype": "V6",
"Fuel": "Petrol",
"MaxSpeed": "250"
},
"Colors": {
"Carcolor": "Black",
"Interiorrcolor": "Black"
}
}
def common_dict(d1, d2):
output = {}
for k in set(d1) & set(d2):
o1, o2 = d1[k], d2[k]
if isinstance(o1, dict) and isinstance(o2, dict):
output[k] = common_dict(o1, o2)
elif o1 == o2:
output[k] = o1
return output
print common_dict(json1, json2)
# {'Engine': {'MaxSpeed': '250', 'Enginetype': 'V6'}, 'Car': {'Brand': 'Audi', 'Modelname': 'A6'}, 'Colors': {'Carcolor': 'Black'}}

Related

Update Nested Dictionary value using Recursion

I want to update Dict dictionary's value by inp dictionary's values using recursion or loop.
also the format should not change mean use recursion or loop on same format
please suggest a solution that is applicable to all level nesting not for this particular case
dict={
"name": "john",
"quality":
{
"type1":"honest",
"type2":"clever"
},
"marks":
[
{
"english":34
},
{
"math":90
}
]
}
inp = {
"name" : "jack",
"type1" : "dumb",
"type2" : "liar",
"english" : 28,
"math" : 89
}
Another solution, changing the dict in-place:
dct = {
"name": "john",
"quality": {"type1": "honest", "type2": "clever"},
"marks": [{"english": 34}, {"math": 90}],
}
inp = {
"name": "jack",
"type1": "dumb",
"type2": "liar",
"english": 28,
"math": 89,
}
def change(d, inp):
if isinstance(d, list):
for i in d:
change(i, inp)
elif isinstance(d, dict):
for k, v in d.items():
if not isinstance(v, (list, dict)):
d[k] = inp.get(k, v)
else:
change(v, inp)
change(dct, inp)
print(dct)
Prints:
{
"name": "jack",
"quality": {"type1": "dumb", "type2": "liar"},
"marks": [{"english": 28}, {"math": 89}],
}
First, make sure you change the name of the first Dictionary, say to myDict, since dict is reserved in Python as a Class Type.
The below function will do what you are looking for, in a recursive manner.
def recursive_swipe(input_var, updates):
if isinstance(input_var, list):
output_var = []
for entry in input_var:
output_var.append(recursive_swipe(entry, updates))
elif isinstance(input_var, dict):
output_var = {}
for label in input_var:
if isinstance(input_var[label], list) or isinstance(input_var[label], dict):
output_var[label] = recursive_swipe(input_var[label], updates)
else:
if label in updates:
output_var[label] = updates[label]
else:
output_var = input_var
return output_var
myDict = recursive_swipe(myDict, inp)
You may look for more optimal solutions if there are some limits to the formatting of the two dictionaries that were not stated in your question.

Convert nested JSON to Dataframe with columns referencing nested paths

I am trying to convert a nested JSON into a CSV file with three columns: the level 0 key, the branch, and the lowest level leaf.
For example, in the JSON below:
{
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
I want to create an output like this (didn't include entire tree example just to get point across):
level 0
branch
leaf
protein
protein.meat
chicken
protein
protein.meat
beef
I tried using json normalize but the actual file will not have paths that I can use to identify the nested fields as each dictionary is unique.
This returns the level 0 field but I need to have these as rows, not columns. Any help would be very much appreciated.
I created a function that pcan unnest the json based on key values like this:
import json
with open('path/to/json') as m:
my_json = json.load(m)
def unnest_json(data):
for key, value in data.items():
print(str(key)+'.'+str(value))
if isinstance(value, dict):
unnest_json(value)
elif isinstance(value, list):
for val in value:
if isinstance(val, str):
pass
elif isinstance(val, list):
pass
else:
unnest_json(val)
unnest_json(my_json)
Probably not the cleanest approach but I think you can use some sort of recursive function (traverse in below code) to convert the dictionary into a list of column values and then convert them to pandas DataFrame.
data = {
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
def traverse(col_values, dictionary, rows):
for key in dictionary:
new_col_values = list(col_values)
if dictionary[key]:
new_col_values[1] += '.' + key
traverse(new_col_values, dictionary[key], rows)
else:
new_col_values[2] = key
rows.append(new_col_values)
rows = []
for key in data:
traverse([key, str(key), None], data[key], rows)
import pandas as pd
df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)

MongoDB Python Update/ Insert dict in dict without overwriting

I can't insert my new document value (dict) without overwriting my existing data. I've looked through all different resources and can't find an answer.
I've also though of putting the values from first_level_dict into a list "first_level_dict" : [dict1, dict2] but I won't know how to append the dict eighter.
Sample Data:
# Create the document
target_dict = {
"_id": 55,
"Root_dict": {
"first_level_dict": {
"second_level_dict1": {"Content1": "Value1"}
}
},
"Root_key": "Root_value"
}
collection.insert_one(target_dict)
The result I'm looking for:
result_dict = {
"_id": 55,
"Root_dict": {
"first_level_dict": {
"second_level_dict1": {"Content1": "Value1"},
"second_level_dict2": {"Content2": "Value2"}
}
},
"Root_key": "Root_value"
}
Update: New Values example 2:
# New Values Sample
new_values = {
"_id": 55,
"Root_dict": {
"first_level_dict": {
"secon_level_dict2": {"Content2": "Value2"},
"secon_level_dict3": {"Content3": "Value3"}
}
}
collection.insert_one(target_dict)
Update: The result I'm looking for example 2:
result_dict = {
"_id": 55,
"Root_dict": {
"first_level_dict": {
"second_level_dict1": {"Content1": "Value1"},
"second_level_dict2": {"Content2": "Value2"},
"second_level_dict3": {"Content3": "Value3"},
}
},
"Root_key": "Root_value"
}
What I've tried:
# Update document "$setOnInsert"
q = {"_id": 55}
target_dict = {"$set": {"Root_dict": {"first_level_dict": {"second_level_dict2": {"Content2": "Value2"}}}}}
collection.update_one(q, target_dict)
What I've tried example 2:
# Update document
q = {"_id": 55}
target_dict = {"$set": {"Root_dict.first_level_dict": {
"second_level_dict2": {"Content2": "Value2"},
"second_level_dict3": {"Content3": "Value3"}}}}
collection.update_one(q, target_dict)
Try using the dot notation:
target_dict = {$set: {"Root_dict.first_level_dict.second_level_dict2": {"Content2": "Value2"}}}
Additionally, to update/add multiple fields (for "example 2"):
target_dict = {$set: {
"Root_dict.first_level_dict.second_level_dict2": {"Content2": "Value2"},
"Root_dict.first_level_dict.second_level_dict3": {"Content3": "Value3"}
}
}

wrong value from a JSON last keys level using .values()

"fwt-master2": {
"ipv4": {
"rtr": {
"ip": "1.2.3.4",
"net": "3.4.5.6",
"netlen": "24",
"netmask": "255.255.255.0",
"broadcast": "7.8.9.1"
}
}
I am trying to get the ip value from this JSON file without specifying the values of each element (without using fwt-maser2[ipv4][rtr][ip]).
using the .values() method (.values()[0].values()[0].values()[0]`)
I am getting the netlen value (24) instead of the ip values which is actually the first element.
why is such thing happening?
I think use of nested code to find key value is the best way ...this way you just search if "broadcast" key in dict then print its value
Try something rom here: find all occurances of a key
If you only know that the target key is "ip", then you can use recursion:
s = {"fwt-master2": {
"ipv4": {
"rtr": {
"ip": "1.2.3.4",
"net": "3.4.5.6",
"netlen": "24",
"netmask": "255.255.255.0",
"broadcast": "7.8.9.1"
}
}
}
}
def get_ip(d):
return [i for c in filter(None, [b if a == 'ip' else get_ip(b) if isinstance(b, dict) else None for a, b in d.items()]) for i in c]
print(''.join(get_ip(s)))
Output:
1.2.3.4
I decided to go through your dictionary and found that its incomplete !!!
your dictionary :
"fwt-master2": {
"ipv4": {
"rtr": {
"ip": "1.2.3.4",
"net": "3.4.5.6",
"netlen": "24",
"netmask": "255.255.255.0",
"broadcast": "7.8.9.1"
}
}
Actually it should be like : Added missing curly braces ... first and last two..
{"fwt-master2": { "ipv4": { "rtr": { "ip": "1.2.3.4", "net": "3.4.5.6", "netlen": "24", "netmask": "255.255.255.0", "broadcast": "7.8.9.1" }}}}
Well it happens ... so Amusing above updated one is the actual dictionary so here is how you can achieve your goal:
>>> d = {"fwt-master2": { "ipv4": { "rtr": { "ip": "1.2.3.4", "net": "3.4.5.6", "netlen": "24", "netmask": "255.255.255.0", "broadcast": "7.8.9.1" }}}}
>>> obj = []
>>> obj.append(d)
>>> obj
[{'fwt-master2': {'ipv4': {'rtr': {'net': '3.4.5.6', 'netlen': '24', 'ip': '1.2.3.4', 'netmask': '255.255.255.0', 'broadcast': '7.8.9.1'}}}}]
>>> key_list = ['netmask', 'broadcast', 'ip', 'net']
>>> def recursive_items(dictionary):
... for key, value in dictionary.items():
... if type(value) is dict:
... yield from recursive_items(value)
... else:
... yield (key, value)
...
>>> def find_key(obj):
... for e in obj:
... for key, value in recursive_items(e):
... if key in key_list:
... print(key, value)
... for key, value in recursive_items(e):
... if key in key_list and value == 0:
... print(double_quote(key),":", value)
...
>>> find_key(obj)
net 3.4.5.6
ip 1.2.3.4
netmask 255.255.255.0
broadcast 7.8.9.1
Have Fun

Python filter nested dict given list of key names

Is there a way to filter a nested dict in Python, so I can see only the keys I'd specified ?
Example:
x = {
"field": [
{
"nm_field": "ch_origem_sistema_chave",
"inf_tabelado": {
"dropdown_value": "",
"dropdown_key": "",
"url_lista": "",
"chave_relacional": ""
},
},
{
"nm_field": "ax_andamento_data",
"inf_tabelado": {
"dropdown_value": "",
"dropdown_key": "",
"url_lista": "",
"chave_relacional": ""
},
}
],
"_metadata": {
"dt_reg": "22/01/2014 16:17:16",
"dt_last_up": "10/04/2014 16:30:44",
},
"url_detalhes": "/DetalhesDocsPro.aspx",
"url_app": "/docspro",
}
y = filter(x, ['dropdown_value', 'nm_field', 'url_app', 'dt_reg'])
Then var y would be something like:
{
"field": [
{
"nm_field": "ch_origem_sistema_chave",
"inf_tabelado": {
"dropdown_value": "",
},
},
{
"nm_field": "ax_andamento_data",
"inf_tabelado": {
"dropdown_value": "",
},
}
],
"_metadata": {
"dt_reg": "22/01/2014 16:17:16",
},
"url_app": "/docspro",
}
I've tried to do something using defaultdict, but had no success with lists at any level of recursion. Also I found dificulty while working with different data structures.
Here's a modified version of 2rs2ts's answer that returns a new object rather than modifying the old one (and handles filtering on non-leaf nodes):
import copy
def fltr(node, vals):
if isinstance(node, dict):
retVal = {}
for key in node:
if key in vals:
retVal[key] = copy.deepcopy(node[key])
elif isinstance(node[key], list) or isinstance(node[key], dict):
child = fltr(node[key], vals)
if child:
retVal[key] = child
if retVal:
return retVal
else:
return None
elif isinstance(node, list):
retVal = []
for entry in node:
child = fltr(entry, vals)
if child:
retVal.append(child)
if retVal:
return retVal
else:
return None
With this, you will call
y = fltr(x, ['dropdown_value', 'nm_field', 'url_app', 'dt_reg'])
and get
{
"field": [
{
"inf_tabelado": {
"dropdown_value": ""
},
"nm_field": "ch_origem_sistema_chave"
},
{
"inf_tabelado": {
"dropdown_value": ""
},
"nm_field": "ax_andamento_data"
}
],
"url_app": "/docspro",
"_metadata": {
"dt_reg": "22/01/2014 16:17:16"
}
}
Note that this will return None if everything is filtered. For example,
fltr(x, [])
will always return None, no matter what is in x.
Here's a solution which walks the structure in a depth-first manner to find the "leaf" nodes which you are checking to see if they're in your list of elements to preserve. When it finds such an element, it removes it from the dictionary with del. (So this is done in-place.)
def fltr(d, vals):
if isinstance(d, dict):
vals_to_del = []
for k in d:
if k in vals:
continue
if not isinstance(d[k], list) and not isinstance(d[k], dict):
if k not in vals:
vals_to_del.append(k)
else:
fltr(d[k], vals)
for k in vals_to_del:
del d[k]
elif isinstance(d, list):
for i in d:
fltr(i, vals)
Note that I didn't define a function called filter, because it's a built-in one and you don't want to shadow it.
>>> fltr(x, ['dropdown_value', 'nm_field', 'url_app', 'dt_reg'])
>>> x
{'field': [{'inf_tabelado': {'dropdown_value': ''}, 'nm_field': 'ch_origem_sistema_chave'}, {'inf_tabelado': {'dropdown_value': ''}, 'nm_field': 'ax_andamento_data'}], 'url_app': '/docspro', '_metadata': {'dt_reg': '22/01/2014 16:17:16'}}

Categories