Nested Dictionary merge with default value dict

Nested Dictionary merge with default value dict - python

I have 2 nested dictionaries that have some matching keys and similar structure, and want to merge them into a final third dictionary in a specific way. A default value dictionary, has the values that will be used if not in the second dictionary, which will have some keys that match, and some keys that dont exist. In either event I want it to overwrite the default key or add a new key from the second dictionary to this third dictionary. See (shortened) example below:
default:
{"model_name": "null",
"description": "null",
"frequency": "d",
"tasks": [
{
"target": "elastic",
"metrics": "null",
"model_type": "null",
"alert": {
"type": "pagerduty",
"threshold": 5,
"service_id" : "P94CEA6"
}
}
]
}
second dict
{"model_name": "dqs_cie_registration_09",
"description": "test cie registration",
"tasks": [
{
"source": "elastic",
"metrics": [
"indid_unique_cnt", "zs"
],
"model_type": "Deep_Dive",
"elastic_config": "config",
"read_object": "dqs_rtfs_d_*",
"watcher": "cie_watch_zs_3d.json",
"target_write_index": "dqs_target_write_index"
}
]
}
Id like to merge it so it results in
{"model_name": "dqs_cie_registration_09",
"description": "test cie registration",
"frequency": "d",
"tasks": [
{
"target": "elastic",
"source": "elastic",
"metrics": ["indid_unique_cnt", "zs"],
"model_type": "Deep_Dive",
"elastic_config": "config",
"read_object": "dqs_rtfs_d_*",
"watcher": "cie_watch_zs_3d.json",
"target_write_index": "dqs_target_write_index",
"alert": {
"type": "pagerduty",
"threshold": 5,
"service_id" : "P94CEA6"
}
]
}
The third dict merges the second dict on the first.
I haven't really gotten anywhere but I feel there is a really easy way to implement this that I just don't remember.

Following merge routine produces desired result
import copy # to provide deepcopy
import pprint # Pretty Print
def merge(a, b):
" Merges b into a (to preserve a make a deepcopy prior to calling merge "
if isinstance(a, dict) and isinstance(b, dict):
" Dictionaries "
for k, v in b.items():
if k in a:
# Conditionally add keys from b
if isinstance(a[k], str):
if a[k] == "null":
a[k] = copy.deepcopy(b[k])
else:
merge(a[k], b[k])
else:
# Add keys from b
a[k] = copy.deepcopy(b[k])
elif isinstance(a, list) and isinstance(b, list):
" Lists "
if len(a) == len(b):
for i, item in enumerate(b):
if isinstance(item, str) and isinstance(b[i], str):
if item == "null":
a[i] = b[i]
else:
merge(a[i], b[i])
Usage
d1 = {"model_name": "null",
"description": "null",
"frequency": "d",
"tasks": [
{
"target": "elastic",
"metrics": "null",
"model_type": "null",
"alert": {
"type": "pagerduty",
"threshold": 5,
"service_id" : "P94CEA6"
}
}
]
}
d2 = {"model_name": "dqs_cie_registration_09",
"description": "test cie registration",
"tasks": [
{
"source": "elastic",
"metrics": [
"indid_unique_cnt", "zs"
],
"model_type": "Deep_Dive",
"elastic_config": "config",
"read_object": "dqs_rtfs_d_*",
"watcher": "cie_watch_zs_3d.json",
"target_write_index": "dqs_target_write_index"
}
]
}
merge(d1, d2) # to preserve d1 create a deepcopy prior to merge (i.e. temp = copy.deepcopy(d1))
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d1)
Output
{ 'description': 'test cie registration',
'frequency': 'd',
'model_name': 'dqs_cie_registration_09',
'tasks': [ { 'alert': { 'service_id': 'P94CEA6',
'threshold': 5,
'type': 'pagerduty'},
'elastic_config': 'config',
'metrics': ['indid_unique_cnt', 'zs'],
'model_type': 'Deep_Dive',
'read_object': 'dqs_rtfs_d_*',
'source': 'elastic',
'target': 'elastic',
'target_write_index': 'dqs_target_write_index',
'watcher': 'cie_watch_zs_3d.json'}
]
}

Related

How to update values in a nested dictionary?

I have 2 dictionaries:
data = {
"filter":
{
"and":
[
{
"or":
[
{
"and":
[
{"category": "profile", "key": "languages", "operator": "IN", "value": "EN"},
{"category": "skill", "key": "26366", "value": 100, "operator": "EQ"},
],
},
],
},
{"or": [{"category": "skill", "key": "45165", "operator": "NE"}]},
{"or": [{"category": "skill", "key": "48834", "value": 80, "operator": "GT"}]},
{"or": [{"category": "profile", "key": "gender", "operator": "EQ", "value": "FEMALE"}]},
],
},
}
new_val = {'26366': '11616', '45165': '11613', '48834': '11618'}
I want to update values in "data" dictionary with the values from "new_val" dictionary.
So that 26366(in "data" dict) becomes 11616(from "new_val" dict), 45165 becomes 11613, and 48834 becomes 11618.
"data" dictionary nesting can be different (both up and down)
The key in the "data" dictionary can be different, not only "key", it can be "skill_id", "filter_id" and so on.
And get this result:
{
"filter":
{
"and":
[
{
"or":
[
{
"and":
[
{"category": "profile", "key": "languages", "operator": "IN", "value": "EN"},
{"category": "skill", "key": "11616", "value": 100, "operator": "EQ"},
],
},
],
},
{"or": [{"category": "skill", "key": "11613", "operator": "NE"}]},
{"or": [{"category": "skill", "key": "11618", "value": 80, "operator": "GT"}]},
{"or": [{"category": "profile", "key": "gender", "operator": "EQ", "value": "FEMALE"}]},
],
},
}

To return an updated dict without modifying the old one:
def updated_in_depth(d, replace):
if isinstance(d, dict):
return {k: updated_in_depth(v, replace)
for k,v in d.items()}
elif isinstance(d, list):
return [updated_in_depth(x, replace) for x in d]
else:
return replace.get(d, d)
Testing with your data and new_val:
>>> updated_in_depth(data, new_val)
{'filter': {'and': [{'or': [{'and': [
{'category': 'profile', 'key': 'languages', 'operator': 'IN', 'value': 'EN'},
{'category': 'skill', 'key': '11616', 'value': 100, 'operator': 'EQ'}]}]},
{'or': [{'category': 'skill', 'key': '11613', 'operator': 'NE'}]},
{'or': [{'category': 'skill', 'key': '11618', 'value': 80, 'operator': 'GT'}]},
{'or': [{'category': 'profile', 'key': 'gender', 'operator': 'EQ', 'value': 'FEMALE'}]}]}}

Use something like this:
data['filter']['and']['or']['and'][1]['key']='11616'

To search for the keys recursively you can do:
from copy import deepcopy
def replace(d, new_vals):
if isinstance(d, dict):
# replace key (if there's match):
if "key" in d:
d["key"] = new_vals.get(d["key"], d["key"])
for v in d.values():
replace(v, new_vals)
elif isinstance(d, list):
for v in d:
replace(v, new_vals)
new_data = deepcopy(data)
replace(new_data, new_val)
print(new_data)
Prints:
{
"filter": {
"and": [
{
"or": [
{
"and": [
{
"category": "profile",
"key": "languages",
"operator": "IN",
"value": "EN",
},
{
"category": "skill",
"key": "11616",
"value": 100,
"operator": "EQ",
},
]
}
]
},
{"or": [{"category": "skill", "key": "11613", "operator": "NE"}]},
{
"or": [
{
"category": "skill",
"key": "11618",
"value": 80,
"operator": "GT",
}
]
},
{
"or": [
{
"category": "profile",
"key": "gender",
"operator": "EQ",
"value": "FEMALE",
}
]
},
]
}
}
If you don't need copy of data you can omit the deepcopy:
replace(data, new_val)
print(data)

You can build a recursive function like this
def walk_dict(d):
if isinstance(d, list):
for item in d:
walk_dict(item)
elif isinstance(d, dict):
if 'key' in d and d['key'] in new_val:
d['key'] = new_val[d['key']]
for k, v in d.items():
walk_dict(v)
walk_dict(data)
print(data)

As many have advised, a recursive function will do the trick:
def a(d):
if isinstance(d, dict): # if dictionary, apply a to all values
d = {k: a(d[k]) for k in d.keys()}
return d
elif isinstance(d, list): # if list, apply to all elements
return [a(x) for x in d]
else: # apply to d directly (it is a number, a string or a bool)
return new_val[d] if d in new_val else d
When a is called, it check what is the type of the variable d:
if d is a list, it apply a to each element of the list and return the updated list
if d is a dict, it applies a to all values and return the updated dict
otherwise, it returns the mapped new value if the old one has been found in the new_val keys

data = {
"filter":
{
"and":
[
{
"or":
[
{
"and":
[
{"category": "profile", "key": "languages", "operator": "IN", "value": "EN"},
{"category": "skill", "key": "11616", "value": 100, "operator": "EQ"},
],
},
],
},
{"or": [{"category": "skill", "key": "11613", "operator": "NE"}]},
{"or": [{"category": "skill", "key": "11618", "value": 80, "operator": "GT"}]},
{"or": [{"category": "profile", "key": "gender", "operator": "EQ", "value": "FEMALE"}]},
],
},
}
class Replace:
def __init__(self,data):
self.data=data
def start(self,d):
data = self.data
def replace(data):
if type(data) == list:
for v in data:
replace(v)
if type(data) == dict:
for k,v in data.items():
if type(v) == dict:
replace(v)
if type(v) == str:
if v in d:
data[k] = d[v]
replace(data)
return data
new_data = Replace(data).start({'26366': '11616',
'45165': '11613',
'48834': '11618'})
print(new_data)

Python equivalent of PHP http_build_query

Here is the PHP code that I want to write in Python.
<?php
$json = '{
"targeting": [
{
"country": {
"allow": [
"US",
"DE"
]
},
"region" : {
"allow" : {
"US" : [
33
],
"DE" : [
10383
]
}
},
"city": {
"allow": {
"US": [
57
],
"DE": [
3324
]
}
},
"os": {
"allow": [
{
"name": "Android",
"comparison": "GTE",
"version": "2.3.1"
},
{
"name": "Apple TV Software",
"comparison": "EQ",
"version": "4.4"
},
{
"name": "Windows",
"comparison": "EQ",
"version": "Vista"
}
]
},
"isp" : {
"allow" : {
"US" : [
"Att"
],
"DE" : [
"Telekom"
]
}
},
"ip": {
"allow": [
"11.12.13.0-17.18.19.22",
"6.0.0.0",
"10.0.0.0-10.0.0.2",
"11.0.0.0/24"
]
},
"device_type": [
"mobile"
],
"browser": {
"allow": [
"Yandex.Browser for iOS",
"SlimBrowser",
"Edge Mobile"
]
},
"brand": {
"allow": [
"Smartbook Entertainment",
"Walton",
"PIPO"
]
},
"sub": {
"allow": {
"1": [
"A",
"B"
]
},
"deny": {
"2": [
"C",
"D"
]
},
"deny_groups": [
{
"1": ""
},
{
"1": "X",
"2": "Y"
}
]
},
"connection": [
"wi-fi",
"cellular"
],
"block_proxy": true,
"affiliate_id": [
1
],
"url": "http://test-url.com"
}
]
}';
$arr = json_decode($json);
$postData = http_build_query($arr);
//POST SomeURLhere
echo urldecode($arr);
What I need is to send this json in this format
targeting[0][country][allow][]=TR
targeting[0][os][allow][][name]=iOS
targeting[1][country][allow][]=DE
targeting[1][os][allow][][name]=iOS
I guess I need to figure out how to use http_build_query in Python.

with referring this answer I found the solution.
from collections.abc import MutableMapping
from urllib.parse import urlencode, unquote
def flatten(dictionary, parent_key=False, separator='.', separator_suffix=''):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key + separator_suffix if parent_key else key
if isinstance(value, MutableMapping):
items.extend(flatten(value, new_key, separator, separator_suffix).items())
elif isinstance(value, list) or isinstance(value, tuple):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key, separator, separator_suffix).items())
else:
items.append((new_key, value))
return dict(items)
req = {'check': 'command', 'parameters': ({'parameter': '1', 'description':
'2'}, {'parameter': '3', 'description': '4'})}
req = flatten(req, False, '[', ']')
query = urlencode(req)
query_parsed = unquote(query)
print(query)
print(query_parsed)
And the outputs:
check=command&parameters%5B0%5D%5Bparameter%5D=1&parameters%5B0%5D%5Bdescription%5D=2&parameters%5B1%5D%5Bparameter%5D=3&parameters%5B1%5D%5Bdescription%5D=4
check=command&parameters[0][parameter]=1&parameters[0][description]=2&parameters[1][parameter]=3&parameters[1][description]=4

Pythonic way to transform/flatten JSON containing nested table-as-list-of-dicts structures

Suppose I have a table represented in JSON as a list of dicts, where the keys of each item are the same:
J = [
{
"symbol": "ETHBTC",
"name": "Ethereum",
:
},
{
"symbol": "LTC",
"name": "LiteCoin"
:
},
And suppose I require efficient lookup, e.g. symbols['ETHBTC']['name']
I can transform with symbols = { item['name']: item for item in J }, producing:
{
"ETHBTC": {
"symbol": "ETHBTC",
"name": "Ethereum",
:
},
"LTCBTC": {
"symbol": "LTCBTC",
"name": "LiteCoin",
:
},
(Ideally I would also remove the now redundant symbol field).
However, what if each item itself contains a "table-as-list-of-dicts"?
Here's a fuller minimal example (I've removed lines not pertinent to the problem):
J = {
"symbols": [
{
"symbol":"ETHBTC",
"filters":[
{
"filterType":"PRICE_FILTER",
"minPrice":"0.00000100",
},
{
"filterType":"PERCENT_PRICE",
"multiplierUp":"5",
},
],
},
{
"symbol":"LTCBTC",
"filters":[
{
"filterType":"PRICE_FILTER",
"minPrice":"0.00000100",
},
{
"filterType":"PERCENT_PRICE",
"multiplierUp":"5",
},
],
}
]
}
So the challenge is to transform this structure into:
J = {
"symbols": {
"ETHBTC": {
"filters": {
"PRICE_FILTER": {
"minPrice": "0.00000100",
:
}
I can write a flatten function:
def flatten(L:list, key) -> dict:
def remove_key_from(D):
del D[key]
return D
return { D[key]: remove_key_from(D) for D in L }
Then I can flatten the outer list and loop through each key/val in the resulting dict, flattening val['filters']:
J['symbols'] = flatten(J['symbols'], key="symbol")
for symbol, D in J['symbols'].items():
D['filters'] = flatten(D['filters'], key="filterType")
Is it possible to improve upon this using glom (or otherwise)?
Initial transform has no performance constraint, but I require efficient lookup.

I don't know if you'd call it pythonic but you could make your function more generic using recursion and dropping key as argument. Since you already suppose that your lists contain dictionaries you could benefit from python dynamic typing by taking any kind of input:
from pprint import pprint
def flatten_rec(I) -> dict:
if isinstance(I, dict):
I = {k: flatten_rec(v) for k,v in I.items()}
elif isinstance(I, list):
I = { list(D.values())[0]: {k:flatten_rec(v) for k,v in list(D.items())[1:]} for D in I }
return I
pprint(flatten_rec(J))
Output:
{'symbols': {'ETHBTC': {'filters': {'PERCENT_PRICE': {'multiplierUp': '5'},
'PRICE_FILTER': {'minPrice': '0.00000100'}}},
'LTCBTC': {'filters': {'PERCENT_PRICE': {'multiplierUp': '5'},
'PRICE_FILTER': {'minPrice': '0.00000100'}}}}}

Since you have different transformation rules for different keys, you can keep a list of the key names that require "grouping" on:
t = ['symbol', 'filterType']
def transform(d):
if (m:={a:b for a, b in d.items() if a in t}):
return {[*m.values()][0]:transform({a:b for a, b in d.items() if a not in m})}
return {a:b if not isinstance(b, list) else {x:y for j in b for x, y in transform(j).items()} for a, b in d.items()}
import json
print(json.dumps(transform(J), indent=4))
{
"symbols": {
"ETHBTC": {
"filters": {
"PRICE_FILTER": {
"minPrice": "0.00000100"
},
"PERCENT_PRICE": {
"multiplierUp": "5"
}
}
},
"LTCBTC": {
"filters": {
"PRICE_FILTER": {
"minPrice": "0.00000100"
},
"PERCENT_PRICE": {
"multiplierUp": "5"
}
}
}
}
}

Remove all occurences of a value from a nested dictionary

I have a nested dictionary as the following.
myDict= {
"id": 10,
"state": "MY LIST",
"Stars":
{
"BookA": {
"id": 10,
"state": "new book",
"Mystery": {
"AuthorA":
{
"id": "100",
"state": "thriller"
},
"AuthorB":
{
"id": "112",
"state": "horror"
}
},
"Thriller": {
"Store1":
{
"id": "300",
"state": "Old"
}
}
}
}
}
I want to return a dictionary which has all of the "state": "text" removed. So that means, I want to remove all the "state" fields and have an output as below.
I want it to be generic method as the dictionary could be nested on many levels.
myDict=
{
id: 10,
"Stars":
{
"BookA": {
"id": 10
"Mystery": {
"AuthorA":
{
"id": "100"
},
"AuthorB":
{
"id": "112"
}
},
"Thriller": {
"Store1":
{
"id": "300"
}
}
}
}
I tried the following but it doesnt seem to work. It only removes the "state": "MY LIST". May someone help me to resolve the issue?
def get(self):
removelist= ["state"]
new_dict = {}
for key, item in myDict.items():
if key not in removelist:
new_dict.update({key: item})
return new_dict
It doesnt remove all the "state" values.

You can use a DFS:
def remove_keys(d, keys):
if isinstance(d, dict):
return {k: remove_keys(v, keys) for k, v in d.items() if k not in keys}
else:
return d
The idea is to remove recursively the keys from subtrees: for every subtree that is a nested dict, return a dict without the keys to remove, using a dict comprehension; for every leaf (that is a single value), just return the value.
Test:
from pprint import pprint
pprint(remove_keys(myDict, ['state']))
Output:
{'Stars': {'BookA': {'Mystery': {'AuthorA': {'id': '100'},
'AuthorB': {'id': '112'}},
'Thriller': {'Store1': {'id': '300'}},
'id': 10}},
'id': 10}

The problem is you aren't handling the nested dictionaries.
def get(self):
removelist= ["state"]
new_dict = {}
for key, item in myDict.items():
if key not in removelist:
new_dict.update({key: item})
if isinstance(item, dict):
# You'll need to handle this use case.
return new_dict
To elaborate, lets look back at your dictionary:
myDict= {
"id": 10, # int
"state": "MY LIST", # string
"Stars": { # dictionary
"BookA": {
"id": 10, # int
"state": "new book", # string
"Mystery": { # dictionary
"AuthorA": {
"id": "100",
"state": "thriller"
},
"AuthorB": {
"id": "112",
"state": "horror"
}
},
"Thriller": {
"Store1": {
"id": "300",
"state": "Old"
}
}
}
}
}
I commented in the types for clarity. Your code is currently parsing myDict and ignoring the key "state". Once you hit the value "Stars", you need to parse that dictionary to also ignore the key "state".

Manipulating data structures in Python

I have data in JSON format:
data = {"outfit":{"shirt":"red,"pants":{"jeans":"blue","trousers":"khaki"}}}
I'm attempting to plot this data into a decision tree using InfoVis, because it looks pretty and interactive. The problem is that their graph takes JSON data in this format:
data = {id:"nodeOutfit",
name:"outfit",
data:{},
children:[{
id:"nodeShirt",
name:"shirt",
data:{},
children:[{
id:"nodeRed",
name:"red",
data:{},
children:[]
}],
}, {
id:"nodePants",
name:"pants",
data:{},
children:[{
id:"nodeJeans",
name:"jeans",
data:{},
children:[{
id:"nodeBlue",
name:"blue",
data:{},
children[]
},{
id:"nodeTrousers",
name:"trousers",
data:{},
children:[{
id:"nodeKhaki",
name:"khaki",
data:{},
children:[]
}
}
Note the addition of 'id', 'data' and 'children' to every key and value and calling every key and value 'name'. I feel like I have to write a recursive function to add these extra values. Is there an easy way to do this?
Here's what I want to do but I'm not sure if it's the right way. Loop through all the keys and values and replace them with the appropriate:
for name, list in data.iteritems():
for dict in list:
for key, value in dict.items():
#Need something here which changes the value for each key and values
#Not sure about the syntax to change "outfit" to name:"outfit" as well as
#adding id:"nodeOutfit", data:{}, and 'children' before the value
Let me know if I'm way off.
Here is their example http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.html
And here's the data http://philogb.github.com/jit/static/v20/Jit/Examples/Spacetree/example1.code.html

A simple recursive solution:
data = {"outfit":{"shirt":"red","pants":{"jeans":"blue","trousers":"khaki"}}}
import json
from collections import OrderedDict
def node(name, children):
n = OrderedDict()
n['id'] = 'node' + name.capitalize()
n['name'] = name
n['data'] = {}
n['children'] = children
return n
def convert(d):
if type(d) == dict:
return [node(k, convert(v)) for k, v in d.items()]
else:
return [node(d, [])]
print(json.dumps(convert(data), indent=True))
note that convert returns a list, not a dict, as data could also have more then one key then just 'outfit'.
output:
[
{
"id": "nodeOutfit",
"name": "outfit",
"data": {},
"children": [
{
"id": "nodeShirt",
"name": "shirt",
"data": {},
"children": [
{
"id": "nodeRed",
"name": "red",
"data": {},
"children": []
}
]
},
{
"id": "nodePants",
"name": "pants",
"data": {},
"children": [
{
"id": "nodeJeans",
"name": "jeans",
"data": {},
"children": [
{
"id": "nodeBlue",
"name": "blue",
"data": {},
"children": []
}
]
},
{
"id": "nodeTrousers",
"name": "trousers",
"data": {},
"children": [
{
"id": "nodeKhaki",
"name": "khaki",
"data": {},
"children": []
}
]
}
]
}
]
}
]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nested Dictionary merge with default value dict - python

Related

How to update values in a nested dictionary?

Python equivalent of PHP http_build_query

Pythonic way to transform/flatten JSON containing nested table-as-list-of-dicts structures

Remove all occurences of a value from a nested dictionary

Manipulating data structures in Python

Categories

Resources