How to mask a Python 3 nested dictionary to return a new dictionary with only certain items? - python

I am consuming several endpoints of an API that is very verbose in the data it returns. I would like to provide a subset of this data to another piece of code elsewhere.
Suppose I am given several dictionaries like this (which I plan to loop through and filter):
asset = {
'id': 1,
'name': 'MY-PC',
'owner': 'me',
'location': 'New York City',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'id': 1,
'name': 'Microsoft'
}
}
}
I want to create a function that will take that dictionary in, along with a "mask" which will be used to create a new dictionary of only the allowed items. This might be an example mask (though, I can work with whatever format makes the resulting code the most concise):
mask = {
'id': True,
'name': True,
'model': {
'id': True,
'name': True,
'manufacturer': {
'name': True
}
}
}
The function should then return this:
mask = {
'id': 1,
'name': 'MY-PC',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'name': 'Microsoft'
}
}
}
Is there something already built into Python 3 that would help aid in this? It looks like if I have to do this manually, it's going to get quite ugly quickly. I found itertools.compress, but that seems like it's for lists and won't handle the complexity of dictionaries.

You can recursively build a new dict from the mask by selecting only values corresponding in the main dict:
def prune_dict(dct, mask):
result = {}
for k, v in mask.items():
if isinstance(v, dict):
value = prune_dict(dct[k], v)
if value: # check that dict is non-empty
result[k] = value
elif v:
result[k] = dct[k]
return result
print(prune_dict(asset, mask))
{'id': 1,
'model': {'id': 1, 'manufacturer': {'name': 'Microsoft'}, 'name': 'Surface'},
'name': 'MY-PC'}

This would be a good chance to use recursion, here is some sample code I haven't tested:
def copy(asset, result, mask):
for key_name, value in mask.items():
if value == True:
result[key_name] = asset[key_name]
else:
result[key_name] = x = {}
copy(asset[key_name], x, value)
y = {}
copy(asset, y, mask)

This would probably be a recursive function. Also, for the mask, I recommend this format: mask = ["id", "name", "model.id", "model.name", "model.manufacturer.name"]
Then, you'd first keep only entries that are named in the mask:
def filterstage1(dictionary, mask):
result = {}
for key in dictionary:
if isinstance(dictionary[key], dict):
newmask = [maskname[mask.find(".") + 1:] for maskname in mask if maskname.startswith(key + ".")]
result[k] = filterstage1(dictionary[key], newmask)
elif key in mask:
result[key] = dictionary[key]
return result
Then, depending on whether or not you want to remove sub-dictionaries that were not in the mask and had no subelements, you can include the second stage:
def filterstage2(dictionary, mask):
result = {}
for key in dictionary:
if not (isinstance(dictionary[key], dict) and dictionary[key] == {} and key not in mask):
result[key] = dictionary[key]
Final code: filterstage2(filterstage1(dictionary, mask), mask). You can combine the two stages together if you wish.

Related

Python build dict from a mixture of dict keys and list values

Input body is:
{'columns': ['site_ref', 'site_name', 'region'], 'data': [['R005000003192', 'AIRTH DSR NS896876', 'WEST'], ['R005000003195', 'AIRTHREY DSR NS814971', 'WEST']]}
How could I build a new dict that will take the column values as keys and then populate a new dict for every list item within the data values for 1 to n?
Desired output would be:
{
{
"site_ref": "R005000003192",
"site_name": "AIRTH DSR NS896876",
"region": "WEST"
},
{
"site_ref": "R005000003195",
"site_name": "AIRTH DSR NS896876",
"region": "WEST"
}
}
I have attempted to iterate over with:
for i in range(len(result["data"])):
new_dict = []
new_dict.append(dict(zip(result["columns"], result["data"][i])))
But cannot seem to get it to complete the iteration
Note that you would require keys if the output should be a dictionary, which are currently missing in the provided desired output. However, you could create a list of dictionaries as follows:
d = {'columns': ['site_ref', 'site_name', 'region'], 'data': [['R005000003192', 'AIRTH DSR NS896876', 'WEST'], ['R005000003195', 'AIRTHREY DSR NS814971', 'WEST']]}
res = [{k: v for k, v in zip(d['columns'], datum)} for datum in d['data']]
print(res)
prints
[{'site_ref': 'R005000003192',
'site_name': 'AIRTH DSR NS896876',
'region': 'WEST'},
{'site_ref': 'R005000003195',
'site_name': 'AIRTHREY DSR NS814971',
'region': 'WEST'}]
If you want keys after all, you could e.g. use the numbering (i.e. 1 for the first, n for the n-th datum) for the keys, as follows:
res = {i+1: {k: v for k, v in zip(d['columns'], datum)} for i, datum in enumerate(d['data'])}
print(res)
prints
{1: {'site_ref': 'R005000003192',
'site_name': 'AIRTH DSR NS896876',
'region': 'WEST'},
2: {'site_ref': 'R005000003195',
'site_name': 'AIRTHREY DSR NS814971',
'region': 'WEST'}}

retreive key from nested dict based on value, where key names are unknown

I have the following dict:
{
'foo': {
'name': 'bar',
'options': None,
'type': 'qux'
},
'baz': {
'name': 'grault',
'options': None,
'type': 'plugh'
},
}
The names of the top level keys are unknown at runtime. I am unable to figure out how to get the name of the top level key where the value of type is plugh. I have tried all kinds of iterators, loops, comprehensions etc, but i'm not great with Python. Any pointers would be appreciated.
Try this:
for key, inner_dict in dict_.items():
if inner_dict['type'] == 'plugh':
print(key)
Or if you a one liner to get the first key matching the condition:
key = next(key for key, inner_dict in dict_.items() if inner_dict['type'] == 'plugh')
print(key)
output:
baz
Try iterating over the dict keys and check for the element
for key in d:
if(d[key]['type'] == 'plugh'):
print(key)
baz
You need to iterate over your data like this:
def top_level_key(search_key, data):
for key, value in data.items():
if value['type'] == search_key:
return key
print(top_level_key('plugh', data_dict))
Besides running loop to filter the target, you have another option to use jsonpath, which is quite like xPath
# pip install jsonpath-ng==1.5.2
# python 3.6
from jsonpath_ng.ext import parse
dct = {
'foo': {
'name': 'bar',
'options': None,
'type': 'qux'
},
'baz': {
'name': 'grault',
'options': None,
'type': 'plugh'
},
}
parse_str = '$[?#.type="plugh"]'
jsonpath_expr = parse(parse_str)
jsonpath_results = jsonpath_expr.find(dct)
if len(jsonpath_results) > 0:
result = jsonpath_results[0].value
print(result)
# {'name': 'grault', 'options': None, 'type': 'plugh'}
else:
result = None
Ref: https://pypi.org/project/jsonpath-ng/ to find out more stynax about jsonpath

How to replace nested python dictionary value from a key as a string format with separated by dots?

a = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com',
#...
#... Infinite level of another nested dict
}
}
str_key_1 = 'user.username=john'
str_key_2 = 'user.name.last=henry'
#...
#str_key_n = 'user.level2.level3...leveln=XXX'
Let's consider this 'str_key' string, goes with infinite number of dots/levels.
Expected Output:
a = {
'user': {
'username': 'john', # username, should be replace
'name': {
'first': 'Micheal',
'last': 'henry' # lastname, should be replace
},
'email': 'micheal#domain.com',
...
... # Infinite level of another nested dict
}
}
I'm expecting the answers for applying 'n' Level of nested key string, rather than simply replacing by a['user']['username'] = 'John' statically. Answers must be work for any number of 'dotted' string values.
Thanks in advance!
There are three steps:
Separate the key-value pair string into a fully-qualified key and
value.
Split the key into path components.
Traverse the dictionary to find the relevant value to update.
Here's an example of what the code might look like:
# Split by the delimiter, making sure to split once only
# to prevent splitting when the delimiter appears in the value
key, value = str_key_n.split("=", 1)
# Break the dot-joined key into parts that form a path
key_parts = key.split(".")
# The last part is required to update the dictionary
last_part = key_parts.pop()
# Traverse the dictionary using the parts
current = a
while key_parts:
current = current[key_parts.pop(0)]
# Update the value
current[last_part] = value
I'd go with a recursive function to accomplish this, assuming your key value strings are all valid:
def assign_value(sample_dict, str_keys, value):
access_key = str_keys[0]
if len(str_keys) == 1:
sample_dict[access_key] = value
else:
sample_dict[access_key] = assign_value(sample_dict[access_key], str_keys[1:], value)
return sample_dict
The idea is to traverse your dict until you hit the lowest key and then we assign our new value to that last key;
if __name__ == "__main__":
sample_dict = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com'
}
}
str_key_1 = 'user.username=john'
str_keys_1, value_1 = str_key_1.split('=')
sample_dict = assign_value(sample_dict, str_keys_1.split('.'), value_1)
print("result: {} ".format(sample_dict))
str_key_2 = 'user.name.last=henry'
str_keys_2, value_2 = str_key_2.split('=')
sample_dict = assign_value(sample_dict, str_keys_2.split('.'), value_2)
print("result: {}".format(sample_dict))
To use the assign_value you would need to split your original key to the keys and value as seen above;
If you're okay with using exec() and modify your str_key(s), you could do something like:
def get_keys_value(string):
keys, value = string.split("=")
return keys, value
def get_exec_string(dict_name, keys):
exec_string = dict_name
for key in keys.split("."):
exec_string = exec_string + "[" + key + "]"
exec_string = exec_string + "=" + "value"
return exec_string
str_key_1 = "'user'.'username'=john"
str_key_2 = "'user'.'name'.'last'=henry"
str_key_list = [str_key_1, str_key_2]
for str_key in str_key_list:
keys, value = get_keys_value(str_key) # split into key-string and value
exec_string = get_exec_string("a", keys) # extract keys from key-string
exec(exec_string)
print(a)
# prints {'user': {'email': 'micheal#domain.com', 'name': {'last': 'henry', 'first': 'Micheal'}, 'username': 'john'}}
str_key_1 = 'user.username=john'
str_key_2 = 'user.name.last=henry'
a = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com',
#...
#... Infinite level of another nested dict
}
}
def MutateDict(key):
strkey, strval = key.split('=')[0], key.split('=')[1]
strkeys = strkey.split('.')
print("strkeys = " ,strkeys)
target = a
k = ""
for k in strkeys:
print(target.keys())
if k in target.keys():
prevTarget = target
target = target[k]
else:
print ("Invalid key specified")
return
prevTarget[k] = strval
MutateDict(str_key_1)
print(a)
MutateDict(str_key_2)
print(a)

Flattening a dictionary of dictionaries that contain lists

I have a dictionary of dictionaries that looks like this:
data={'data': 'input',
'test':
{
'and':
{
'range': {'month': [{'start': 'Jan','end': 'July'}]},
'Student': {'Name': ['ABC'], 'Class': ['10']}
}
}
}
I need to flatten this dict into a dataframe.I tried to use json_normalize() to flatten the dictionary and the output I got looked like this:
My desired output is something like the one given below.
This can be done in R by using as.data.frame(unlist(data)) but I want to do the same flattening in Python. I am a novice in python so I dont have much idea about doing this.
I have made an attempt to normalize your json object by writing a recursive function as follows:
data={'data': 'input',
'test':
{
'and':
{
'range': {'month': [{'start': 'Jan','end': 'July'}]},
'Student': {'Name': ['ABC'], 'Class': ['10']}
}
}
}
sequence = ""
subDicts = []
def findAllSubDicts(data):
global subDicts
global sequence
for key, value in data.items():
sequence += key
#print(sequence)
if isinstance(value, str):
subDicts.append([sequence,value])
sequence = sequence[:sequence.rfind(".")+1]
#print(sequence)
elif isinstance(value, dict):
tempSequence = sequence[:sequence.rfind(".")+1]
sequence += "."
#print(sequence)
findAllSubDicts(value)
sequence = tempSequence
elif isinstance(value, list) and isinstance(value[0], dict):
sequence += "."
tempSequence = sequence[:sequence.rfind(".")+1]
#print(sequence)
findAllSubDicts(value[0])
sequence = tempSequence
elif isinstance(value, list) and len(value)==1:
tempSequence = sequence[:sequence.rfind(".")+1]
subDicts.append([sequence,value[0]])
sequence = tempSequence
return subDicts
outDict = findAllSubDicts(data)
for i in outDict:
print(i[0].ljust(40," "), end=" ")
print(i[1])
Printing the results will give you:
data input
test.and.range.month.start Jan
test.and.range.month.end July
test.and.Student.Name ABC
test.and.Student.Class 10
Notify me if you need any clarification or any modification in my code.

How merge with appending two nested dictionaries in python?

For example I have two dicts:
schema = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
},
'required': ['reseller_name', 'timestamp'],
}
and
schema_add = {
'properties': {
'user_login': {
'type': 'string',
},
},
'required': ['user_login'],
}
How I can get next merged with appending result dict:
schema_result = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
'user_login': {
'type': 'string',
},
},
'required': ['reseller_name', 'timestamp', 'user_login'],
}
Rules:
Same path is properties and required for scheme and scheme_add in example.
If both dict have dicts with same path, they merged with same rules.
If both dict have lists with same path, then add first list with second.
If both dict have simple values (or dict and non dict or list and non list) with same path, then first value overriding with second.
If only one dict have key with some path, than setting this key and value.
Not sure where the problem likes, but the way you're writing it down is almost like a computer program, and the example is like a test case. Why don't you start from this?
def add_dict(d1, d2):
newdict = {}
for (key, value) in d1.iteritems():
if key in d2: ...
#apply rules, add to newdict, use
else:
#simply add
for (key, value) in d2.iteritems():
if not key in d1:
# simply add
return newdict
This can probably be written more tightly, but might be easier like that to edit.
Edit.. after writing the last comment, couldn't help but write a nicer implementation
def merge_values(a,b):
if a==None or b==None:
return a or b
# now handle cases where both have values
if type(a)==dict:
return add_dict(a, b)
if type(a)==list:
...
def add_dict(d1,d2):
return dict(
[
(key,
merge_values(
d1.get(key,None),
d2.get(key,None)))
for key
in set(d1.keys()).union(d2.keys())
])
My own solution with #Nicolas78 help:
def merge(obj_1, obj_2):
if type(obj_1) == dict and type(obj_2) == dict:
result = {}
for key, value in obj_1.iteritems():
if key not in obj_2:
result[key] = value
else:
result[key] = merge(value, obj_2[key])
for key, value in obj_2.iteritems():
if key not in obj_1:
result[key] = value
return result
if type(obj_1) == list and type(obj_2) == list:
return obj_1 + obj_2
return obj_2
I am adding simple solution of this problem. Assuming that sample data will not change.
def merge_nested_dicts(schema,schema_add):
new_schema = schema
for k in schema:
if k in schema_add.keys():
if isinstance(schema_add[k],dict):
new_schema[k].update(schema_add[k])
if isinstance(schema_add[k],list):
new_schema[k] = new_schema[k]+schema_add[k]
return new_schema
Try this if you know the keys exactly.
schema['properties'].update(schema_add['properties'])
schema['result'].append(schema_add['result'])
result is merged in schema.
If you do not know the keys exactly then one loop is required to find inner list and dictionaries.
for value in schema:
if value is dict:
if schema_add.has_key(value) and schema_add[value] is dict:
schema[value].update(schema_add[value])
elif value is list:
if schema_add.has_key(value) and schema_add[value] is list:
schema[value].append(schema_add[value])
result can be merged into different dict as well.

Categories