For example I have two dicts:
schema = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
},
'required': ['reseller_name', 'timestamp'],
}
and
schema_add = {
'properties': {
'user_login': {
'type': 'string',
},
},
'required': ['user_login'],
}
How I can get next merged with appending result dict:
schema_result = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
'user_login': {
'type': 'string',
},
},
'required': ['reseller_name', 'timestamp', 'user_login'],
}
Rules:
Same path is properties and required for scheme and scheme_add in example.
If both dict have dicts with same path, they merged with same rules.
If both dict have lists with same path, then add first list with second.
If both dict have simple values (or dict and non dict or list and non list) with same path, then first value overriding with second.
If only one dict have key with some path, than setting this key and value.
Not sure where the problem likes, but the way you're writing it down is almost like a computer program, and the example is like a test case. Why don't you start from this?
def add_dict(d1, d2):
newdict = {}
for (key, value) in d1.iteritems():
if key in d2: ...
#apply rules, add to newdict, use
else:
#simply add
for (key, value) in d2.iteritems():
if not key in d1:
# simply add
return newdict
This can probably be written more tightly, but might be easier like that to edit.
Edit.. after writing the last comment, couldn't help but write a nicer implementation
def merge_values(a,b):
if a==None or b==None:
return a or b
# now handle cases where both have values
if type(a)==dict:
return add_dict(a, b)
if type(a)==list:
...
def add_dict(d1,d2):
return dict(
[
(key,
merge_values(
d1.get(key,None),
d2.get(key,None)))
for key
in set(d1.keys()).union(d2.keys())
])
My own solution with #Nicolas78 help:
def merge(obj_1, obj_2):
if type(obj_1) == dict and type(obj_2) == dict:
result = {}
for key, value in obj_1.iteritems():
if key not in obj_2:
result[key] = value
else:
result[key] = merge(value, obj_2[key])
for key, value in obj_2.iteritems():
if key not in obj_1:
result[key] = value
return result
if type(obj_1) == list and type(obj_2) == list:
return obj_1 + obj_2
return obj_2
I am adding simple solution of this problem. Assuming that sample data will not change.
def merge_nested_dicts(schema,schema_add):
new_schema = schema
for k in schema:
if k in schema_add.keys():
if isinstance(schema_add[k],dict):
new_schema[k].update(schema_add[k])
if isinstance(schema_add[k],list):
new_schema[k] = new_schema[k]+schema_add[k]
return new_schema
Try this if you know the keys exactly.
schema['properties'].update(schema_add['properties'])
schema['result'].append(schema_add['result'])
result is merged in schema.
If you do not know the keys exactly then one loop is required to find inner list and dictionaries.
for value in schema:
if value is dict:
if schema_add.has_key(value) and schema_add[value] is dict:
schema[value].update(schema_add[value])
elif value is list:
if schema_add.has_key(value) and schema_add[value] is list:
schema[value].append(schema_add[value])
result can be merged into different dict as well.
Related
Suppose that I have a dict named data like below:
{
001: {
'data': {
'fruit': 'apple',
'vegetable': 'spinach'
},
'text': 'lorem ipsum',
'status': 10
},
002: {
.
.
.
}
}
I want to flatten(?) the data key and convert it to this:
{
001: {
'fruit': 'apple',
'vegetable': 'spinach',
'text': 'lorem ipsum',
'status': 10
},
002: {
.
.
.
}
}
I am trying to achieve this using dict comprehensions. Below implementation is with for loops:
mydict = {}
for id, values in data.items():
mydict[id] = {}
for label, value in values.items():
if label == 'data':
for x, y in value.items():
mydict[id][x] = y
else:
mydict[id][label] = value
I tried below comprehension but it gives syntax error:
mydict = {
id: {x: y} for x, y in value.items() if label == 'data' else {label: value}
for id, values in data.items() for label, value in values.items()}
Is there a way to achieve this using comprehensions only?
With dict expansions:
mydict = {i:{**v['data'], **{k:u for k, u in v.items() if k != "data"}} for i, v in data.items()}
The if clause in a comprehension (dict, list, set, generator) applies to the iteration itself, it can not be used for the production. For that you need conditionals in the production.
Generally speaking, comprehensions are really a reorganisation of a specific kind of (possibly nested) iterations:
a bunch of iterations and conditions, possibly nested
a single append/set
So
for a in b:
if c:
for d in e:
for f in g:
if h:
thing.append(i)
can be comprehension-ified, just move the production (i) to the head and put the other bits in a flat sequence:
thing = [
i
for a in b
if c
for d in e
for f in g
if h
]
Now your comprehension makes no sense, because it starts with iterating value, and there's no else in comprehension filter, and even if we add parens {x: y} for x, y in value.items() is not a value. Comprehensions also do not "merge" items, so with:
mydict = {
id: {label: value}
for id, values in data.items() for label, value in values.items()
}
Well you'll get only the last {label: value} for each id, because that's how dicts work.
Here if you consider the production loop, it's this:
for id, values in data.items():
mydict[id] = {}
This means that is your dict comprehension:
mydict = {
id: {}
for id, values in data.items()
}
the rest of the iteration is filling the value, so it needs to be a separate iteration inside the production:
mydict = {
id: {
label: value ???
for label, value in values.items()
}
for id, values in data.items()
}
In which case you hit the issue that this doesn't quite work, because you can't "conditionally iterate" in comprehensions, it's all or nothing.
Except you can: the right side of in is a normal expression, so you can do whatever you want with it, meaning you can unfold-or-refold:
mydict = {
id: {
x: y
for label, value in values.items()
for x, y in (value.items() if label == 'data' else [(label, value)])
}
for id, values in data.items()
}
This is a touch more expensive in the non-data case as you need to re-wrap the key and value in a tuple and list, but that's unlikely to be a huge deal.
An other alternative, instead of using a conditional comprehension, is to use splatting to merge the two dicts (once of which you create via a comp):
mydict = {
id: {
**values['data'],
**{label: value for label, value in values.items() if label != 'data'}
}
for id, values in data.items()
}
This can also be applied to the original to simplify it:
mydict = {}
for id, values in data.items():
mydict[id] = {}
for label, value in values.items():
if label == 'data':
mydict[id].update(value)
else:
mydict[id][label] = value
let me simplify;
sample_data = {
"001": {
"data": {
"fruit": 'apple',
"vegetable": 'spinach'
},
"text": 'lorem ipsum',
"status": 10
},
"002": {
"data": {
"fruit": 'apple',
"vegetable": 'spinach'
},
"text": 'lorem ipsum',
"status": 10
}
}
for key, row in sample_data.items():
if 'data' in row.keys():
info = sample_data[key].pop('data')
sample_data[key] = {**row, **info}
print(sample_data)
I have the following dict:
{
'foo': {
'name': 'bar',
'options': None,
'type': 'qux'
},
'baz': {
'name': 'grault',
'options': None,
'type': 'plugh'
},
}
The names of the top level keys are unknown at runtime. I am unable to figure out how to get the name of the top level key where the value of type is plugh. I have tried all kinds of iterators, loops, comprehensions etc, but i'm not great with Python. Any pointers would be appreciated.
Try this:
for key, inner_dict in dict_.items():
if inner_dict['type'] == 'plugh':
print(key)
Or if you a one liner to get the first key matching the condition:
key = next(key for key, inner_dict in dict_.items() if inner_dict['type'] == 'plugh')
print(key)
output:
baz
Try iterating over the dict keys and check for the element
for key in d:
if(d[key]['type'] == 'plugh'):
print(key)
baz
You need to iterate over your data like this:
def top_level_key(search_key, data):
for key, value in data.items():
if value['type'] == search_key:
return key
print(top_level_key('plugh', data_dict))
Besides running loop to filter the target, you have another option to use jsonpath, which is quite like xPath
# pip install jsonpath-ng==1.5.2
# python 3.6
from jsonpath_ng.ext import parse
dct = {
'foo': {
'name': 'bar',
'options': None,
'type': 'qux'
},
'baz': {
'name': 'grault',
'options': None,
'type': 'plugh'
},
}
parse_str = '$[?#.type="plugh"]'
jsonpath_expr = parse(parse_str)
jsonpath_results = jsonpath_expr.find(dct)
if len(jsonpath_results) > 0:
result = jsonpath_results[0].value
print(result)
# {'name': 'grault', 'options': None, 'type': 'plugh'}
else:
result = None
Ref: https://pypi.org/project/jsonpath-ng/ to find out more stynax about jsonpath
a = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com',
#...
#... Infinite level of another nested dict
}
}
str_key_1 = 'user.username=john'
str_key_2 = 'user.name.last=henry'
#...
#str_key_n = 'user.level2.level3...leveln=XXX'
Let's consider this 'str_key' string, goes with infinite number of dots/levels.
Expected Output:
a = {
'user': {
'username': 'john', # username, should be replace
'name': {
'first': 'Micheal',
'last': 'henry' # lastname, should be replace
},
'email': 'micheal#domain.com',
...
... # Infinite level of another nested dict
}
}
I'm expecting the answers for applying 'n' Level of nested key string, rather than simply replacing by a['user']['username'] = 'John' statically. Answers must be work for any number of 'dotted' string values.
Thanks in advance!
There are three steps:
Separate the key-value pair string into a fully-qualified key and
value.
Split the key into path components.
Traverse the dictionary to find the relevant value to update.
Here's an example of what the code might look like:
# Split by the delimiter, making sure to split once only
# to prevent splitting when the delimiter appears in the value
key, value = str_key_n.split("=", 1)
# Break the dot-joined key into parts that form a path
key_parts = key.split(".")
# The last part is required to update the dictionary
last_part = key_parts.pop()
# Traverse the dictionary using the parts
current = a
while key_parts:
current = current[key_parts.pop(0)]
# Update the value
current[last_part] = value
I'd go with a recursive function to accomplish this, assuming your key value strings are all valid:
def assign_value(sample_dict, str_keys, value):
access_key = str_keys[0]
if len(str_keys) == 1:
sample_dict[access_key] = value
else:
sample_dict[access_key] = assign_value(sample_dict[access_key], str_keys[1:], value)
return sample_dict
The idea is to traverse your dict until you hit the lowest key and then we assign our new value to that last key;
if __name__ == "__main__":
sample_dict = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com'
}
}
str_key_1 = 'user.username=john'
str_keys_1, value_1 = str_key_1.split('=')
sample_dict = assign_value(sample_dict, str_keys_1.split('.'), value_1)
print("result: {} ".format(sample_dict))
str_key_2 = 'user.name.last=henry'
str_keys_2, value_2 = str_key_2.split('=')
sample_dict = assign_value(sample_dict, str_keys_2.split('.'), value_2)
print("result: {}".format(sample_dict))
To use the assign_value you would need to split your original key to the keys and value as seen above;
If you're okay with using exec() and modify your str_key(s), you could do something like:
def get_keys_value(string):
keys, value = string.split("=")
return keys, value
def get_exec_string(dict_name, keys):
exec_string = dict_name
for key in keys.split("."):
exec_string = exec_string + "[" + key + "]"
exec_string = exec_string + "=" + "value"
return exec_string
str_key_1 = "'user'.'username'=john"
str_key_2 = "'user'.'name'.'last'=henry"
str_key_list = [str_key_1, str_key_2]
for str_key in str_key_list:
keys, value = get_keys_value(str_key) # split into key-string and value
exec_string = get_exec_string("a", keys) # extract keys from key-string
exec(exec_string)
print(a)
# prints {'user': {'email': 'micheal#domain.com', 'name': {'last': 'henry', 'first': 'Micheal'}, 'username': 'john'}}
str_key_1 = 'user.username=john'
str_key_2 = 'user.name.last=henry'
a = {
'user': {
'username': 'mic_jack',
'name': {
'first': 'Micheal',
'last': 'Jackson'
},
'email': 'micheal#domain.com',
#...
#... Infinite level of another nested dict
}
}
def MutateDict(key):
strkey, strval = key.split('=')[0], key.split('=')[1]
strkeys = strkey.split('.')
print("strkeys = " ,strkeys)
target = a
k = ""
for k in strkeys:
print(target.keys())
if k in target.keys():
prevTarget = target
target = target[k]
else:
print ("Invalid key specified")
return
prevTarget[k] = strval
MutateDict(str_key_1)
print(a)
MutateDict(str_key_2)
print(a)
I have a dictionary of dictionaries that looks like this:
data={'data': 'input',
'test':
{
'and':
{
'range': {'month': [{'start': 'Jan','end': 'July'}]},
'Student': {'Name': ['ABC'], 'Class': ['10']}
}
}
}
I need to flatten this dict into a dataframe.I tried to use json_normalize() to flatten the dictionary and the output I got looked like this:
My desired output is something like the one given below.
This can be done in R by using as.data.frame(unlist(data)) but I want to do the same flattening in Python. I am a novice in python so I dont have much idea about doing this.
I have made an attempt to normalize your json object by writing a recursive function as follows:
data={'data': 'input',
'test':
{
'and':
{
'range': {'month': [{'start': 'Jan','end': 'July'}]},
'Student': {'Name': ['ABC'], 'Class': ['10']}
}
}
}
sequence = ""
subDicts = []
def findAllSubDicts(data):
global subDicts
global sequence
for key, value in data.items():
sequence += key
#print(sequence)
if isinstance(value, str):
subDicts.append([sequence,value])
sequence = sequence[:sequence.rfind(".")+1]
#print(sequence)
elif isinstance(value, dict):
tempSequence = sequence[:sequence.rfind(".")+1]
sequence += "."
#print(sequence)
findAllSubDicts(value)
sequence = tempSequence
elif isinstance(value, list) and isinstance(value[0], dict):
sequence += "."
tempSequence = sequence[:sequence.rfind(".")+1]
#print(sequence)
findAllSubDicts(value[0])
sequence = tempSequence
elif isinstance(value, list) and len(value)==1:
tempSequence = sequence[:sequence.rfind(".")+1]
subDicts.append([sequence,value[0]])
sequence = tempSequence
return subDicts
outDict = findAllSubDicts(data)
for i in outDict:
print(i[0].ljust(40," "), end=" ")
print(i[1])
Printing the results will give you:
data input
test.and.range.month.start Jan
test.and.range.month.end July
test.and.Student.Name ABC
test.and.Student.Class 10
Notify me if you need any clarification or any modification in my code.
I am consuming several endpoints of an API that is very verbose in the data it returns. I would like to provide a subset of this data to another piece of code elsewhere.
Suppose I am given several dictionaries like this (which I plan to loop through and filter):
asset = {
'id': 1,
'name': 'MY-PC',
'owner': 'me',
'location': 'New York City',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'id': 1,
'name': 'Microsoft'
}
}
}
I want to create a function that will take that dictionary in, along with a "mask" which will be used to create a new dictionary of only the allowed items. This might be an example mask (though, I can work with whatever format makes the resulting code the most concise):
mask = {
'id': True,
'name': True,
'model': {
'id': True,
'name': True,
'manufacturer': {
'name': True
}
}
}
The function should then return this:
mask = {
'id': 1,
'name': 'MY-PC',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'name': 'Microsoft'
}
}
}
Is there something already built into Python 3 that would help aid in this? It looks like if I have to do this manually, it's going to get quite ugly quickly. I found itertools.compress, but that seems like it's for lists and won't handle the complexity of dictionaries.
You can recursively build a new dict from the mask by selecting only values corresponding in the main dict:
def prune_dict(dct, mask):
result = {}
for k, v in mask.items():
if isinstance(v, dict):
value = prune_dict(dct[k], v)
if value: # check that dict is non-empty
result[k] = value
elif v:
result[k] = dct[k]
return result
print(prune_dict(asset, mask))
{'id': 1,
'model': {'id': 1, 'manufacturer': {'name': 'Microsoft'}, 'name': 'Surface'},
'name': 'MY-PC'}
This would be a good chance to use recursion, here is some sample code I haven't tested:
def copy(asset, result, mask):
for key_name, value in mask.items():
if value == True:
result[key_name] = asset[key_name]
else:
result[key_name] = x = {}
copy(asset[key_name], x, value)
y = {}
copy(asset, y, mask)
This would probably be a recursive function. Also, for the mask, I recommend this format: mask = ["id", "name", "model.id", "model.name", "model.manufacturer.name"]
Then, you'd first keep only entries that are named in the mask:
def filterstage1(dictionary, mask):
result = {}
for key in dictionary:
if isinstance(dictionary[key], dict):
newmask = [maskname[mask.find(".") + 1:] for maskname in mask if maskname.startswith(key + ".")]
result[k] = filterstage1(dictionary[key], newmask)
elif key in mask:
result[key] = dictionary[key]
return result
Then, depending on whether or not you want to remove sub-dictionaries that were not in the mask and had no subelements, you can include the second stage:
def filterstage2(dictionary, mask):
result = {}
for key in dictionary:
if not (isinstance(dictionary[key], dict) and dictionary[key] == {} and key not in mask):
result[key] = dictionary[key]
Final code: filterstage2(filterstage1(dictionary, mask), mask). You can combine the two stages together if you wish.