I am looking to efficiently merge two (fairly arbitrary) data structures: one representing a set of defaults values and one representing overrides. Example data below. (Naively iterating over the structures works, but is very slow.) Thoughts on the best approach for handling this case?
_DEFAULT = { 'A': 1122, 'B': 1133, 'C': [ 9988, { 'E': [ { 'F': 6666, }, ], }, ], }
_OVERRIDE1 = { 'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }
_ANSWER1 = { 'A': 1122, 'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }
_OVERRIDE2 = { 'C': [ 6543, { 'E': [ { 'G': 9876, }, ], }, ], }
_ANSWER2 = { 'A': 1122, 'B': 1133, 'C': [ 6543, { 'E': [ { 'F': 6666, 'G': 9876, }, ], }, ], }
_OVERRIDE3 = { 'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }
_ANSWER3 = { 'A': 1122, 'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }
This is an example of how to run the tests:
(The dictionary update doesn't work, just an stub function.)
import itertools
def mergeStuff( default, override ):
# This doesn't work
result = dict( default )
result.update( override )
return result
def main():
for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
result = mergeStuff(_DEFAULT, override)
print('ANSWER: %s' % (answer) )
print('RESULT: %s\n' % (result) )
You cannot do that by "iterating", you'll need a recursive routine like this:
def merge(a, b):
if isinstance(a, dict) and isinstance(b, dict):
d = dict(a)
d.update({k: merge(a.get(k, None), b[k]) for k in b})
return d
if isinstance(a, list) and isinstance(b, list):
return [merge(x, y) for x, y in itertools.izip_longest(a, b)]
return a if b is None else b
If you want your code to be fast, don't copy like crazy
You don't really need to merge two dicts. You can just chain them.
A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple update() calls.
class ChainMap(UserDict.DictMixin):
"""Combine multiple mappings for sequential lookup"""
def __init__(self, *maps):
self._maps = maps
def __getitem__(self, key):
for mapping in self._maps:
try:
return mapping[key]
except KeyError:
pass
raise KeyError(key)
def main():
for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
result = ChainMap(override, _DEFAULT)
http://docs.python.org/dev/library/collections#chainmap-objects
http://code.activestate.com/recipes/305268/
If you know one structure is always a subset of the other, then just iterate the superset and in O(n) time you can check element by element whether it exists in the subset and if it doesn't, put it there. As far as I know there's no magical way of doing this other than checking it manually element by element. Which, as I said, is not bad as it can be done in with O(n) complexity.
dict.update() is what you need. But it overrides the original dict, so make a copy of the original one if you want to keep it.
Related
I have a dictionary like this
d = {
'Benefits': {
1: {
'BEN1': {
'D': [{'description': 'D1'}],
'C': [{'description': 'C1'}]
}
},
2: {
'BEN2': {
'D': [{'description': 'D2'}],
'C': [{'description': 'C2'}]
}
}
}
}
I am trying to sort dictionary based on KEY OF LAST VALUES(LIST).
FOR EXAMPLE
I am looking for get dictionary value like 'C' IN first and 'D' in second
I'm trying to get correct order. Here is code:
d1 = collections.OrderedDict(sorted(d.items()))
Unfortunately didn't get correct result
This is my expected output
{'Benefits':
{1:
{'BEN1':
{'C':[{'description': 'C1'}], 'D': [{'description': 'D1'}]
}
},
2:
{'BEN2':
{'C': [{'description': 'C2'}], 'D': [{'description': 'D2'}]
}
}
}
}
I am using python 3.5 . I am trying to get order like this
{'C':[{'description': 'C1'}], 'D': [{'description': 'D1'}]}
The following code will sort any dictionary by its key and recursively sort any dictionary value that is also a dictionary by its key and makes no assumption as to the content of the dictionary being sorted. It uses an OrderedDict but if you can be sure it will always run on Python 3.6 or greater, a simple change can be made to use a dict.
from collections import OrderedDict
d = {
'Benefits': {
1: {
'BEN1': {
'D': [{'description': 'D1'}],
'C': [{'description': 'C1'}]
}
},
2: {
'BEN2': {
'D': [{'description': 'D2'}],
'C': [{'description': 'C2'}]
}
}
}
}
def sort_dict(d):
items = [[k, v] for k, v in sorted(d.items(), key=lambda x: x[0])]
for item in items:
if isinstance(item[1], dict):
item[1] = sort_dict(item[1])
return OrderedDict(items)
#return dict(items)
print(sort_dict(d))
See demo
d1 = collections.OrderedDict(sorted(d.items()))
This is not working because it is sorting only on the Benefits item. Here you want to sort inner items, so we have to reach the inner items and sort them.
d1 = {'Benefits': {}}
for a_benefit in d['Benefits']:
d1['Benefits'][a_benefit] = {}
for a_ben in d['Benefits'][a_benefit]:
d1['Benefits'][a_benefit][a_ben] = dict(collections.OrderedDict(sorted(d['Benefits'][a_benefit][a_ben].items())))
I am starting with a dict received from an api
start_dict = {
"a": 795,
"b": 1337,
"c": [
{
"d1": 2,
"d2": [
{
"e1": 4
}
]
}
]
}
I need to create a separate dict from that dict. That has each of the keys and value separated by their key and value into there own dict. While keeping the nested dicts intact.
values =
{
"fields": [
{
"element_name": "a",
"value": 795
},
{
"element_name": "b",
"value": 1337
},
{
"element_name": "c",
"value": [
{
"element_name": "d1",
"value": 2
},
{
"element_name": "d2",
"value" : [
{
"element_name": "e1",
"value": 4
}
]
]
}
]
}
The actual dict is quite a bit larger but there are no more then one two deep nested dicts in the original but many single nested dicts. This is the only way the api will accept new data so I am kinda stuck until I figure it out. Any help is greatly appreciated as I am quite new to Python (3 Weeks) lol so if this is something simple please don't be to harsh.
You can build the output with a recursive function:
def transform(ob):
if isinstance(ob, list):
return [transform(v) for v in ob]
elif not isinstance(ob, dict):
return ob
return [{'element_name': k, 'value': transform(v)}
for k, v in ob.items()]
values = {'fields': transform(start_dict)}
so each key, value pair is transformed to a {'element_name': key, 'value': value} dictionary in a list, where any value that is itself a list or dictionary is transformed by a recursive call.
Demo:
>>> from pprint import pprint
>>> def transform(ob):
... if isinstance(ob, list):
... return [transform(v) for v in ob]
... elif not isinstance(ob, dict):
... return ob
... return [{'element_name': k, 'value': transform(v)}
... for k, v in ob.items()]
...
>>> start_dict = {
... "a": 795,
... "b": 1337,
... "c": [
... {
... "d1": 2,
... "d2": [
... {
... "e1": 4
... }
... ]
... }
... ]
... }
>>> pprint({'fields': transform(start_dict)})
{'fields': [{'element_name': 'a', 'value': 795},
{'element_name': 'c',
'value': [[{'element_name': 'd1', 'value': 2},
{'element_name': 'd2',
'value': [[{'element_name': 'e1', 'value': 4}]]}]]},
{'element_name': 'b', 'value': 1337}]}
I know similar questions have already been asked before, but I really having problems implementing them for my special case:
Let's say I have a dictionary with varying depths, for example:
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0}
'd':{'offset':70, 'start':0}
}
'e':{
'f':{'offset':80, 'start':0}
'g':{'offset':30, 'start':0}
'h':{'offset':20, 'start':0}
}
}
}
etc... (with a lot more different levels and entries)
so now I want a copy of that dictionary with basically the same structure and keys, but if 'offset' (at any level) is greater than let's say 50 'offset' should be changed to 0
I guess some kind of iterative function would be the best, but I cannot get my head around that...
You might use the standard machinery for the copy and then modify the copied dictionary (solution #1 in my example), or you might do copying and modification in the same function (solution #2).
In either case, you're looking for a recursive function.
import copy
from pprint import pprint
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0},
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0},
}
}
}
# Solution 1, two passes
def modify(d):
if isinstance(d, dict):
if d.get('offset', 0) > 50:
d['offset'] = 0
for k,v in d.items():
modify(v)
dictj = copy.deepcopy(dicti)
modify(dictj)
pprint(dictj)
# Solution 2, copy and modify in one pass
def copy_and_modify(d):
if isinstance(d, dict):
d2 = {k:copy_and_modify(v) for k,v in d.items()}
if d2.get('offset') > 50:
d2['offset'] = 0
return d2
return d
dictj = copy_and_modify(dicti)
pprint(dictj)
A recursive solution is going to be more intuitive. You want something like the following pseudocode:
def copy(dict):
new_dict = {}
for key, value in dict:
if value is a dictionary:
new_dict[key] = copy(value)
else if key == 'offset' and value > 50:
new_dict[key] = 0
else:
new_dict[key] = value
return new_dict
d = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0}
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0}
}
}
}
def transform(item):
new_item = item.copy() # consider usage of deepcopy if needed
if new_item['offset'] == 80:
new_item['offset'] = 'CHANGED'
return new_item
def visit(item):
if item.get('offset'):
return transform(item)
else:
return {k: visit(v) for k, v in item.items()}
result = visit(d)
print(result)
Output:
{
'files': {
'b': {
'd': {
'offset': 70,
'start': 0
},
'c': {
'offset': 50,
'start': 0
}
},
'e': {
'g': {
'offset': 30,
'start': 0
},
'h': {
'offset': 20,
'start': 0
},
'f': {
'offset': 'CHANGED',
'start': 0
}
},
'a': {
'offset': 100,
'start': 0
}
}
}
You can revise some links regarding stuff which is used in the answer:
Recursion
Visitor pattern
You could call a recursive function to change its value once condition is met:
dicti = {'files':
{'a':{'offset':100, 'start': 0},
'b':{
'c':{'offset':50, 'start':0},
'd':{'offset':70, 'start':0}
},
'e':{
'f':{'offset':80, 'start':0},
'g':{'offset':30, 'start':0},
'h':{'offset':20, 'start':0}
}
}
}
def dictLoop(dt):
for k, v in dt.items():
if isinstance(v, int):
if k == 'offset' and v > 50:
dt[k] = 0
else: dictLoop(v)
return dt
print dictLoop(dicti)
I have a JSON data with structure like this:
{ "a":"1",
"b":[{ "a":"4",
"b":[{}],
"c":"6"}]
"c":"3"
}
Here the key a is always unique even if nested.
I want to separate my JSON data so that it should look like this:
{"a":"1"
"b":[]
"c":"3"
},
{"a":"4",
"b":[],
"c":"6"
}
JSON data can be nested up to many times.
How to do that?
I'd use an input and output stack:
x = {
"a":1,
"b":[
{
"a":2,
"b":[ { "a":3, }, { "a":4, } ]
}
]
}
input_stack = [x]
output_stack = []
while input_stack:
# for the first element in the input stack
front = input_stack.pop(0)
b = front.get('b')
# put all nested elements onto the input stack:
if b:
input_stack.extend(b)
# then put the element onto the output stack:
output_stack.append(front)
output_stack ==
[{'a': 1, 'b': [{'a': 2, 'b': [{'a': 3}, {'a': 4}]}]},
{'a': 2, 'b': [{'a': 3}, {'a': 4}]},
{'a': 3},
{'a': 4}]
output_stack can be a dict of cause. Then replace
output_stack.append(front)
with
output_dict[front['a']] = front
Not sure about a Python implementation, but in JavaScript this could be done using recursion:
function flatten(objIn) {
var out = [];
function unwrap(obj) {
var arrayItem = {};
for(var idx in obj) {
if(!obj.hasOwnProperty(idx)) {continue;}
if(typeof obj[idx] === 'object') {
if(isNaN(parseInt(idx)) === true) {
arrayItem[idx] = [];
}
unwrap(obj[idx]);
continue;
}
arrayItem[idx] = obj[idx];
}
if(JSON.stringify(arrayItem) !== '{}') {
out.unshift(arrayItem);
}
}
unwrap(objIn);
return out;
}
This will only work as expected if the object key names are not numbers.
See JSFiddle.
I'm using the User object from the Google App Engine environment, and just tried the following:
pprint(user)
print vars(user)
The results:
pprint(user)
users.User(email='test#example.com',_user_id='18580000000000')
print vars(user)
{'_User__federated_identity': None, '_User__auth_domain': 'gmail.com',
'_User__email': 'test#example.com', '_User__user_id': '1858000000000',
'_User__federated_provider': None}
Several issues here (sorry for the multipart):
How come I'm not seeing all the variables in my object. It's not showing auth_domain, which has a value?
Is there a way to have it list properties that are = None? None is a legitimate value, why does it treat those properties like they don't exist?
Is there a way to get pprint to line-break between properties?
pprint is printing the repr of the instance, while vars simply returns the instance's __dict__, whose repr is then printed. Here's an example:
>>> class Foo(object):
... def __init__(self, a, b):
... self.a = a
... self.b = b
... def __repr__(self):
... return 'Foo(a=%s)' % self.a
...
>>> f = Foo(a=1, b=2)
>>> vars(f)
{'a': 1, 'b': 2}
>>> pprint.pprint(f)
Foo(a=1)
>>> vars(f) is f.__dict__
True
You see that the special method __repr__ here (called by pprint(), the print statement, repr(), and others) explicitly only includes the a member, while the instance's __dict__ contains both a and b, and is reflected by the dictionary returned by vars().
There are a couple ways to get different line breaks in an object print-dump of this kind.
Sample data:
d = dict(a=1, b=2, c=dict(d=3, e=[4, 5, 6], f=dict(g=7)), h=[8,9,10])
Standard print with no friendly spacing:
>>> print d
{'a': 1, 'h': [8, 9, 10], 'c': {'e': [4, 5, 6], 'd': 3, 'f': {'g': 7}}, 'b': 2}
Two possible solutions:
(1) Using pprint with width=1 gives you one leaf node per line, but possibly >1 keys per line:
>>> import pprint
>>> pprint.pprint(d, width=1)
{'a': 1,
'b': 2,
'c': {'d': 3,
'e': [4,
5,
6],
'f': {'g': 7}},
'h': [8,
9,
10]}
(2) Using json.dumps gives you max one key per line, but some lines with just a closing bracket:
>>> import json
>>> print json.dumps(d, indent=4)
{
"a": 1,
"h": [
8,
9,
10
],
"c": {
"e": [
4,
5,
6
],
"d": 3,
"f": {
"g": 7
}
},
"b": 2
}
In reference to question 3, "Is there a way to get pprint to line-break between properties?":
The Python Docs make this description:
The formatted representation keeps objects on a single line if it can, and breaks them onto multiple lines if they don’t fit within the allowed width.
The property "width" (passable in init) is where you specify what is allowable. I set mine to width=1, and that seems to do the trick.
As an example:
pretty = pprint.PrettyPrinter(indent=2)
results in...
{ 'acbdf': { 'abdf': { 'c': { }}, 'cbdf': { 'bdf': { 'c': { }}, 'cbd': { }}},
'cef': { 'abd': { }}}
whereas
pretty = pprint.PrettyPrinter(indent=2,width=1)
results in...
{ 'acbdf': { 'abdf': { 'c': { }},
'cbdf': { 'bdf': { 'c': { }},
'cbd': { }}},
'cef': { 'abd': { }}}
Hope that helps.