Bulk insert mapped array using pymongo fails due to BulkWriteError

Bulk insert mapped array using pymongo fails due to BulkWriteError - python

I am trying to bulk insert documents in MongoDB using python library pymongo.
import pymongo
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([p for i in range(10)])
tryManyInsert()
But I keep failing due to BulkWriteError.
Traceback (most recent call last):
File "/prog_path/testMongoCon.py", line 9, in <module>
tryManyInsert();
File "/prog_path/testMongoCon.py", line 7, in tryManyInsert
mongoColl.insert_many([p for i in range(10)])
File "/myenv_path/lib/python3.6/site-packages/pymongo/collection.py", line 724, in insert_many
blk.execute(self.write_concern.document)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 331, in execute_command
raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred
I am trying to insert only 10 docs sequentially without _id so conditions in this answer / discussion doesn't apply here. Similar question has no answer.
I have tried pymongo 3.4 and pymongo 3.5.1, both give the same error. I am on python3.6, mongodb 3.2.10.
What am I doing wrong here?

Python is still referring to p as being the same thing for each array member. You want a copy() of p for each array member:
import pymongo
from copy import copy
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([copy(p) for i in range(10)])
tryManyInsert()
Or even simply:
mongoColl.insert_many([{ 'x': 1, 'y': True, 'z': None } for i in range(10)])
Unless you do that the _id only gets assigned once and you are simply repeating "the same document" with the same _id in the argument to insert_many(). Hence the error for a duplicate key.
As a quick demonstration:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[p for i in range(2)])
print docs
Gives you:
[
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')},
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}
]
Or more succinctly:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([p for i in range(3)])])
print docs
Gives:
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
Which clearly demonstrates the index value passed overwriting the same value which was passed in.
But using copy() to take a copy of the value:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[copy(p) for i in range(2)])
print docs
Gives you:
[
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc00')},
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc01')}
]
Or our base demonstration:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([copy(p) for i in range(3)])])
print docs
Returns:
[{'a': 1, 'b': 0}, {'a': 1, 'b': 1}, {'a': 1, 'b': 2}]
So this is basically how python works. If you don't actually deliberately assign to a new value, then all you are doing is returning the same referenced value and simply updating each referenced value in the loop, rather than producing a "new one".

Related

Is there a method to updating value with unknown sub key in Python 3 Dictionary

I'm working with dictionary structure of Python 3 and I want to some change in a dictionary with desired and changeable key.
For instance, let's create a dictionary as follow:
myDict = {
'foo': {
'a':12,
'b':14
},
'bar': {
'c':12,
'b':14
},
'moo': {
'a':12,
'd':14
},
}
In this point, a key that is unknown will have and will used to find desired data path.
So, if received key is "myDict.foo.a", I must change value of "a" variable of foo, or if the key is "myDict.moo.a", I will change value of "a" variable of moo. As in this example, the key to use is unknown and I have a value that will put to the identified key (data path).
Under these conditions, how to change dictionary value with unknown key (data path).
To better explain, I described a dysfunctional code snippet about the solve of this question:
dictionary = init_dic() # initialization step for dictionary
desired_value = 1 # a variable to use for change operation in dictionary
received_key = get_key() # receive unknown key group (exp: myDict.foo.a)
dictionary[received_key] = desire_value # The question of this topic
Thank you for reading, have a good day!

For the question I asked, I found a solution like below using "exec".
myDict = {}
myDict["myDict"] = { 'foo': { 'a':12, 'b':14 }, 'bar': { 'c':12, 'b':14 }, 'moo': { 'a':12, 'd':14 }, }
print(myDict)
{'myDict': {'foo': {'a': 12, 'b': 14}, 'bar': {'c': 12, 'b': 14},
'moo': {'a': 12, 'd': 14}}}
data_path = "myDict.foo.a"
desired_value = 1
exec_string = "myDict"
for path in data_path.split("."):
exec_string += "[\"{}\"]".format(path)
exec_string += " = {}".format(desired_value)
exec(exec_string)
print(myDict)
{'myDict': {'foo': {'a': 1, 'b': 14}, 'bar': {'c': 12, 'b': 14},
'moo': {'a': 12, 'd': 14}}}
This perspective is required some control steps but, for me, enough for now. For instance, incoming sub data path can be unwanted command such as open("file.txt","w"). Therefore a control mechanism is required.
If you have different suggestion as solution, could you write here?

Lift up all occurrences of a type in a nested dictionary to a top level key

I have a need in a project to find all of a given type in a nested dictionary and move them all to a top level key in the same dictionary.
So far I have the below code, which seems to work. In the example I'm looking for all the items that are integers and moving them to a 'numbers' key.
I'd prefer it if the lift_numbers_to_top function made and returned a copy of the dictionary rather than editing it in place, but I haven't been able to work out a nice way to pass the copy and the numbers back from the recursive function to itself, if that makes sense.
a_dictionary = {
"one": 1,
"two": 2,
"text": "Hello",
"more_text": "Hi",
"internal_dictionary": {
"three": 3,
"two": 2,
"even_more_text": "Hey",
"another_internal_dictionary": {
"four": 4,
"five": 5,
"last_text": "howdy"
}
}
}
def extract_integers(dictionary, level_key=None):
numbers = {}
for key in dictionary:
if type(dictionary[key]) == int:
numbers[level_key + "__" + key if level_key else key] = dictionary[key]
return numbers
def lift_numbers_to_top(dictionary, level_key=None):
numbers = {}
if type(dictionary) == dict:
numbers = extract_integers(dictionary, level_key)
for key in numbers:
keyNumber = key.split('__')[-1]
del dictionary[keyNumber]
for key in dictionary:
numbers = {**numbers, **lift_numbers_to_top(dictionary[key], key)}
return numbers
a_dictionary['numbers'] = lift_numbers_to_top(a_dictionary)
print(a_dictionary)
Result:
{
'text': 'Hello',
'more_text': 'Hi',
'internal_dictionary': {
'even_more_text': 'Hey',
'another_internal_dictionary': {
'last_text': 'howdy'
},
},
'numbers': {
'one': 1,
'two': 2,
'internal_dictionary__two': 2,
'internal_dictionary__three': 3,
'another_internal_dictionary__four': 4,
'another_internal_dictionary__five': 5,
}
}

Use a match function to determine what to lift, and pass along the target object where you move key-value pairs to to recursive calls. If that target is missing, you know the current call is for the top-level. The match function should return the new key for the new dictionary.
To produce a new dictionary, just produce a new dictionary and put recursion results into that object.
I prefer to use #singledispatch() to handle different types when recursing:
from functools import singledispatch
#singledispatch
def lift_values(obj, match, targetname=None, **kwargs):
"""Lift key-value pairs from a nested structure to the top
For key-value pairs anywhere in the nested structure, if
match(path, value) returns a value other than `None`, the
key-value pair is moved to the top-level dictionary when targetname
is None, or to a new dictionary stored under targetname is not None,
using the return value of the match function as the key. path
is the tuple of all keys and indices leading to the value.
For example, for an input
{'foo': True, 'bar': [{'spam': False, 'ham': 42}]}
and the match function lambda p, v: p if isinstance(v, bool) else None
and targetname "flags", this function returns
{'flags': {('foo',): True, ('bar', 0, 'spam'): False}, 'bar': [{'ham': 42}]}
"""
# leaf nodes, no match testing needed, no moving of values
return obj
#lift_values.register(list)
def _handle_list(obj, match, _path=(), **kwargs):
# list values, no lifting, just passing on the recursive call
return [lift_values(v, match, _path=_path + (i,), **kwargs)
for i, v in enumerate(obj)]
#lift_values.register(dict)
def _handle_list(obj, match, targetname=None, _path=(), _target=None):
result = {}
if _target is None:
# this is the top-level object, key-value pairs are lifted to
# a new dictionary stored at this level:
if targetname is not None:
_target = result[targetname] = {}
else:
# no target name? Lift key-value pairs into the top-level
# object rather than a separate sub-object.
_target = result
for key, value in obj.items():
new_path = _path + (key,)
new_key = match(new_path, value)
if new_key is not None:
_target[new_key] = value
else:
result[key] = lift_values(
value, match, _path=new_path, _target=_target)
return result
I included a dispatch function for lists; your sample doesn't use lists, but these are common in JSON data structures so I anticipate you probably want it anyway.
The match function must accept two arguments, the path to the object this key-value pair was found in, and the value. It should return a new key to use or None if not to lift the value.
For your case, the match function would be:
def lift_integers(path, value):
if isinstance(value, int):
return '__'.join(path[-2:])
result = lift_values(a_dictionary, lift_integers, 'numbers')
Demo on your sample input dictionary:
>>> from pprint import pprint
>>> def lift_integers(path, value):
... if isinstance(value, int):
... return '__'.join(path[-2:])
...
>>> lift_values(a_dictionary, lift_integers, 'numbers')
{'numbers': {'one': 1, 'two': 2, 'internal_dictionary__three': 3, 'internal_dictionary__two': 2, 'another_internal_dictionary__four': 4, 'another_internal_dictionary__five': 5}, 'text': 'Hello', 'more_text': 'Hi', 'internal_dictionary': {'even_more_text': 'Hey', 'another_internal_dictionary': {'last_text': 'howdy'}}}
>>> pprint(_)
{'internal_dictionary': {'another_internal_dictionary': {'last_text': 'howdy'},
'even_more_text': 'Hey'},
'more_text': 'Hi',
'numbers': {'another_internal_dictionary__five': 5,
'another_internal_dictionary__four': 4,
'internal_dictionary__three': 3,
'internal_dictionary__two': 2,
'one': 1,
'two': 2},
'text': 'Hello'}
Personally, I'd use the full path as the key in the lifted dictionary to avoid name clashes; either by joining the full path into a new string key with some unique delimiter, or just by making the path tuple itself the new key:
>>> lift_values(a_dictionary, lambda p, v: p if isinstance(v, int) else None, 'numbers')
{'numbers': {('one',): 1, ('two',): 2, ('internal_dictionary', 'three'): 3, ('internal_dictionary', 'two'): 2, ('internal_dictionary', 'another_internal_dictionary', 'four'): 4, ('internal_dictionary', 'another_internal_dictionary', 'five'): 5}, 'text': 'Hello', 'more_text': 'Hi', 'internal_dictionary': {'even_more_text': 'Hey', 'another_internal_dictionary': {'last_text': 'howdy'}}}
>>> pprint(_)
{'internal_dictionary': {'another_internal_dictionary': {'last_text': 'howdy'},
'even_more_text': 'Hey'},
'more_text': 'Hi',
'numbers': {('internal_dictionary', 'another_internal_dictionary', 'five'): 5,
('internal_dictionary', 'another_internal_dictionary', 'four'): 4,
('internal_dictionary', 'three'): 3,
('internal_dictionary', 'two'): 2,
('one',): 1,
('two',): 2},
'text': 'Hello'}

You can use walk through the dict recursively and pop all elements with values as an int to create a new dict
>>> def extract(d):
... new_d = {}
... for k,v in d.items():
... if type(v) == int:
... new_d[k] = d[k]
... elif type(v) == dict:
... for k2,v2 in extract(v).items():
... new_d[k2 if '__' in k2 else k+'__'+k2] = v2
... return new_d
...
>>> a_dictionary['numbers'] = extract(a_dictionary)
>>> pprint(a_dictionary)
{'internal_dictionary': {'another_internal_dictionary': {'last_text': 'howdy'},
'even_more_text': 'Hey'},
'more_text': 'Hi',
'numbers': {'another_internal_dictionary__five': 5,
'another_internal_dictionary__four': 4,
'internal_dictionary__three': 3,
'internal_dictionary__two': 2,
'one': 1,
'two': 2},
'text': 'Hello'}

Python: Merging two arbitrary data structures

I am looking to efficiently merge two (fairly arbitrary) data structures: one representing a set of defaults values and one representing overrides. Example data below. (Naively iterating over the structures works, but is very slow.) Thoughts on the best approach for handling this case?
_DEFAULT = { 'A': 1122, 'B': 1133, 'C': [ 9988, { 'E': [ { 'F': 6666, }, ], }, ], }
_OVERRIDE1 = { 'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }
_ANSWER1 = { 'A': 1122, 'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }
_OVERRIDE2 = { 'C': [ 6543, { 'E': [ { 'G': 9876, }, ], }, ], }
_ANSWER2 = { 'A': 1122, 'B': 1133, 'C': [ 6543, { 'E': [ { 'F': 6666, 'G': 9876, }, ], }, ], }
_OVERRIDE3 = { 'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }
_ANSWER3 = { 'A': 1122, 'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }
This is an example of how to run the tests:
(The dictionary update doesn't work, just an stub function.)
import itertools
def mergeStuff( default, override ):
# This doesn't work
result = dict( default )
result.update( override )
return result
def main():
for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
result = mergeStuff(_DEFAULT, override)
print('ANSWER: %s' % (answer) )
print('RESULT: %s\n' % (result) )

You cannot do that by "iterating", you'll need a recursive routine like this:
def merge(a, b):
if isinstance(a, dict) and isinstance(b, dict):
d = dict(a)
d.update({k: merge(a.get(k, None), b[k]) for k in b})
return d
if isinstance(a, list) and isinstance(b, list):
return [merge(x, y) for x, y in itertools.izip_longest(a, b)]
return a if b is None else b

If you want your code to be fast, don't copy like crazy
You don't really need to merge two dicts. You can just chain them.
A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple update() calls.
class ChainMap(UserDict.DictMixin):
"""Combine multiple mappings for sequential lookup"""
def __init__(self, *maps):
self._maps = maps
def __getitem__(self, key):
for mapping in self._maps:
try:
return mapping[key]
except KeyError:
pass
raise KeyError(key)
def main():
for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
result = ChainMap(override, _DEFAULT)
http://docs.python.org/dev/library/collections#chainmap-objects
http://code.activestate.com/recipes/305268/

If you know one structure is always a subset of the other, then just iterate the superset and in O(n) time you can check element by element whether it exists in the subset and if it doesn't, put it there. As far as I know there's no magical way of doing this other than checking it manually element by element. Which, as I said, is not bad as it can be done in with O(n) complexity.

dict.update() is what you need. But it overrides the original dict, so make a copy of the original one if you want to keep it.

Python pprint issues

I'm using the User object from the Google App Engine environment, and just tried the following:
pprint(user)
print vars(user)
The results:
pprint(user)
users.User(email='test#example.com',_user_id='18580000000000')
print vars(user)
{'_User__federated_identity': None, '_User__auth_domain': 'gmail.com',
'_User__email': 'test#example.com', '_User__user_id': '1858000000000',
'_User__federated_provider': None}
Several issues here (sorry for the multipart):
How come I'm not seeing all the variables in my object. It's not showing auth_domain, which has a value?
Is there a way to have it list properties that are = None? None is a legitimate value, why does it treat those properties like they don't exist?
Is there a way to get pprint to line-break between properties?

pprint is printing the repr of the instance, while vars simply returns the instance's __dict__, whose repr is then printed. Here's an example:
>>> class Foo(object):
... def __init__(self, a, b):
... self.a = a
... self.b = b
... def __repr__(self):
... return 'Foo(a=%s)' % self.a
...
>>> f = Foo(a=1, b=2)
>>> vars(f)
{'a': 1, 'b': 2}
>>> pprint.pprint(f)
Foo(a=1)
>>> vars(f) is f.__dict__
True
You see that the special method __repr__ here (called by pprint(), the print statement, repr(), and others) explicitly only includes the a member, while the instance's __dict__ contains both a and b, and is reflected by the dictionary returned by vars().

There are a couple ways to get different line breaks in an object print-dump of this kind.
Sample data:
d = dict(a=1, b=2, c=dict(d=3, e=[4, 5, 6], f=dict(g=7)), h=[8,9,10])
Standard print with no friendly spacing:
>>> print d
{'a': 1, 'h': [8, 9, 10], 'c': {'e': [4, 5, 6], 'd': 3, 'f': {'g': 7}}, 'b': 2}
Two possible solutions:
(1) Using pprint with width=1 gives you one leaf node per line, but possibly >1 keys per line:
>>> import pprint
>>> pprint.pprint(d, width=1)
{'a': 1,
'b': 2,
'c': {'d': 3,
'e': [4,
5,
6],
'f': {'g': 7}},
'h': [8,
9,
10]}
(2) Using json.dumps gives you max one key per line, but some lines with just a closing bracket:
>>> import json
>>> print json.dumps(d, indent=4)
{
"a": 1,
"h": [
8,
9,
10
],
"c": {
"e": [
4,
5,
6
],
"d": 3,
"f": {
"g": 7
}
},
"b": 2
}

In reference to question 3, "Is there a way to get pprint to line-break between properties?":
The Python Docs make this description:
The formatted representation keeps objects on a single line if it can, and breaks them onto multiple lines if they don’t fit within the allowed width.
The property "width" (passable in init) is where you specify what is allowable. I set mine to width=1, and that seems to do the trick.
As an example:
pretty = pprint.PrettyPrinter(indent=2)
results in...
{ 'acbdf': { 'abdf': { 'c': { }}, 'cbdf': { 'bdf': { 'c': { }}, 'cbd': { }}},
'cef': { 'abd': { }}}
whereas
pretty = pprint.PrettyPrinter(indent=2,width=1)
results in...
{ 'acbdf': { 'abdf': { 'c': { }},
'cbdf': { 'bdf': { 'c': { }},
'cbd': { }}},
'cef': { 'abd': { }}}
Hope that helps.

Python - Unflatten dict

I have this multi-dimensional dict:
a = {'a' : 'b', 'c' : {'d' : 'e'}}
And written simple function to flatten that dict:
def __flatten(self, dictionary, level = []):
tmp_dict = {}
for key, val in dictionary.items():
if type(val) == dict:
tmp_dict.update(self.__flatten(val, level + [key]))
else:
tmp_dict['.'.join(level + [key])] = val
return tmp_dict
After call this function with dict a i get in result:
{'a' : 'b', 'c.d' : 'e'}
Now, after making few instructions on this flattened dict i need to build new, multi-dimensional dict from that flattened. Example:
>> unflatten({'a' : 0, 'c.d' : 1))
{'a' : 0, 'c' : {'d' : 1}}
The only problem I have is that i do not have a function unflatten :)
Can anyone help with this? I have no idea how to do it.
EDIT:
Another example:
{'a' : 'b', 'c.d.e.f.g.h.i.j.k.l.m.n.o.p.r.s.t.u.w' : 'z'}
Should be after unflatten:
{'a': 'b', 'c': {'d': {'e': {'f': {'g': {'h': {'i': {'j': {'k': {'l': {'m': {'n': {'o': {'p': {'r': {'s': {'t': {'u': {'w': 'z'}}}}}}}}}}}}}}}}}}}
And another:
{'a' : 'b', 'c.d' : 'z', 'c.e' : 1}
To:
{'a' : 'b', 'c' : {'d' : 'z', 'e' : 1}}
This greatly increases the difficulty of the task, i know. Thats why i had problem with this and found no solution in hours..

def unflatten(dictionary):
resultDict = dict()
for key, value in dictionary.items():
parts = key.split(".")
d = resultDict
for part in parts[:-1]:
if part not in d:
d[part] = dict()
d = d[part]
d[parts[-1]] = value
return resultDict

from collections import defaultdict
def unflatten(d):
ret = defaultdict(dict)
for k,v in d.items():
k1,delim,k2 = k.partition('.')
if delim:
ret[k1].update({k2:v})
else:
ret[k1] = v
return ret

Here's one utilizing Python 3.5+ features, like typing and destructuring assignments. Try the tests out on repl.it.
from typing import Any, Dict
def unflatten(
d: Dict[str, Any],
base: Dict[str, Any] = None,
) -> Dict[str, Any]:
"""Convert any keys containing dotted paths to nested dicts
>>> unflatten({'a': 12, 'b': 13, 'c': 14}) # no expansion
{'a': 12, 'b': 13, 'c': 14}
>>> unflatten({'a.b.c': 12}) # dotted path expansion
{'a': {'b': {'c': 12}}}
>>> unflatten({'a.b.c': 12, 'a': {'b.d': 13}}) # merging
{'a': {'b': {'c': 12, 'd': 13}}}
>>> unflatten({'a.b': 12, 'a': {'b': 13}}) # insertion-order overwrites
{'a': {'b': 13}}
>>> unflatten({'a': {}}) # insertion-order overwrites
{'a': {}}
"""
if base is None:
base = {}
for key, value in d.items():
root = base
###
# If a dotted path is encountered, create nested dicts for all but
# the last level, then change root to that last level, and key to
# the final key in the path.
#
# This allows one final setitem at the bottom of the loop.
#
if '.' in key:
*parts, key = key.split('.')
for part in parts:
root.setdefault(part, {})
root = root[part]
if isinstance(value, dict):
value = unflatten(value, root.get(key, {}))
root[key] = value
return base

I wrote one years ago in Python 2 and 3 which I've adapted below. It was for making it easier to check if a given dictionary is a subset of a larger dictionary irrespective of whether provided in flattened or scaffolded form.
A bonus feature: Should there be consecutive integer indexes (as in 0, 1, 2, 3, 4 etc.), this will also convert them back into lists as well.
def unflatten_dictionary(field_dict):
field_dict = dict(field_dict)
new_field_dict = dict()
field_keys = list(field_dict)
field_keys.sort()
for each_key in field_keys:
field_value = field_dict[each_key]
processed_key = str(each_key)
current_key = None
current_subkey = None
for i in range(len(processed_key)):
if processed_key[i] == "[":
current_key = processed_key[:i]
start_subscript_index = i + 1
end_subscript_index = processed_key.index("]")
current_subkey = int(processed_key[start_subscript_index : end_subscript_index])
# reserve the remainder descendant keys to be processed later in a recursive call
if len(processed_key[end_subscript_index:]) > 1:
current_subkey = "{}.{}".format(current_subkey, processed_key[end_subscript_index + 2:])
break
# next child key is a dictionary
elif processed_key[i] == ".":
split_work = processed_key.split(".", 1)
if len(split_work) > 1:
current_key, current_subkey = split_work
else:
current_key = split_work[0]
break
if current_subkey is not None:
if current_key.isdigit():
current_key = int(current_key)
if current_key not in new_field_dict:
new_field_dict[current_key] = dict()
new_field_dict[current_key][current_subkey] = field_value
else:
new_field_dict[each_key] = field_value
# Recursively unflatten each dictionary on each depth before returning back to the caller.
all_digits = True
highest_digit = -1
for each_key, each_item in new_field_dict.items():
if isinstance(each_item, dict):
new_field_dict[each_key] = unflatten_dictionary(each_item)
# validate the keys can safely converted to a sequential list.
all_digits &= str(each_key).isdigit()
if all_digits:
next_digit = int(each_key)
if next_digit > highest_digit:
highest_digit = next_digit
# If all digits and can be sequential order, convert to list.
if all_digits and highest_digit == (len(new_field_dict) - 1):
digit_keys = list(new_field_dict)
digit_keys.sort()
new_list = []
for k in digit_keys:
i = int(k)
if len(new_list) <= i:
# Pre-populate missing list elements if the array index keys are out of order
# and the current element is ahead of the current length boundary.
while len(new_list) <= i:
new_list.append(None)
new_list[i] = new_field_dict[k]
new_field_dict = new_list
return new_field_dict
# Test
if __name__ == '__main__':
input_dict = {'a[0]': 1,
'a[1]': 10,
'a[2]': 5,
'b': 10,
'c.test.0': "hi",
'c.test.1': "bye",
"c.head.shoulders": "richard",
"c.head.knees": 'toes',
"z.trick.or[0]": "treat",
"z.trick.or[1]": "halloween",
"z.trick.and.then[0]": "he",
"z.trick.and.then[1]": "it",
"some[0].nested.field[0]": 42,
"some[0].nested.field[1]": 43,
"some[2].nested.field[0]": 44,
"mixed": {
"statement": "test",
"break[0]": True,
"break[1]": False,
}}
expected_dict = {'a': [1, 10, 5],
'b': 10,
'c': {
'test': ['hi', 'bye'],
'head': {
'shoulders': 'richard',
'knees' : 'toes'
}
},
'z': {
'trick': {
'or': ["treat", "halloween"],
'and': {
'then': ["he", "it"]
}
}
},
'some': {
0: {
'nested': {
'field': [42, 43]
}
},
2: {
'nested': {
'field': [44]
}
}
},
"mixed": {
"statement": "test",
"break": [True, False]
}}
# test
print("Input:")
print(input_dict)
print("====================================")
print("Output:")
actual_dict = unflatten_dictionary(input_dict)
print(actual_dict)
print("====================================")
print(f"Test passed? {expected_dict==actual_dict}")

As a rough-draft (could use a little improvement in variable name choice, and perhaps robustness, but it works for the example given):
def unflatten(d):
result = {}
for k,v in d.iteritems():
if '.' in k:
k1, k2 = k.split('.', 1)
v = {k2: v}
k = k1
result[k] = v
return result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bulk insert mapped array using pymongo fails due to BulkWriteError - python

Related

Is there a method to updating value with unknown sub key in Python 3 Dictionary

Lift up all occurrences of a type in a nested dictionary to a top level key

Python: Merging two arbitrary data structures

Python pprint issues

Python - Unflatten dict

Categories

Resources