Is there a library that would help me achieve the task to rearrange the levels of a nested dictionary
Eg: From this:
{1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
To this:
{"A":{1:"i",3:"iii"},"B":{1:"ii",2:"i"},"C":{1:"i",2:"ii"}}
ie first two levels on a 3 levelled dictionary swapped. So instead of 1 mapping to A and 3 mapping to A, we have A mapping to 1 and 3.
The solution should be practical for an arbitrary depth and move from one level to any other within.
>>> d = {1:{"A":"i","B":"ii","C":"i"},2:{"B":"i","C":"ii"},3:{"A":"iii"}}
>>> keys = ['A','B','C']
>>> e = {key:{k:d[k][key] for k in d if key in d[k]} for key in keys}
>>> e
{'C': {1: 'i', 2: 'ii'}, 'B': {1: 'ii', 2: 'i'}, 'A': {1: 'i', 3: 'iii'}}
thank god for dict comprehension
One way to think about this would be to consider your data as a (named) array and to take the transpose. An easy way to achieve this would be to use the data analysis package Pandas:
import pandas as pd
df = pd.DataFrame({1: {"A":"i","B":"ii","C":"i"},
2: {"B":"i","C":"ii"},
3: {"A":"iii"}})
df.transpose().to_dict()
{'A': {1: 'i', 2: nan, 3: 'iii'},
'B': {1: 'ii', 2: 'i', 3: nan},
'C': {1: 'i', 2: 'ii', 3: nan}}
I don't really care about performance for my application of this so I haven't bothered checking how efficient this is. Its based on bubblesort so my guess is ~O(N^2).
Maybe this is convoluted, but essentially below works by:
- providing dict_swap_index a nested dictionary and a list. the list should be of the format [i,j,k]. The length should be the depth of the dictionary. Each element corresponds to which position you'd like to move each element to. e.g. [2,0,1] would indicate move element 0 to position 2, element 1 to position 0 and element 2 to position 1.
- this function performs a bubble sort on the order list and dict_, calling deep_swap to swap the levels of the dictionary which are being swapped in the order list
- deep_swap recursively calls itself to find the level provided and returns a dictionary which has been re-ordered
- swap_two_level_dict is called to swap any two levels in a dictionary.
Essentially the idea is to perform a bubble sort on the dictionary, but instead of swapping elements in a list swap levels in a dictionary.
from collections import defaultdict
def dict_swap_index(dict_, order):
for pas_no in range(len(order)-1,0,-1):
for i in range(pas_no):
if order[i] > order[i+1]:
temp = order[i]
order[i] = order[i+1]
order[i+1] = temp
dict_ = deep_swap(dict_, i)
return dict_, order
def deep_swap(dict_, level):
dict_ = deepcopy(dict_)
if level==0:
dict_ = swap_two_level_dict(dict_)
else:
for key in dict_:
dict_[key] = deep_swap(dict_[key], level-1)
return dict_
def swap_two_level_dict(a):
b = defaultdict(dict)
for key1, value1 in a.items():
for key2, value2 in value1.items():
b[key2].update({key1: value2})
return b
e.g.
test_dict = {'a': {'c': {'e':0, 'f':1}, 'd': {'e':2,'f':3}}, 'b': {'c': {'g':4,'h':5}, 'd': {'j':6,'k':7}}}
result = dict_swap_index(test_dict, [2,0,1])
result
(defaultdict(dict,
{'c': defaultdict(dict,
{'e': {'a': 0},
'f': {'a': 1},
'g': {'b': 4},
'h': {'b': 5}}),
'd': defaultdict(dict,
{'e': {'a': 2},
'f': {'a': 3},
'j': {'b': 6},
'k': {'b': 7}})}),
[0, 1, 2])
Related
Is it possible to make a function that will return a nested dict depending on the arguments?
def foo(key):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
return d[key]
foo(['c']['d'])
I waiting for:
3
I'm getting:
TypeError: list indices must be integers or slices, not str
I understanding that it possible to return a whole dict, or hard code it to return a particular part of dict, like
if 'c' and 'd' in kwargs:
return d['c']['d']
elif 'c' and 'e' in kwargs:
return d['c']['e']
but it will be very inflexible
When you give ['c']['d'], you slice the list ['c'] using the letter d, which isin't possible. So what you can do is, correct the slicing:
foo('c')['d']
Or you could alter your function to slice it:
def foo(*args):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
d_old = dict(d) # if case you have to store the dict for other operations in the fucntion
for i in args:
d = d[i]
return d
>>> foo('c','d')
3
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
def funt(keys):
val = d
for key in keys:
if val:
val = val.get(key)
return val
funt(['c', 'd'])
Additionally to handle key not present state.
One possible solution would be to iterate over multiple keys -
def foo(keys, d=None):
if d is None:
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
if len(keys) == 1:
return d[keys[0]]
return foo(keys[1:], d[keys[0]])
foo(['c', 'd'])
I have a dictionary composed of {key: value}.
I select a set of keys from this dictionary.
I'd like to build a new dictionary with {keyA: set of all keys wich have the same value as keyA}.
I already have a solution: Is there a faster way to do it?
It seems very slow to me, and I imagine I'm not the only one in this case!
for key1 in selectedkeys:
if key1 not in seen:
seen.add(key1)
equal[key1] = set([key1])#egual to itself
for key2 in selectedkeys:
if key2 not in seen and dico[key1] == dico[key2]:
equal[key1].add(key2)
seen.update(equal[key1])
Try this
>>> a = {1:1, 2:1, 3:2, 4:2}
>>> ret_val = {}
>>> for k, v in a.iteritems():
... ret_val.setdefault(v, []).append(k)
...
>>> ret_val
{1: [1, 2], 2: [3, 4]}
def convert(d):
result = {}
for k, v in d.items(): # or d.iteritems() if using python 2
if v not in result:
result[v] = set()
result[v].add(k)
return result
or just use collections.defaultdict(set) if you are careful enough not to access any non key later :-)
So you want to create a dictionary that maps key to "the set of all keys which have the same value as key" for each selected key in a given source dictionary.
Thus, if the source dictionary is:
{'a': 1, 'b': 2, 'c': 1, 'd': 2, 'e': 3, 'f': 1, 'g': 3)
and the selected keys are a, b, and e, the result should be:
{'a': {'a', 'c', 'f'}, 'e': {'g', 'e'}, 'b': {'b', 'd'}}
One way to achieve this would be to use a defaultdict to build a value to key table, and then use that to build the required result from the specified keys:
from collections import defaultdict
def value_map(source, keys):
table = defaultdict(set)
for key, value in source.items():
table[value].add(key)
return {key: table[source[key]] for key in keys}
source = {'a': 1, 'b': 2, 'c': 1, 'd': 2, 'e': 3, 'f': 1, 'g': 3)
print(value_map(source, ['a', 'b', 'e']))
Output:
{'a': {'a', 'c', 'f'}, 'e': {'g', 'e'}, 'b': {'b', 'd'}}
Since you select a set of keys from the original dictionary. We can modify #Nilesh solution for your purpose.
a = {1:1, 2:1, 3:2, 4:2}
keys = [1, 3] # lets say this is the list of keys
ret_val = {}
for i in keys:
for k,v in a.items():
if a[i]==v:
ret_val.setdefault(i, []).append(k)
print (ret_val)
{1: [1, 2], 3: [3, 4]}
This was sort of stated in the comments by #Patrick Haugh:
d=your dictionary
s=set(d.values())
d2={i:[] for i in s}
for k in d:
d2[d[k]].append(k)
I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by leading whitespace. This desire is very much like that documented here. I've solved that but now have the problem of handling the case where repeating keys are overwritten instead of being cast into a list.
Essentially:
a:
b: c
d: e
a:
b: c2
d: e2
d: wrench
is cast into {"a":{"b":"c2","d":"wrench"}} when it should be cast into
{"a":[{"b":"c","d":"e"},{"b":"c2","d":["e2","wrench"]}]}
A self-contained example:
import json
def jsonify_indented_tree(tree):
#convert indentet text into json
parsedJson= {}
parentStack = [parsedJson]
for i, line in enumerate(tree):
data = get_key_value(line)
if data['key'] in parsedJson.keys(): #if parent key is repeated, then cast value as list entry
# stuff that doesn't work
# if isinstance(parsedJson[data['key']],list):
# parsedJson[data['key']].append(parsedJson[data['key']])
# else:
# parsedJson[data['key']]=[parsedJson[data['key']]]
print('Hey - Make a list now!')
if data['value']: #process child by adding it to its current parent
currentParent = parentStack[-1] #.getLastElement()
currentParent[data['key']] = data['value']
if i is not len(tree)-1:
#determine when to switch to next branch
level_dif = data['level']-get_key_value(tree[i+1])['level'] #peek next line level
if (level_dif > 0):
del parentStack[-level_dif:] #reached leaf, process next branch
else:
#group node, push it as the new parent and keep on processing.
currentParent = parentStack[-1] #.getLastElement()
currentParent[data['key']] = {}
newParent = currentParent[data['key']]
parentStack.append(newParent)
return parsedJson
def get_key_value(line):
key = line.split(":")[0].strip()
value = line.split(":")[1].strip()
level = len(line) - len(line.lstrip())
return {'key':key,'value':value,'level':level}
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
#nested_string=['a:', '\tb:\t\tc', '\td:\t\te', 'a:', '\tb:\t\tc2', '\td:\t\te2']
#nested_string=['w:','\tgeneral:\t\tcase','a:','\tb:\t\tc','\td:\t\te','a:','\tb:\t\tc2','\td:\t\te2']
nested_string=['a:',
'\tb:\t\tc',
'\td:\t\te',
'a:',
'\tb:\t\tc2',
'\td:\t\te2',
'\td:\t\twrench']
pp_json(jsonify_indented_tree(nested_string))
This approach is (logically) a lot more straightforward (though longer):
Track the level and key-value pair of each line in your multi-line string
Store this data in a level keyed dict of lists:
{level1:[dict1,dict2]}
Append only a string representing the key in a key-only line: {level1:[dict1,dict2,"nestKeyA"]}
Since a key-only line means the next line is one level deeper, process that on the next level: {level1:[dict1,dict2,"nestKeyA"],level2:[...]}. The contents of some deeper level level2 may itself be just another key-only line (and the next loop will add a new level level3 such that it will become {level1:[dict1,dict2,"nestKeyA"],level2:["nestKeyB"],level3:[...]}) or a new dict dict3 such that {level1:[dict1,dict2,"nestKeyA"],level2:[dict3]
Steps 1-4 continue until the current line is indented less than the previous one (signifying a return to some prior scope). This is what the data structure looks like on my example per line iteration.
0, {0: []}
1, {0: [{'k': 'sds'}]}
2, {0: [{'k': 'sds'}, 'a']}
3, {0: [{'k': 'sds'}, 'a'], 1: [{'b': 'c'}]}
4, {0: [{'k': 'sds'}, 'a'], 1: [{'b': 'c'}, {'d': 'e'}]}
5, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: []}
6, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}]}
7, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}, {'d': 'e2'}]}
Then two things need to happen. 1: the list of dict need to be inspected for containing duplicate keys and any of those duplicated dict's values combined in a list - this will be demonstrated in a moment. 2: as can be seen between iteration 4 and 5, the list of dicts from the deepest level (here 1) are combined into one dict... Finally, to demonstrate duplicate handling observe:
[7b, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, 'a'], 1: [{'b': 'c2'}, {'d': 'e2'}, {'d': 'wrench'}]}]
[7c, {0: [{'k': 'sds'}, {'a': {'d': 'e', 'b': 'c'}}, {'a': {'d': ['wrench', 'e2'], 'b': 'c2'}}], 1: []}]
where wrench and e2 are placed in a list that itself goes into a dict keyed by their original key.
Repeat Steps 1-5, hoisting deeper scoped dicts up and onto their parent keys until the current line's scope (level) is reached.
Handle termination condition to combine the list of dict on the zeroth level into a dict.
Here's the code:
import json
def get_kvl(line):
key = line.split(":")[0].strip()
value = line.split(":")[1].strip()
level = len(line) - len(line.lstrip())
return {'key':key,'value':value,'level':level}
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
def jsonify_indented_tree(tree): #convert shitty sgml header into json
level_map= {0:[]}
tree_length=len(tree)-1
for i, line in enumerate(tree):
data = get_kvl(line)
if data['level'] not in level_map.keys():
level_map[data['level']]=[] # initialize
prior_level=get_kvl(tree[i-1])['level']
level_dif = data['level']-prior_level # +: line is deeper, -: shallower, 0:same
if data['value']:
level_map[data['level']].append({data['key']:data['value']})
if not data['value'] or i==tree_length:
if i==tree_length: #end condition
level_dif = -len(list(level_map.keys()))
if level_dif < 0:
for level in reversed(range(prior_level+level_dif+1,prior_level+1)): # (end, start)
#check for duplicate keys in current deepest (child) sibling group,
# merge them into a list, put that list in a dict
key_freq={} #track repeated keys
for n, dictionary in enumerate(level_map[level]):
current_key=list(dictionary.keys())[0]
if current_key in list(key_freq.keys()):
key_freq[current_key][0]+=1
key_freq[current_key][1].append(n)
else:
key_freq[current_key]=[1,[n]]
for k,v in key_freq.items():
if v[0]>1: #key is repeated
duplicates_list=[]
for index in reversed(v[1]): #merge value of key-repeated dicts into list
duplicates_list.append(list(level_map[level].pop(index).values())[0])
level_map[level].append({k:duplicates_list}) #push that list into a dict on the same stack it came from
if i==tree_length and level==0: #end condition
#convert list-of-dict into dict
parsed_nest={k:v for d in level_map[level] for k,v in d.items()}
else:
#push current deepest (child) sibling group onto parent key
key=level_map[level-1].pop() #string
#convert child list-of-dict into dict
level_map[level-1].append({key:{k:v for d in level_map[level] for k,v in d.items()}})
level_map[level]=[] #reset deeper level
level_map[data['level']].append(data['key'])
return parsed_nest
nested_string=['k:\t\tsds', #need a starter key,value pair otherwise this won't work... fortunately I always have one
'a:',
'\tb:\t\tc',
'\td:\t\te',
'a:',
'\tb:\t\tc2',
'\td:\t\te2',
'\td:\t\twrench']
pp_json(jsonify_indented_tree(nested_string))
I want to write a code which takes the following inputs:
list (list of maps)
request_keys (list of strings)
operation (add,substract,multiply,concat)
The code would look at the list for the maps having the same value for all keys except the keys given in request_keys. Upon finding two maps for which the value in the search keys match, the code would do the operation (add,multiple,substract,concat) on the two maps and combine them into one map. This combination map would basically replace the other two maps.
i have written the following peice of code to do this. The code only does add operation. It can be extended to make the other operations
In [83]: list
Out[83]:
[{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
In [84]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def func(list,request_keys):
: new_list = []
: found_indexes = []
: for i in range(0,len(list)):
: new_item = list[i]
: if i in found_indexes:
: continue
: for j in range(0,len(list)):
: if i != j and {k: v for k,v in list[i].iteritems() if k not in request_keys} == {k: v for k,v in list[j].iteritems() if k not in request_keys}:
: found_indexes.append(j)
: for request_key in request_keys:
: new_item[request_key] += list[j][request_key]
: new_list.append(new_item)
: return new_list
:--
In [85]: func(list,['c'])
Out[85]: [{'a': 2, 'b': 3, 'c': 18}, {'a': 2, 'b': 4, 'c': 4}]
In [86]:
What i want to know is, is there a faster, more memory efficient, cleaner and a more pythonic way of doing the same?
Thank you
You manually generate all the combinations and then compare each of those combinations. This is pretty wasteful. Instead, I suggest grouping the dictionaries in another dictionary by their matching keys, then adding the "same" dictionaries. Also, you forgot the operator parameter.
import collections, operator, functools
def func(lst, request_keys, op=operator.add):
matching_dicts = collections.defaultdict(list)
for d in lst:
key = tuple(sorted(((k, d[k]) for k in d if k not in request_keys)))
matching_dicts[key].append(d)
for group in matching_dicts.values():
merged = dict(group[0])
merged.update({key: functools.reduce(op, (g[key] for g in group))
for key in request_keys})
yield merged
What this does: First, it creates a dictionary, mapping the key-value pairs that have to be equal for two dictionaries to match to all those dictionaries that have those key-value pairs. Then it iterates the dicts from those groups, using one of that group as a prototype and updating it with the sum (or product, or whatever, depending on the operator) of the all the dicts in that group for the required_keys.
Note that this returns a generator. If you want a list, just call it like list(func(...)), or accumulate the merged dicts in a list and return that list.
from itertools import groupby
from operator import itemgetter
def mergeDic(inputData, request_keys):
keys = inputData[0].keys()
comparedKeys = [item for item in keys if item not in request_keys]
grouper = itemgetter(*comparedKeys)
result = []
for key, grp in groupby(sorted(inputData, key = grouper), grouper):
temp_dict = dict(zip(comparedKeys, key))
for request_key in request_keys:
temp_dict[request_key] = sum(item[request_key] for item in grp)
result.append(temp_dict)
return result
inputData = [{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
from pprint import pprint
pprint(mergeDic(inputData,['c']))
I'm trying to associate keys with a unique identifier. That is, transform dict1 to dict2:
dict1={'A': {'A': 1},
'B': {'B': .5, 'C': .36, 'E': .14},
'C': {'A': .5, 'C': .5},
'D': {'G': 1},
'E': {'F': 1},
'F': {}
}
dict2={1: {1: 1},
2: {2: .5, 3: .36, 5: .14},
3: {1: .5, 3: .5},
4: {7: 1},
5: {6: 1},
6: {}
}
I came up with something recursively but my code isn't working too well for nested keys. Any suggestions on how to fix the code or approach this problem?
def transform(d, count = 1):
output={}
for k,v in d.iteritems():
k=count
count = count + 1
if isinstance(v,dict):
v=transform(v, count)
output[k]=v
return output
You are missing a few parts. First of all, you need to pass any conversion that you have already determined (e.g. A = 1) to your function when you call it recursively - otherwise you won't use the same replacement for the same key in the nested dictionaries. Also, you need some way to ensure that when you generate a new key, that key is used up and won't be used again. When you increment count in your function, this will only affect it within the current call to the function - any calls higher up the chain will keep using a lower count, and so keys will be used multiple times.
My attempt:
import itertools
def transform(d, key_generator=None, conversion=None):
if key_generator is None:
key_generator = itertools.count(start=1)
if conversion is None:
conversion = {}
output = {}
for k, v in d.iteritems():
if k in conversion:
k = conversion[k]
else:
next_key = next(key_generator)
conversion[k] = next_key
k = next_key
if isinstance(v, dict):
v = transform(v, key_generator, conversion)
output[k] = v
return output
Testing:
conversion = {}
transform(dict1, conversion=conversion)
print conversion
{1: {1: 1},
2: {1: 0.5, 2: 0.5},
3: {2: 0.36, 3: 0.5, 4: 0.14},
4: {5: 1},
5: {},
6: {7: 1}}
{'A': 1, 'C': 2, 'B': 3, 'E': 4, 'D': 6, 'G': 7, 'F': 5}
Because of the undetermined iteration order of the dictionaries (and because, even if you sort the initial dictionary, E will be handled before D), this conversion is not quite what you were looking for, but it's pretty close.