Comparing nested dictionaries

Comparing nested dictionaries - python

I would like to compare nested dictionaries as following:
d = {'siteA': {'00000000': 3, '11111111': 4, '22222222': 5},
'siteB': {'00000000': 1, '11111111': 2, '22222222': 5}}
e = {'siteA': {'00000000': 5}}
f = {'siteB': {'33333333': 10}}
g = {'siteC': {'00000000': 8}}
d is the total full dictionaries that will be use to compare with e, f and g.
If e happens to found in siteA-00000000, then I would like both value (in this case 3 and 5) add up to become 8.
If f is not found (in this case, it's true), I would like to append the dictionary into the d['siteB'].
If g is not found, would like to append into d.
Thanks!

collections.Counter is useful for summing values in dictionaries and adding keys where they do not exist. Since Counter is a subclass of dict, this should not break other operations. Apart from a one-off conversion cost, it is efficient and designed specifically for such tasks.
from collections import Counter
# convert d to dictionary of Counter objects
d = {k: Counter(v) for k, v in d.items()}
# add items from e
for k, v in e.items():
if k in d:
d[k] += Counter(e[k])
# add item from f if not found
for k, v in f.items():
if k not in d:
d[k] += Counter(f[k])
# add item from g if not found
for k, v in g.items():
if k not in d:
d[k] = Counter(v)
Result:
print(d)
{'siteA': Counter({'00000000': 8, '11111111': 4, '22222222': 5}),
'siteB': Counter({'00000000': 1, '11111111': 2, '22222222': 5}),
'siteC': Counter({'00000000': 8})}

You can use Counter from collections in combination with defaultdict.
As the name suggests, the counter counts the same elements, and a defaultdict lets you access non-existing keys by providing a default value (an empty Counter in this case). Your code then becomes
from collections import Counter, defaultdict
d = defaultdict(Counter)
d['siteA'] = Counter({'00000000': 3, '11111111': 4, '22222222': 5})
d['siteB'] = Counter({'00000000': 1, '11111111': 2, '22222222': 5})
print(d.items())
> dict_items([('siteA', Counter({'22222222': 5, '11111111': 4, '00000000': 3})),
> ('siteB', Counter({'22222222': 5, '11111111': 2, '00000000': 1}))])
# d + e:
d['siteA'].update({'00000000': 5})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'22222222': 5, '11111111': 2, '00000000': 1}))])
# d + f
d['siteB'].update({'33333333': 10})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'33333333': 10, '22222222': 5, '11111111': 2, '00000000': 1}))])
# d + g
d['siteC'].update({'00000000': 8})
print(d.items())
> dict_items([('siteA', Counter({'00000000': 8, '22222222': 5, '11111111': 4})),
> ('siteB', Counter({'33333333': 10, '22222222': 5, '11111111': 2, '00000000': 1})),
>. ('siteC', Counter({'00000000': 8}))])

Given the format of your dictionaries dict[site][address], let's say, this merge function will take the values from dictFrom and insert them into dictTo according to your rules.
def merge(dictTo, dictFrom):
for site in dictFrom:
if site not in dictTo:
dictTo[site] = {}
for address in dictFrom[site]:
dictTo[site][address] = dictTo[site].get(address, 0) + dictFrom[site][address]
merge(d, e)
merge(d, f)
merge(d, g)
This may be preferable to jpp's answer because the objects at dict[site] are all still basic dicts.

Related

Two-keys dictionary into one key dictionary of lists

I am trying to implement a simple task. I have a dictionary with keys (ti, wi)
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
I want to create a new dictionary where keys will be wi, and value is a list of all ti. So I want to have an output dictionary like:
{'w1': [1, 2, 3], 'w2': [4, 5, 6]}
I wrote the following code:
y={('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w={}
y_t=[]
for w in range(1,3):
y_t.clear()
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But the result I am getting is
{'w1': [4, 5, 6], 'w2': [4, 5, 6]}
I can not understand where the first list disappeared? Can someone help me explain where I am wrong? Is there a nicer way to do it, maybe without for lops?

Your problem lies in the assumption that setting the value in the dictionary somehow freezes the list.
It's no accident the lists have the same values: They are identical, two pointers to the same list. Observe:
>>> a_dict = {}
>>> a_list = []
>>> a_list.append(23)
>>> a_dict["a"] = a_list
>>> a_list.clear()
>>> a_list.append(42)
>>> a_dict["b"] = a_list
>>> a_dict
{'a': [42], 'b': [42]}
You could fix your solution by replacing y_t.clear() with y_t = [], which does create a new list:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
y_w = {}
for w in range(1,3):
y_t = []
for t in range(1,4):
print('t= ', t, 'w= ', w, 'y=' , y['t{0}'.format(t), 'w{0}'.format(w)])
y_t.append(y['t{0}'.format(t), 'w{0}'.format(w)])
print(y_t)
y_w['w{0}'.format(w)]=y_t
print(y_w)
But there are, as you suspect, easier ways of doing this, for example the defaultdict solution shown by Riccardo Bucco.

Try this:
from collections import defaultdict
d = defaultdict(list)
for k, v in y.items():
d[k[1]].append(v)
d = dict(d)

The line number 10 is causing the problem, if you replace it with y_t = [] it will work as you expect

You could first find all unique keys:
unique_keys = set(list(zip(*k))[1])
and then create the dict with list-values using those:
{u: [v for k, v in y.items() if k[1] == u] for u in unique_keys}

According to your output here's what you can try:
y = {('t1', 'w1'): 1, ('t2', 'w1'): 2, ('t3', 'w1'): 3, ('t1', 'w2'): 4, ('t2', 'w2'): 5, ('t3', 'w2'): 6}
def new_dict_with_keys(dictionary):
new_dictionary = dict()
# Go through the dictionary keys to read each key's value
for tuple_key in dictionary:
if "w1" in tuple_key or "w2" in tuple_key:
# Determine which key to use
if "w1" in tuple_key:
key = "w1"
else:
key = "w2"
# Check if the new dictionary has the "w1" or "w2" as a an item
# If it does not, create a new list
if new_dictionary.get(key) is None:
new_dictionary[key] = list()
# Append the value in the respective key
new_dictionary[key].append(dictionary[tuple_key])
# Return the dictionary with the items
return new_dictionary
print(new_dict_with_keys(y))
# Prints: {'w1': [1, 2, 3], 'w2': [4, 5, 6]}

Here's a solution using itertools.groupby:
import itertools as it
from operator import itemgetter
items = sorted((k, v) for (_, k), v in y.items())
groups = it.groupby(items, key=itemgetter(0))
result = {k: [v for _, v in vs] for k, vs in groups}
# {'w1': [1, 2, 3], 'w2': [4, 5, 6]}

How to assign certain scores from a list to values in multiple lists and get the sum for each value in python?

Could you explain how to assign certain scores from a list to values in multiple lists and get the total score for each value?
score = [1,2,3,4,5] assigne a score based on the position in the list
l_1 = [a,b,c,d,e]
assign a=1, b=2, c=3, d=4, e=5
l_2 = [c,a,d,e,b]
assign c=1, a=2, d=3, e=4, b=5
I am trying to get the result like
{'e':9, 'b': 7, 'd':7, 'c': 4, 'a': 3}
Thank you!

You can zip the values of score to each list, which gives you a tuple of (key, value) for each letter-score combination. Make each zipped object a dict. Then use a dict comprehension to add the values for each key together.
d_1 = dict(zip(l_1, score))
d_2 = dict(zip(l_2, score))
{k: v + d_2[k] for k, v in d_1.items()}
# {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}

You better use zip function:
dic = {'a':0, 'b': 0, 'c':0, 'd': 0, 'e': 0}
def score(dic, *args):
for lst in args:
for k, v in zip(lst, range(len(lst))):
dic[k] += v+1
return dic
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score(dic, l_1, l_2)

Instead of storing your lists in separate variables, you should put them in a list of lists so that you can iterate through it and calculate the sums of the scores according to each key's indices in the sub-lists:
score = [1, 2, 3, 4, 5]
lists = [
['a','b','c','d','e'],
['c','a','d','e','b']
]
d = {}
for l in lists:
for i, k in enumerate(l):
d[k] = d.get(k, 0) + score[i]
d would become:
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}

from collections import defaultdict
score = [1,2,3,4,5] # note: 0 no need to use this list if there is no scenario like [5,6,9,10,4]
l_1 = ['a','b','c','d','e']
l_2 = ['c','a','d','e','b']
score_dict = defaultdict(int)
'''
for note: 0
if your score is always consecutive
like score = [2,3,4,5,6] or [5,6,7,8,9]...
you don't need to have seperate list of score you can set
start = score_of_char_at_first_position_ie_at_zero-th_index
like start = 2, or start = 5
else use this function
def add2ScoreDict( lst):
for pos_score, char in zip(score,lst):
score_dict[char] += pos_score
'''
def add2ScoreDict( lst):
for pos, char in enumerate( lst,start =1):
score_dict[char] += pos
# note: 1
add2ScoreDict( l_1)
add2ScoreDict( l_2)
#print(score_dict) # defaultdict(<class 'int'>, {'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9})
score_dict = dict(sorted(score_dict.items(), reverse = True, key=lambda x: x[1]))
print(score_dict) # {'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3}
edit 1:
if you have multiple lists put them in list_of_list = [l_1, l_2] so that you don't have to call func add2ScoreDict yourself again and again.
# for note: 1
for lst in list_of_list:
add2ScoreDict( lst)

You could zip both lists with score as one list l3 then you could use dictionary comprehension with filterto construct your dicitonary. The key being index 1 of the the newly formed tuples in l3, and the value being the sum of all index 0's in l3 after creating a sublist that is filtered for only matching index 0's
score = [1,2,3,4,5]
l_1 = ['a', 'b', 'c', 'd', 'e']
l_2 = ['c', 'a', 'd', 'e', 'b']
l3 = [*zip(score, l_1), *zip(score,l_2)]
d = {i[1]: sum([j[0] for j in list(filter(lambda x: x[1] ==i[1], l3))]) for i in l3}
{'a': 3, 'b': 7, 'c': 4, 'd': 7, 'e': 9}
Expanded Explanation:
d = {}
for i in l3:
f = list(filter(lambda x: x[1] == i[1], l3))
vals = []
for j in f:
vals.append(j[0])
total_vals = sum(vals)
d[i[1]] = total_vals

The simplest way is probably to use a Counter from the Python standard library.
from collections import Counter
tally = Counter()
scores = [1, 2, 3, 4, 5]
def add_scores(letters):
for letter, score in zip(letters, scores):
tally[letter] += score
L1 = ['a', 'b', 'c', 'd', 'e']
add_scores(L1)
L2 = ['c', 'a', 'd', 'e', 'b']
add_scores(L2)
print(tally)
>>> python tally.py
Counter({'e': 9, 'b': 7, 'd': 7, 'c': 4, 'a': 3})
zip is used to pair letters and scores, a for loop to iterate over them and a Counter to collect the results. A Counter is actually a dictionary, so you can write things like
tally['a']
to get the score for letter a or
for letter, score in tally.items():
print('Letter %s scored %s' % (letter, score))
to print the results, just as you would with a normal dictionary.
Finally, small ells and letter O's can be troublesome as variable names because they are hard to distinguish from ones and zeros. The Python style guide (often referred to as PEP8) recommends avoiding them.

Remap data according to ranking

I have the following:
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
m = {v: i+1 for i,v in enumerate(sorted(set(d.values()),reverse=True))}
r = {k:m[d[k]] for k in d}
where r is:
{'a': 3, 'd': 4, 'b': 4, 'c': 3, 'e': 4, 'f': 3, 'g': 2, 'h': 1}
So "h" has the highest value, 6, in d so it is remapped to 1 in r. Then 'g' is ranked 2 since it has the next highest value, 4 in d.
My solution works fine but I was wondering if there is a more elegant solution.

Python dicts don't keep order. If you want that you need an OrderedDict.
Use Counter to get the ranks. Then turn that into a list of tuples or into an OrderedDict.
from collections import Counter, OrderedDict
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
c = Counter(d)
# if you want a list of tuples
ranked_list = [(pair[0],rank+1) for rank,pair in enumerate(c.most_common())]
# [('h', 1),('g', 2),('f', 3),('a', 4),('c', 5),('b', 6),('d', 7), ('e', 8)]
# if you want a dict:
ranked_dict = OrderedDict(ranked_list)
# OrderedDict([('h', 1),('g', 2),('f', 3),('a', 4),('c', 5),('b', 6),('d', 7), ('e', 8)])

You can use this:
d = {"a":3,"b":2,"c":3,"d":2,"e":2,"f":3,"g":4, "h":6}
# sort the dictionary items by -value, throw away old value and use the
# enumerate position starting at 1 instead - no backreferencing in the old
# dict needed here
k = {k:idx for idx,(k,_) in enumerate(sorted(d.items(), key = lambda x:-x[1]),1)}
print(k)
Output:
{'h': 1, 'g': 2, 'a': 3, 'c': 4, 'f': 5, 'b': 6, 'd': 7, 'e': 8}

def ranker(d):
ranks = sorted(set(d.values()),reverse=True)
ranks = {r:i+1 for i,r in enumerate(ranks)}
return {k: ranks[v] for k,v in d.items()}

Creating dictionaries from list which contains the headers as elements

bit of a rookie scraper here, trying to make a dictionary out of scraped table.
I scraped a table using selenium, which didn't have different headers and cells, and now I am stuck with an appended list I made myself that features firstly the header names and then all the values such as:
list = [H1, H2, H3, ValueA1, ValueA2, ValueA3, ValueB1, ValueB2, ValueB3 ....]
My desired output is a list of dictionaries that features the first three objects as dictionary keys, and the next three as objects as dictionary values, and so on.
Thank you

Though this is a code request, I'll bite:
In [3]: l = ['asdf', 'qwer', 1, 2, 3, 4, 5, 6, 7, 8]
In [4]: n_headers = 2
In [5]: [{k: v for k, v in zip(l[:n_headers], l[i:i + n_headers])}
for i in range(n_headers, len(l), n_headers)]
Out[5]:
[{'qwer': 2, 'asdf': 1},
{'qwer': 4, 'asdf': 3},
{'qwer': 6, 'asdf': 5},
{'qwer': 8, 'asdf': 7}]
This'll end up slicing the list quite a few times, which you can avoid with the iter() trick:
In [9]: g = zip(*[iter(l)] * 2)
In [10]: hdrs = next(g)
In [11]: hdrs
Out[11]: ('asdf', 'qwer')
In [12]: [{k: v for k, v in zip(hdrs, h)} for h in g]
Out[12]:
[{'qwer': 2, 'asdf': 1},
{'qwer': 4, 'asdf': 3},
{'qwer': 6, 'asdf': 5},
{'qwer': 8, 'asdf': 7}]

Unclear if this is what you're looking for but for a 'list' of dictionaries:
i = 3
d={}
result=[]
while i < len(list): #Iterating over list
d[list[i%3]]=list[i]
i += 1
if (i%3==0): #Add to your list for every third element
result.append(d)
d={}
Output would be along the lines of
[{'H2': 'ValueA2', 'H3': 'ValueA3', 'H1': 'ValueA1'}, {'H2': 'ValueB2', 'H3': 'ValueB3', 'H1': 'ValueB1'}]

Use combination of zip, iter.. Assuming 3 headers..
lst = [ 'H1', 'H2', 'H3', 'ValueA1', 'ValueA2', 'ValueA3', 'ValueB1', 'ValueB2', 'ValueB3', 'ValueC1', 'ValueC2', 'ValueC3' ]
grps = list( zip(*([iter(lst)] * 3)) )
[ dict( zip( grps[0], grps[i]) ) for i in range(1,len(grps))]
Output:
[{'H1': 'ValueA1', 'H2': 'ValueA2', 'H3': 'ValueA3'},
{'H1': 'ValueB1', 'H2': 'ValueB2', 'H3': 'ValueB3'},
{'H1': 'ValueC1', 'H2': 'ValueC2', 'H3': 'ValueC3'}]

Using secondary key to sum dictionary values

I have a dict structured like [a][b]=(c) such as:
{'cat': {1:1, 2:3, 3:1, 4:1}, 'dog': {1:8, 2:2, 3:4}, 'egg': {5:1, 6:2}, 'frog': {2:1, 4:1, 5:1}, 'nuts': {3:1}, 'idea': {4:1}}
What I'd like to be able to do is search by the [b] key and sum the corresponding c belonging to that. So I'd get the following outputs:
1: 9, 2: 6, 3: 6
...and so on.
Does this require restructuring of the dict?

You can iterate on the dictionary values which are dicts and sum up the values for each key using a collections.defaultdict. Then you'll simply access the result dictionary to find out the summed value for each key with no need to search:
from collections import defaultdict
d = {'cat': {1:1, 2:3, 3:1, 4:1}, 'dog': {1:8, 2:2, 3:4}, 'egg': {5:1, 6:2}, 'frog': {2:1, 4:1, 5:1}, 'nuts': {3:1}, 'idea': {4:1}}
result = defaultdict(int)
for i in d.values():
for j in i:
result[j] += i[j]
print(result)
# defaultdict(<class 'int'>, {1: 9, 2: 6, 3: 6, 4: 3, 5: 2, 6: 2})
>>> print(result[1])
9

I'll assume you have such dictionary:
d = {'cat': {1:1, 2:3, 3:1, 4:1}, 'dog': {1:8, 2:2, 3:4}, 'egg': {5:1, 6:2}, 'frog': {2:1, 4:1, 5:1}, 'nuts': {3:1}, 'idea': {4:1}}}
Now we have to write a function which will take a parameter (an integer) and sum all values for this integer across all inner dictionaries.
def calc(b):
result = 0
for val in d.values():
if b in val:
result += val[b]

If all you need is the total then you can sum up all the values using:
>>> b = 2
>>> sum(a.get(b, 0) for a in d.values())
6
If you want all the bs then you use collections.Counter() which behaves like a dict to do all the heavy lifting:
>>> from collections import Counter
>>> sum((Counter(a) for a in d.values()), Counter())
Counter({1: 9, 2: 6, 3: 6, 4: 3, 5: 2, 6: 2})
But if you are really fussy and want a dict:
>>> dict(sum((Counter(a) for a in d.values()), Counter()))
{1: 9, 2: 6, 3: 6, 4: 3, 5: 2, 6: 2}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing nested dictionaries - python

Related

Two-keys dictionary into one key dictionary of lists

How to assign certain scores from a list to values in multiple lists and get the sum for each value in python?

Remap data according to ranking

Creating dictionaries from list which contains the headers as elements

Using secondary key to sum dictionary values

Categories

Resources