Getting the max from a nested default dictionary - python

I'm trying to obtain the maximum value from every dictionary in a default dictionary of default dictionaries using Python3.
Dictionary Set Up:
d = defaultdict(lambda: defaultdict(int))
My iterator runs through the dictionaries and the csv data I'm using just fine, but when I call max, it doesn't necessarily return the max every time.
Example output:
defaultdict(<class 'int'>, {0: 106, 2: 35, 3: 12})
max = (0, 106)
defaultdict(<class 'int'>, {0: 131, 1: 649, 2: 338, 3: 348, 4: 276, 5: 150, 6: 138, 7: 89, 8: 54, 9: 22, 10: 5, 11: 2})
max = (0, 131)
defaultdict(<class 'int'>, {0: 39, 1: 13, 2: 30, 3: 15, 4: 5, 5: 10, 6: 1, 8: 1})
max = (0, 39)
defaultdict(<class 'int'>, {0: 40, 1: 53, 2: 97, 3: 80, 4: 154, 5: 203, 6: 173, 7: 142, 8: 113, 9: 76, 10: 55, 11: 22, 12: 13, 13: 7})
max = (0, 40)
So sometimes it's right, but far from perfect.
My approach was informed by the answer to this question, but I adapted it to try and make it work for a nested default dictionary. Here's the code I'm using to find the max:
for sub_d in d:
outer_dict = d[sub_d]
print(max(outer_dict.items(), key=lambda x: outer_dict.get(x, 0)))
Any insight would be greatly appreciated. Thanks so much.

If you check the values in outer_dict.items(), they are actually consisted of key value tuples, and since these aren't in your dictionary, they all return 0, and hence returns the index 0.
max(a.keys(),key = lambda x: a.get(x,0))
will get you the index of the max value, and retrieve the value by looking up on the dictionary

In
max(outer_dict.items(), key=lambda x: outer_dict.get(x, 0))
the outer_dict.items() call returns an iterator that produces (key, value) tuples of the items in outer_dict. So the key function gets passed a (key, value) tuple as its x argument, and then tries to find that tuple as a key in outer_dict, and of course that's not going to succeed, so the get call always returns 0.
Instead, we can use a key function that extracts the value from the tuple. eg:
nested = {
'a': {0: 106, 2: 35, 3: 12},
'b': {0: 131, 1: 649, 2: 338, 3: 348, 4: 276, 5: 150, 6: 138, 7: 89,
8: 54, 9: 22, 10: 5, 11: 2},
'c': {0: 39, 1: 13, 2: 30, 3: 15, 4: 5, 5: 10, 6: 1, 8: 1},
'd': {0: 40, 1: 53, 2: 97, 3: 80, 4: 154, 5: 203, 6: 173, 7: 142,
8: 113, 9: 76, 10: 55, 11: 22, 12: 13, 13: 7},
}
for k, subdict in nested.items():
print(k, max((t for t in subdict.items()), key=lambda t: t[1]))
output
a (0, 106)
b (1, 649)
c (0, 39)
d (5, 203)
A more efficient alternative to that lambda is to use itemgetter. Here's a version that puts the maxima into a dictionary:
from operator import itemgetter
nested = {
'a': {0: 106, 2: 35, 3: 12},
'b': {0: 131, 1: 649, 2: 338, 3: 348, 4: 276, 5: 150, 6: 138, 7: 89,
8: 54, 9: 22, 10: 5, 11: 2},
'c': {0: 39, 1: 13, 2: 30, 3: 15, 4: 5, 5: 10, 6: 1, 8: 1},
'd': {0: 40, 1: 53, 2: 97, 3: 80, 4: 154, 5: 203, 6: 173, 7: 142,
8: 113, 9: 76, 10: 55, 11: 22, 12: 13, 13: 7},
}
ig1 = itemgetter(1)
maxes = {k: max((t for t in subdict.items()), key=ig1)
for k, subdict in nested.items()}
print(maxes)
output
{'a': (0, 106), 'b': (1, 649), 'c': (0, 39), 'd': (5, 203)}
We define ig1 outside the dictionary comprehension so that we don't call itemgetter(1) on every iteration of the outer loop.

Related

Python Color Dataframe cells depending on values

I am trying to color the cells
I have the following Dataframe:
pd.DataFrame({'Jugador': {1: 'M. Sanchez',
2: 'L. Ovalle',
3: 'K. Soto',
4: 'U. Kanu',
5: 'K. Abud'},
'Equipo': {1: 'Houston Dash',
2: 'Tigres UANL',
3: 'Guadalajara',
4: 'Tigres UANL',
5: 'Cruz Azul'},
'Edad': {1: 26, 2: 22, 3: 26, 4: 24, 5: 29},
'Posición específica': {1: 'RAMF, RW',
2: 'LAMF, LW',
3: 'RAMF, RW, CF',
4: 'RAMF, CF, RWF',
5: 'RW, RAMF, LW'},
'Minutos jugados': {1: 2053, 2: 3777, 3: 2287, 4: 1508, 5: 1436},
'Offence': {1: 84, 2: 90, 3: 69, 4: 80, 5: 47},
'Defense': {1: 50, 2: 36, 3: 64, 4: 42, 5: 86},
'Passing': {1: 78, 2: 81, 3: 72, 4: 73, 5: 71},
'Total': {1: 72, 2: 71, 3: 69, 4: 66, 5: 66}})
How can I color the Offence, Defense and Passing cells green if > 60, red < 40 and yellow the rest?
Use Styler.applymap with custom function:
def styler(v):
if v > 60:
return 'background-color:green'
elif v < 40:
return 'background-color:red'
else:
return 'background-color:yellow'
df.style.applymap(styler, subset=['Offence','Defense','Passing'])
Alternative solution:
styler = lambda v: 'background-color:green' if v > 60 else 'background-color:red' if v < 40 else 'background-color:yellow'
df.style.applymap(styler, subset=['Offence','Defense','Passing'])
Another approach:
def hightlight(x):
c1 = 'background-color:green'
c2 = 'background-color:red'
c3 = 'background-color:yellow'
cols = ['Offence','Defense','Passing']
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#modify values of df1 columns by boolean mask
df1[cols] = np.select([x[cols] > 60, x[cols] < 40], [c1, c2], default=c3)
return df1
df.style.apply(hightlight, axis=None)

Replace the keys of one dictionary with the values from another IF the keys of both dictionaries match

I have two dictionaries.
a = {3: 1, 14: 2, 4: 3, 1: 4, 5: 5, 16: 6, 2: 7, 6: 8, 15: 9, 10: 10, 7: 11, 9: 12, 8: 13, 12: 14, 13: 15, 11: 16}
b = {1: 29, 2: 63, 3: 29, 4: 32, 5: 46, 6: 29, 7: 24, 8: 63, 9: 67, 10: 46, 11: 29, 12: 24, 13: 67, 14: 2, 15: 63, 16: 2, 17: 42}
I want to assign the values of a to the values of b if the keys match. Or some other similar result. I've been learning Python for about a week and I'm pretty sure the answer involves list comprehension but I just can't get my ahead around some of the other answers on here so a step by step for loop style answer would be much appreciated. Thanks
EDIT.
Sorry for not showing previous efforts. I was trying something like this (as one of the comments suggested) but I don't understand why it doesn't return what I need.
for k1, v1 in a.items():
for k2, v2 in b.items():
if(k1 == k2):
b[k2] = v1
Also this:
for k1,v1 in a.items():
for k2,v2 in b.items():
if k1 in b.items():
b[k2] = a[v1]
For clarity, I've updated the dictionaries to have strings for keys.
a = {'2': 1, '3': 2, '4': 3, '5': 4, '1': 5, '6': 6, '7': 7, '8': 8, '9': 9, '10': 10, '11': 11, '12': 12, '13': 13, '14': 14, '15': 15, '16': 16}
b = {'1': 67, '2': 46, '3': 32, '4': 63, '5': 49, '6': 63, '7': 67, '8': 67, '9': 2, '10': 2, '11': 24, '12': 67, '13': 49, '14': 67, '15': 63, '16': 46, '17': 42}
I want to replace the strings in b with the integers from a (and do nothing if strings don't match) ie. no strings in the result. The strings are for associating values. As far as I can tell (not far) and unless I'm mistaken (very likely), most answers provided so far don't do this - they seem to think I want to replace v2 with v1, not k2 with v1. Apologies if that's wrong. The order of dictionary a can be jumbled as well which I tried to represent in my first example but for the second example I've only displaced key "1". In any case, the desired output from example two is:
new dict = {5: 67, 1: 46, 2: 32, 3: 63, 4: 49, 6: 63... 16: 46}
Hope that makes sense. Any further help greatly appreciated. Apologies again for noobishness.
EDIT2: A SOLUTION
Got what I needed from this:
new_dict = {}
for k1, v1 in a.items():
for k2, v2 in b.items():
if k1 == k2:
new_dict[v1] = v2
I'm sure there's fat to be trimmed - critiques welcome. Sorry if this question is now shit show - happy to hear edit suggestions.
Iterate over the set of keys in a and if that key exists in b, set b's value at that key to the value of a at that key
for key in a.keys():
if key in b:
b[key] = a[key]
Hope it helped
You can loop the dict a.
Add a condition if a key exist in dict b
If they match, assign the dict b with key as the value from the dict a
for _a in a:
if _a in b:
b[_a] = a[_a]
a = {3: 1, 14: 2, 4: 3, 1: 4, 5: 5, 16: 6, 2: 7, 6: 8, 15: 9, 10: 10, 7: 11, 9: 12, 8: 13, 12: 14, 13: 15, 11: 16}
b = {1: 29, 2: 63, 3: 29, 4: 32, 5: 46, 6: 29, 7: 24, 8: 63, 9: 67, 10: 46, 11: 29, 12: 24, 13: 67, 14: 2, 15: 63, 16: 2, 17: 42}
a_list = list(a)
b_list = list(b)
for key in range(len(a_list)):
if b_list[key] == a_list[key]:
b[b_list[key]] = a[a_list[key]]
print(b)
outputs {1: 29, 2: 63, 3: 29, 4: 32, 5: 5, 6: 29, 7: 24, 8: 63, 9: 67, 10: 10, 11: 29, 12: 24, 13: 67, 14: 2, 15: 63, 16: 2, 17: 42}
not sure if this is the output you wanted
a = {3: 1, 14: 2, 4: 3, 1: 4, 5: 5, 16: 6, 2: 7, 6: 8, 15: 9, 10: 10, 7: 11, 9: 12, 8: 13, 12: 14, 13: 15, 11: 16}
b = {1: 29, 2: 63, 3: 29, 4: 32, 5: 46, 6: 29, 7: 24, 8: 63, 9: 67, 10: 46, 11: 29, 12: 24, 13: 67, 14: 2, 15: 63, 16: 2, 17: 42}
for key_a, value_a in a.items():
for key_b, value_b in b.items():
if(key_a == key_b):
b[key_b] = value_a
print(b)
Edit: Or more efficiently, as recommended by #Mark (and done by #marcusoft):
for key in a.keys():
if key in b.keys():
b[key] = a[key]
print(b)
Output:
{1: 4, 2: 7, 3: 1, 4: 3, 5: 5, 6: 8, 7: 11, 8: 13, 9: 12, 10: 10, 11: 16, 12: 14, 13: 15, 14: 2, 15: 9, 16: 6, 17: 42}

Comparing pandas map and merge

I have the following df:
df = pd.DataFrame({'key': {0: 'EFG_DS_321',
1: 'EFG_DS_900',
2: 'EFG_DS_900',
3: 'EFG_Q_900',
4: 'EFG_DS_1000',
5: 'EFG_DS_1000',
6: 'EFG_DS_1000',
7: 'ABC_DS_444',
8: 'EFG_DS_900',
9: 'EFG_DS_900',
10: 'EFG_DS_321',
11: 'EFG_DS_900',
12: 'EFG_DS_1000',
13: 'EFG_DS_900',
14: 'EFG_DS_321',
15: 'EFG_DS_321',
16: 'EFG_DS_1000',
17: 'EFG_DS_1000',
18: 'EFG_DS_1000',
19: 'EFG_DS_1000',
20: 'ABC_DS_444',
21: 'EFG_DS_900',
22: 'EFG_DAS_12345',
23: 'EFG_DAS_12345',
24: 'EFG_DAS_321',
25: 'EFG_DS_321',
26: 'EFG_DS_12345',
27: 'EFG_Q_1000',
28: 'EFG_DS_900',
29: 'EFG_DS_321'}})
and I have the following dict:
d = {'ABC_AS_1000': 123,
'ABC_AS_444': 321,
'ABC_AS_231341': 421,
'ABC_AS_888': 412,
'ABC_AS_087': 4215,
'ABC_DAS_1000': 3415,
'ABC_DAS_444': 4215,
'ABC_DAS_231341': 3214,
'ABC_DAS_888': 321,
'ABC_DAS_087': 111,
'ABC_Q_1000': 222,
'ABC_Q_444': 3214,
'ABC_Q_231341': 421,
'ABC_Q_888': 321,
'ABC_Q_087': 41,
'ABC_DS_1000': 421,
'ABC_DS_444': 421,
'ABC_DS_231341': 321,
'ABC_DS_888': 41,
'ABC_DS_087': 41,
'EFG_AS_1000': 213,
'EFG_AS_900': 32,
'EFG_AS_12345': 1,
'EFG_AS_321': 3,
'EFG_DAS_1000': 421,
'EFG_DAS_900': 321,
'EFG_DAS_12345': 123,
'EFG_DAS_321': 31,
'EFG_Q_1000': 41,
'EFG_Q_900': 51,
'EFG_Q_12345': 321,
'EFG_Q_321': 321,
'EFG_DS_1000': 41,
'EFG_DS_900': 51,
'EFG_DS_12345': 321,
'EFG_DS_321': 1}
I want to map d into df, but given that the real data is very large and complicated, i'm trying to understand if map or merge is better in terms of efficiency (running time).
first option:
a simple map
res = df['key'].map(d)
second option:
convert d into a dataframe and preform a merge
d1 = pd.DataFrame.from_dict(d,orient='index',columns=['res'])
res = df.merge(d1,left_on='key',right_index=True)['res']
Any help will be much appreciated (or any better solutions of course:))
map will be faster than a merge
If your goal is simply to assign a numerical category to each unique value in df['AB'], you could use pandas.factorize that should be a bit faster than map:
res = df['AB'].factorize()[0]+1
output: array([1, 1, 1, 2, 2, 3, 3, 3])
test on 800k rows:
factorize 28.6 ms ± 153 µs
map 32.1 ms ± 110 µs
merge 68.6 ms ± 1.33 ms

nested Dictionary print differently after sorting by inner dictionary values in python

I was trying sorting nested dict by inner dict's value. The sorting went well. but when I check my result, I found out that the original dict was printed when I just use the variable (d2), but it gives me the correct result when I use print(d2)
d2 = {1: {1: 4, 2: 5, 3: 6},
2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18},
3: {1: 1, 2: 9, 3: 4}}
# sorting by inner dict value
for keys in d2.keys():
sorted_tuples = sorted(d2[keys].items(), key=operator.itemgetter(1), reverse=True)
d2[keys] = {k: v for k, v in sorted_tuples}
print(d2)
d2
{1: {3: 6, 2: 5, 1: 4}, 2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13}, 3: {2: 9, 3: 4, 1: 1}}
{1: {1: 4, 2: 5, 3: 6},
2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18},
3: {1: 1, 2: 9, 3: 4}}
why the output is different when I use d2 and print(d2)
friend! Did you use the pretty print module to print the results of d2? I was only able to replicate your behavior using the pretty print module. Pretty print alphabetically sorts a dictionary before printing it, which can be disabled.
I originally (and wrongly) suspected the different output between d2 and print(d2) was a result of dictionaries being unordered collections of data; I suspected dict.__str__ and dict.__repr__ differed just enough. I would recommend you use an OrderedDict over a standard dictionary if you wish to maintain its order--despite Python preserving dictionaries insertion order in Python 3.7.
Below is my code and conclusions.
After initialization, d2 and print(d2) printed the same values:
❯ python
Python 3.7.12 (default, Sep 10 2021, 17:29:55)
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> d2 = {1: {1: 4, 2: 5, 3: 6},
2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18},
3: {1: 1, 2: 9, 3: 4}}
>>> d2
{1: {1: 4, 2: 5, 3: 6}, 2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18}, 3: {1: 1, 2: 9, 3: 4}}
>>> print(d2)
{1: {1: 4, 2: 5, 3: 6}, 2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18}, 3: {1: 1, 2: 9, 3: 4}}
After sorting, d2 and print(d2) printed the same values.
>>> import operator
>>> for keys in d2.keys():
... sorted_tuples = sorted(d2[keys].items(), key=operator.itemgetter(1), reverse=True)
... d2[keys] = {k: v for k, v in sorted_tuples}
...
>>> print(d2)
{1: {3: 6, 2: 5, 1: 4}, 2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13}, 3: {2: 9, 3: 4, 1: 1}}
>>> d2
{1: {3: 6, 2: 5, 1: 4}, 2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13}, 3: {2: 9, 3: 4, 1: 1}}
However, while using the pretty print module, I was able to replicate your behavior.
>>> from pprint import pprint as pp
>>> pp(print(d2))
{1: {3: 6, 2: 5, 1: 4}, 2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13}, 3: {2: 9, 3: 4, 1: 1}}
>>> pp(d2)
{1: {1: 4, 2: 5, 3: 6},
2: {7: 13, 8: 14, 9: 15, 10: 16, 11: 17, 12: 18},
3: {1: 1, 2: 9, 3: 4}}
Once I disabled dictionary sorting in the pretty print module, I was able to obtain your desired output.
>>> pprint.sorted = lambda x, key=None: x
>>> pp(d2)
{1: {3: 6, 2: 5, 1: 4},
2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13},
3: {2: 9, 3: 4, 1: 1}}
>>> pp(print(d2))
{1: {3: 6, 2: 5, 1: 4}, 2: {12: 18, 11: 17, 10: 16, 9: 15, 8: 14, 7: 13}, 3: {2: 9, 3: 4, 1: 1}}

Double header dataframe, sumif (possibly groupby?) with python

So here is an image of what I have and what I want to get: https://imgur.com/a/RyDbvZD
Basically Those are SUMIF formulas in excel, I would like to recreate that in python, I was trying with pandas groupby().sum() function but I have no clue how to groupby on 2 headers like this, and then how to order the data.
Original dataframe:
df = pd.DataFrame( {'Group': {0: 'Name', 1: 20201001, 2: 20201002, 3: 20201003, 4: 20201004, 5: 20201005, 6: 20201006, 7: 20201007, 8: 20201008, 9: 20201009, 10: 20201010}, 'Credit': {0: 'Credit', 1: 65, 2: 69, 3: 92, 4: 18, 5: 58, 6: 12, 7: 31, 8: 29, 9: 12, 10: 41}, 'Equity': {0: 'Stock', 1: 92, 2: 62, 3: 54, 4: 52, 5: 14, 6: 5, 7: 14, 8: 17, 9: 54, 10: 51}, 'Equity.1': {0: 'Option', 1: 87, 2: 30, 3: 40, 4: 24, 5: 95, 6: 77, 7: 44, 8: 77, 9: 88, 10: 85}, 'Credit.1': {0: 'Credit', 1: 62, 2: 60, 3: 91, 4: 57, 5: 65, 6: 50, 7: 75, 8: 55, 9: 48, 10: 99}, 'Equity.2': {0: 'Option', 1: 61, 2: 91, 3: 38, 4: 3, 5: 71, 6: 51, 7: 74, 8: 41, 9: 59, 10: 31}, 'Bond': {0: 'Bond', 1: 4, 2: 62, 3: 91, 4: 66, 5: 30, 6: 51, 7: 76, 8: 6, 9: 65, 10: 73}, 'Unnamed: 7': {0: 'Stock', 1: 54, 2: 23, 3: 74, 4: 92, 5: 36, 6: 89, 7: 88, 8: 32, 9: 19, 10: 91}, 'Bond.1': {0: 'Bond', 1: 96, 2: 10, 3: 11, 4: 7, 5: 28, 6: 82, 7: 13, 8: 46, 9: 70, 10: 46}, 'Bond.2': {0: 'Bond', 1: 25, 2: 53, 3: 96, 4: 70, 5: 52, 6: 9, 7: 98, 8: 9, 9: 48, 10: 58}, 'Unnamed: 10': {0: float('nan'), 1: 63.0, 2: 80.0, 3: 17.0, 4: 21.0, 5: 30.0, 6: 78.0, 7: 23.0, 8: 31.0, 9: 72.0, 10: 65.0}} )
What I want at the end:
df = pd.DataFrame( {'Group': {0: 20201001, 1: 20201002, 2: 20201003, 3: 20201004, 4: 20201005, 5: 20201006, 6: 20201007, 7: 20201008, 8: 20201009, 9: 20201010}, 'Credit': {0: 127, 1: 129, 2: 183, 3: 75, 4: 123, 5: 62, 6: 106, 7: 84, 8: 60, 9: 140}, 'Equity': {0: 240, 1: 183, 2: 132, 3: 79, 4: 180, 5: 133, 6: 132, 7: 135, 8: 201, 9: 167}, 'Stock': {0: 146, 1: 85, 2: 128, 3: 144, 4: 50, 5: 94, 6: 102, 7: 49, 8: 73, 9: 142}, 'Option': {0: 148, 1: 121, 2: 78, 3: 27, 4: 166, 5: 128, 6: 118, 7: 118, 8: 147, 9: 116}} )
Any ideas where to start on this, or anything is appreciated
Here you go. First row seems to be the real headers so we first move that to column names and set the index to Name
df2 = df.rename(columns = df.loc[0]).drop(index = 0).set_index(['Name'])
Then we groupby by columns and sum
df2.groupby(df2.columns, axis=1, sort = False).sum().reset_index()
and we get
Name Credit Stock Option Bond
0 20201001 127.0 146.0 148.0 125.0
1 20201002 129.0 85.0 121.0 125.0
2 20201003 183.0 128.0 78.0 198.0
3 20201004 75.0 144.0 27.0 143.0
4 20201005 123.0 50.0 166.0 110.0
5 20201006 62.0 94.0 128.0 142.0
6 20201007 106.0 102.0 118.0 187.0
7 20201008 84.0 49.0 118.0 61.0
8 20201009 60.0 73.0 147.0 183.0
9 20201010 140.0 142.0 116.0 177.0
I realise the output is not exactly what you asked for but since we cannot see your SUMIF formulas, I do not know which columns you want to aggregate
Edit
Following up on your comment, I note that, as far as I can tell, the rules for aggregation are somewhat messy so that the same column is included in more than one output column (like Equity.1). I do not think there is much you can do with automation here, and you can replicate your SUMIF experience by directly referencing the columns you want to add. So I think the following gives you what you want
df = df.drop(index =0)
df2 = df[['Group']].copy()
df2['Credit'] = df['Credit'] + df['Credit.1']
df2['Equity'] = df['Equity'] + df['Equity.1']+ df['Equity.2']
df2['Stock'] = df['Equity'] + df['Unnamed: 7']
df2['Option'] = df['Equity.1'] + df['Equity.2']
df2
produces
Group Credit Equity Stock Option
-- -------- -------- -------- ------- --------
1 20201001 127 240 146 148
2 20201002 129 183 85 121
3 20201003 183 132 128 78
4 20201004 75 79 144 27
5 20201005 123 180 50 166
6 20201006 62 133 94 128
7 20201007 106 132 102 118
8 20201008 84 135 49 118
9 20201009 60 201 73 147
10 20201010 140 167 142 116
This also gives you control over which columns to include in the final output
If you want this more automated than you need to do something about labels of your columns, as you would want a unique label for a set of columns you want to aggregate. If the same input column is used in more than one calculation it is probably easiest to just duplicate it with the right labels

Categories