Normalization of a nested dictionary in python - python

I am new to Python and I have a nested dictionary for which I want to normalize the values of the dictionary. For example:
nested_dictionary={'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
And I would like to get the normalization as
Normalized_result={'D': {'D': '0.47', 'B': '0.24', 'C': '0.00', 'A': '0.24', 'K': '0.00', 'J': '0.04'}, 'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
I have seen the example in Normalizing dictionary values which only for one dictionary but I want to go further with nested one.
I have tried to flatten the nested_dictionary and apply the normalization as
import flatdict
d = flatdict.FlatDict(nested_dictionary, delimiter='_')
dd=dict(d)
newDict = dict(zip(dd.keys(), [float(value) for value in dd.values()]))
def normalize(d, target=1.0):
global factor
raw = sum(d.values())
print(raw)
if raw==0:
factor=0
#print('ok')
else:
# print('kok')
factor = target/raw
return {key:value*factor for key,value in d.items()}
normalize(newDict)
And I get the result as
{'D_D': 0.2578125,
'D_B': 0.1328125,
'D_C': 0.0,
'D_A': 0.1328125,
'D_K': 0.0,
'D_J': 0.023437499999999997,
'A_A': 0.39062499999999994,
'A_K': 0.0,
'A_J': 0.06249999999999999}
But what I want is the Normalized_result as above
Thanks in advance.

nested_dictionary = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'},
'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
In this example, your dict values are str type, so we need to convert to float:
nested_dictionary = dict([b, dict([a, float(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': 0.33, 'B': 0.17, 'C': 0.0, 'A': 0.17, 'K': 0.0, 'J': 0.03},
'A': {'A': 0.5, 'K': 0.0, 'J': 0.08}}
The function below is adapted from the link you provided.
It loops through the dictionaries, calculates the factor and updates the values inplace.
for _, d in nested_dictionary.items():
factor = 1.0/sum(d.values())
for k in d:
d[k] = d[k] * factor
nested_dictionary
{'D': {'D': 0.47142857142857136,
'B': 0.24285714285714285,
'C': 0.0,
'A': 0.24285714285714285,
'K': 0.0,
'J': 0.04285714285714285},
'A': {'A': 0.8620689655172414, 'K': 0.0, 'J': 0.13793103448275865}}
If you need to convert back to str, use the function below:
nested_dictionary = dict([b, dict([a, "{:.2f}".format(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': '0.47',
'B': '0.24',
'C': '0.00',
'A': '0.24',
'K': '0.00',
'J': '0.04'},
'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}

This code would do:
def normalize(d, target=1.0):
raw = sum(float(number) for number in d.values())
factor = (target/raw if raw else 0)
return {key: f'{float(value)*factor:.2f}' for key, value in d.items()}
{key: normalize(dct) for key, dct in nested_dictionary.items()}

Turn the string-values in your inner dicts into floats.
Take one of the solutions from the the duplicate, for example really_safe_normalise_in_place.
Use the solution on each dict.
Example:
d = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
d = {k: {kk: float(vv) for kk, vv in v.items()} for k, v in d.items()}
for v in d.values():
really_safe_normalise_in_place(v)

Related

iterating over a list of dictionaries with a for loop

I have a variable that looks like this, it contains multiple lists and each list has multiple dictionaries. what i need to do now is:
combine the lists into 1 big list
if 2 dictionaries have the same key i need to combine them(keep 1 of the keys and add their values)
i know i need to use a for loop but how do i reference dictionaries inside a list and how do i refernce the lists stored in the variable?
i tried doing something like this:
for list in bigram_lists:
for list1 in bigram_lists:
list.append(list1)
it gives back the error that dict object has no attribute append
help would be appreciated
import ast
x = "[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e': 3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j': 59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o': 1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t': 2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393}, {'z': 23}] [{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429}, {'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42}, {'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365}, {'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708}, {'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706}, {'z': 85}] [{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904}, {'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138}, {'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675}, {'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438}, {'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718}, {'z': 53}]"
strs = x.replace(']','],')[:-1]
strs = "[" + strs + "]"
listOfLists = ast.literal_eval(strs)
finalDict = {}
for ls in listOfLists:
for dct in ls:
if (list(dct.keys())[0]) in finalDict:
finalDict[list(dct.keys())[0]] += dct[list(dct.keys())[0]]
else:
finalDict[list(dct.keys())[0]] = dct[list(dct.keys())[0]]
print(finalDict)
gives you
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}
Working with x as a list of lists, I created a dictionary with multiple keys, that you can split if you want later, but each key has the addition of the same key in each list :
result = {}
for sublist in x:
for elem in sublist:
for key, value in elem.items():
if key not in result:
result[key] = value
else:
result[key] += value
>>> print(result)
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}
Having corrected the x input as a list of lists:
x = [[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e':
3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j':
59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o':
1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t':
2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393},
{'z': 23}],
[{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429},
{'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42},
{'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365},
{'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708},
{'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706},
{'z': 85}],
[{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904},
{'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138},
{'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675},
{'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438},
{'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718},
{'z': 53}]]
this:
R=[]
for ld in x:
result = {}
for d in ld:
result.update(d)
R.append(result)
D = dict.fromkeys(R[0].keys(), 0)
for d in R:
for k in R[0].keys():
D[k]+=d[k]
will give you the answer you wanted.

Python functions for nested dictionary

Given a dictionary :
dic = {
2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}
I want the output as a dictionary with key as 'a', 'p', ... and their values as the addition of their values in a nested dictionary
Expected output:
{'p': 0.025 , 'a' 0.09811 ....}
Try the following:
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311}, 7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},8:{'r':0.1, 's':0.2}}
res = {}
for d in dic.values():
for k, v in d.items():
res[k] = res.get(k, 0.0) + v
print(res) # {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}
In particular, dict.get(key, value) returns dict[key] if the latter is present, and value otherwise.
See Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Use collections.Counter
from collections import Counter
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}}
res = dict(sum([Counter(d) for d in dic.values()], Counter()))
print(res)
output:
{'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}

Find the smallest three values in a nested dictionary

I have a nested dictionary ( i.e. sample_dict), where for each day, we need to find the smallest three values (in ascending manner), after which the result has to be stored in a new dictionary.
The sample_dict is as follows:
sample_dict ={ '2020-12-22': {'A': 0.0650,'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
The final dictionary (i.e. result) after selecting the smallest three for each date would look like:
Can someone suggest a solution using for loops.
Let's use heapq.nsmallest inside a dictionary comprehension to select the smallest 3 items per subdict:
from operator import itemgetter
import heapq
for k, v in sample_dict.items():
# Look ma, no `sorted`
sample_dict[k] = dict(heapq.nsmallest(3, v.items(), key=itemgetter(1)))
print (sample_dict)
# {'2020-12-22': {'A': 0.065, 'C': 0.078, 'B': 0.292},
# '2020-12-23': {'H': 0.227731, 'B': 0.367, 'C': 0.489},
# '2020-12-24': {'C': 0.095, 'A': 0.363, 'Z': 0.3735},
# '2020-12-25': {'B': 0.484, 'C': 0.8366},
# '2020-12-26': {'Y': 5.366}}
This is pretty fast because it does not need to sort the array, and updates sample_dict in-place.
Try using this dictionary comprehension:
print({k: dict(sorted(sorted(v.items(), key=lambda x: x[1]), key=lambda x: x[0])[:3]) for k, v in sample_dict.items()})
Output:
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078}, '2020-12-23': {'B': 0.367, 'C': 0.489, 'G': 1.34235}, '2020-12-24': {'A': 0.363, 'B': 0.396, 'C': 0.095}, '2020-12-25': {'B': 0.484, 'C': 0.8366}, '2020-12-26': {'Y': 5.366}}
This should work for your purposes.
sample_dict = {'2020-12-22': {'A': 0.0650, 'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
results_dict = {day[0]:{sample[0]:sample[1] for sample in sorted(day[1].items(), key=lambda e: e[1])[:3]} for day in sample_dict.items()}
# Output
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078},
'2020-12-23': {'B': 0.367, 'C': 0.489, 'H': 0.227731},
'2020-12-24': {'A': 0.363, 'C': 0.095, 'Z': 0.3735},
'2020-12-25': {'B': 0.484, 'C': 0.8366},
'2020-12-26': {'Y': 5.366}}

Summarizing a dictionary into another one

I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?
The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)
Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}
You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}

Count value in dictionary relative to another value

I have a python dictionary like the one below:
{'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
....
}
I want to count the number of 'NaN' in each value, and return that count with the number of 'NaN' that have the value 'E': True.
So I would like to create a dictionary like this:
{'A': {'NaN': 0, 'E': 0},
'B': {'NaN': 1, 'E': 1},
'C': {'NaN': 2, 'E': 1},
'D': {'NaN': 2, 'E': 2}}
I have this code that returns a dictionary with the count of NaN
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]=0
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]+=1
print NaN
How can I add the count of E:True to it?
Ok, why don't you try this:
dict = {'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
}
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if i != 'E':
NaNs[i]={'NaN': 0, 'E': 0}
for k,v in dict.iteritems():
for i in v:
if str(v[i]).lower() == 'nan':
NaNs[i]['NaN']+=1
if v['E'] == True:
NaNs[i]['E']+=1
print NaNs
I shouldn't really be going around calling variables dict and NaNs, but I tried to change your code as little as possible.

Categories