Summarizing a dictionary into another one - python

I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?

The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)

Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}

You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}

You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}

Related

Remove values from nested dictionary

I've got a nested dictionary:
{'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
is there a simple way of removing all the nested zero values so the dictionary becomes:
{'a': {'m': 1}, 'b': {'x': 1}}
Dictionary comprehension:
>>> d = {'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
>>> {key: {k:v for k,v in d[key].items() if v != 0} for key in d}
Output:
{'a': {'m': 1}, 'b': {'x': 1}}
One approach, that modifies the dictionary in-place:
d = {'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
for dd in d.values():
for k, v in list(dd.items()):
if v == 0:
dd.pop(k)
print(d)
Output
{'a': {'m': 1}, 'b': {'x': 1}}
You can use dict comprehension:
d = {'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
result = {k: {k1: v1 for k1, v1 in v.items() if v1 != 0} for k, v in d.items()}
One-line solution:
d = {'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
d_new = {k: {inner_k: inner_v for inner_k, inner_v in v.items() if inner_v != 0} for k, v in d.items()}
print(d_new) # Prints: {'a': {'m': 1}, 'b': {'x': 1}}
dict_1 = {'a': {'m': 1, 'n': 0}, 'b': {'m': 0, 'x': 1}}
new_dict_1={}
for key in dict_1.keys():
for v,k in zip(dict_1[key].values(),dict_1[key].keys()):
if v != 0 :
new_dict_1[key] = {k:v}
new_dict_1
You can even try to convert your nested dict to flat and then look for value zero
Code:
[d[k[0]].pop(k[-1]) for k,v in pd.json_normalize(d).to_dict(orient='records')[0].items() if v==0]
d

I am getting different value when printing and appending same variable to a list, Why is that?

The first code gives me the output I want but I want the dct to append to a list so I can use the values later. When I try to do that it gives me a different output. Why?
lst = [{'a' : 1, 'b' : 2, 'c': 3 },{'e' : 1, 'f' : 2, 'g': 3}]
e = 0
while e < len(lst):
for k in lst[e]:
dct = {}
x = lst[e][k]
for key, value in lst[e].items():
lst[e][key] = (value - x)
dct[k] = (lst[e])
print(dct)
e += 1
output(lst) = {'a': {'a': 0, 'b': 1, 'c': 2}}
{'b': {'a': -1, 'b': 0, 'c': 1}}
{'c': {'a': -2, 'b': -1, 'c': 0}}
{'e': {'e': 0, 'f': 1, 'g': 2}}
{'f': {'e': -1, 'f': 0, 'g': 1}}
{'g': {'e': -2, 'f': -1, 'g': 0}}
So the following is what I tried to do to save it in a list
e = 0
lst2 = []
while e < len(lst):
for k in lst[e]:
dct = {}
x = lst[e][k]
for key, value in lst[e].items():
lst[e][key] = (value - x)
dct[k] = (lst[e])
lst2.append(dct)
e += 1
print(lst2)
But the output when I print that list gives me the same value for every key in the different dictionaries.
Output(lst2)= [{'a': {'a': -2, 'b': -1, 'c': 0}},
{'b': {'a': -2, 'b': -1, 'c': 0}},
{'c': {'a': -2, 'b': -1, 'c': 0}},
{'e': {'e': -2, 'f': -1, 'g': 0}},
{'f': {'e': -2, 'f': -1, 'g': 0}},
{'g': {'e': -2, 'f': -1, 'g': 0}}]
If you want to use your existing code, change
lst2.append(dct)
to
lst2.append(dct.copy())
(and to understand why, read up on lists, references, and mutability.)
Or, if you want to rewrite your code, you might use
list_ = [{'a' : 1, 'b' : 2, 'c': 3 },{'e' : 1, 'f' : 2, 'g': 3}]
result = {}
for d in list_:
for key, value in d.items():
result[key] = {k:d[k]-value for k in d}
which gives
>>> print(result)
{'a': {'a': 0, 'b': 1, 'c': 2},
'b': {'a': -1, 'b': 0, 'c': 1},
'c': {'a': -2, 'b': -1, 'c': 0},
'e': {'e': 0, 'f': 1, 'g': 2},
'f': {'e': -1, 'f': 0, 'g': 1},
'g': {'e': -2, 'f': -1, 'g': 0},
}
(and if you're a fan of code-golf, here's a one-liner:)
result = {key: {k:d[k]-value for k in d} for d in list_ for key,value in d.items()}

Python functions for nested dictionary

Given a dictionary :
dic = {
2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}
I want the output as a dictionary with key as 'a', 'p', ... and their values as the addition of their values in a nested dictionary
Expected output:
{'p': 0.025 , 'a' 0.09811 ....}
Try the following:
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311}, 7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},8:{'r':0.1, 's':0.2}}
res = {}
for d in dic.values():
for k, v in d.items():
res[k] = res.get(k, 0.0) + v
print(res) # {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}
In particular, dict.get(key, value) returns dict[key] if the latter is present, and value otherwise.
See Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Use collections.Counter
from collections import Counter
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}}
res = dict(sum([Counter(d) for d in dic.values()], Counter()))
print(res)
output:
{'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}

Find the smallest three values in a nested dictionary

I have a nested dictionary ( i.e. sample_dict), where for each day, we need to find the smallest three values (in ascending manner), after which the result has to be stored in a new dictionary.
The sample_dict is as follows:
sample_dict ={ '2020-12-22': {'A': 0.0650,'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
The final dictionary (i.e. result) after selecting the smallest three for each date would look like:
Can someone suggest a solution using for loops.
Let's use heapq.nsmallest inside a dictionary comprehension to select the smallest 3 items per subdict:
from operator import itemgetter
import heapq
for k, v in sample_dict.items():
# Look ma, no `sorted`
sample_dict[k] = dict(heapq.nsmallest(3, v.items(), key=itemgetter(1)))
print (sample_dict)
# {'2020-12-22': {'A': 0.065, 'C': 0.078, 'B': 0.292},
# '2020-12-23': {'H': 0.227731, 'B': 0.367, 'C': 0.489},
# '2020-12-24': {'C': 0.095, 'A': 0.363, 'Z': 0.3735},
# '2020-12-25': {'B': 0.484, 'C': 0.8366},
# '2020-12-26': {'Y': 5.366}}
This is pretty fast because it does not need to sort the array, and updates sample_dict in-place.
Try using this dictionary comprehension:
print({k: dict(sorted(sorted(v.items(), key=lambda x: x[1]), key=lambda x: x[0])[:3]) for k, v in sample_dict.items()})
Output:
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078}, '2020-12-23': {'B': 0.367, 'C': 0.489, 'G': 1.34235}, '2020-12-24': {'A': 0.363, 'B': 0.396, 'C': 0.095}, '2020-12-25': {'B': 0.484, 'C': 0.8366}, '2020-12-26': {'Y': 5.366}}
This should work for your purposes.
sample_dict = {'2020-12-22': {'A': 0.0650, 'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
results_dict = {day[0]:{sample[0]:sample[1] for sample in sorted(day[1].items(), key=lambda e: e[1])[:3]} for day in sample_dict.items()}
# Output
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078},
'2020-12-23': {'B': 0.367, 'C': 0.489, 'H': 0.227731},
'2020-12-24': {'A': 0.363, 'C': 0.095, 'Z': 0.3735},
'2020-12-25': {'B': 0.484, 'C': 0.8366},
'2020-12-26': {'Y': 5.366}}

Count value in dictionary relative to another value

I have a python dictionary like the one below:
{'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
....
}
I want to count the number of 'NaN' in each value, and return that count with the number of 'NaN' that have the value 'E': True.
So I would like to create a dictionary like this:
{'A': {'NaN': 0, 'E': 0},
'B': {'NaN': 1, 'E': 1},
'C': {'NaN': 2, 'E': 1},
'D': {'NaN': 2, 'E': 2}}
I have this code that returns a dictionary with the count of NaN
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]=0
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]+=1
print NaN
How can I add the count of E:True to it?
Ok, why don't you try this:
dict = {'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
}
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if i != 'E':
NaNs[i]={'NaN': 0, 'E': 0}
for k,v in dict.iteritems():
for i in v:
if str(v[i]).lower() == 'nan':
NaNs[i]['NaN']+=1
if v['E'] == True:
NaNs[i]['E']+=1
print NaNs
I shouldn't really be going around calling variables dict and NaNs, but I tried to change your code as little as possible.

Categories