Count value in dictionary relative to another value - python

I have a python dictionary like the one below:
{'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
....
}
I want to count the number of 'NaN' in each value, and return that count with the number of 'NaN' that have the value 'E': True.
So I would like to create a dictionary like this:
{'A': {'NaN': 0, 'E': 0},
'B': {'NaN': 1, 'E': 1},
'C': {'NaN': 2, 'E': 1},
'D': {'NaN': 2, 'E': 2}}
I have this code that returns a dictionary with the count of NaN
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]=0
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]+=1
print NaN
How can I add the count of E:True to it?

Ok, why don't you try this:
dict = {'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
}
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if i != 'E':
NaNs[i]={'NaN': 0, 'E': 0}
for k,v in dict.iteritems():
for i in v:
if str(v[i]).lower() == 'nan':
NaNs[i]['NaN']+=1
if v['E'] == True:
NaNs[i]['E']+=1
print NaNs
I shouldn't really be going around calling variables dict and NaNs, but I tried to change your code as little as possible.

Related

I am getting different value when printing and appending same variable to a list, Why is that?

The first code gives me the output I want but I want the dct to append to a list so I can use the values later. When I try to do that it gives me a different output. Why?
lst = [{'a' : 1, 'b' : 2, 'c': 3 },{'e' : 1, 'f' : 2, 'g': 3}]
e = 0
while e < len(lst):
for k in lst[e]:
dct = {}
x = lst[e][k]
for key, value in lst[e].items():
lst[e][key] = (value - x)
dct[k] = (lst[e])
print(dct)
e += 1
output(lst) = {'a': {'a': 0, 'b': 1, 'c': 2}}
{'b': {'a': -1, 'b': 0, 'c': 1}}
{'c': {'a': -2, 'b': -1, 'c': 0}}
{'e': {'e': 0, 'f': 1, 'g': 2}}
{'f': {'e': -1, 'f': 0, 'g': 1}}
{'g': {'e': -2, 'f': -1, 'g': 0}}
So the following is what I tried to do to save it in a list
e = 0
lst2 = []
while e < len(lst):
for k in lst[e]:
dct = {}
x = lst[e][k]
for key, value in lst[e].items():
lst[e][key] = (value - x)
dct[k] = (lst[e])
lst2.append(dct)
e += 1
print(lst2)
But the output when I print that list gives me the same value for every key in the different dictionaries.
Output(lst2)= [{'a': {'a': -2, 'b': -1, 'c': 0}},
{'b': {'a': -2, 'b': -1, 'c': 0}},
{'c': {'a': -2, 'b': -1, 'c': 0}},
{'e': {'e': -2, 'f': -1, 'g': 0}},
{'f': {'e': -2, 'f': -1, 'g': 0}},
{'g': {'e': -2, 'f': -1, 'g': 0}}]
If you want to use your existing code, change
lst2.append(dct)
to
lst2.append(dct.copy())
(and to understand why, read up on lists, references, and mutability.)
Or, if you want to rewrite your code, you might use
list_ = [{'a' : 1, 'b' : 2, 'c': 3 },{'e' : 1, 'f' : 2, 'g': 3}]
result = {}
for d in list_:
for key, value in d.items():
result[key] = {k:d[k]-value for k in d}
which gives
>>> print(result)
{'a': {'a': 0, 'b': 1, 'c': 2},
'b': {'a': -1, 'b': 0, 'c': 1},
'c': {'a': -2, 'b': -1, 'c': 0},
'e': {'e': 0, 'f': 1, 'g': 2},
'f': {'e': -1, 'f': 0, 'g': 1},
'g': {'e': -2, 'f': -1, 'g': 0},
}
(and if you're a fan of code-golf, here's a one-liner:)
result = {key: {k:d[k]-value for k in d} for d in list_ for key,value in d.items()}

Find the smallest three values in a nested dictionary

I have a nested dictionary ( i.e. sample_dict), where for each day, we need to find the smallest three values (in ascending manner), after which the result has to be stored in a new dictionary.
The sample_dict is as follows:
sample_dict ={ '2020-12-22': {'A': 0.0650,'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
The final dictionary (i.e. result) after selecting the smallest three for each date would look like:
Can someone suggest a solution using for loops.
Let's use heapq.nsmallest inside a dictionary comprehension to select the smallest 3 items per subdict:
from operator import itemgetter
import heapq
for k, v in sample_dict.items():
# Look ma, no `sorted`
sample_dict[k] = dict(heapq.nsmallest(3, v.items(), key=itemgetter(1)))
print (sample_dict)
# {'2020-12-22': {'A': 0.065, 'C': 0.078, 'B': 0.292},
# '2020-12-23': {'H': 0.227731, 'B': 0.367, 'C': 0.489},
# '2020-12-24': {'C': 0.095, 'A': 0.363, 'Z': 0.3735},
# '2020-12-25': {'B': 0.484, 'C': 0.8366},
# '2020-12-26': {'Y': 5.366}}
This is pretty fast because it does not need to sort the array, and updates sample_dict in-place.
Try using this dictionary comprehension:
print({k: dict(sorted(sorted(v.items(), key=lambda x: x[1]), key=lambda x: x[0])[:3]) for k, v in sample_dict.items()})
Output:
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078}, '2020-12-23': {'B': 0.367, 'C': 0.489, 'G': 1.34235}, '2020-12-24': {'A': 0.363, 'B': 0.396, 'C': 0.095}, '2020-12-25': {'B': 0.484, 'C': 0.8366}, '2020-12-26': {'Y': 5.366}}
This should work for your purposes.
sample_dict = {'2020-12-22': {'A': 0.0650, 'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
results_dict = {day[0]:{sample[0]:sample[1] for sample in sorted(day[1].items(), key=lambda e: e[1])[:3]} for day in sample_dict.items()}
# Output
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078},
'2020-12-23': {'B': 0.367, 'C': 0.489, 'H': 0.227731},
'2020-12-24': {'A': 0.363, 'C': 0.095, 'Z': 0.3735},
'2020-12-25': {'B': 0.484, 'C': 0.8366},
'2020-12-26': {'Y': 5.366}}

Is there any way to sort this dictionaries by lowest value from keys?

I just wanna sort these dictionaries with some values from an input file.
def sortdicts():
listofs=[]
listofs=splitndict()
print sorted(listofs)
The splitndict() function has this output:
[{'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}, {'y': 5, 'x': 0}]
While the input is from another file and it's:
a 1
b 2
c 2
d 4
a 7
c 3
x 0
y 5
I used this to split the dictionary:
def splitndict():
listofd=[]
variablesRead=readfromfile()
splitted=[i.split() for i in variablesRead]
d={}
for lines in splitted:
if lines:
d[lines[0]]=int(lines[1])
elif d=={}:
pass
else:
listofd.append(d)
d={}
print listofd
return listofd
The output file should look like this:
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}
This output because :
It needs to be sorted by the lowest value from each dictionary key.
array = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
for the above array:
array = sorted(array, lambda element: min(element.values()))
where "element.values()" returns all values from dictionary and "min" returns the minimum of those values.
"sorted" passes each dictionary (an element) inside the lambda function one by one. and sorts on the basis of the result from the lambda function.
x = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
sorted(x, key=lambda i: min(i.values()))
Output is
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]

How can I return the values with the same length in a dictionary?

Currently, I have the following dictionary, with a key, and values of type dictionary as well:
db = {
'john' : {'a': 13, 'b': 64, 'c': 43},
'eric' : {'a': 63, 'b': 12},
'kek' : {'a': 43, 'b': 37, 'c': 83, 'd': 87},
'rick' : {'a': 77, 'b': 66, 'c': 44},
'alex' : {'a': 44, 'b': 99, 'c': 22}
}
How can I return a set of all the keys with the same number of items in the value part? In the dictionary above, the keys: john, rick, alex all have the same amount of keys. So the output would include these.
Expected Output:
same = {john, rick, alex}
Here is my code so far, i'm not sure how to store the current length:
db = {
'john': {'a': 13, 'b': 64, 'c': 43},
'eric': {'a': 63, 'b': 12},
'kek': {'a': 43, 'b': 37, 'c': 83, 'd': 87},
'rick': {'a': 77, 'b': 66, 'c': 44},
'alex': {'a': 44, 'b': 99, 'c': 22}
}
def maximum(db):
same = {}
for key, value in db.items():
for values in value:
if len(values) == 'something':
pass
maximum(db)
Try this, You can find your desires in the result:
from itertools import groupby
result = []
for i,j in groupby(sorted(db.items(), key = lambda x:len(x[1])), lambda x:len(x[1])):
result.append({i:[k[0] for k in j]})
then the result will be:
[{2: ['eric']}, {3: ['john', 'rick', 'alex']}, {4: ['kek']}]
keys are the len of items and values are those which has that length.
I have your output when I enter the following code:
def maximum(db):
for key, value in db.items():
if len(value) == 3:
print(key)
maximum(db)
Instead of print() enter whatever suits your needs of course.
Since OP says thats he requires the keys which have values of the same length, so that length must be the most frequent occuring length (ofcourse this is an assumption, but OP has not clarified that. There could be, for instance two sets ot values, having length either 2 or 3, so we should output keys corresponding to which value lengths? 2 or 3?)
db = {'john': {'a': 13, 'b': 64, 'c': 43},
'eric': {'a': 63, 'b': 12},
'kek': {'a': 43, 'b': 37, 'c': 83, 'd': 87},
'rick': {'a': 77, 'b': 66, 'c': 44},
'alex': {'a': 44, 'b': 99, 'c': 22} }
from collections import Counter
elem_len = [len(r) for k,r in db.items()]
c = Counter(elem_len)
most_common_length = c.most_common(1)[0][0]
print(most_common_length)
3
elements = [k for k,r in db.items() if len(r)==most_common_length]
print(elements)
['john', 'rick', 'alex']
Try this:
db = {51: {'a': 13, 'b': 64, 'c': 43}, 64: {'a': 63, 'b': 12}, 87: {'a': 43, 'b': 37, 'c': 83, 'd': 87}, 91: {'a': 77, 'b': 66, 'c': 44}, 99: {'a': 44, 'b': 99, 'c': 22} }
def maximum(db):
count_map = {}
for key, value in db.items():
if len(value.keys()) in count_map.keys():
count_map[len(value.keys())].append(key)
continue
count_map[len(value.keys())] = [key,]
countwise_list = list(count_map.values())
countwise_list.sort(key=(lambda x: len(x)), reverse=True)
return countwise_list[0]

Summarizing a dictionary into another one

I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?
The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)
Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}
You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}

Categories