finding probability of values in dictionary - python

I have a default dict which looks like this:
my_dict = default(dict, {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}})
The keys are phonemes, and values that are dictionaries themselves are graphemes which occur a certain amount of times, which are the respective numbers in the default dict.
The function should return another default dict containing the probabilities, which will look like this:
defaultdict(<class 'dict'>, {'EH': {'e': 1.0}, 'K': {'k': 0.6666666666666666, 'x': 0.3333333333333333}, 'S': {'_': 0.5, 's': 0.5}})
'e' remains the same, as 1.0/1 = 1.0. 'K' has values of 0.66666 and 0.33333 because 2/3 = 0.66666 and 1/3 = 0.3333333. 'S' has values of 0.5 and 0.5, because 1/2=0.5 for each of them. The probabilities in the return dict must always sum to one.
so far I have this:
from collections import defaultdict
my_dict = default(dict, {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}})
def dict_probability(my_dict):
return_dict = defaultdict(dict)
for char in my_dict.values():

For each of your subdictionnaries, you would like to divide each value by the sum of the subdictionnary values:
my_dict = {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}}
{k:{k1:v1/sum(v.values()) for k1,v1 in v.iteritems()} for k,v in my_dict.iteritems()}
{'EH': {'e': 1.0},
'K': {'k': 0.6666666666666666, 'x': 0.3333333333333333},
'S': {'_': 0.5, 's': 0.5}}

example_dict = {"A": 1, "B": 2, "C": 3}
prob_dict = {}
for k, v in test_dict.items():
prob_dict[k] = v / sum(example_dict.values())
print(prob_dict)
{'A': 0.16666666666666666, 'B': 0.3333333333333333, 'C': 0.5}

Related

Python functions for nested dictionary

Given a dictionary :
dic = {
2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}
I want the output as a dictionary with key as 'a', 'p', ... and their values as the addition of their values in a nested dictionary
Expected output:
{'p': 0.025 , 'a' 0.09811 ....}
Try the following:
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311}, 7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},8:{'r':0.1, 's':0.2}}
res = {}
for d in dic.values():
for k, v in d.items():
res[k] = res.get(k, 0.0) + v
print(res) # {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}
In particular, dict.get(key, value) returns dict[key] if the latter is present, and value otherwise.
See Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
Use collections.Counter
from collections import Counter
dic = {2: {'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09311},
7: {'r': 0.186, 's': 0.148, 'd': 0.145, 'a': 0.005},
8: {'r': 0.1, 's': 0.2}}
res = dict(sum([Counter(d) for d in dic.values()], Counter()))
print(res)
output:
{'p': 0.225, 'i': 0.159, 'e': 0.116, 'c': 0.098, 'a': 0.09811, 'r': 0.28600000000000003, 's': 0.348, 'd': 0.145}

Appending (not replacing) items in a nested dictionary via a list of keys

This is a follow up to following question on SO:
Access nested dictionary items via a list of keys?
All the solutions given in the above link allows to replace/create a value of the nested dict via a list of keys. But my requirement is to append (update) another dict to value.
If my dict is as follows:
dataDict = {
"a":{
"r": 1,
"s": 2,
"t": 3
},
"b":{
"u": 1,
"v": {
"x": 1,
"y": 2,
"z": 3
},
"w": 3
}
}
And my list of keys is as follows:
maplist = ["b", "v"]
How can I append another dict, say "somevalue": {} to the dataDict['b']['v'] using my maplist?
So, after appending new dataDict would look as follows:
dataDict = {
"a":{
"r": 1,
"s": 2,
"t": 3
},
"b":{
"u": 1,
"v": {
"x": 1,
"y": 2,
"z": 3,
"somevalue": {}
},
"w": 3
}
}
The solutions on the aforementioned SO link changes/creates the value like this:
dataDict[mapList[-1]] = value
But we can't call the dict's update function like dataDict[mapList[-1]].update().
Checked all of SO and Google, but no luck. Could someone please provide some help with this. Thanks!
This should work for you:
def set_item(this_dict, maplist, key, value):
for k in maplist:
if k not in this_dict:
this_dict[k] = {}
this_dict = this_dict[k]
this_dict[key] = value
dataDict = {
"a":{
"r": 1,
"s": 2,
"t": 3
},
"b":{
"u": 1,
"v": {
"x": 1,
"y": 2,
"z": 3
},
"w": 3
}
}
maplist = ["b", "v"]
new_key = "somevalue"
new_value = {}
set_item(dataDict, maplist, new_key, new_value)
print(dataDict)
Output:
{'a': {'r': 1, 's': 2, 't': 3}, 'b': {'u': 1, 'v': {'x': 1, 'y': 2, 'z': 3, 'somevalue': {}}, 'w': 3}}
You can use setdefault() method.
>>>dataDict['b']['v'].setdefault('something',{})
>>>print(dataDict)
{'a': {'r': 1, 's': 2, 't': 3},
'b': {'u': 1, 'v': {'x': 1, 'y': 2, 'z': 3, 'something': {}},
'w': 3}}
The above is the idea of how it works. Make a function for the above.
def set_new_key(d:dict,x:list,y:str):
for key in x:
d=d[key]
d.setdefault(y,{})
dataDict = {
"a":{
"r": 1,
"s": 2,
"t": 3
},
"b":{
"u": 1,
"v": {
"x": 1,
"y": 2,
"z": 3
},
"w": 3
}
}
mapList=['b','v']
set_new_key(dataDict,mapList,'something')
print(dataDict)
print()
set_new_key(dataDict,['a'],'another')
print(dataDict)
print()
set_new_key(dataDict,['b','v','something'],'inside something')
print(dataDict)
output:
{'a': {'r': 1, 's': 2, 't': 3}, 'b': {'u': 1, 'v': {'x': 1, 'y': 2, 'z': 3, 'something': {}}, 'w': 3}}
{'a': {'r': 1, 's': 2, 't': 3, 'another': {}}, 'b': {'u': 1, 'v': {'x': 1, 'y': 2, 'z': 3, 'something': {}}, 'w': 3}}
{'a': {'r': 1, 's': 2, 't': 3, 'another': {}}, 'b': {'u': 1, 'v': {'x': 1, 'y': 2, 'z': 3, 'something': {'inside something': {}}}, 'w': 3}}

Summarizing a dictionary into another one

I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?
The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)
Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}
You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}

Nested dictionary with lists to many dictionaries

I have nested dictionary with lists like this
{
'a': 1,
'x':[
{'b': 1,
'c': [
{'z': 12},
{'z': 22},
]
},
{'b': 2,
'c': [
{'z': 10},
{'z': 33},
]
}
]
}
And I want to convert it to list of flat dictionaries i form like this.
[
{'a': 1, 'b': 1, 'z': 12},
{'a': 1, 'b': 1, 'z': 22},
{'a': 1, 'b': 2, 'z': 10},
{'a': 1, 'b': 2, 'z': 33},
]
Any idea how to achieve that?
The following produces the requested result:
[{'a': 1, 'b': 1, 'z': 12}, {'a': 1, 'b': 2, 'z': 10}]
Use at your own risk. The following was only tested on your example.
from itertools import product
def flatten(D):
if not isinstance(D, dict): return D
base = [(k, v) for k, v in D.items() if not isinstance(v, list)]
lists = [[flatten(x) for x in v] for k, v in D.items() if isinstance(v, list)]
l = []
for p in product(*lists):
r = dict(base)
for a in p:
for d in a:
r.update(d)
l.append(r)
return l
The following tests above.
d = {
'a': 1,
'x':[
{'b': 1,
'c': [
{'z': 12}
]
},
{'b': 2,
'c': [
{'z': 10}
]
}
]
}
print flatten(d)
A possible solution is:
#!/usr/bin/env python3
d = {
'a': 1,
'x': [
{
'b': 1,
'c': [
{'z': 12}
]
},
{
'b': 2,
'c': [
{'z': 10}
]
}
]
}
res = [{"a": 1, "b": x["b"], "z": x["c"][0]["z"]} for x in d["x"]]
print(res)
This assumes that there is only one a (with a fixed value of 1) and x element and this element is added to the comprehension manually.
The other two elements (b and z) are taken from x array with a list comprehension.
To learn more about how comprehensions work read the following:
Python Documentation - 5.1.4. List Comprehensions
Python: List Comprehensions
PS. You are supposed to first show what you have tried so far and get help on that. Take a look at SO rules before posting your next question.

Filter inner keys from 2 level nested dictionaries

I looking the most elegant way to get this:
{'i_1': {'a': 33, 't': 4}, 'i_2': {'a': 9, 't': 0}}
From this:
{'i_1': {'a': 33, 'b': 55, 't': 4}, 'i_2': {'a': 9, 'b': 11, 't': 0}}
Each inner dict can have a lot of a, b, ..., z keys.
for now I have this::
In [3]: {k:dict(a=d[k]['a'], t=d[k]['t']) for k in d.keys()}
Out[3]: {'i_1': {'a': 33, 't': 4}, 'i_2': {'a': 9, 't': 0}}
but it's not very elegant
You can make your code a little bit more readable by using items instead of keys:
{k: dict(a=v['a'], t=v['t']) for k, v in d.items())
Here you go. This functions takes a dict in a format you specified and a list of keys that have to be removed from inner dictionaries:
def remove_inner_keys(data: dict, inner_keys_to_remove: list) -> dict:
result = dict()
for outer_key in data.keys():
partial_result = dict()
for inner_key in data[outer_key]:
if inner_key not in inner_keys_to_remove:
partial_result[inner_key] = data[outer_key][inner_key]
result[outer_key] = partial_result
return result
Testing:
data = { 'i_1': { 'a': 33, 'b': 55, 't': 4 }, 'i_2': { 'a': 9, 'b': 11, 't': 0 } }
print(str(remove_inner_keys(data, ["b"])))
output:
{'i_2': {'a': 9, 't': 0}, 'i_1': {'a': 33, 't': 4}}
import copy
def foo(d):
d_copy = copy.deepcopy(d)
for key in d_copy:
print(key, d[key])
if isinstance(d[key], dict):
foo(d[key])
if key == 'b':
d.pop(key)

Categories