Find the smallest three values in a nested dictionary - python

I have a nested dictionary ( i.e. sample_dict), where for each day, we need to find the smallest three values (in ascending manner), after which the result has to be stored in a new dictionary.
The sample_dict is as follows:
sample_dict ={ '2020-12-22': {'A': 0.0650,'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
The final dictionary (i.e. result) after selecting the smallest three for each date would look like:
Can someone suggest a solution using for loops.

Let's use heapq.nsmallest inside a dictionary comprehension to select the smallest 3 items per subdict:
from operator import itemgetter
import heapq
for k, v in sample_dict.items():
# Look ma, no `sorted`
sample_dict[k] = dict(heapq.nsmallest(3, v.items(), key=itemgetter(1)))
print (sample_dict)
# {'2020-12-22': {'A': 0.065, 'C': 0.078, 'B': 0.292},
# '2020-12-23': {'H': 0.227731, 'B': 0.367, 'C': 0.489},
# '2020-12-24': {'C': 0.095, 'A': 0.363, 'Z': 0.3735},
# '2020-12-25': {'B': 0.484, 'C': 0.8366},
# '2020-12-26': {'Y': 5.366}}
This is pretty fast because it does not need to sort the array, and updates sample_dict in-place.

Try using this dictionary comprehension:
print({k: dict(sorted(sorted(v.items(), key=lambda x: x[1]), key=lambda x: x[0])[:3]) for k, v in sample_dict.items()})
Output:
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078}, '2020-12-23': {'B': 0.367, 'C': 0.489, 'G': 1.34235}, '2020-12-24': {'A': 0.363, 'B': 0.396, 'C': 0.095}, '2020-12-25': {'B': 0.484, 'C': 0.8366}, '2020-12-26': {'Y': 5.366}}

This should work for your purposes.
sample_dict = {'2020-12-22': {'A': 0.0650, 'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
results_dict = {day[0]:{sample[0]:sample[1] for sample in sorted(day[1].items(), key=lambda e: e[1])[:3]} for day in sample_dict.items()}
# Output
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078},
'2020-12-23': {'B': 0.367, 'C': 0.489, 'H': 0.227731},
'2020-12-24': {'A': 0.363, 'C': 0.095, 'Z': 0.3735},
'2020-12-25': {'B': 0.484, 'C': 0.8366},
'2020-12-26': {'Y': 5.366}}

Related

Cartesian product of two dict in two lists in Python

Here is my code.
>>> a = [{'a': 1}, {'b': 2}]
>>> b = [{'c': 3}, {'d': 4}]
I want to show:
[{'a':1, 'c':3}, {'b':2, 'c':3}, {'a':1, 'd':4}, {'b':2, 'd':4}]
Is there a way I can do it only with list/dict comprehension?
A one line, no import solution can consist of a lambda function:
f = lambda d, c:[c] if not d else [i for k in d[0] for i in f(d[1:], {**c, **k})]
a = [{'a': 1}, {'b': 2}]
b = [{'c': 3}, {'d': 4}]
print(f([a, b], {}))
Output:
[{'a': 1, 'c': 3}, {'a': 1, 'd': 4}, {'b': 2, 'c': 3}, {'b': 2, 'd': 4}]
However, a much cleaner solution can include itertools.product:
from itertools import product
result = [{**j, **k} for j, k in product(a, b)]
Output:
[{'a': 1, 'c': 3}, {'a': 1, 'd': 4}, {'b': 2, 'c': 3}, {'b': 2, 'd': 4}]
You can try this.
a = [{'a': 1}, {'b': 2}]
b = [{'c': 3}, {'d': 4}]
d = [ {**i, **j} for i in a for j in b ]
print(d)

Is there any way to sort this dictionaries by lowest value from keys?

I just wanna sort these dictionaries with some values from an input file.
def sortdicts():
listofs=[]
listofs=splitndict()
print sorted(listofs)
The splitndict() function has this output:
[{'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}, {'y': 5, 'x': 0}]
While the input is from another file and it's:
a 1
b 2
c 2
d 4
a 7
c 3
x 0
y 5
I used this to split the dictionary:
def splitndict():
listofd=[]
variablesRead=readfromfile()
splitted=[i.split() for i in variablesRead]
d={}
for lines in splitted:
if lines:
d[lines[0]]=int(lines[1])
elif d=={}:
pass
else:
listofd.append(d)
d={}
print listofd
return listofd
The output file should look like this:
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}
This output because :
It needs to be sorted by the lowest value from each dictionary key.
array = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
for the above array:
array = sorted(array, lambda element: min(element.values()))
where "element.values()" returns all values from dictionary and "min" returns the minimum of those values.
"sorted" passes each dictionary (an element) inside the lambda function one by one. and sorts on the basis of the result from the lambda function.
x = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
sorted(x, key=lambda i: min(i.values()))
Output is
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]

Summarizing a dictionary into another one

I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?
The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)
Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}
You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}

creating undirected graph from directed graph

I'm just too confused, and can't come up with proper way to do this:
I have this directed graph:
and have two dictionaries, which show outgoing and incoming scores
graph_to = {'a':{'b':2,'c':3},'b':{'a':1,'d':4}}
graph_from = {'a':{'b':1},'b':{'a':2},'c':{'a':3},'d':{'b':4}}
For example, in graph_to, node a goes to node b with score 2 and to node c with score 3; and in graph_from node a receives score 1 from node b.
I want to create undirected graph such that scores between two nodes are summed up. It should become this dictionary:
graph = {
'a':{'b':3,'c':3},
'b':{'a':3,'d':4},
'c':{'a':3},
'd':{'b':4}
}
You could try to make a collections.defaultdict() of collections.Counter() objects, and sum the edge counts as you iterate both graph dicts:
from collections import defaultdict
from collections import Counter
from pprint import pprint
graph_to = {'a':{'b':2,'c':3},'b':{'a':1,'d':4}}
graph_from = {'a':{'b':1},'b':{'a':2},'c':{'a':3},'d':{'b':4}}
undirected_graph = defaultdict(Counter)
def sum_edges(graph, result):
for node, edges in graph.items():
for edge in edges:
result[node][edge] += edges[edge]
sum_edges(graph_to, undirected_graph)
sum_edges(graph_from, undirected_graph)
pprint(undirected_graph)
Which gives:
defaultdict(<class 'collections.Counter'>,
{'a': Counter({'b': 3, 'c': 3}),
'b': Counter({'d': 4, 'a': 3}),
'c': Counter({'a': 3}),
'd': Counter({'b': 4})})
Note: Counter and defaultdict are subclasses of dict, so you can treat them the same as normal dictionaries.
If you really want normal dictionaries in the final undirected graph, you can use either of these dict comprehensions:
dict((k, dict(v)) for k, v in undirected_graph.items())
# {'a': {'b': 3, 'c': 3}, 'b': {'a': 3, 'd': 4}, 'c': {'a': 3}, 'd': {'b': 4}}
{k: dict(v) for k, v in undirected_graph.items()}
# {'a': {'b': 3, 'c': 3}, 'b': {'a': 3, 'd': 4}, 'c': {'a': 3}, 'd': {'b': 4}}
Additionally, you can also use dict.update() here to refactor sum_edges():
def sum_edges(graph, result):
for node, edges in graph.items():
result[node].update(edges)
I hope we appreciate taking things in own hand, here's with simple logic
out_dict = {}
for key in graph_to :
for sub_key in graph_to[key]:
if key in graph_from and sub_key in graph_from[key]:
out_dict[key] = {sub_key: graph_to[key][sub_key] + graph_from[key][sub_key]}
else:
out_dict[key].update({sub_key: graph_to[key][sub_key]})
graph_from.update(out_dict)
print(graph_from)
Output:
{'a': {'b': 3, 'c': 3}, 'b': {'a': 3, 'd': 4}, 'c': {'a': 3}, 'd': {'b': 4}}

Filter Dictionary keys of multilevel dictionary

I have the following dict structure:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 2, 'b': 7}, {'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
I want to be able to filter based on the keys of 'a' or 'b'
for example if 'a' is 1 the my filtered dict would look like:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
I have the following for loop which gets me down to where I have the inner dict's I want, but I am not sure how to put it back into a dict of the same structure.
d = {12345: {2006: [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 2, 'b': 7}, {'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
d_filter = {}
for item_code in d.keys():
for year in d[item_code]:
for item_dict in d[item_code][year]:
if item_dict['a'] == 1:
print(item_dict) # how to put this back in d_filter?
producing:
{'a': 1, 'b': 2}
{'a': 1, 'b': 5}
{'a': 1, 'b': 9}
{'a': 1, 'b': 12}
I am guessing there is a better way to filter that I can not find, or something with dictionary comprehension that my small mind can not grasp.
Any help would be appreciated.
Here's a dictionary comprehension that does just that; dct is your initial dictionary:
d = {k: {ky: [d for d in vl if d['a']==1] for ky, vl in v.items()}
for k, v in dct.items()}
print d
# {12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]}, 12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}
You can change the inner filter (i.e. d['a']==1) to the dict key and/or value of your choice.
You could do something like this:
filtered = {
item_code: {
year: [item for item in items if item['a'] == 1]
for year, items in years.items()
}
for item_code, years in d.items()
}
Which results in:
{12345: {2006: [{'a': 1, 'b': 2}, {'a': 1, 'b': 5}]},
12346: {2007: [{'a': 1, 'b': 9}, {'a': 1, 'b': 12}]}}

Categories