Related
I have a variable that looks like this, it contains multiple lists and each list has multiple dictionaries. what i need to do now is:
combine the lists into 1 big list
if 2 dictionaries have the same key i need to combine them(keep 1 of the keys and add their values)
i know i need to use a for loop but how do i reference dictionaries inside a list and how do i refernce the lists stored in the variable?
i tried doing something like this:
for list in bigram_lists:
for list1 in bigram_lists:
list.append(list1)
it gives back the error that dict object has no attribute append
help would be appreciated
import ast
x = "[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e': 3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j': 59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o': 1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t': 2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393}, {'z': 23}] [{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429}, {'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42}, {'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365}, {'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708}, {'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706}, {'z': 85}] [{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904}, {'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138}, {'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675}, {'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438}, {'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718}, {'z': 53}]"
strs = x.replace(']','],')[:-1]
strs = "[" + strs + "]"
listOfLists = ast.literal_eval(strs)
finalDict = {}
for ls in listOfLists:
for dct in ls:
if (list(dct.keys())[0]) in finalDict:
finalDict[list(dct.keys())[0]] += dct[list(dct.keys())[0]]
else:
finalDict[list(dct.keys())[0]] = dct[list(dct.keys())[0]]
print(finalDict)
gives you
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}
Working with x as a list of lists, I created a dictionary with multiple keys, that you can split if you want later, but each key has the addition of the same key in each list :
result = {}
for sublist in x:
for elem in sublist:
for key, value in elem.items():
if key not in result:
result[key] = value
else:
result[key] += value
>>> print(result)
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}
Having corrected the x input as a list of lists:
x = [[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e':
3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j':
59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o':
1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t':
2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393},
{'z': 23}],
[{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429},
{'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42},
{'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365},
{'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708},
{'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706},
{'z': 85}],
[{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904},
{'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138},
{'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675},
{'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438},
{'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718},
{'z': 53}]]
this:
R=[]
for ld in x:
result = {}
for d in ld:
result.update(d)
R.append(result)
D = dict.fromkeys(R[0].keys(), 0)
for d in R:
for k in R[0].keys():
D[k]+=d[k]
will give you the answer you wanted.
I am new to Python and I have a nested dictionary for which I want to normalize the values of the dictionary. For example:
nested_dictionary={'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
And I would like to get the normalization as
Normalized_result={'D': {'D': '0.47', 'B': '0.24', 'C': '0.00', 'A': '0.24', 'K': '0.00', 'J': '0.04'}, 'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
I have seen the example in Normalizing dictionary values which only for one dictionary but I want to go further with nested one.
I have tried to flatten the nested_dictionary and apply the normalization as
import flatdict
d = flatdict.FlatDict(nested_dictionary, delimiter='_')
dd=dict(d)
newDict = dict(zip(dd.keys(), [float(value) for value in dd.values()]))
def normalize(d, target=1.0):
global factor
raw = sum(d.values())
print(raw)
if raw==0:
factor=0
#print('ok')
else:
# print('kok')
factor = target/raw
return {key:value*factor for key,value in d.items()}
normalize(newDict)
And I get the result as
{'D_D': 0.2578125,
'D_B': 0.1328125,
'D_C': 0.0,
'D_A': 0.1328125,
'D_K': 0.0,
'D_J': 0.023437499999999997,
'A_A': 0.39062499999999994,
'A_K': 0.0,
'A_J': 0.06249999999999999}
But what I want is the Normalized_result as above
Thanks in advance.
nested_dictionary = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'},
'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
In this example, your dict values are str type, so we need to convert to float:
nested_dictionary = dict([b, dict([a, float(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': 0.33, 'B': 0.17, 'C': 0.0, 'A': 0.17, 'K': 0.0, 'J': 0.03},
'A': {'A': 0.5, 'K': 0.0, 'J': 0.08}}
The function below is adapted from the link you provided.
It loops through the dictionaries, calculates the factor and updates the values inplace.
for _, d in nested_dictionary.items():
factor = 1.0/sum(d.values())
for k in d:
d[k] = d[k] * factor
nested_dictionary
{'D': {'D': 0.47142857142857136,
'B': 0.24285714285714285,
'C': 0.0,
'A': 0.24285714285714285,
'K': 0.0,
'J': 0.04285714285714285},
'A': {'A': 0.8620689655172414, 'K': 0.0, 'J': 0.13793103448275865}}
If you need to convert back to str, use the function below:
nested_dictionary = dict([b, dict([a, "{:.2f}".format(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': '0.47',
'B': '0.24',
'C': '0.00',
'A': '0.24',
'K': '0.00',
'J': '0.04'},
'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
This code would do:
def normalize(d, target=1.0):
raw = sum(float(number) for number in d.values())
factor = (target/raw if raw else 0)
return {key: f'{float(value)*factor:.2f}' for key, value in d.items()}
{key: normalize(dct) for key, dct in nested_dictionary.items()}
Turn the string-values in your inner dicts into floats.
Take one of the solutions from the the duplicate, for example really_safe_normalise_in_place.
Use the solution on each dict.
Example:
d = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
d = {k: {kk: float(vv) for kk, vv in v.items()} for k, v in d.items()}
for v in d.values():
really_safe_normalise_in_place(v)
I have a nested dictionary ( i.e. sample_dict), where for each day, we need to find the smallest three values (in ascending manner), after which the result has to be stored in a new dictionary.
The sample_dict is as follows:
sample_dict ={ '2020-12-22': {'A': 0.0650,'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
The final dictionary (i.e. result) after selecting the smallest three for each date would look like:
Can someone suggest a solution using for loops.
Let's use heapq.nsmallest inside a dictionary comprehension to select the smallest 3 items per subdict:
from operator import itemgetter
import heapq
for k, v in sample_dict.items():
# Look ma, no `sorted`
sample_dict[k] = dict(heapq.nsmallest(3, v.items(), key=itemgetter(1)))
print (sample_dict)
# {'2020-12-22': {'A': 0.065, 'C': 0.078, 'B': 0.292},
# '2020-12-23': {'H': 0.227731, 'B': 0.367, 'C': 0.489},
# '2020-12-24': {'C': 0.095, 'A': 0.363, 'Z': 0.3735},
# '2020-12-25': {'B': 0.484, 'C': 0.8366},
# '2020-12-26': {'Y': 5.366}}
This is pretty fast because it does not need to sort the array, and updates sample_dict in-place.
Try using this dictionary comprehension:
print({k: dict(sorted(sorted(v.items(), key=lambda x: x[1]), key=lambda x: x[0])[:3]) for k, v in sample_dict.items()})
Output:
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078}, '2020-12-23': {'B': 0.367, 'C': 0.489, 'G': 1.34235}, '2020-12-24': {'A': 0.363, 'B': 0.396, 'C': 0.095}, '2020-12-25': {'B': 0.484, 'C': 0.8366}, '2020-12-26': {'Y': 5.366}}
This should work for your purposes.
sample_dict = {'2020-12-22': {'A': 0.0650, 'B': 0.2920, 'C': 0.0780, 'D': 1.28008, 'G': 3.122},
'2020-12-23': {'B': 0.3670, 'C': 0.4890, 'G':1.34235, 'H': 0.227731},
'2020-12-24': {'A': 0.3630, 'B': 0.3960, 'C': 0.0950, 'Z':0.3735},
'2020-12-25': {'C': 0.8366, 'B': 0.4840},
'2020-12-26': {'Y': 5.366}}
results_dict = {day[0]:{sample[0]:sample[1] for sample in sorted(day[1].items(), key=lambda e: e[1])[:3]} for day in sample_dict.items()}
# Output
{'2020-12-22': {'A': 0.065, 'B': 0.292, 'C': 0.078},
'2020-12-23': {'B': 0.367, 'C': 0.489, 'H': 0.227731},
'2020-12-24': {'A': 0.363, 'C': 0.095, 'Z': 0.3735},
'2020-12-25': {'B': 0.484, 'C': 0.8366},
'2020-12-26': {'Y': 5.366}}
I have a dictionary of dictionaries in python like this example:
small example:
d = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613},
2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590},
3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}}
every sub-dictionary has items in which keys are one of these letters: A, C, T or G. and the values are absolute numbers. for every item I want to get the percentage of every letter based on its value. and at the end I want to make a new dictionary like the input example in which instead of absolute value there would be percentage. the expected output for the small example would be like this:
result = {1: {'A': 30.34, 'C': 22.16, 'T': 30, 'G': 17.5},
2: {'A': 30.78, 'C': 24.76, 'T': 27.06, 'G': 17.4},
3: {'A': 7.78, 'C': 68.15, 'T': 18.4, 'G': 5.67}}
I am trying to do that in python using the following code:
values = dict.values()
freq = {}
for i in d.keys()
freq[i] = d.values(i)/d.values
but it does not return what i expect. do you know how to fix it?
The pandas solution
import pandas as pd
df = pd.DataFrame(d)
result = (100*(df/df.sum())).round(2).to_dict()
gives you
>>> print(result)
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
(You can omit round(2) if you wish to perform no rounding.)
Try building a collections.defaultdict() and adding the percentages as you iterate the original dictionary:
from collections import defaultdict
from pprint import pprint
d = {
1: {"A": 11472, "C": 8405, "T": 11428, "G": 6613},
2: {"A": 11678, "C": 9388, "T": 10262, "G": 6590},
3: {"A": 2945, "C": 25843, "T": 6980, "G": 2150},
}
percentages = defaultdict(dict)
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages[k1][k2] = round(v2 / total * 100, 2)
pprint(percentages)
Which gives:
defaultdict(<class 'dict'>,
{1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}})
Note: defaultdict() is a subclass of dict, so you can treat it the same as a normal dictionary. If you really want to, you can wrap dict(percentages) to convert it to a regular dictionary.
Another way, slightly slower, is to use dict.setdefault():
percentages = {}
for k1, v1 in d.items():
total = sum(v1.values())
for k2, v2 in v1.items():
percentages.setdefault(k1, {})[k2] = round(v2 / total * 100, 2)
pprint(percentages)
# {1: {'A': 30.25, 'C': 22.17, 'G': 17.44, 'T': 30.14},
# 2: {'A': 30.8, 'C': 24.76, 'G': 17.38, 'T': 27.06},
# 3: {'A': 7.77, 'C': 68.15, 'G': 5.67, 'T': 18.41}}
You are going to need to nest in some way to go through your dictionary. Here's with dictionary comprehension:
totals = {sub: sum(d[sub].values()) for sub in d}
result = {sub: {base: d[sub][base] / totals[sub] * 100 for base in d[sub]} for sub in d}
with output:
{
1: {'A': 30.254760272166255, 'C': 22.166253494382616, 'T': 30.13872039664539, 'G': 17.44026583680574},
2: {'A': 30.79803787119574, 'C': 24.758689804314574, 'T': 27.063663695342584, 'G': 17.379608629147107},
3: {'A': 7.76675985020307, 'C': 68.15496597921832, 'T': 18.408143889445647, 'G': 5.6701302811329715}
}
You could use a nested dictionary comprehension:
{ k: { kk: round(100*vv/sum(v.values()),2) for kk, vv in v.items() } for k, v in d.items() }
#=> {1: {'A': 30.25, 'C': 22.17, 'T': 30.14, 'G': 17.44}, 2: {'A': 30.8, 'C': 24.76, 'T': 27.06, 'G': 17.38}, 3: {'A': 7.77, 'C': 68.15, 'T': 18.41, 'G': 5.67}}
I have a python dictionary like the one below:
{'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
....
}
I want to count the number of 'NaN' in each value, and return that count with the number of 'NaN' that have the value 'E': True.
So I would like to create a dictionary like this:
{'A': {'NaN': 0, 'E': 0},
'B': {'NaN': 1, 'E': 1},
'C': {'NaN': 2, 'E': 1},
'D': {'NaN': 2, 'E': 2}}
I have this code that returns a dictionary with the count of NaN
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]=0
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]+=1
print NaN
How can I add the count of E:True to it?
Ok, why don't you try this:
dict = {'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
}
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if i != 'E':
NaNs[i]={'NaN': 0, 'E': 0}
for k,v in dict.iteritems():
for i in v:
if str(v[i]).lower() == 'nan':
NaNs[i]['NaN']+=1
if v['E'] == True:
NaNs[i]['E']+=1
print NaNs
I shouldn't really be going around calling variables dict and NaNs, but I tried to change your code as little as possible.