How can I take a list of values (percentages):
example = [(1,100), (1,50), (2,50), (1,100), (3,100), (2,50), (3,50)]
and return a dictionary:
example_dict = {1:250, 2:100, 3:150}
and recalculate by dividing by sum(example_dict.values())/100:
final_dict = {1:50, 2:20, 3:30}
The methods I have tried for mapping the list of values to a dictionary results in values being iterated over rather than summed.
Edit:
Since it was asked here are some attempts (after just writing over old values) that went no where and demonstrate my 'noviceness' with python:
{k: +=v if k==w[x][0] for x in range(0,len(w),1)}
invalid
for i in w[x][0] in range(0,len(w),1):
for item in r:
+=v (don't where I was going on that one)
invalid again.
another similar one that was invalid, nothing on google, then to SO.
You could try something like this:
total = float(sum(v for k,v in example))
example_dict = {}
for k,v in example:
example_dict[k] = example_dict.get(k, 0) + v * 100 / total
See it working online: ideone
Use the Counter class:
from collections import Counter
totals = Counter()
for k, v in example: totals.update({k:v})
total = sum(totals.values())
final_dict = {k: 100 * v // total for k, v in totals.items()}
Related
I am trying to write code to find the mean of the keys in my dict, but based on the dict values. So, for example, for:
d = {1:2, 2:1, 3:2}
the dict keys would be:
[1,1,2,3,3]
I've written the following code, which works for small data sets such as the above:
def get_median_of_dict_keys(d: dict) -> float:
nums_list = []
for k,v in d.items():
if type(v) != int:
raise TypeError
nums_list.extend([k] * v)
median = sum(nums_list) / len(nums_list)
return median
This gets me the values I want when the data set is small, but if the data set is something like:
d = {1:1_000_000_000_000_000, 2:2_000, 3:1_000_000_000_000_000}
I get an out of memory error which, now that I think about it, makes sense.
So how can I structure the above function in a way that will also handle those larger data sets? Thanks for your time.
You do not need to create a list, just keep two running variables, one holding the total sum and the other one holding the number of elements:
def get_mean_of_dict_keys(d: dict) -> float:
total = 0
count = 0
for k, v in d.items():
total += k * v
count += v
mean = total / count
return mean
print(get_mean_of_dict_keys({1: 2, 2: 1, 3: 2}))
Output
2.0
If you want the mean
this is perfectly attainable with larger numbers:
import numpy as np
d = {1:2000000000, 2:1000, 3:2000000000}
print(np.mean([i*d[i] for i in d]))
output
2666667333.3333335
breakdown
[i*d[i] for i in d]
# is equivalent to:
lst = []
for i in d:
lst.append(i*d[i])
What you want to find is weighted average.
Formula:
Where,
X1..n are keys in your dictionary.
W1..n are values in your dictionary.
X̅ is weighted average.
Pure Python approach.
Using itertools.starmap with operator.mul
from itertools import starmap
from operator import mul
d = {1:2, 2:1, 3:2}
sum(starmap(mul, d.items()))/sum(d.values())
# 2.0
If you want to use NumPy
You can use np.average here.
np.average([*d.keys()], weights=[*d.values()])
# 2.0
I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,
I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}
from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation
Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}
I have two dictionaries created this way :
tr = defaultdict(list)
tr = { 'critic' : '2_critic',
'major' : '3_major',
'all' : ['2_critic','3_major']
}
And the second one :
scnd_dict = defaultdict(list)
And contains values like this :
scnd_dict = {'severity': ['all']}
I want to have a third dict that will contain the key of scnd_dict and its corresponding value from tr.
This way, I will have :
third_dict = {'severity' : ['2_critic','3_major']}
I tried this, but it didn't work :
for (k,v) in scnd_dict.iteritems() :
if v in tr:
third_dict[k].append(tr[v])
Any help would be appreciated. Thanks.
Well...
from collections import defaultdict
tr = {'critic' : '2_critic',
'major' : '3_major',
'all' : ['2_critic','3_major']}
scnd_dict = {'severity': ['all']}
third_dict = {}
for k, v in scnd_dict.iteritems():
vals = []
if isinstance(v, list):
for i in v:
vals.append(tr.get(i))
else:
vals.append(tr.get(v))
if not vals:
continue
third_dict[k] = vals
print third_dict
Results:
>>>
{'severity': [['2_critic', '3_major']]}
Will do what you want. But I question the logic of using defaultdicts here, or of have your index part of a list...
If you use non-lists for scnd_dict then you can do the whole thing much easier. Assuming scnd_dict looks like this: scnd_dict = {'severity': 'all'}:
d = dict((k, tr.get(v)) for k, v in scnd_dict.items())
# {'severity': ['2_critic', '3_major']}
Your problem is that v is a list, not an item of a list. So, the if v in tr: will be false. Change your code so that you iterate over the items in v
third_dict = {k: [t for m in ks for t in tr[m]] for k,ks in scnd_dict.iteritems()}
The second dict's value is list, not str, so the code blow will work
for (k, v) in send_dict.iteritems():
if v[0] in tr.keys():
third_dict[k] = tr[v[0]]
The problem is that the third dictionary does not knows that the values is a list
for k in scnd_dict:
for v in scnd_dict[k]:
print v
for k2 in tr:
if v==k2:
if k not in third_dict:
third_dict[k]=tr[k2]
else:
third_dict[k]+=tr[k2]
third_dict = {k: tr[v[0]] for k, v in scnd_dict.iteritems() if v[0] in tr}
This
tr = defaultdict(list)
is a waste of time if you are just rebinding tr on the next line. Likewise for scnd_dict.
It's a better idea to make all the values of tr lists - even if they only have one item. It will mean less special cases to worry about later on.
In Python I currently have a Dictionary with a composite Key. In this dictionary there are multiple occurences of these keys. (The keys are comma-separated):
(A,B), (A,C), (A,B), (A,D), (C,A), (A,B), (C,A), (C,B), (C,B)
I already have something that totals the unique occurrences and counts the duplicates which gives me a print-out similar to this:
(A,B) with a count of 4, (A,C) with a count of 2, (B,C) with a count of 6, etc.
I would like to know how to code a loop that would give me the following:
Print out the first occurance of the first part of the key and its associtated values and counts.
Name: A:
Type Count
B 4
C 2
Total 6
Name: B:
Type Count
A 3
B 2
C 3
Total 8
I know I need to create a loop where the first statement = the first statement and do the following, but have no real idea how to approach/code this.
Here's a slightly slow algorithm that'll get it done:
def convert(myDict):
keys = myDict.keys()
answer = collections.defaultdict(dict)
for key in keys:
for k in [k for k in keys if k.startswith(key[0])]:
answer[key[0]][k[1]] = myDict[k]
return answer
Ultimately, I think what you're after is a trie
Its a little misleading to say that your dictionary has multiple values for a given key. Python doesn't allow that. Instead, what you have are keys that are tuples. You want to unpack those tuples and rebuild a nested dictionary.
Here's how I'd do it:
import collections
# rebuild data structure
nested = collections.defaultdict(dict)
for k, v in myDict.items():
k1, k2 = k # unpack key tuple
nested[k1][k2] = v
# print out data in the desired format (with totals)
for k1, inner in nested.items():
print("%s\tType\tCount" % k1)
total = 0
for k2, v in innner.items():
print("\t%s\t%d" % (k2, v))
total += v
print("\tTotal\t%d" % total)
I need to modify a dictionary. I have a dictionary with integer values and want to replace each value with the fraction of the total of all values, eg.:
census={a:4, b:1, c:3}; turnIntoFractions(census), should then print {a:0.5, b:0,125 ,c:0,375 }
I was thinking something like:
def turnIntoFractions:
L=d.keys()
total=sum(L)
F=[]
for count in L:
f.append(float(count/float(total))
return F
I'm kind of stuck, and it isn't working..
You can use dictionary comprehension.
def turnIntoFractions(d):
total = float(sum(d.values()))
return {key:(value/total) for key,value in d.items()}
Your first problem is that you are doing the sum of the keys, not the values:
total = sum(d.values())
Now, you can just modify the dictionary inline, instead of putting it into a new list:
for key in d.keys():
d[key] /= total # or d[key] = d[key] / total
My previous code goes through each key, retrieves the value, then divides by total, and then finally stores it back into d[key].
If you want a new dictionary returned, instead of just modifying the existing one, you can just start out with e = d.copy(), then use e instead.
You seem to want to edit the dict in place, but your code returns a new object, which is actually better practice.
def turnIntoFractions(mydict):
values=d.values()
total=float(sum(values))
result = {}
for key, val in mydict.items():
result[key] = val/total
return result
your code has the right idea, but also a few small mistakes.
here's a working code:
def turnIntoFractions(d):
L=d.values()
total=sum(L)
f=[]
for count in L:
f.append(float(count/float(total)))
return f
census={'a':4, 'b':1, 'c':3}
print(turnIntoFractions(census))
note that python is case sensitive so f is not the same as F, and also keys that are strings need to be quoted
Use dictionary comprehension
sum = float(sum(census.itervalues()))
newDict = {k : (v / sum) for k,v in census.iteritems()}
for python 2.6:
newDict = dict((k,v /sum) for (k,v) in census.iteritems())
The following Python code will modify the dictionary's keys to float values.
def turnIntoFractions(mydict):
total = sum(mydict.values())
for key in mydict:
mydict[key] = float(mydict[key]) / total