Sum dictionary items based on rank - python

I can sum items in a list of dicts per key like so:
import functools
dict(
functools.reduce(
lambda x, y:x.update(y) or x,
dict1,
collections.Counter())
)
But given that
dict1 = [{'ledecky': 1, 'king': 2, 'vollmer': 3},
{'ledecky': 1, 'vollmer': 2, 'king': 3},
{'schmitt': 1, 'ledecky': 2, 'vollmer': 3}]
how could I sum their values according to medal value, given that:
medal_value = {1: 10.0, 2: 5.0, 3: 3.0}
Such that the final dict would yield:
{'ledecky': 25.0, 'king': 8.0, 'vollmer': 11.0, 'schmitt': 10.0}

The get() dictionary function works really well in this example, we either give the newly created dictionary a default value of 0 or add it's current value with the weighted value using our value (the value of dict1) as the search key.
def calculate_points(results, medal_value):
d = {}
for item in results:
for key, value in item.iteritems():
d[key] = d.get(key, 0) + medal_value[value]
return d
Sample output:
dict1 = [{'ledecky': 1, 'king': 2, 'vollmer': 3},
{'ledecky': 1, 'vollmer': 2, 'king': 3},
{'schmitt': 1, 'ledecky': 2, 'vollmer': 3}]
medal_value = {1 : 10.0, 2 : 5.0, 3 : 3.0}
print calculate_points(dict1, medal_value)
>>> {'ledecky': 25.0, 'king': 8.0, 'schmitt': 10.0, 'vollmer': 11.0}

Just define a lookup function to transform the original dict to a medal values dict:
def lookup(d):
return dict((k, medal_value[v]) for k, v in d.items())
And apply this function to your update part of the expression:
dict(
functools.reduce(
lambda x, y: x.update(lookup(y)) or x,
dict1,
collections.Counter())
)

Related

How can I map and reduce my list of dictionaries with Python

I have this list of dictionaries:
[{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
I would like to map and reduce (or group) to have a result like this:
[
{
'topic_id': 1,
'count': 2,
'variance': 3.0,
'global_average': 6.5
},
{
'topic_id': 2,
'count': 1,
'variance': 5.0,
'global_average': 5.0
}
]
Something that calculate the variance (max average - min average) and sum the count of items too.
What I have already did:
Before I just tried sum the count changing the structure of the dictionary, and making the key be the topic_id and value the count, my result was:
result = sorted(dict(functools.reduce(operator.add, map(collections.Counter, data))).items(), reverse=True)
this was just the first try.
You could achieve this with some comprehensions, a map, and the mean function from the built-in statistics module.
from statistics import mean
data = [
{
'topic_id': 1,
'average': 5.0,
'count': 1
}, {
'topic_id': 1,
'average': 8.0,
'count': 1
}, {
'topic_id': 2,
'average': 5.0,
'count': 1
}
]
# a set of unique topic_id's
keys = set(i['topic_id'] for i in data)
# a list of list of averages for each topic_id
averages = [[i['average'] for i in data if i['topic_id'] == j] for j in keys]
# a map of tuples of (counts, variances, averages) for each topic_id
stats = map(lambda x: (len(x), max(x) - min(x), mean(x)), averages)
# finally reconstruct it back into a list
result = [
{
'topic_id': key,
'count': count,
'variance': variance,
'global_average': average
} for key, (count, variance, average) in zip(keys, stats)
]
print(result)
Returns
[{'topic_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5}, {'topic_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Here is an attempt using itertools.groupby to group the data based on the topic_id:
import itertools
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# groupby
grouper = itertools.groupby(data, key=lambda x: x['topic_id'])
# holder for output
output = []
# iterate over grouper to calculate things
for key, group in grouper:
# variables for calculations
count = 0
maxi = -1
mini = float('inf')
total = 0
# one pass over each dictionary
for g in group:
avg = g['average']
maxi = avg if avg > maxi else maxi
mini = avg if avg < mini else mini
total += avg
count += 1
# write to output
output.append({'total_id':key,
'count':count,
'variance':maxi-mini,
'global_average':total/count})
Giving this output:
[{'total_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5},
{'total_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Note that the 'variance' for the second group is 0.0 here instead of 5.0; this is different from your expected output, but I would guess this is what you want?
If you are willing to use pandas, this seems like an appropriate use case:
import pandas as pd
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# move to dataframe
df = pd.DataFrame(data)
# groupby and get all desired metrics
grouped = df.groupby('topic_id')['average'].describe()
grouped['variance'] = grouped['max'] - grouped['min']
# rename columns and remove unneeded ones
grouped = grouped.reset_index().loc[:, ['topic_id', 'count', 'mean', 'variance']].rename({'mean':'global_average'}, axis=1)
# back to list of dicts
output = grouped.to_dict('records')
output is:
[{'topic_id': 1, 'count': 2.0, 'global_average': 6.5, 'variance': 3.0},
{'topic_id': 2, 'count': 1.0, 'global_average': 5.0, 'variance': 0.0}]
You can also try to use the agg functionality of pandas dataframe like this
import pandas as pd
f = pd.DataFrame(d).set_index('topic_id')
def var(x):
return x.max() - x.min()
out = f.groupby(level=0).agg(count=('count', 'sum'),
global_average=('average', 'mean'),
variance=('average', var))

Group all keys with the same value in a dictionary of sets

I am trying to transform a dictionary of sets as the values with duplication to a dictionary with the unique sets as the value and at the same time join the keys together.
dic = {'a': {1, 2, 3}, 'b': {1, 2}, 'c': {1, 3, 2}, 'd': {1, 2, 3}}
Should be changed to
{'a-c-d': {1, 2, 3}, 'b': {1, 2}}
My try is as below, but I think there has to be a better way.
def transform_dictionary(dic: dict) -> dict:
dic = {k: frozenset(v) for k, v in dic.items()}
key_list = list(dic.keys())
value_list = list(dic.values())
dict_transformed = {}
for v_uinque in set(value_list):
sub_key_list = []
for i, v in enumerate(value_list):
if v == v_uinque:
sub_key_list.append(str(key_list[i]))
dict_transformed['-'.join(sub_key_list)] = set(v_uinque)
return dict_transformed
print(transform_dictionary(dic))
You can "invert" the input dictionary into a dictionary mapping frozensets into a set of keys.
import collections
dic = {'a': {1, 2, 3}, 'b': {1, 2}, 'c': {1, 3, 2}, 'd': {1, 2, 3}}
keys_per_set = collections.defaultdict(list)
for key, value in dic.items():
keys_per_set[frozenset(value)].append(key)
Then invert that dictionary mapping back into the desired form:
{'-'.join(keys): value for (value, keys) in keys_per_set.items()}
Output:
{'a-c-d': frozenset({1, 2, 3}), 'b': frozenset({1, 2})}
This will turn the values into a frozenset, but you could "thaw" them with a set(value) in the last list comprehension.
from itertools import groupby
dic_output = {'-'.join(v):g for g,v in groupby(sorted(dic_input,
key=dic_input.get),
key=lambda x: dic_input[x])}
Output
{'b': {1, 2}, 'a-c-d': {1, 2, 3}}

Remove duplicate dictionaries from a list and subtract value of keys of duplicate element

I have a list of dicts, and I'd like to remove the dicts with identical key and subtract the value pairs.
For this list:
[{'chair': 4}, {'tv': 5}, {'chair': 3}, {'tv': 2}, {'laptop': 2}]
I'd like to return this:
[{'chair': 1}, {'tv': 3}, {'laptop': 2}]
You could do it like this, creating an intermediate dict for efficiency:
dicts_list = [{'chair': 4}, {'tv': 5}, {'chair': 3}, {'tv': 2}, {'laptop': 2}]
out = {}
for d in dicts_list:
for key, val in d.items():
if key in out:
out[key] -= val
else:
out[key] = val
out_list = [ {key:val} for key, val in out.items()]
print(out_list)
# [{'tv': 3}, {'chair': 1}, {'laptop': 2}]
But you might be interested in this intermediate dict as output:
print(out)
# {'tv': 3, 'chair': 1, 'laptop': 2}
defaultdict from collections might come in handy. This solution will cover the cases where there are more than 2 dicts of the same key in the list.
from collections import defaultdict
ls = defaultdict(list)
d = [{'chair': 4}, {'tv': 5}, {'chair': 3}, {'tv': 2}, {'laptop': 2}]
# Creating a list of all values under one key
for dic in d:
for k in dic:
ls[k].append(dic[k])
print(ls)
defaultdict(<class 'list'>, {'chair': [4, 3], 'tv': [5, 2], 'laptop': [2]})
# safe proofing for negative values on subtraction
for k in ls:
ls[k].sort(reverse=True)
ls[k] = ls[k][0] - sum(ls[k][1:])
print(ls)
defaultdict(<class 'list'>, {'chair': 1, 'tv': 3, 'laptop': 2})
You can construct a defaultdict of lists, then use a list comprehension:
from collections import defaultdict
dd = defaultdict(list)
for d in data:
k, v = next(iter(d.items()))
dd[k].append(v)
res = [{k: v if len(v) == 1 else v[0] - sum(v[1:])} for k, v in dd.items()]
print(res)
# [{'chair': 1}, {'tv': 3}, {'laptop': [2]}]
Following snippet is using nothing but standard modules:
a= [{'chair': 4}, {'tv': 5}, {'chair': 3}, {'tv': 2}, {'laptop': 2}]
print("Input:", a)
b=dict()
for element in a:
for k,v in element.items():
try:
# you didn't specify the subtracted element order,
# so I'm subtracting BIGGER from SMALLER using simple abs() :)
b[k] = abs(b[k] - v)
except:
b[k] = v
print("Output:", b)
# restore original structure
c = [ dict({item}) for item in b.items() ]
print("Output:", c)
And demo:
('Input:', [{'chair': 4}, {'tv': 5}, {'chair': 3}, {'tv': 2}, {'laptop': 2}])
('Output:', {'tv': 3, 'chair': 1, 'laptop': 2})
('Output:', [{'tv': 3}, {'chair': 1}, {'laptop': 2}])
EDIT: Added the secondary out put C to restructure B similar to A

How to count unique key elements in a tuple in a defaultdict (python)?

I have the following dictionary, keys being tuples:
defaultdict(<class 'float'>, {('abc', 'xyz'): 1.0, ('abc', 'def'):
3.0, ('abc', 'pqr'): 1.0, ('pqr', 'xyz'): 1.0, ('pqr', 'def'): 1.0})
How do I count up the first key element and second key element,
so that I can get:
defaultdict(<class 'float'>, {'abc': 3.0, 'pqr': 3.0})
and
defaultdict(<class 'float'>, {'xyz': 2.0, 'def': 2.0, 'pqr': 1.0})
I am ignoring the values in the original dictionary and just counting up unique keys (first and second separately).
I want to do something like the following, but I get an error "'tuple' object has no attribute 'items'":
first_key_list =[j[0][0] for i in dictionary for j in i.items()]
new_dict = collections.defaultdict(float)
for i in first_key_list:
new_dict[i] += 1
You're on the right track with your approach. But I'd recommend using a Counter object if you want to count things.
from collections import Counter
c1 = Counter(k[0] for k in d.keys())
c2 = Counter(k[1] for k in d.keys())
Truthfully, d.keys() is redundant here, since iteration is over the keys by default.
c1
Counter({'abc': 3, 'pqr': 2})
c2
Counter({'def': 2, 'pqr': 1, 'xyz': 2})
for i in dictionary for j in i.items() doesn't work because outer loop yields the dictionary keys (the tuples), and items don't apply to tuples.
Anyway, it seems that you're ignoring the values of your dictionaries. Just use collections.Counter on the first part of the key:
d = {('abc', 'xyz'): 1.0, ('abc', 'def'):
3.0, ('abc', 'pqr'): 1.0, ('pqr', 'xyz'): 1.0, ('pqr', 'def'): 1.0}
import collections
d1 = collections.Counter(k[0] for k in d)
print(d1)
result:
Counter({'abc': 3, 'pqr': 2})
if you want floats, I suggest that you convert to float after having counted to avoid floating point inaccuracy:
{k:float(v) for k,v in d1.items()}
or in one line:
d1 = {k:float(v) for k,v in collections.Counter(k[0] for k in d).items()}
to keep keys as tuples:
d1 = {(k,):float(v) for k,v in collections.Counter(k[0] for k in d).items()}
for the second part, just use k[1] instead.

finding probability of values in dictionary

I have a default dict which looks like this:
my_dict = default(dict, {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}})
The keys are phonemes, and values that are dictionaries themselves are graphemes which occur a certain amount of times, which are the respective numbers in the default dict.
The function should return another default dict containing the probabilities, which will look like this:
defaultdict(<class 'dict'>, {'EH': {'e': 1.0}, 'K': {'k': 0.6666666666666666, 'x': 0.3333333333333333}, 'S': {'_': 0.5, 's': 0.5}})
'e' remains the same, as 1.0/1 = 1.0. 'K' has values of 0.66666 and 0.33333 because 2/3 = 0.66666 and 1/3 = 0.3333333. 'S' has values of 0.5 and 0.5, because 1/2=0.5 for each of them. The probabilities in the return dict must always sum to one.
so far I have this:
from collections import defaultdict
my_dict = default(dict, {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}})
def dict_probability(my_dict):
return_dict = defaultdict(dict)
for char in my_dict.values():
For each of your subdictionnaries, you would like to divide each value by the sum of the subdictionnary values:
my_dict = {"K": {"k": 2, "x": 1.0}, "S": {"_":1.0, "s":1}, "EH": {"e":1.0}}
{k:{k1:v1/sum(v.values()) for k1,v1 in v.iteritems()} for k,v in my_dict.iteritems()}
{'EH': {'e': 1.0},
'K': {'k': 0.6666666666666666, 'x': 0.3333333333333333},
'S': {'_': 0.5, 's': 0.5}}
example_dict = {"A": 1, "B": 2, "C": 3}
prob_dict = {}
for k, v in test_dict.items():
prob_dict[k] = v / sum(example_dict.values())
print(prob_dict)
{'A': 0.16666666666666666, 'B': 0.3333333333333333, 'C': 0.5}

Categories