Summing keys and values in a list of dictionaries python - python

I have a list of dictionaries called "timebucket" :
[{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
I would like to return the top two largest keys (.99 and .98) and average them , plus , get both of their values and average those as well.
Expected output would like something like:
{ (avg. two largest keys) : (avg. values of two largest keys) }
I've tried:
import numpy as np
import heapq
[np.mean(heapq.nlargest(2, i.keys())) for i in timebucket]
but heapq doesn't work in this scenario, and not sure how to keep keys and values linked

Doing this with numpy:
In []:
a = np.array([e for i in timebucket for e in i.items()]);
a[a[:,1].argsort()][:2].mean(axis=0)
Out[]
array([ 0.99084129, 0.00261179])
Though I suspect creating a better data-structure up front would probably be a better approach.

This gives you the average of 2 largest keys (keyave) and the average of the two corresponding values (valave).
The keys and values are put into a dictionary called newdict.
timebucket = [{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
keys = []
for time in timebucket:
for x in time:
keys.append(x)
result = {}
for d in timebucket:
result.update(d)
largestkey = (sorted(keys)[-1])
ndlargestkey = (sorted(keys)[-2])
keyave = (float((largestkey)+(ndlargestkey))/2)
largestvalue = (result[(largestkey)])
ndlargestvalue = (result[(ndlargestkey)])
valave = (float((largestvalue)+(ndlargestvalue))/2)
newdict = {}
newdict[keyave] = valave
print(newdict)
#print(keyave)
#print(valave)
Output
{0.9908412862404705: 0.002611786698168918}

Here is a solution to your problem:
def dothisthing(mydict) # define the function with a dictionary a the only parameter
keylist = [] # create an empty list
for key in mydict: # iterate the input dictionary
keylist.append(key) # add the key from the dictionary to a list
keylist.sort(reverse = True) # sort the list from highest to lowest numbers
toptwokeys = 0 # create a variable
toptwovals = 0 # create a variable
count = 0 # create an integer variable
for item in keylist: # iterate the list we created above
if count <2: # this limits the iterations to the first 2
toptwokeys += item # add the key
toptwovals += (mydict[item]) # add the value
count += 1
finaldict = {(toptwokeys/2):(toptwovals/2)} # create a dictionary where the key and val are the average of the 2 from the input dict with the greatest keys
return finaldict # return the output dictionary
dothisthing({0.9711533363722904: 0.008296776727415599, 0.97163564816067838: 0.008153794130319884, 0.99212783984967068: 0.0022392112909864364, 0.98955473263127025: 0.0029843621053514003})
#call the function with your dictionary as the parameter
I hope it helps

You can do it in just four lines without importing numpy :
One line solution
For two max average keys :
max_keys_average=sorted([keys for item in timebucket for keys,values in item.items()])[::-1][:2]
print(sum(max_keys_average)/len(max_keys_average))
output:
0.9908412862404705
for their keys average :
max_values_average=[values for item in max_keys_average for item_1 in timebucket for keys,values in item_1.items() if item==keys]
print(sum(max_values_average)/len(max_values_average))
output:
0.002611786698168918
If you are facing issue with understanding list comprehension here is detailed solution for you:
Detailed Solution
first step:
get all the keys of dict in one list :
Here is your timebucket list:
timebucket=[{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
now let's store all the keys in one list:
keys_list=[]
for dict in timebucket:
for key,value in dict.items():
keys_list.append(key)
Now next step is sort this list and get last two values of this list :
max_keys=sorted(keys_list)[::-1][:2]
Next step just take sum of this new list and divide by len of list :
print(sum(max_keys)/len(max_keys))
output:
0.9908412862404705
Now just iterate the max_keys and keys in timebucket and see if both item match then get the value of that item in a list.
max_values=[]
for item in max_keys:
for dict in timebucket:
for key, value in dict.items():
if item==key:
max_values.append(value)
print(max_values)
Now last part , just take sum and divide by len of max_values:
print(sum(max_values)/len(max_values))
Gives the output :
0.002611786698168918

This is an alternative solution to the problem:
In []:
import numpy as np
import time
def AverageTB(time_bucket):
tuples = [tb.items() for tb in time_bucket]
largest_keys = []
largest_keys.append(max(tuples))
tuples.remove(max(tuples))
largest_keys.append(max(tuples))
keys = [i[0][0] for i in largest_keys]
values = [i[0][1] for i in largest_keys]
return np.average(keys), np.average(values)
time_bucket = [{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
time_exe = time.time()
print('avg. (keys, values): {}'.format(AverageTB(time_bucket)))
print('time: {}'.format(time.time() - time_exe))
Out[]:
avg. (keys, values): (0.99084128624047052, 0.0026117866981689181)
time: 0.00037789344787

Related

Rearranging a dictionary based on a function-condition over its items

(In relation to this question I posed a few days ago)
I have a dictionary whose keys are strings, and whose values are sets of integers, for example:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
I would like to have a procedure that joins the keys whose values satisfy a certain generic condition given in an external function. The new item will therefore have as a key the union of both keys (the order is not important). The value will be determined by the condition itserf.
For example: given this condition function:
def condition(kv1: tuple, kv2: tuple):
key1, val1 = kv1
key2, val2 = kv2
union = val1 | val2 #just needed for the following line
maxDif = max(union) - min(union)
newVal = set()
for i in range(maxDif):
auxVal1 = {pos - i for pos in val2}
auxVal2 = {pos + i for pos in val2}
intersection1 = val1.intersection(auxVal1)
intersection2 = val1.intersection(auxVal2)
print(intersection1, intersection2)
if (len(intersection1) >= 3):
newVal.update(intersection1)
if (len(intersection2) >= 3):
newVal.update({pos - i for pos in intersection2})
if len(newVal)==0:
return False
else:
newKey = "".join(sorted(key1+key2))
return newKey, newVal
That is, the satisfying pair of items have at least 3 numbers in their values at the same distance (difference) between them. As said, if satisfied, the resulting key is the union of the two keys. And for this particular example, the value is the (minimum) matching numbers in the original value sets.
How can I smartly apply a function like this to a dictionary like db? Given the aforementioned dictionary, the expected result would be:
result = {"ab":{1,2,3}, "cde":{0,3,2}, "d":{18}}
Your "condition" in this case is more than just a mere condition. It is actually merging rule that identifies values to keep and values to drop. This may or may not allow a generalized approach depending on how the patterns and merge rules vary.
Given this, each merge operation could leave values in the original keys that may be merged with some of the remaining keys. Multiple merges can also occur (e.g. key "cde"). In theory the merging process would need to cover a power set of all keys which may be impractical. Alternatively, this can be performed by successive refinements using pairings of (original and/or merged) keys.
The merge condition/function:
db = {"a":{1,2,3}, "b":{5,6,7}, "c":{2,5,4}, "d":{8,11,10,18}, "e":{0,3,2}}
from itertools import product
from collections import Counter
# Apply condition and return a keep-set and a remove-set
# the keep-set will be empty if the matching condition is not met
def merge(A,B,inverted=False):
minMatch = 3
distances = Counter(b-a for a,b in product(A,B) if b>=a)
delta = [d for d,count in distances.items() if count>=minMatch]
keep = {a for a in A if any(a+d in B for d in delta)}
remove = {b for b in B if any(b-d in A for d in delta)}
if len(keep)>=minMatch: return keep,remove
return None,None
print( merge(db["a"],db["b"]) ) # ({1, 2, 3}, {5, 6, 7})
print( merge(db["e"],db["d"]) ) # ({0, 2, 3}, {8, 10, 11})
Merge Process:
# combine dictionary keys using a merging function/condition
def combine(D,mergeFunction):
result = { k:set(v) for k,v in D.items() } # start with copy of input
merging = True
while merging: # keep merging until no more merges are performed
merging = False
for a,b in product(*2*[list(result.keys())]): # all key pairs
if a==b: continue
if a not in result or b not in result: continue # keys still there?
m,n = mergeFunction(result[a],result[b]) # call merge function
if not m : continue # if merged ...
mergedKey = "".join(sorted(set(a+b))) # combine keys
result[mergedKey] = m # add merged set
if mergedKey != a: result[a] -= m; merging = True # clean/clear
if not result[a]: del result[a] # original sets,
if mergedKey != b: result[b] -= n; merging = True # do more merges
if not result[b]: del result[b]
return result

In list of lists, how to find average of values associated with inner lists?

I have a list like this
l=[[Alex,12],[John,14],[Ross,24],[Alex,42],[John,24],[Alex,45]]
how should I process this list that I get a output like this
l=[[Alex,33],[John,19],[Ross,24]]
which is basically the average of scores per each name.
Use pandas to group by name and calculate mean (l is your list):
import pandas as pd
df = pd.DataFrame(l,columns=['name','value'])
l = df.groupby('name').value.mean().reset_index().values.tolist()
df:
name value
0 Alex 12
1 John 14
2 Ross 24
3 Alex 42
4 John 24
5 Alex 45
output:
[['Alex', 33], ['John', 19], ['Ross', 24]]
l = [['Alex',12],['John',14],['Ross',24],['Alex',42],['John',24],['Alex',45]]
score_dict = {}
for l_score in l:
name = l_score[0]
score = l_score[1]
if name in score_dict.keys():
score_dict[name].append(score)
else:
score_dict[name] = [score]
ret_list = []
for k, v in score_dict.items():
sum_l = sum(v)
len_l = len(v)
if len_l > 0:
avg = float(sum_l)/float(len_l)
else:
avg = 0
ret_list.append([k,avg])
print(ret_list)
this should return the following list :
[['Ross', 24.0], ['Alex', 33.0], ['John', 19.0]]
I did not use any package as there were no imports in your code sample. It can be simplified with numpy or pandas
lets simplify the problem, by constructing new dict from it, where the keys is the names or the inner lists first element and the value is the average. since keys are unique in python dicts, this become easy. after doing this we will generate a new list from the constructed dict and this will be our answer.
TheOriginalList=[[Alex,12],[John,14],[Ross,24],[Alex,42],[John,24],[Alex,45]]
aux_dict = {}
for inner_list in TheOriginalList:
if not aux_dict.get(inner_list[0],None): #_1_
aux_dict[inner_list[0]]=[inner_list[1],1] #_2_
else:
aux_dict[inner_list[0]][0]+= inner_list[1] #_3_
aux_dict[inner_list[0]][1]+= 1 #_4_
final_list = []
for k,v in aux_dict.items(): #_5_
final_list.append([k,v[0]/v[1]]) #_6_
explinations
in #1 we are trying to get the key which is the person name, if it already exist in the dict we will get its value which is a list of 2 int items [acumaltive_score , counter] and this will send us to the else to #3. if its not we enter #2
here we add the key (person name to the dict) and set its value to be new list of 2 items [current_score, 1], 1 is the first score. its a counter we need it later for average calculations.
we get here #3, because this person already exist in the dict. so we add its current score to the scores and in #4 we increments the counter by 1.
we explain it (incrementing the counter by 1)
in #5 we iterates over the dict keys and items, so we get in each iteration the key(person name) and the value (list of 2 items, the first item is the total score and the second is the number of the scores).
here in #6 we construct our final list, by appending anew list (again lis of 2 items, in the 0 index the name of the person which is the current key and in index 1 the average which is the v[0]/v[1].
take in mind that this code can raises exceptions in some cases. consider to use try-except

Average values in multiple dictionaries?

I have 4 dictionaries where I have symbol as my key and LTP as the value. Now I want to create a new dictionary where I want the symbol as my key and average of LTP of 4 dictionary as my value
first = {"MRF":40000,"RELIANCE":1000}
second = {"MRF":50000,"RELIANCE":2000}
third = {"MRF":30000,"RELIANCE":500}
fourth = {"MRF":60000,"RELIANCE":4000}
new = {"MRF":45000,"RELIANCE":1875} # this is the average of ltp
Kindly assist me with a way to do it ?
We can get this using mean method in statistics library and list comprehension.
Here is the code :
Note: assuming that keys in all dictionaries are the same:
Note: I am using Python3.x for the below code:
from statistics import mean
first = {"MRF":40000,"RELIANCE":1000}
second = {"MRF":50000,"RELIANCE":2000}
third = {"MRF":30000,"RELIANCE":500}
fourth = {"MRF":60000,"RELIANCE":4000}
dictionaryList = [first,second,third,fourth]
new = {}
for key in first.keys():
new[key] = mean([d[key] for d in dictionaryList ])
print(new)
It Produces the exact same result that you needed
{'MRF': 45000, 'RELIANCE': 1875}
first = {"MRF":40000,"RELIANCE":1000}
second = {"MRF":50000,"RELIANCE":2000}
third = {"MRF":30000,"RELIANCE":500}
fourth = {"MRF":60000,"RELIANCE":4000}
dicts = [first, second, third, fourth]
keys = first.keys()
new = {k: sum((d[k] for d in dicts)) / len(dicts) for k in first.keys()}
print(new) ## {'MRF': 45000.0, 'RELIANCE': 1875.0}

if two people have same score how to return both names connected by "and"

I'm calculating the average score of people in a dictionary with two-dimensional array and I want to know how to return two people with the same score connected by "and"; EX: name and name
My code:
def bestAverage(inputDict):
dic = {}
for i in inputDict:
if i[0] in dic.keys():
dic[i[0]].append(int(i[1]))
else:
dic[i[0]] = [int(i[1])]
totle_score = 0
print(dic)
for key, value, in dic.items():
for c in value:
totle_score += int(c)
Q = len(value)
avrage = totle_score / Q
dic[key]= [avrage]
print(dic)
My input:
inputDict = [ ["Diane", 20],["Bion",25],["Jack","30"],["Diane","50"] ]
result = bestAverage(inputDict)
OUTCOME:
{'Diane': [35.0], 'Bion': [95.0], 'Jack': [125.0]}
Using the sorted dictionary, you can get the dictionary you want.
Sorry, I think my code is a bit complicated.
dic = {'Diane': [35.0],
'Bion': [95.0],
'Jack': [125.0],
'Diane_2': [35.0],
'Bion_2':[95],
'Diane_3':[35.0],
'John':[10]}
import operator
sorted_dic = sorted(dic.items(), key=operator.itemgetter(0))
new_dic = dict()
preKey = sorted_dic[0][0]
preValue = sorted_dic[0][1]
nms = preKey
for key,value in sorted_dic[1:]:
if(value == preValue):
nms += ' and ' + key
else:
new_dic[nms] = preValue
preKey = key
preValue = value
nms = preKey
new_dic[nms] = preValue
print(new_dic)
OUTCOME:
{'Jack': [125.0], 'John': [10], 'Diane and Diane_2 and Diane_3':
[35.0], 'Bion and Bion_2': [95.0]}
Per the OPs question in the comments, this example now produces a final structure containing entries for only those scores that had multiple people with that same score.
data = {'Diane': [35.0], 'Bion': [95.0], 'Jack': [125.0], 'Sam': [95.0]}
# Here, we create a dict of lists, where the keys are the scores, and the values
# are the names of each person who has that score. This will produce:
#
# {
# 35.0: ['Diane'],
# 95.0: ['Bion', 'Sam'],
# 125.0: ['Jack']
# }
collected = {}
# For each key (name) in the input dict...
for name in data:
# Get the score value out of the array for this name
val = data[name][0]
# If we don't have an entry in our new dict for this score (no key in the dict of that
# score value) then add that entry as the score for the key and an empty array for the value
if val not in collected:
collected[val] = []
# Now that we're sure we have an entry for the score of the name we're processing, add
# the name to the array for that score in the new dict
collected[val].append(name)
# Now we just "flip" each entry in the 'collected' map to create a new dict. We create
# one entry in this dict for each entry in the 'collected' map, where each key is a
# single string where we've combined all of the names with the same score, separated
# by 'and', and each value is the score that those names had.
result = {}
# Now iterate over each of our keys, the unique scores, in our new 'collected' dict...
for val in collected:
# We only want to create an entry in the new dict if the entry we're processing has more than
# just one name in the list of names. So here, we check for that, and skip adding an entry to
# the new dict if there is only one name in the list
if len(collected[val]) == 1:
continue
# Combine the value of this entry, the list of names with a particular score, into a single string
combinedNames = " and ".join(collected[val])
# Add an entry to our 'result' dict with this combined name as the key and the score as the value
result[combinedNames] = val
# Print each combined name string from the resulting structure
for names in result:
print(names)
Output:
Bion and Sam

Python: Sum entries in list of tuples entries with case sensitive keys?

I have a list of tuples holding hashtags and frequencies for example:
[('#Example', 92002),
('#example', 65544)]
I want to sum entries which have have the same string as the first entry in the tuple (but a different case-sensitive version), keeping the first entry with the highest value in the second entry. The above would be transformed to:
[('#Example', 157,546)]
I've tried this so far:
import operator
for hashtag in hashtag_freq_list:
if hashtag[0].lower() not in [res_entry[0].lower() for res_entry in res]:
entries = [entry for entry in hashtag_freq_list if hashtag[0].lower() == entry[0].lower()]
k = max(entries,key=operator.itemgetter(1))[0]
v = sum([entry[1] for entry in entries])
res.append((k,v))
I was just wondering if this could be approached in a more elegant way?
I would use dictionary
data = [('#example', 65544),('#Example', 92002)]
hashtable = {}
for i in data:
# See if this thing exists regardless of casing
if i[0].lower() not in hashtable:
# Create a dictionary
hashtable[i[0].lower()] = {
'meta':'',
'value':[]
}
# Copy the relevant information
hashtable[i[0].lower()]['value'].append(i[1])
hashtable[i[0].lower()]['meta'] = i[0]
# If the value exists
else:
# Check if the number it holds is the max against
# what was collected so far. If so, change meta
if i[1] > max(hashtable[i[0].lower()]['value']):
hashtable[i[0].lower()]['meta'] = i[0]
# Append the value regardless
hashtable[i[0].lower()]['value'].append(i[1])
# For output purposes
myList = []
# Build the tuples
for node in hashtable:
myList.append((hashtable[node]['meta'],sum(hashtable[node]['value'])))
# Voila!
print myList
# [('#Example', 157546)]

Categories