I've following nested dictionary, where the first number is resource ID (the total number of IDs is greater than 100 000):
dict = {1: {'age':1,'cost':14,'score':0.3},
2: {'age':1,'cost':9,'score':0.5},
...}
I want to add to each resource a sum of costs of resources with lower score than given resource. I can add 'sum_cost' key which is equal to 0 by following code:
for id in adic:
dict[id]['sum_cost'] = 0
It gives me following:
dict = {1: {'age':1,'cost':14,'score':0.3, 'sum_cost':0},
2: {'age':1,'cost':9,'score':0.5,'sum_cost':0},
...}
Now I would like to use ideally for loop (to make the code easily readable) to assign to each sum_cost a value equal of sum of cost of IDs with lower score than the given ID.
Ideal output looks like dictionary where 'sum_cost' of each ID is equal to the cost of IDs with lower score than given ID:
dict = {1: {'age':1,'cost':14,'score':0.3, 'sum_cost':0},
2: {'age':1,'cost':9,'score':0.5,'sum_cost':21},
3: {'age':13,'cost':7,'score':0.4,'sum_cost':14}}
Is there any way how to do it?
Notes:
Using sorted method for sorting the dictionary output corresponding to the key score
dictionary get method to get dictionary values
and using a temporary variable for cumulative addition os sum_cost
Code:
dicts = {1: {'age': 1, 'cost': 14, 'score': 0.3, 'sum_cost': 0},
2: {'age': 1, 'cost': 9, 'score': 0.5, 'sum_cost': 0},
3: {'age': 13, 'cost': 7, 'score': 0.4, 'sum_cost': 0}}
sum_addition = 0
for key, values in sorted(dicts.items(), key=lambda x: x[1].get('score', None)):
if dicts[key].get('score') is not None: #By default gives None when key is not available
dicts[key]['sum_cost'] = sum_addition
sum_addition += dicts[key]['cost']
print key, dicts[key]
A even more simplified method by #BernarditoLuis and #Kevin Guan advise
Code2:
dicts = {1: {'age': 1, 'cost': 14, 'score': 0.3, 'sum_cost': 0},
2: {'age': 1, 'cost': 9, 'score': 0.5, 'sum_cost': 0},
3: {'age': 13, 'cost': 7, 'score': 0.4, 'sum_cost': 0}}
sum_addition = 0
for key, values in sorted(dicts.items(), key=lambda x: x[1].get('score', None)):
if dicts[key].get('score'): #By default gives None when key is not available
dicts[key]['sum_cost'] = sum_addition
sum_addition += dicts[key]['cost']
print key, dicts[key]
Output:
1 {'sum_cost': 0, 'age': 1, 'cost': 14, 'score': 0.3}
3 {'sum_cost': 14, 'age': 13, 'cost': 7, 'score': 0.4}
2 {'sum_cost': 21, 'age': 1, 'cost': 9, 'score': 0.5}
What about using OrderedDict?
from collections import OrderedDict
origin_dict = {
1: {'age':1,'cost':14,'score':0.3},
2: {'age':1,'cost':9,'score':0.5},
3: {'age':1,'cost':8,'score':0.45}
}
# sort by score
sorted_dict = OrderedDict(sorted(origin_dict.items(), key=lambda x: x[1]['score']))
# now all you have to do is to count sum_cost successively starting from 0
sum_cost = 0
for key, value in sorted_dict.items():
value['sum_cost'] = sum_cost
sum_cost += value['cost']
print sorted_dict
Related
I have several lists of dictionaries, where each dictionary contains a unique id value that is common among all lists. I'd like to combine them into a single list of dicts, where each dict is joined on that id value.
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I tried doing something like the answer found at https://stackoverflow.com/a/42018660/7564393, but I'm getting very confused since I have more than 2 lists. Should I try using a defaultdict approach? More importantly, I am NOT always going to know the other values, only that the id value is present in all dicts.
You can use itertools.groupby():
from itertools import groupby
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = []
for _, values in groupby(sorted([*list1, *list2, *list3], key=lambda x: x['id']), key=lambda x: x['id']):
temp = {}
for d in values:
temp.update(d)
desired_output.append(temp)
Result:
[{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
# combine all lists
d = {} # id -> dict
for l in [list1, list2, list3]:
for list_d in l:
if 'id' not in list_d: continue
id = list_d['id']
if id not in d:
d[id] = list_d
else:
d[id].update(list_d)
# dicts with same id are grouped together since id is used as key
res = [v for v in d.values()]
print(res)
You can first build a dict of dicts, then turn it into a list:
from itertools import chain
from collections import defaultdict
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
dict_out = defaultdict(dict)
for d in chain(list1, list2, list3):
dict_out[d['id']].update(d)
out = list(dict_out.values())
print(out)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
itertools.chain allows you to iterate on all the dicts contained in the 3 lists. We build a dict dict_out having the id as key, and the corresponding dict being built as value. This way, we can easily update the already built part with the small dict of our current iteration.
Here, I have presented a functional approach without using itertools (which is excellent in rapid development work).
This solution will work for any number of lists as the function takes variable number of arguments and also let user to specify the type of return output (list/dict).
By default it returns list as you want that otherwise it returns dictionary in case if you pass as_list = False.
I preferred dictionary to solve this because its fast and search complexity is also less.
Just have a look at the below get_packed_list() function.
get_packed_list()
def get_packed_list(*dicts_lists, as_list=True):
output = {}
for dicts_list in dicts_lists:
for dictionary in dicts_list:
_id = dictionary.pop("id") # id() is in-built function so preferred _id
if _id not in output:
# Create new id
output[_id] = {"id": _id}
for key in dictionary:
output[_id][key] = dictionary[key]
dictionary["id"] = _id # push back the 'id' after work (call by reference mechanism)
if as_list:
return [output[key] for key in output]
return output # dictionary
Test
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
output = get_packed_list(list1, list2, list3)
print(output)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
output = get_packed_list(list1, list2, list3, as_list=False)
print(output)
# {1: {'id': 1, 'value': 20, 'sum': 10, 'total': 30}, 2: {'id': 2, 'value': 21, 'sum': 11, 'total': 32}}
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
print(list1+list2+list3)
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
result = []
for i in range(0,len(list1)):
final_dict = dict(list(list1[i].items()) + list(list2[i].items()) + list(list3[i].items()))
result.append(final_dict)
print(result)
output : [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I am struggling to create a nested dictionary with the following data:
Team, Group, ID, Score, Difficulty
OneTeam, A, 0, 0.25, 4
TwoTeam, A, 1, 1, 10
ThreeTeam, A, 2, 0.64, 5
FourTeam, A, 3, 0.93, 6
FiveTeam, B, 4, 0.5, 7
SixTeam, B, 5, 0.3, 8
SevenTeam, B, 6, 0.23, 9
EightTeam, B, 7, 1.2, 4
Once imported as a Pandas Dataframe, I turn each feature into these lists:
teams, group, id, score, diff.
Using this stack overflow answer Create a complex dictionary using multiple lists I can create the following dictionary:
{'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25},
'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}
using the code:
{team: {'id': i, 'score': s, 'diff': d} for team, i, s, d in zip(teams, id, score, diff)}
But what I'm after is having 'Group' as the main key, then team, and then id, score and difficulty within the team (as above).
I have tried:
{g: {team: {'id': i, 'score': s, 'diff': d}} for g, team, i, s, d in zip(group, teams, id, score, diff)}
but this doesn't work and results in only one team per group within the dictionary:
{'A': {'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93}},
'B': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2}}}
Below is how the dictionary should look, but I'm not sure how to get there - any help would be much appreciated!
{'A:': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25}},
'B': {'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}}
A dict comprehension may not be the best way of solving this if your data is stored in a table like this.
Try something like
from collections import defaultdict
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
By using defaultdict, if groups[g] already exists, the new team is added as a key, if it doesn't, an empty dict is automatically created that the new team is then inserted into.
Edit: you edited your answer to say that your data is in a pandas dataframe. You can definitely skip the steps of turning the columns into list. Instead you could then for example do:
from collections import defaultdict
groups = defaultdict(dict)
for row in df.itertuples():
groups[row.Group][row.Team] = {'id': row.ID, 'score': row.Score, 'diff': row.Difficulty}
If you absolutely want to use comprehension, then this should work:
z = zip(teams, group, id, score, diff)
s = set(group)
d = { #outer dict, one entry for each different group
group: ({ #inner dict, one entry for team, filtered for group
team: {'id': i, 'score': s, 'diff': d}
for team, g, i, s, d in z
if g == group
})
for group in s
}
I added linebreaks for clarity
EDIT:
After the comment, to better clarify my intention and out of curiosity, I run a comparison:
# your code goes here
from collections import defaultdict
import timeit
teams = ['OneTeam', 'TwoTeam', 'ThreeTeam', 'FourTeam', 'FiveTeam', 'SixTeam', 'SevenTeam', 'EightTeam']
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
id = [0, 1, 2, 3, 4, 5, 6, 7]
score = [0.25, 1, 0.64, 0.93, 0.5, 0.3, 0.23, 1.2]
diff = [4, 10, 5, 6, 7, 8, 9, 4]
def no_comprehension():
global group, teams, id, score, diff
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
def comprehension():
global group, teams, id, score, diff
z = zip(teams, group, id, score, diff)
s = set(group)
d = {group: ({team: {'id': i, 'score': s, 'diff': d} for team, g, i, s, d in z if g == group}) for group in s}
print("no comprehension:")
print(timeit.timeit(lambda : no_comprehension(), number=10000))
print("comprehension:")
print(timeit.timeit(lambda : comprehension(), number=10000))
executable version
Output:
no comprehension:
0.027287796139717102
comprehension:
0.028979241847991943
They do look the same, in terms of performance. With my sentence above, I was just highlighting this as an alternative solution to the one already posted by #JohnO.
I have input list
inlist = [{"id":123,"hour":5,"groups":"1"},{"id":345,"hour":3,"groups":"1;2"},{"id":65,"hour":-2,"groups":"3"}]
I need to group the dictionaries by 'groups' value. After that I need to add key min and max of hour in new grouped lists. The output should look like this
outlist=[(1, [{"id":123, "hour":5, "min_group_hour":3, "max_group_hour":5}, {"id":345, "hour":3, "min_group_hour":3, "max_group_hour":5}]),
(2, [{"id":345, "hour":3, "min_group_hour":3, "max_group_hour":3}])
(3, [{"id":65, "hour":-2, "min_group_hour":-2, "max_group_hour":-2}])]
So far I managed to group input list
new_list = []
for domain in test:
for group in domain['groups'].split(';'):
d = dict()
d['id'] = domain['id']
d['group'] = group
d['hour'] = domain['hour']
new_list.append(d)
for k,v in itertools.groupby(new_list, key=itemgetter('group')):
print (int(k),max(list(v),key=itemgetter('hour'))
And output is
('1', [{'group': '1', 'id': 123, 'hour': 5}])
('2', [{'group': '2', 'id': 345, 'hour': 3}])
('3', [{'group': '3', 'id': 65, 'hour': -2}])
I don't know how to aggregate values by group? And is there more pythonic way of grouping dictionaries by key value that needs to be splitted?
Start by creating a dict that maps group numbers to dictionaries:
from collections import defaultdict
dicts_by_group = defaultdict(list)
for dic in inlist:
groups = map(int, dic['groups'].split(';'))
for group in groups:
dicts_by_group[group].append(dic)
This gives us a dict that looks like
{1: [{'id': 123, 'hour': 5, 'groups': '1'},
{'id': 345, 'hour': 3, 'groups': '1;2'}],
2: [{'id': 345, 'hour': 3, 'groups': '1;2'}],
3: [{'id': 65, 'hour': -2, 'groups': '3'}]}
Then iterate over the grouped dicts and set the min_group_hour and max_group_hour for each group:
outlist = []
for group in sorted(dicts_by_group.keys()):
dicts = dicts_by_group[group]
min_hour = min(dic['hour'] for dic in dicts)
max_hour = max(dic['hour'] for dic in dicts)
dicts = [{'id': dic['id'], 'hour': dic['hour'], 'min_group_hour': min_hour,
'max_group_hour': max_hour} for dic in dicts]
outlist.append((group, dicts))
Result:
[(1, [{'id': 123, 'hour': 5, 'min_group_hour': 3, 'max_group_hour': 5},
{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 5}]),
(2, [{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 3}]),
(3, [{'id': 65, 'hour': -2, 'min_group_hour': -2, 'max_group_hour': -2}])]
IIUC: Here is another way to do it in pandas:
import pandas as pd
input = [{"id":123,"hour":5,"group":"1"},{"id":345,"hour":3,"group":"1;2"},{"id":65,"hour":-2,"group":"3"}]
df = pd.DataFrame(input)
#Get minimum
dfmi = df.groupby('group').apply(min)
#Rename hour column as min_hour
dfmi.rename(columns={'hour':'min_hour'}, inplace=True)
dfmx = df.groupby('group').apply(max)
#Rename hour column as max_hour
dfmx.rename(columns={'hour':'max_hour'}, inplace=True)
#Merge min df with main df
df = df.merge(dfmi, on='group', how='outer')
#Merge max df with main df
df = df.merge(dfmx, on='group', how='outer')
output = list(df.apply(lambda x: x.to_dict(), axis=1))
#Dictionary of dictionaries
dict_out = df.to_dict(orient='index')
I'm a bit mentally stuck at something, that seems really simple at first glance.
I'm grabbing a list of ids to be selected and scores to sort them based on.
My current solution is the following:
ids = [1, 2, 3, 4, 5]
items = Item.objects.filter(pk__in=ids)
Now I need to add a score based ordering somehow so I'll build the following list:
scores = [
{'id': 1, 'score': 15},
{'id': 2, 'score': 7},
{'id': 3, 'score': 17},
{'id': 4, 'score': 11},
{'id': 5, 'score': 9},
]
ids = [score['id'] for score in scores]
items = Item.objects.filter(pk__in=ids)
So far so good - but how do I actually add the scores as some sort of aggregate and sort the queryset based on them?
Sort the scores list, and fetch the queryset using in_bulk().
scores = [
{'id': 1, 'score': 15},
{'id': 2, 'score': 7},
{'id': 3, 'score': 17},
{'id': 4, 'score': 11},
{'id': 5, 'score': 9},
]
sorted_scores = sorted(scores) # use reverse=True for descending order
ids = [score['id'] for score in scores]
items = Item.objects.in_bulk(ids)
Then generate a list of the items in the order you want:
items_in_order = [items[x] for x in ids]
i am having two dictionaries
first = {'id': 1, 'age': 23}
second = {'id': 4, 'out': 100}
I want output dictionary as
{'id': 5, 'age': 23, 'out':100}
I tried
>>> dict(first.items() + second.items())
{'age': 23, 'id': 4, 'out': 100}
but i am getting id as 4 but i want to it to be 5 .
You want to use collections.Counter:
from collections import Counter
first = Counter({'id': 1, 'age': 23})
second = Counter({'id': 4, 'out': 100})
first_plus_second = first + second
print first_plus_second
Output:
Counter({'out': 100, 'age': 23, 'id': 5})
And if you need the result as a true dict, just use dict(first_plus_second):
>>> print dict(first_plus_second)
{'age': 23, 'id': 5, 'out': 100}
If you want to add values from the second to the first, you can do it like this:
first = {'id': 1, 'age': 23}
second = {'id': 4, 'out': 100}
for k in second:
if k in first:
first[k] += second[k]
else:
first[k] = second[k]
print first
The above will output:
{'age': 23, 'id': 5, 'out': 100}
You can simply update the 'id' key afterwards:
result = dict(first.items() + second.items())
result['id'] = first['id'] + second['id']