I have a list of dictionaries:
data = [{"average": 2, "day": "2022-01-01", "name": "joe", "employee_id": 1},
{"average": 3, "day": "2022-01-02", "name": "joe", "employee_id": 1},
{"average": 9, "day": "2022-01-03", "name": "joe", "employee_id": 1},
{"sum": 13, "day": "2022-01-01", "name": "joe", "employee_id": 1},
{"sum": 15, "day": "2022-01-02", "name": "joe", "employee_id": 1},
{"sum": 0, "day": "2022-01-03", "name": "joe", "employee_id": 1},
{"average": 1, "day": "2022-01-01", "name": "bob", "employee_id": 2},
{"average": 3, "day": "2022-01-02", "name": "bob", "employee_id": 2},
{"sum": 9, "day": "2022-01-01", "name": "bob", "employee_id": 2},
{"sum": 8, "day": "2022-01-02", "name": "bob", "employee_id": 2}]
I want my output as:
output = [{"name": "joe", "employee_id": 1, "day": "2022-01-01", "average": 2, "sum": 13},
{"name": "joe", "employee_id": 1, "day": "2022-01-02", "average": 3, "sum": 15},
{"name": "joe", "employee_id": 1, "day": "2022-01-03", "average": 9, "sum": 0},
{"name": "bob", "employee_id": 2, "day": "2022-01-01", "average": 1, "sum": 9},
{"name": "bob", "employee_id": 2, "day": "2022-01-02", "average": 3, "sum": 8}]
The goal is that the output values are put together by day, name, and employee_id.
I've tried:
output = {}
for item in data:
if item["day"] not in output:
output[item["day"]] = item
else:
output[item["day"]].update(item)
print(list(output.values()))
This works in getting the "average" and "sum" and "date" together, but it ends up not including all of the employees and their IDs.
Any help is appreciated
Using collections.defaultdict with dict. Here, take the value of 'day' and 'name' of each dictionary as the key:
>>> from collections import defaultdict
>>> defdict = defaultdict(dict)
>>> for mp in data:
... defdict[mp['day'], mp['name']].update(mp)
...
>>> keys = ('name', 'employee_id', 'day', 'average', 'sum')
>>> [{k: mp[k] for k in keys} for mp in defdict.values()]
[{'name': 'joe', 'employee_id': 1, 'day': '2022-01-01', 'average': 2, 'sum': 13},
{'name': 'joe', 'employee_id': 1, 'day': '2022-01-02', 'average': 3, 'sum': 15},
{'name': 'joe', 'employee_id': 1, 'day': '2022-01-03', 'average': 9, 'sum': 0},
{'name': 'bob', 'employee_id': 2, 'day': '2022-01-01', 'average': 1, 'sum': 9},
{'name': 'bob', 'employee_id': 2, 'day': '2022-01-02', 'average': 3, 'sum': 8}]
For 150w pieces of data, the performance of this solution is still better than that of pandas (at least when converting data into DataFrame, the for loop has completed the work):
In [451]: random.seed(0)
...: names = [''.join(random.choices(string.ascii_lowercase, k=random.randrange(3, 7))) for _ in range(10000)]
...: dates = [str(datetime.date(2022, i, j)) for i in range(7, 10) for j in range(1, 31)]
...: keys = ['sum', 'average']
...:
...: data = [{k: random.randrange(10), 'day': date, 'name': name, 'employee_id': i}
...: for i, name in enumerate(names, 1)
...: for date in sorted(random.sample(dates, random.randrange(60, 90)))
...: for k in keys]
...:
In [452]: len(data)
Out[452]: 1492286
In [453]: %%timeit
...: defdict = defaultdict(dict)
...: for mp in data:
...: defdict[mp['day'], mp['name']].update(mp)
...: keys = ('name', 'employee_id', 'day', 'average', 'sum')
...: [{k: mp[k] for k in keys} for mp in defdict.values()]
...:
...:
926 ms ± 6.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [454]: %%timeit
...: df = pd.DataFrame(data)
...: pd.merge(df.loc[df['average'].notna()][[ 'name','day','employee_id','average']],
...: df.loc[df['sum'].notna()][['name','day','employee_id','sum']],
...: how='outer'
...: ).to_dict(orient= 'records')
...:
...:
3.58 s ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [455]: %timeit pd.DataFrame(data)
1.26 s ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
here is one way to do it
# filter using loc where average is not null and where sum is not null, as two separate frame
# merge the two DF
#finally, output as a dict of orient record
pd.merge(df.loc[df['average'].notna()][[ 'name','day','employee_id','average']],
df.loc[df['sum'].notna()][['name','day','employee_id','sum']],
how='outer'
).to_dict(orient= 'records')
[{'name': 'joe',
'day': '2022-01-01',
'employee_id': 1,
'average': 2.0,
'sum': 13.0},
{'name': 'joe',
'day': '2022-01-02',
'employee_id': 1,
'average': 3.0,
'sum': 15.0},
{'name': 'joe',
'day': '2022-01-03',
'employee_id': 1,
'average': 9.0,
'sum': 0.0},
{'name': 'bob',
'day': '2022-01-01',
'employee_id': 2,
'average': 1.0,
'sum': 9.0},
{'name': 'bob',
'day': '2022-01-02',
'employee_id': 2,
'average': 3.0,
'sum': 8.0}]
From the description given the combination "day', "name", "employee_id" acts like a unique combination to which the other two fields should be added. Each incoming dictionary has these and we can use them as a key into a new dictionary, but we need to convert them to something hashable like a json string which we need to produce with sorting to make them unique..
from json import dumps
data = [{"average": 2, "day": "2022-01-01", "employee_id": 1, "name": "joe"},
{"average": 3, "day": "2022-01-02", "name": "joe", "employee_id": 1},
{"average": 9, "day": "2022-01-03", "name": "joe", "employee_id": 1},
{"sum": 13, "day": "2022-01-01", "name": "joe", "employee_id": 1},
{"sum": 15, "day": "2022-01-02", "name": "joe", "employee_id": 1},
{"sum": 0, "day": "2022-01-03", "name": "joe", "employee_id": 1},
{"average": 1, "day": "2022-01-01", "name": "bob", "employee_id": 2},
{"average": 3, "day": "2022-01-02", "name": "bob", "employee_id": 2},
{"sum": 9, "day": "2022-01-01", "name": "bob", "employee_id": 2},
{"sum": 8, "day": "2022-01-02", "name": "bob", "employee_id": 2}]
flattend_employee_summaries = dict()
for employee_summary in data:
key = employee_summary.copy()
if "average" in key:
del key["average"]
if dumps(key, sort_keys=True) not in flattend_employee_summaries:
flattend_employee_summaries[dumps(key, sort_keys=True)] = employee_summary.copy()
else:
flattend_employee_summaries[dumps(key, sort_keys=True)]["average"] = employee_summary["average"]
if "sum" in key:
del key["sum"]
if dumps(key, sort_keys=True) not in flattend_employee_summaries:
flattend_employee_summaries[dumps(key, sort_keys=True)] = employee_summary.copy()
else:
flattend_employee_summaries[dumps(key, sort_keys=True)]["sum"] = employee_summary["sum"]
flattend_employee_summaries = [ summary for summary in flattend_employee_summaries.values()]
print(f'{flattend_employee_summaries=}')
It has been answered, and I suspect this is the long way of repeating Mechanic Pig's solution which I'd recommend. For all of the solutions, I believe we are assuming there is only one average record per employee/day.
employees = dict()
for data_row in data:
if data_row['employee_id'] not in employees:
employees[data_row['employee_id']] = {data_row['day']: {'name':data_row.get('name', 0),
'average': data_row.get('average', 0),
'sum': data_row.get('sum',0)
}
}
else:
data_row_day = data_row['day']
if data_row['day'] not in employees[data_row['employee_id']]:
employees[data_row['employee_id']][data_row_day] = {'name':data_row.get('name', 0),
'average': data_row.get('average', 0),
'sum': data_row.get('sum', 0)
}
else:
current_sum = employees[data_row['employee_id']][data_row_day].get('sum',0)
employees[data_row['employee_id']][data_row_day].update({'sum': current_sum + data_row.get('sum', 0) })
employee_output = list()
for employee_id, employee_dates in employees.items():
for employee_date, employee_details in employee_dates.items():
employee_output.append({"name": employee_details['name'],
"employee_id": employee_id,
"day": employee_date,
"average": employee_details['average'],
"sum": employee_details['sum'],
})
employee_output would then contain:
[{'name': 'joe',
'employee_id': 1,
'day': '2022-01-01',
'average': 2,
'sum': 13},
{'name': 'joe',
'employee_id': 1,
'day': '2022-01-02',
'average': 3,
'sum': 15},
{'name': 'joe',
'employee_id': 1,
'day': '2022-01-03',
'average': 9,
'sum': 0},
{'name': 'bob',
'employee_id': 2,
'day': '2022-01-01',
'average': 1,
'sum': 9},
{'name': 'bob',
'employee_id': 2,
'day': '2022-01-02',
'average': 3,
'sum': 8}]
How can I sort a dictionary using the values of a key's dictionary?
Input:
myDict = {
"1":{
"VALUE1": 10,
"VALUE2": 5,
"VALUE3": 3
},
"2":{
"VALUE1": 5,
"VALUE2": 3,
"VALUE3": 1
},
"3":{
"VALUE1": 15,
"VALUE2": 2,
"VALUE3": 4
},
}
Expected output:
myDict = {
"3": {
"VALUE1": 15,
"VALUE2": 2,
"VALUE3": 4
},
"1": {
"VALUE1": 10,
"VALUE2": 5,
"VALUE3": 3
},
"2": {
"VALUE1": 5,
"VALUE2": 3,
"VALUE3": 1
},
}
It is now sorted by the value of keys VALUE1
How would I get the expected output?
Try:
newDict = dict(sorted(myDict.items(), key = lambda x: x[1]['VALUE1'], reverse=True))
newDict
{'3': {'VALUE1': 15, 'VALUE2': 2, 'VALUE3': 4},
'1': {'VALUE1': 10, 'VALUE2': 5, 'VALUE3': 3},
'2': {'VALUE1': 5, 'VALUE2': 3, 'VALUE3': 1}}
I am working with a nested data structure which needs to be flattened. The values need to be aggregated so totals are produced across each level of the nested data. I'm trying to do this recursively but it's not clear how best to achieve this?
The following is an example of the data I'm working with.
def get_result():
return {
"a1": {
"b1": {
"c1": {
"d1": 1,
"d2": 1,
},
"c2": {
"d3": 1,
}
},
"b2": {
"c3": {
"d4": 1
}
}
},
"a2": {}
}
The data I'd like to produce would be as follows:
[
{
"key": "a1",
"total": 4
},
{
"key": "b1",
"total": 3
},
{
"key": "c1",
"total": 2
},
{
"key": "d1",
"total": 1
},
{
"key": "d2",
"total": 1
}
{
"key": "c2",
"total": 1
},
{
"key": "d3",
"total": 1
},
{
"key": "b2",
"total": 1
},
{
"key": "c3",
"total": 1
},
{
"key": "d4",
"total": 1
}
]
You can use recursion
from collections import defaultdict
def agg(data):
result = defaultdict(int)
agg_sum = 0
for k, v in data.items():
if isinstance(v, dict):
d, sub = agg(v)
if sub:
result.update(d)
result[k] += sub
agg_sum += sub
else:
result[k] += v
agg_sum += v
return result, agg_sum
You can use a recursive generator function for a shorter solution:
d = {'a1': {'b1': {'c1': {'d1': 1, 'd2': 1}, 'c2': {'d3': 1}}, 'b2': {'c3': {'d4': 1}}}, 'a2': {}}
def get_aggr(d):
return d if not isinstance(d, dict) else sum(map(get_aggr, d.values()))
def aggr_keys(d):
for a, b in d.items():
yield {'key':a, 'total':get_aggr(b)}
yield from (() if not isinstance(b, dict) else aggr_keys(b))
print(list(aggr_keys(d)))
Output:
[{'key': 'a1', 'total': 4},
{'key': 'b1', 'total': 3},
{'key': 'c1', 'total': 2},
{'key': 'd1', 'total': 1},
{'key': 'd2', 'total': 1},
{'key': 'c2', 'total': 1},
{'key': 'd3', 'total': 1},
{'key': 'b2', 'total': 1},
{'key': 'c3', 'total': 1},
{'key': 'd4', 'total': 1},
{'key': 'a2', 'total': 0}]
Assume I have this:
[
{"name": "bob", "total": 1},
{"name": "alice", "total": 5},
{"name": "eve", "total": 2},
{"name": "bob", "total": 3},
{"name": "alice", "total": 2},
{"name": "alice", "total": 2},
]
I want to transform this list into :
[
{"name": "bob", "total": 4},
{"name": "alice", "total": 9},
{"name": "eve", "total": 2}
]
For now, I walk through the whole second list to find if the key exist for each loop of the first list.
How can I achieve this with a lower complexity?
from collections import defaultdict
a = [
{"name": "bob", "total": 1},
{"name": "alice", "total": 5},
{"name": "eve", "total": 2},
{"name": "bob", "total": 3},
{"name": "alice", "total": 2},
{"name": "alice", "total": 2},
]
# calculate the frequency of each key
freq = defaultdict(lambda: 0)
for d in a:
freq[d['name']] += d['total']
# build the result list
a = list()
for key, val in freq.items():
a.append({'name': key, 'total': val})
print(a)
If you only have two pieces of information (name and total), I would suggest changing your schema a bit. Instead of a list of dictionaries, use a single dictionary where the keys are names and the values are totals:
>>> values = [
... {"name": "bob", "total": 1},
... {"name": "alice", "total": 5},
... {"name": "eve", "total": 2},
... {"name": "bob", "total": 3},
... {"name": "alice", "total": 2},
... {"name": "alice", "total": 2},
... ]
>>> from collections import defaultdict
>>> totals_by_name = defaultdict(int)
>>> for value in values:
... totals_by_name[value["name"]] += value["total"]
...
>>> totals_by_name
defaultdict(<class 'int'>, {'bob': 4, 'alice': 9, 'eve': 2})
This can work even if you have more pieces of data that you want to look up by name (replace the integer value with a nested dictionary that stores the total as well as other data).
You can use groupby from the itertools module:
from itertools import groupby
from operator import itemgetter
# itemgetter(foo) is roughly equivalent to lambda x: x[foo]
get_name = itemgetter('name')
get_total = itemgetter('total')
lst = [
{"name": "bob", "total": 1},
{"name": "alice", "total": 5},
{"name": "eve", "total": 2},
{"name": "bob", "total": 3},
{"name": "alice", "total": 2},
{"name": "alice", "total": 2},
]
grouped = groupby(sorted(lst, key=get_name), get_name)
new_list = [{'name': k, 'total': sum(get_total(x) for x in v)} for k, v in grouped]
groupby will produce a new sequence that collects the dicts from the original list into subsequences, based on a common value of the 'name' attribute. Iterating over that lets you extract all the total values to sum up for use in a new list of dict values.
Let's say,
your_data = [
{"name": "bob", "total": 1},
{"name": "alice", "total": 5},
{"name": "eve", "total": 2},
{"name": "bob", "total": 3},
{"name": "alice", "total": 2},
{"name": "alice", "total": 2},
]
You can simply use pandas to receive the desired output.
import pandas as pd
df = pd.DataFrame(your_data)
df = df.groupby(by = 'name', as_index = False).sum('total')
result = df.to_dict(orient = 'records')
OUTPUT: [{'name': 'alice', 'total': 9}, {'name': 'bob', 'total': 4}, {'name': 'eve', 'total': 2}]
I have two lists containing dictionaries:
List1 = [{"Value": "Value1", "Start": 7.11, "End": 8},
{"Value": "Value2", "Start": 16.45, "End": 20}]
List2 = [{"From":7.11, "To": 8, "Result": 0},
{"From":16.45, "To": 20 "Result": 1}
]
I need to produce a list by correlating these lists. So result will be
Result = [{"Value": "Value1", "Start": 7.11, "End": 8, Result: 0},
{"Value": "Value2", "Start": 16.45, "End": 20,Result: 1}]
This almost seem like simple table join in SQL.
How would I do it in Python?
Thanks!
You can use a nested dictionary comprehension:
List1 = [{"Value": "Value1", "Start": 7.11, "End": 8},
{"Value": "Value2", "Start": 16.45, "End": 20}]
List2 = [{"From":7.11, "To": 8, "Result": 0},
{"From":16.45, "To": 20, "Result": 1}
]
new_list = [{**a, **{'Result':b['Result']}} for a, b in zip(List1, List2)]
Output:
[{'Value': 'Value1', 'Start': 7.11, 'End': 8, 'Result': 0}, {'Value': 'Value2', 'Start': 16.45, 'End': 20, 'Result': 1}]
Since, dictionary unpacking (**) is a feature in Python3 only, you can use dict.items in Python2:
new_list = [dict(a.items()+[('Result', b['Result'])]) for a, b in zip(List1, List2)]
Output:
[{'Start': 7.11, 'End': 8, 'Result': 0, 'Value': 'Value1'}, {'Start': 16.45, 'End': 20, 'Result': 1, 'Value': 'Value2'}]