How to group datetime by hour - python

I am trying to group an array of datetime by hour and return the count of each hour.
This is my list that contains many datetime objects. I try to use a loop to count how many datetime objects are having the same hour but I could not find a way to get the count.
The other references at stackoverflow are all storing date as a column in pandas which I do not want, because my datetime are store in a list.
I am hoping to get a list of hour_count objects that looks like this
hour_cound = [
{
"hour": datetime,
"count": 2
}
]
# code
hours = [
datetime(2019, 1, 25, 1),
datetime(2019, 1, 25, 1),
datetime(2019, 1, 25, 2),
datetime(2019, 1, 25, 3),
datetime(2019, 1, 25, 4),
datetime(2019, 1, 25, 4)
]
existed = []
for hour in hours:
if hour.hour not in existed:
existed.append({
"hour": hour.hour,
"count": existed[hour.hour] + 1
})

The simplest thing without using pandas is to use collections.Counter
from collections import Counter
counts = Counter(h.hour for h in hours)
print(counts)
#Counter({1: 2, 2: 1, 3: 1, 4: 2})
Now just reformat into your desired output using a list comprehension:
hour_count = [{"hour": h, "count": c} for h, c in counts.items()]
print(hour_count)
#[{'count': 2, 'hour': 1},
# {'count': 1, 'hour': 2},
# {'count': 1, 'hour': 3},
# {'count': 2, 'hour': 4}]

You can use a helper method from pandas to store your list of hours and then use numpy to generate unique counts for each unique hour.
import numpy as np
import pandas as pd
hours = pd.DatetimeIndex(hours).hour
unique_hours, counts = np.unique(hours, return_counts=True)
hour_count = [{ "hour": hour, "count": count } for hour, count in zip(unique_hours, counts)]
pprint(hour_count)
Result
[{'count': 2, 'hour': 1},
{'count': 1, 'hour': 2},
{'count': 1, 'hour': 3},
{'count': 2, 'hour': 4}]

Related

Summing Values in Dictionary Based on Certain Values

I have a list of dictionaries that state a date as well as a price. It looks like this:
dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50}, {'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
I'd like to create a new list of dictionaries that sum all the Price values that are on the same date. So the output would look like this:
output_dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
How could I achieve this?
You can use Counter from collections module:
from collections import Counter
c = Counter()
for v in dict:
c[v['Date']] += v['Price']
output_dict = [{'Date': name, 'Price': count} for name, count in c.items()]
Output:
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
OR, a new way:
You can use Pandas library to solve this:
Install pandas like:
pip install pandas
Then code would be:
import pandas as pd
output_dict = pd.DataFrame(dict).groupby('Date').agg(sum).to_dict()['Price']
Output:
{Timestamp('2020-06-01 00:00:00'): 62, Timestamp('2020-06-02 00:00:00'): 60}
Another solution using itertools.groupby:
import datetime
from itertools import groupby
dct = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50}, {'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
out = []
for k, g in groupby(dct, lambda k: k['Date']):
out.append({'Date': k, 'Price': sum(v['Price'] for v in g)})
print(out)
Prints:
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
You can use itertools' groupby, although I'd like to believe that defaultdict will be faster :
#sort dicts
dicts = sorted(dicts, key= itemgetter("Date"))
#get the sum via itertools' groupby
result = [{"Date" : key,
"Price" : sum(entry['Price'] for entry in value)}
for key,value in
groupby(dicts, key = itemgetter("Date"))]
print(result)
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
Using defaultdict
import datetime
from collections import defaultdict
dct = [{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50},
{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
sum_up = defaultdict(int)
for v in dct:
sum_up[v['Date']] += v['Price']
print([{"Date": k, "Price": v} for k, v in sum_up.items()])
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
This a good use-case for defaultdict, let's say our dict is my_dict:
import datetime
my_dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50},
{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12},
{'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
We can accumulate prices using a defaultdict like so:
from collections import defaultdict
new_dict = defaultdict(int)
for dict_ in my_dict:
new_dict[dict_['Date']] += dict_['Price']
Then we just reconvert this dict into a list of dicts!:
my_dict = [{'Date': date, 'Price': price} for date, price in new_dict.items()]

Python - group/merge dictionaries based on key/values identity

I have a list containing many dictionaries with same keys but different values.
What I would like to do is to group/merge dictionaries based on the values of some of the keys.
It's probably faster to show an example rather than trying to explain:
[{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 3, 'C2': 15},
{'zone': 'B', 'weekday': 2, 'hour': 6, 'C1': 5, 'C2': 27},
{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 7, 'C2': 12},
{'zone': 'C', 'weekday': 5, 'hour': 8, 'C1': 2, 'C2': 13}]
So, what I want to achieve is merging the first and third dictionary, since they have the same "zone", "hour" and "weekday", summing the values in C1 and C2:
[{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 10, 'C2': 27},
{'zone': 'B', 'weekday': 2, 'hour': 6, 'C1': 5, 'C2': 27},
{'zone': 'C', 'weekday': 5, 'hour': 8, 'C1': 2, 'C2': 13}]
Any help here? :) I've been struggling with this for a couple of days, I've got a bad unscalable solution, but I'm sure there is something far more pythonic that I could put in place.
Thanks!
Sort then group by the relevant keys; iterate over the groups and create new dictionaries with summed values.
import operator
import itertools
keys = operator.itemgetter('zone','weekday','hour')
c1_c2 = operator.itemgetter('C1','C2')
# data is your list of dicts
data.sort(key=keys)
grouped = itertools.groupby(data,keys)
new_data = []
for (zone,weekday,hour),g in grouped:
c1,c2 = 0,0
for d in g:
c1 += d['C1']
c2 += d['C2']
new_data.append({'zone':zone,'weekday':weekday,
'hour':hour,'C1':c1,'C2':c2})
That last loop could also be written as:
for (zone,weekday,hour),g in grouped:
cees = map(c1_c2,g)
c1,c2 = map(sum,zip(*cees))
new_data.append({'zone':zone,'weekday':weekday,
'hour':hour,'C1':c1,'C2':c2})
By using a defaultdict you can merge them in linear time.
from collections import defaultdict
res = defaultdict(lambda : defaultdict(int))
for d in dictionaries:
res[(d['zone'],d['weekday'],d['hour'])]['C1']+= d['C1']
res[(d['zone'],d['weekday'],d['hour'])]['C2']+= d['C2']
The drawback is that you need another pass to have the output as you've defined it.
I've gone ahead and written a slightly longer solution, making use of nametuples as keys of the dictionary:
from collections import namedtuple
zones = [{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 3, 'C2': 15},
{'zone': 'B', 'weekday': 2, 'hour': 6, 'C1': 5, 'C2': 27},
{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 7, 'C2': 12},
{'zone': 'C', 'weekday': 5, 'hour': 8, 'C1': 2, 'C2': 13}]
ZoneTime = namedtuple("ZoneTime", ["zone", "weekday", "hour"])
results = dict()
for zone in zones:
zone_time = ZoneTime(zone['zone'], zone['weekday'], zone['hour'])
if zone_time in results:
results[zone_time]['C1'] += zone['C1']
results[zone_time]['C2'] += zone['C2']
else:
results[zone_time] = {'C1': zone['C1'], 'C2': zone['C2']}
print(results)
This uses a namedtuple of (zone, weekday, hour) as the key to each dictionary. Then it's fairly trivial to either add to it if it already exists within results, or create a new entry in the dictionary.
You can definitely make this shorter and "smarter", but it may become less readable.
Edit: Run Time Comparison
My original answer (see below) was not a good one, but I think I had a useful contribution by doing a little bit of run time analysis on the other answers so I've edited that portion and put it at the top. Here I include the three other solutions, along with the required transformations to produce the desired output. For completeness I also include a version using pandas, which assumes that the user is working with a DataFrame (transforming from list of dicts to data frame and back was not even close to worth it). Comparison times vary a little depending on the random data generated, but these are fairly representative:
>>> run_timer(100)
Times with 100 values
...with defaultdict: 0.1496697600000516
...with namedtuple: 0.14976404899994122
...with groupby: 0.0690777249999428
...with pandas: 3.3165711250001095
>>> run_timer(1000)
Times with 1000 values
...with defaultdict: 1.267153091999944
...with namedtuple: 0.9605341750000207
...with groupby: 0.6634409229998255
...with pandas: 3.5146895360001054
>>> run_timer(10000)
Times with 10000 values
...with defaultdict: 9.194478484000001
...with namedtuple: 9.157486462000179
...with groupby: 5.18553969300001
...with pandas: 4.704001281000046
>>> run_timer(100000)
Times with 100000 values
...with defaultdict: 59.644778522000024
...with namedtuple: 89.26688319799996
...with groupby: 93.3517027989999
...with pandas: 14.495209061999958
Take aways:
working with pandas data frames pays off big time for large datasets
NOTE: I do not include conversion between list of dicts and data frame, which is definitely significant
otherwise the accepted solution (by wwii) wins for small to medium datasets, but for very large ones it may be the slowest
changing the sizes of the groups (e.g., by decreasing the number of zones) has a huge effect which is not examined here
Here is the script I used to generate the above.
import random
import pandas
from timeit import timeit
from functools import partial
from itertools import groupby
from operator import itemgetter
from collections import namedtuple, defaultdict
def with_pandas(df):
return df.groupby(['zone', 'weekday', 'hour']).agg(sum).reset_index()
def with_groupby(data):
keys = itemgetter('zone', 'weekday', 'hour')
# data is your list of dicts
data.sort(key=keys)
grouped = groupby(data, keys)
new_data = []
for (zone, weekday, hour), g in grouped:
c1, c2 = 0, 0
for d in g:
c1 += d['C1']
c2 += d['C2']
new_data.append({'zone': zone, 'weekday': weekday,
'hour': hour, 'C1': c1, 'C2': c2})
return new_data
def with_namedtuple(zones):
ZoneTime = namedtuple("ZoneTime", ["zone", "weekday", "hour"])
results = dict()
for zone in zones:
zone_time = ZoneTime(zone['zone'], zone['weekday'], zone['hour'])
if zone_time in results:
results[zone_time]['C1'] += zone['C1']
results[zone_time]['C2'] += zone['C2']
else:
results[zone_time] = {'C1': zone['C1'], 'C2': zone['C2']}
return [
{
'zone': key[0],
'weekday': key[1],
'hour': key[2],
**val
}
for key, val in results.items()
]
def with_defaultdict(dictionaries):
res = defaultdict(lambda: defaultdict(int))
for d in dictionaries:
res[(d['zone'], d['weekday'], d['hour'])]['C1'] += d['C1']
res[(d['zone'], d['weekday'], d['hour'])]['C2'] += d['C2']
return [
{
'zone': key[0],
'weekday': key[1],
'hour': key[2],
**val
}
for key, val in res.items()
]
def gen_random_vals(num):
return [
{
'zone': random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ'),
'weekday': random.randint(1, 7),
'hour': random.randint(0, 23),
'C1': random.randint(1, 50),
'C2': random.randint(1, 50),
}
for idx in range(num)
]
def run_timer(num_vals=1000, timeit_num=1000):
vals = gen_random_vals(num_vals)
df = pandas.DataFrame(vals)
p_fmt = "\t...with %s: %s"
times = {
'defaultdict': timeit(stmt=partial(with_defaultdict, vals), number=timeit_num),
'namedtuple': timeit(stmt=partial(with_namedtuple, vals), number=timeit_num),
'groupby': timeit(stmt=partial(with_groupby, vals), number=timeit_num),
'pandas': timeit(stmt=partial(with_pandas, df), number=timeit_num),
}
print("Times with %d values" % num_vals)
for key, val in times.items():
print(p_fmt % (key, val))
where
with_groupby uses the solution by wwii
with_namedtuple uses the solution by Jose Salvatierra
with_defaultdict uses the solution by abc
with_pandas uses the solution proposed by Alexander Cécile in comments
assumes data is already in a DataFrame and produces a DataFrame as result
Original answer:
Just for fun, here's a completely different approach using groupby. Granted, it's not the prettiest, but it should be fairly quick.
from itertools import groupby
from operator import itemgetter
from pprint import pprint
vals = [
{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 3, 'C2': 15},
{'zone': 'B', 'weekday': 2, 'hour': 6, 'C1': 5, 'C2': 27},
{'zone': 'A', 'weekday': 1, 'hour': 12, 'C1': 7, 'C2': 12},
{'zone': 'C', 'weekday': 5, 'hour': 8, 'C1': 2, 'C2': 13}
]
ordered = sorted(
[
(
(row['zone'], row['weekday'], row['hour']),
row['C1'], row['C2']
)
for row in vals
]
)
def invert_columns(grp):
return zip(*[g_row[1:] for g_row in grp])
merged = [
{
'zone': key[0],
'weekday': key[1],
'hour': key[2],
**dict(
zip(["C1", "C2"], [sum(col) for col in invert_columns(grp)])
)
}
for key, grp in groupby(ordered, itemgetter(0))
]
pprint(merged)
which yields
[{'C1': 10, 'C2': 27, 'hour': 12, 'weekday': 1, 'zone': 'A'},
{'C1': 5, 'C2': 27, 'hour': 6, 'weekday': 2, 'zone': 'B'},
{'C1': 2, 'C2': 13, 'hour': 8, 'weekday': 5, 'zone': 'C'}]

Creating a complex nested dictionary from multiple lists in Python

I am struggling to create a nested dictionary with the following data:
Team, Group, ID, Score, Difficulty
OneTeam, A, 0, 0.25, 4
TwoTeam, A, 1, 1, 10
ThreeTeam, A, 2, 0.64, 5
FourTeam, A, 3, 0.93, 6
FiveTeam, B, 4, 0.5, 7
SixTeam, B, 5, 0.3, 8
SevenTeam, B, 6, 0.23, 9
EightTeam, B, 7, 1.2, 4
Once imported as a Pandas Dataframe, I turn each feature into these lists:
teams, group, id, score, diff.
Using this stack overflow answer Create a complex dictionary using multiple lists I can create the following dictionary:
{'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25},
'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}
using the code:
{team: {'id': i, 'score': s, 'diff': d} for team, i, s, d in zip(teams, id, score, diff)}
But what I'm after is having 'Group' as the main key, then team, and then id, score and difficulty within the team (as above).
I have tried:
{g: {team: {'id': i, 'score': s, 'diff': d}} for g, team, i, s, d in zip(group, teams, id, score, diff)}
but this doesn't work and results in only one team per group within the dictionary:
{'A': {'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93}},
'B': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2}}}
Below is how the dictionary should look, but I'm not sure how to get there - any help would be much appreciated!
{'A:': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25}},
'B': {'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}}
A dict comprehension may not be the best way of solving this if your data is stored in a table like this.
Try something like
from collections import defaultdict
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
By using defaultdict, if groups[g] already exists, the new team is added as a key, if it doesn't, an empty dict is automatically created that the new team is then inserted into.
Edit: you edited your answer to say that your data is in a pandas dataframe. You can definitely skip the steps of turning the columns into list. Instead you could then for example do:
from collections import defaultdict
groups = defaultdict(dict)
for row in df.itertuples():
groups[row.Group][row.Team] = {'id': row.ID, 'score': row.Score, 'diff': row.Difficulty}
If you absolutely want to use comprehension, then this should work:
z = zip(teams, group, id, score, diff)
s = set(group)
d = { #outer dict, one entry for each different group
group: ({ #inner dict, one entry for team, filtered for group
team: {'id': i, 'score': s, 'diff': d}
for team, g, i, s, d in z
if g == group
})
for group in s
}
I added linebreaks for clarity
EDIT:
After the comment, to better clarify my intention and out of curiosity, I run a comparison:
# your code goes here
from collections import defaultdict
import timeit
teams = ['OneTeam', 'TwoTeam', 'ThreeTeam', 'FourTeam', 'FiveTeam', 'SixTeam', 'SevenTeam', 'EightTeam']
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
id = [0, 1, 2, 3, 4, 5, 6, 7]
score = [0.25, 1, 0.64, 0.93, 0.5, 0.3, 0.23, 1.2]
diff = [4, 10, 5, 6, 7, 8, 9, 4]
def no_comprehension():
global group, teams, id, score, diff
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
def comprehension():
global group, teams, id, score, diff
z = zip(teams, group, id, score, diff)
s = set(group)
d = {group: ({team: {'id': i, 'score': s, 'diff': d} for team, g, i, s, d in z if g == group}) for group in s}
print("no comprehension:")
print(timeit.timeit(lambda : no_comprehension(), number=10000))
print("comprehension:")
print(timeit.timeit(lambda : comprehension(), number=10000))
executable version
Output:
no comprehension:
0.027287796139717102
comprehension:
0.028979241847991943
They do look the same, in terms of performance. With my sentence above, I was just highlighting this as an alternative solution to the one already posted by #JohnO.

Python - Inserting and Updating python dict simultaneously

so I have a list of dicts that looks like this:
[{
'field': {
'data': 'F1'
},
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value2,
'date': datetime.datetime(2019, 2, 7, 0, 0)
}]
And I want an output that looks like this:
[
{
'F1': [
{
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}
]
},
{
'F2': [
{
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
{
'value': F2Value2,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
]
}
]
That is, I want every field.data to be the key and have it append the value and date if it belongs to the same field.
Note: I want to do this WITHOUT using a for loop (apart from the loop to iterate through the list). I want to use python dict functions like update() and append() etc.
Any optimized solutions would be really helpful.
You could just use iterate through the list of dicts and use defaultdict from collections to add the items with a unique key,
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>>
>>> for items in x:
... d[items['field']['data']].append({
... 'value': items['value'],
... 'date': items['date']
... })
...
>>>
>>> import pprint
>>> pprint.pprint(x)
[{'date': datetime.datetime(2019, 3, 1, 0, 0),
'field': {'data': 'F1'},
'value': 'F1Value1'},
{'date': datetime.datetime(2019, 2, 5, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value2'}]
>>>
>>> pprint.pprint(list(d.items()))
[('F1', [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]),
('F2',
[{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}])]
Use itertools.groupby:
from itertools import groupby
from pprint import pprint
result = [{key: [{k: v for k, v in element.items() if k != 'field'}
for element in group]}
for key, group in groupby(data, lambda element: element['field']['data'])]
pprint(result)
Output:
[{'F1': [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]},
{'F2': [{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}]}]
Only using dict, list, and set:
[
{
field_data :
[
{ k:v for k, v in thing.items() if k != 'field' }
for thing in things if thing['field']['data'] == field_data
]
for field_data in set(thing['field']['data'] for thing in things)
}
]

Aggregate values on lists of dicts based on key in python

I'm trying to get the aggregation of 2 different lists, where each element is a dictionary with 2 entries, month and value.
So the first list looks like this:
[{
'patient_notes': 5,
'month': datetime.date(2017, 1, 1)
}, {
'patient_notes': 5,
'month': datetime.date(2017, 2, 1)
}, {
'patient_notes': 5,
'month': datetime.date(2017, 5, 1)
}, {
'patient_notes': 5,
'month': datetime.date(2017, 7, 1)
}, {
'patient_notes': 5,
'month': datetime.date(2017, 8, 1)
}, {
'patient_notes': 5,
'month': datetime.date(2017, 12, 1)
}]
Second list is:
[{
'employee_notes': 4,
'month': datetime.date(2017, 2, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 3, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 4, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 8, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 9, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 10, 1)
}, {
'employee_notes': 4,
'month': datetime.date(2017, 12, 1)
}]
So I need to build a new list that contains the sum of both list per month, something like this:
[{
'total_messages': 14,
'month': '2017-01-01'
}, {
'total_messages': 14,
'month': '2017-02-01'
}, {
'total_messages': 14,
'month': '2017-03-01'
}, {
'total_messages': 14,
'month': '2017-04-01'
}, {
'total_messages': 14,
'month': '2017-05-01'
}, {
'total_messages': 14,
'month': '2017-06-01'
}, {
'total_messages': 14,
'month': '2017-07-01'
}, {
'total_messages': 14,
'month': '2017-08-01'
}, {
'total_messages': 14,
'month': '2017-09-01'
}, {
'total_messages': 14,
'month': '2017-10-01'
}, {
'total_messages': 14,
'month': '2017-11-01'
}, {
'total_messages': 14,
'month': '2017-12-01'
}]
I first tried with zip but this only works if first 2 list are equal size. Then I tried with [itertools.izip_longest] but this has problems if lists are equal size but different months...I cannot simply aggregate those...I need to aggregate matching months only
Counter also is great for this, but I cannot change the keys names of original lists...any ideas?
You can use defaultdict to create a counter. Go through each item in the first list and add the patient_notes value to the dictionary. Then go through the second list and add the employee_notes values.
Now you need to encode your new defaultdict back into a list in your desired format. You can use a list comprehension for that. I've sorted the list by month.
from collections import defaultdict
dd = defaultdict(int)
for d in my_list_1:
dd[d['month']] += d['patient_notes']
for d in my_list_2:
dd[d['month']] += d['employee_notes']
result = [{'total_messages': dd[k], 'month': k} for k in sorted(dd.keys())]
>>> result
[{'month': datetime.date(2017, 1, 1), 'total_messages': 5},
{'month': datetime.date(2017, 2, 1), 'total_messages': 9},
{'month': datetime.date(2017, 3, 1), 'total_messages': 4},
{'month': datetime.date(2017, 4, 1), 'total_messages': 4},
{'month': datetime.date(2017, 5, 1), 'total_messages': 5},
{'month': datetime.date(2017, 7, 1), 'total_messages': 5},
{'month': datetime.date(2017, 8, 1), 'total_messages': 9},
{'month': datetime.date(2017, 9, 1), 'total_messages': 4},
{'month': datetime.date(2017, 10, 1), 'total_messages': 4},
{'month': datetime.date(2017, 12, 1), 'total_messages': 9}]
from collections import defaultdict
d_dict = defaultdict(int)
for k,v in [ i.values() for i in l1 + l2 ]:
d_dict[k] += v
[ {'month':i.strftime("%Y-%m-%d"),'total_messages':j} for i, j in sorted(d_dict.items()) ]
Output:
[{'month': '2017-01-01', 'total_messages': 5},
{'month': '2017-02-01', 'total_messages': 9},
{'month': '2017-03-01', 'total_messages': 4},
{'month': '2017-04-01', 'total_messages': 4},
{'month': '2017-05-01', 'total_messages': 5},
{'month': '2017-07-01', 'total_messages': 5},
{'month': '2017-08-01', 'total_messages': 9},
{'month': '2017-09-01', 'total_messages': 4},
{'month': '2017-10-01', 'total_messages': 4},
{'month': '2017-12-01', 'total_messages': 9}]

Categories