Summing Values in Dictionary Based on Certain Values - python

I have a list of dictionaries that state a date as well as a price. It looks like this:
dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50}, {'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
I'd like to create a new list of dictionaries that sum all the Price values that are on the same date. So the output would look like this:
output_dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
How could I achieve this?

You can use Counter from collections module:
from collections import Counter
c = Counter()
for v in dict:
c[v['Date']] += v['Price']
output_dict = [{'Date': name, 'Price': count} for name, count in c.items()]
Output:
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
OR, a new way:
You can use Pandas library to solve this:
Install pandas like:
pip install pandas
Then code would be:
import pandas as pd
output_dict = pd.DataFrame(dict).groupby('Date').agg(sum).to_dict()['Price']
Output:
{Timestamp('2020-06-01 00:00:00'): 62, Timestamp('2020-06-02 00:00:00'): 60}

Another solution using itertools.groupby:
import datetime
from itertools import groupby
dct = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50}, {'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12}, {'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
out = []
for k, g in groupby(dct, lambda k: k['Date']):
out.append({'Date': k, 'Price': sum(v['Price'] for v in g)})
print(out)
Prints:
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]

You can use itertools' groupby, although I'd like to believe that defaultdict will be faster :
#sort dicts
dicts = sorted(dicts, key= itemgetter("Date"))
#get the sum via itertools' groupby
result = [{"Date" : key,
"Price" : sum(entry['Price'] for entry in value)}
for key,value in
groupby(dicts, key = itemgetter("Date"))]
print(result)
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]

Using defaultdict
import datetime
from collections import defaultdict
dct = [{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50},
{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12},
{'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
sum_up = defaultdict(int)
for v in dct:
sum_up[v['Date']] += v['Price']
print([{"Date": k, "Price": v} for k, v in sum_up.items()])
[{'Date': datetime.datetime(2020, 6, 1, 0, 0), 'Price': 62}, {'Date': datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]

This a good use-case for defaultdict, let's say our dict is my_dict:
import datetime
my_dict = [{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 50},
{'Date':datetime.datetime(2020, 6, 1, 0, 0), 'Price': 12},
{'Date':datetime.datetime(2020, 6, 2, 0, 0), 'Price': 60}]
We can accumulate prices using a defaultdict like so:
from collections import defaultdict
new_dict = defaultdict(int)
for dict_ in my_dict:
new_dict[dict_['Date']] += dict_['Price']
Then we just reconvert this dict into a list of dicts!:
my_dict = [{'Date': date, 'Price': price} for date, price in new_dict.items()]

Related

Removing an element of a list [duplicate]

This question already has answers here:
How to modify list entries during for loop?
(10 answers)
Closed 1 year ago.
I have a list of lists containing dictionaries:
[[{'event_id': 1, 'order_id': 1, 'item_id': 1, 'count': 1, 'return_count': 0, 'status': 'OK'},
{'event_id': 2, 'order_id': 1, 'item_id': 1, 'count': 1, 'return_count': 0, 'status': 'OK'}],
[{'order_id': 2, 'item_id': 1, 'event_id': 1, 'count': 3, 'return_count': 1, 'status': 'OK'},
{'order_id': 2, 'event_id': 2, 'item_id': 1, 'count': 3, 'return_count': 1, 'status': 'OK'},
{'order_id': 2, 'event_id': 1, 'item_id': 2, 'count': 4, 'return_count': 2, 'status': 'OK'}]]
For each item in the given order I only need those dictionaries whose event_id is max. So I wrote the following code:
for el in lst:
for element in el:
if element['event_id'] != max(x['event_id'] for x in el if element['item_id'] == x['item_id']):
el.remove(element)
lst is the initial list.
For some reason, after running the code lst remains unchanged.
This isn't in one line, but it does return dictionaries with the max event id
dictlist = [
[{'event_id': 1, 'order_id': 1, 'item_id': 1, 'count': 1, 'return_count': 0, 'status': 'OK'},
{'event_id': 2, 'order_id': 1, 'item_id': 1, 'count': 1, 'return_count': 0, 'status': 'OK'}],
[{'order_id': 2, 'item_id': 1, 'event_id': 1, 'count': 3, 'return_count': 1, 'status': 'OK'},
{'order_id': 2, 'event_id': 2, 'item_id': 1, 'count': 3, 'return_count': 1, 'status': 'OK'},
{'order_id': 2, 'event_id': 1, 'item_id': 2, 'count': 4, 'return_count': 2, 'status': 'OK'}]]
max = 0
parsed = []
for item in dictlist:
for i in item:
if i['event_id'] > max:
max = i['event_id']
for item in dictlist:
for dic in item:
if dic['event_id'] == max:
parsed.append(dic)
You're trying to remove an element from a list you're iterating over, that won't work. And it's really hard to understand what you're trying to do; my suggestion would be to do something like this:
newlst = []
for el in lst:
max_event_id = max(element['event_id'] for element in el)
max_event_element = next(element for element in el if element['event_id'] == max_event_id)
newlst.append(max_event_element)
The expected result ends up in the newlst variable.
sort on "event_id" and keep only the max (last element):
result = [sorted(l, key=lambda x: x["event_id"])[-1] for l in lst]
If you want to keep all dictionaries with the max "event_id":
lsts = [[x for x in l if x["event_id"]==max(l, key=lambda x: x["event_id"])["event_id"]] for l in lst]
result = [item for sublist in lsts for item in lsts]

Create nested dictionary from queryset

my queryset output is
[{'ACCOUNT_NAME': 'MOHAMMAD FAWAD KHALID',
'ACCOUNT_SNO': 1810028081,
'ACTIVETRACKING': 1,
'CAMPAIGN_CODE': 'Testing',
'CAMPAIGN_DESCRIPTION': 'First Testing Campaign',
'CAMPAIGN_DOCS_ID': 121,
'CAMPAIGN_OBJECTIVE_ID': 2,
'CAMP_DETAIL_ID': 1462,
'CAMP_END': datetime.datetime(2020, 2, 1, 0, 0),
'CAMP_START': datetime.datetime(2020, 1, 1, 0, 0),
'CUSTOMER_EMAIL': 'm.fawadkhalid#gmail.com',
'DOCUMENT': 'App_download_urdu_1.html',
'ID': 61,
'ISACTIVE': 1,
'LAST_CYCLE': '2',
'MAILSTATUS_APP': 'D',
'MAILSUBJECT': 'MCBAH Testing Campaign',
'MOBILE_NO': '923000704342',
'OBJECTIVE': 'SIP Payment',
'TRACKINGCYCLE': 5}]
I need to convert above list as follows:
[{'DATA': {'ACCOUNT_NAME': 'MOHAMMAD FAWAD KHALID',
'ACCOUNT_SNO': 1810028081,
'ACTIVETRACKING': 1,
'CAMPAIGN_CODE': 'Testing',
'CAMPAIGN_DESCRIPTION': 'First Testing Campaign',
'CAMPAIGN_DOCS_ID': 121,
'CAMPAIGN_OBJECTIVE_ID': 2,
'CAMP_DETAIL_ID': 1462,
'CAMP_END': datetime.datetime(2020, 2, 1, 0, 0),
'CAMP_START': datetime.datetime(2020, 1, 1, 0, 0),
'CUSTOMER_EMAIL': 'm.fawadkhalid#gmail.com',
'DOCUMENT': 'App_download_urdu_1.html',
'ISACTIVE': 1,
'LAST_CYCLE': '2',
'MAILSTATUS_APP': 'D',
'MAILSUBJECT': 'MCBAH Testing Campaign',
'MOBILE_NO': '923000704342',
'OBJECTIVE': 'SIP Payment',
'TRACKINGCYCLE': 5},
'ID': 61}]
I tried to convert this as follows but in vain.
from collections import defaultdict
qr_dict = defaultdict(list)
for qr in query_result:
qr_dict[qr.ID].append(qr.qr)
I am suffering with following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-86-f76fa64c419a> in <module>
3 qr_dict = defaultdict(list)
4 for qr in query_result:
----> 5 qr_dict[qr.ID].append(qr.qr)
6
7
AttributeError: 'dict' object has no attribute 'ID'
Using a list comprehension and dict.pop
Ex:
query_result = [{'ID': 61, 'CAMP_DETAIL_ID': 1462, 'CAMP_START': datetime.datetime(2020, 1, 1, 0, 0), 'CAMP_END': datetime.datetime(2020, 2, 1, 0, 0), 'ISACTIVE': 1, 'ACTIVETRACKING': 1, 'TRACKINGCYCLE': 5, 'MAILSUBJECT': 'MCBAH Testing Campaign', 'CAMPAIGN_CODE': 'Testing', 'CAMPAIGN_DESCRIPTION': 'First Testing Campaign', 'ACCOUNT_SNO': 1810028081, 'CUSTOMER_EMAIL': 'm.fawadkhalid#gmail.com', 'MOBILE_NO': '923000704342', 'ACCOUNT_NAME': 'MOHAMMAD FAWAD KHALID', 'MAILSTATUS_APP': 'D', 'CAMPAIGN_OBJECTIVE_ID': 2, 'OBJECTIVE': 'SIP Payment', 'CAMPAIGN_DOCS_ID': 121, 'DOCUMENT': 'App_download_urdu_1.html', 'LAST_CYCLE': '2'}]
qr_dict = [{'ID':i.pop('ID'), 'data': i} for i in query_result]
Output:
[{'ID': 61,
'data': {'ACCOUNT_NAME': 'MOHAMMAD FAWAD KHALID',
'ACCOUNT_SNO': 1810028081,
'ACTIVETRACKING': 1,
'CAMPAIGN_CODE': 'Testing',
'CAMPAIGN_DESCRIPTION': 'First Testing Campaign',
'CAMPAIGN_DOCS_ID': 121,
'CAMPAIGN_OBJECTIVE_ID': 2,
'CAMP_DETAIL_ID': 1462,
'CAMP_END': datetime.datetime(2020, 2, 1, 0, 0),
'CAMP_START': datetime.datetime(2020, 1, 1, 0, 0),
'CUSTOMER_EMAIL': 'm.fawadkhalid#gmail.com',
'DOCUMENT': 'App_download_urdu_1.html',
'ISACTIVE': 1,
'LAST_CYCLE': '2',
'MAILSTATUS_APP': 'D',
'MAILSUBJECT': 'MCBAH Testing Campaign',
'MOBILE_NO': '923000704342',
'OBJECTIVE': 'SIP Payment',
'TRACKINGCYCLE': 5}}]

Python - Inserting and Updating python dict simultaneously

so I have a list of dicts that looks like this:
[{
'field': {
'data': 'F1'
},
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value2,
'date': datetime.datetime(2019, 2, 7, 0, 0)
}]
And I want an output that looks like this:
[
{
'F1': [
{
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}
]
},
{
'F2': [
{
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
{
'value': F2Value2,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
]
}
]
That is, I want every field.data to be the key and have it append the value and date if it belongs to the same field.
Note: I want to do this WITHOUT using a for loop (apart from the loop to iterate through the list). I want to use python dict functions like update() and append() etc.
Any optimized solutions would be really helpful.
You could just use iterate through the list of dicts and use defaultdict from collections to add the items with a unique key,
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>>
>>> for items in x:
... d[items['field']['data']].append({
... 'value': items['value'],
... 'date': items['date']
... })
...
>>>
>>> import pprint
>>> pprint.pprint(x)
[{'date': datetime.datetime(2019, 3, 1, 0, 0),
'field': {'data': 'F1'},
'value': 'F1Value1'},
{'date': datetime.datetime(2019, 2, 5, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value2'}]
>>>
>>> pprint.pprint(list(d.items()))
[('F1', [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]),
('F2',
[{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}])]
Use itertools.groupby:
from itertools import groupby
from pprint import pprint
result = [{key: [{k: v for k, v in element.items() if k != 'field'}
for element in group]}
for key, group in groupby(data, lambda element: element['field']['data'])]
pprint(result)
Output:
[{'F1': [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]},
{'F2': [{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}]}]
Only using dict, list, and set:
[
{
field_data :
[
{ k:v for k, v in thing.items() if k != 'field' }
for thing in things if thing['field']['data'] == field_data
]
for field_data in set(thing['field']['data'] for thing in things)
}
]

How to group datetime by hour

I am trying to group an array of datetime by hour and return the count of each hour.
This is my list that contains many datetime objects. I try to use a loop to count how many datetime objects are having the same hour but I could not find a way to get the count.
The other references at stackoverflow are all storing date as a column in pandas which I do not want, because my datetime are store in a list.
I am hoping to get a list of hour_count objects that looks like this
hour_cound = [
{
"hour": datetime,
"count": 2
}
]
# code
hours = [
datetime(2019, 1, 25, 1),
datetime(2019, 1, 25, 1),
datetime(2019, 1, 25, 2),
datetime(2019, 1, 25, 3),
datetime(2019, 1, 25, 4),
datetime(2019, 1, 25, 4)
]
existed = []
for hour in hours:
if hour.hour not in existed:
existed.append({
"hour": hour.hour,
"count": existed[hour.hour] + 1
})
The simplest thing without using pandas is to use collections.Counter
from collections import Counter
counts = Counter(h.hour for h in hours)
print(counts)
#Counter({1: 2, 2: 1, 3: 1, 4: 2})
Now just reformat into your desired output using a list comprehension:
hour_count = [{"hour": h, "count": c} for h, c in counts.items()]
print(hour_count)
#[{'count': 2, 'hour': 1},
# {'count': 1, 'hour': 2},
# {'count': 1, 'hour': 3},
# {'count': 2, 'hour': 4}]
You can use a helper method from pandas to store your list of hours and then use numpy to generate unique counts for each unique hour.
import numpy as np
import pandas as pd
hours = pd.DatetimeIndex(hours).hour
unique_hours, counts = np.unique(hours, return_counts=True)
hour_count = [{ "hour": hour, "count": count } for hour, count in zip(unique_hours, counts)]
pprint(hour_count)
Result
[{'count': 2, 'hour': 1},
{'count': 1, 'hour': 2},
{'count': 1, 'hour': 3},
{'count': 2, 'hour': 4}]

Grouping data on year

mydata = [{'date': datetime.datetime(2009, 1, 31, 0, 0), 'value': 14, 'year': u'2009'},
{'date': datetime.datetime(2009, 2, 28, 0, 0), 'value': 84, 'year': u'2009'},
{'date': datetime.datetime(2009, 3, 31, 0, 0), 'value': 77, 'year': u'2009'},
{'date': datetime.datetime(2009, 4, 30, 0, 0), 'value': 80, 'year': u'2009'},
{'date': datetime.datetime(2009, 5, 31, 0, 0), 'value': 6, 'year': u'2009'},
{'date': datetime.datetime(2009, 6, 30, 0, 0), 'value': 16, 'year': u'2009'},
{'date': datetime.datetime(2009, 7, 31, 0, 0), 'value': 16, 'year': u'2009'},
{'date': datetime.datetime(2009, 8, 31, 0, 0), 'value': 1, 'year': u'2009'},
{'date': datetime.datetime(2009, 9, 30, 0, 0), 'value': 9, 'year': u'2009'},
{'date': datetime.datetime(2008, 1, 31, 0, 0), 'value': 77, 'year': u'2008'},
{'date': datetime.datetime(2008, 2, 29, 0, 0), 'value': 60, 'year': u'2008'},
{'date': datetime.datetime(2008, 3, 31, 0, 0), 'value': 28, 'year': u'2008'},
{'date': datetime.datetime(2008, 4, 30, 0, 0), 'value': 9, 'year': u'2008'},
{'date': datetime.datetime(2008, 5, 31, 0, 0), 'value': 74, 'year': u'2008'},
{'date': datetime.datetime(2008, 6, 30, 0, 0), 'value': 70, 'year': u'2008'},
{'date': datetime.datetime(2008, 7, 31, 0, 0), 'value': 75, 'year': u'2008'},
{'date': datetime.datetime(2008, 8, 31, 0, 0), 'value': 7, 'year': u'2008'},
{'date': datetime.datetime(2008, 9, 30, 0, 0), 'value': 10, 'year': u'2008'},
{'date': datetime.datetime(2008, 10, 31, 0, 0), 'value': 54, 'year': u'2008'},
{'date': datetime.datetime(2008, 11, 30, 0, 0), 'value': 55, 'year': u'2008'},
{'date': datetime.datetime(2008, 12, 31, 0, 0), 'value': 40, 'year': u'2008'},
{'date': datetime.datetime(2007, 12, 31, 0, 0), 'value': 93, 'year': u'2007'},]
In 'mydata', I get list of sequential monthly data. I wrote some code to group them on year.
partial_req_data = dict([(k,[f for f in v]) for k,v in itertools.groupby(mydata, key=lambda x : x.get('year'))])
Now I further need some efficient code to fill the missing months with {}, i.e. empty dict. There are bad ways to do that, but am looking for good ones.
required_data = {"2009": [{'date': datetime.datetime(2009, 1, 31, 0, 0), 'value': 14, 'year': u'2009' },
{'date': datetime.datetime(2009, 2, 28, 0, 0), 'value': 84, 'year': u'2009'},
{'date': datetime.datetime(2009, 3, 31, 0, 0), 'value': 77, 'year': u'2009'},
{'date': datetime.datetime(2009, 4, 30, 0, 0), 'value': 80, 'year': u'2009'},
{'date': datetime.datetime(2009, 5, 31, 0, 0), 'value': 6, 'year': u'2009'},
{'date': datetime.datetime(2009, 6, 30, 0, 0), 'value': 16, 'year': u'2009'},
{'date': datetime.datetime(2009, 7, 31, 0, 0), 'value': 16, 'year': u'2009'},
{'date': datetime.datetime(2009, 8, 31, 0, 0), 'value': 1, 'year': u'2009'},
{'date': datetime.datetime(2009, 9, 30, 0, 0), 'value': 9, 'year': u'2009'},
{}, {}, {}],
"2008": [{'date': datetime.datetime(2008, 1, 31, 0, 0), 'value': 77, 'year': u'2008'},
{'date': datetime.datetime(2008, 2, 29, 0, 0), 'value': 60, 'year': u'2008'},
{'date': datetime.datetime(2008, 3, 31, 0, 0), 'value': 28, 'year': u'2008'},
{'date': datetime.datetime(2008, 4, 30, 0, 0), 'value': 9, 'year': u'2008'},
{'date': datetime.datetime(2008, 5, 31, 0, 0), 'value': 74, 'year': u'2008'},
{'date': datetime.datetime(2008, 6, 30, 0, 0), 'value': 70, 'year': u'2008'},
{'date': datetime.datetime(2008, 7, 31, 0, 0), 'value': 75, 'year': u'2008'},
{'date': datetime.datetime(2008, 8, 31, 0, 0), 'value': 7, 'year': u'2008'},
{'date': datetime.datetime(2008, 9, 30, 0, 0), 'value': 10, 'year': u'2008'},
{'date': datetime.datetime(2008, 10, 31, 0, 0), 'value': 54, 'year': u'2008'},
{'date': datetime.datetime(2008, 11, 30, 0, 0), 'value': 55, 'year': u'2008'},
{'date': datetime.datetime(2008, 12, 31, 0, 0), 'value': 40, 'year': u'2008'},]
"2007": [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
{'date': datetime.datetime(2007, 12, 31, 0, 0), 'value': 93, 'year': u'2007'}]
}
import datetime
from itertools import groupby
from pprint import pprint
required_data={}
for k,g in groupby(mydata,key=lambda x: x.get('year')):
partial={}
for datum in g:
partial[datum.get('date').month]=datum
required_data[k]=[partial.get(m,{}) for m in range(1,13)]
pprint(required_data)
For each year k, partial is a dict whose keys are months.
The trick is to use partial.get(m,{}) since this will return the datum when it exists, or {} when it does not.

Categories