Related
I have two lists of dictionaries named as category and sub_category.
category = [{'cat_id':1,'total':300,'from':250},{'cat_id':2,'total':100,'from':150}]
sub_category = [{'id':1,'cat_id':1,'charge':30},{'id':2,'cat_id':1,'charge':20},{'id':3,'cat_id':2,'charge':30}]
I want to change the value for charge to 0 in sub_category if the value of total >= from in category where cat_id's are equal.
Expected result is :
sub_category = [{'id':1,'cat_id':1,'charge':0},{'id':2,'cat_id':1,'charge':0},{'id':3,'cat_id':2,'charge':30}]
I managed to get the result by using this
for sub in sub_category:
for cat in category:
if cat['cat_id'] == sub['cat_id']:
if cat['total'] >= cat['from']:
sub['charge']=0
But I want to know the better way of doing this. Any help would be highly appreciated.
This is one approach. Change category to a dict for easy loopup.
Ex:
category = [{'cat_id':1,'total':300,'from':250},{'cat_id':2,'total':100,'from':150}]
sub_category = [{'id':1,'cat_id':1,'charge':30},{'id':2,'cat_id':1,'charge':20},{'id':3,'cat_id':2,'charge':30}]
category = {i.pop('cat_id'): i for i in category}
for i in sub_category:
if i['cat_id'] in category:
if category[i['cat_id']]['total'] >= category[i['cat_id']]['from']:
i['charge'] = 0
print(sub_category)
Output:
[{'cat_id': 1, 'charge': 0, 'id': 1},
{'cat_id': 1, 'charge': 0, 'id': 2},
{'cat_id': 2, 'charge': 30, 'id': 3}]
Try this:
I thinkt the way i did may not suitable at some cases. I like to use List Comprehensions just have a look.
category = [{'cat_id':1,'total':300,'from':250},{'cat_id':2,'total':100,'from':150}]
sub_category = [{'id':1,'cat_id':1,'charge':30},{'id':2,'cat_id':1,'charge':20},{'id':3,'cat_id':2,'charge':30}]
print [sub_cat if cat['cat_id'] == sub_cat['id'] and cat['total'] >= cat['from'] and not sub_cat.__setitem__('charge','0') else sub_cat for sub_cat in sub_category for cat in category]
Result:[{'cat_id': 1, 'charge': '0', 'id': 1}, {'cat_id': 1, 'charge': '0', 'id': 1}, {'cat_id': 1, 'charge': 20, 'id': 2}, {'cat_id': 1, 'charge': 20, 'id': 2}, {'cat_id': 2, 'charge': 30, 'id': 3}, {'cat_id': 2, 'charge': 30, 'id': 3}]
You can solve your problem using this approach:
target_categories = set([elem.get('cat_id') for elem in category if elem.get('total', 0) >= elem.get('from', 0)])
if None in target_categories:
target_categories.remove(None) # if there's no cat_id in one of the categories we will get None in target_categories. Remove it.
for elem in sub_category:
if elem.get('cat_id') in target_categories:
elem.update({'charge': 0})
Time comparison with another approach:
import numpy as np
size = 5000000
np.random.seed()
cat_ids = np.random.randint(50, size=(size,))
totals = np.random.randint(500, size=(size,))
froms = np.random.randint(500, size=(size,))
category = [{'cat_id': cat_id, 'total': total, 'from': from_} for cat_id, total, from_ in zip(cat_ids, totals, froms)]
sub_category = [{'id': 1, 'cat_id': np.random.randint(50), 'charge': np.random.randint(100)} for i in range(size)]
%%time
target_categories = set([elem.get('cat_id') for elem in category if elem.get('total', 0) >= elem.get('from', 0)])
if None in target_categories:
target_categories.remove(None) # if there's no cat_id in one of the categories we will get None in target_categories. Remove it.
for elem in sub_category:
if elem.get('cat_id') in target_categories:
elem.update({'charge': 0})
# Wall time: 3.47 s
%%time
category = {i.pop('cat_id'): i for i in category}
for i in sub_category:
if i['cat_id'] in category:
if category[i['cat_id']]['total'] >= category[i['cat_id']]['from']:
i['charge'] = 0
# Wall time: 5.73 s
Solution:
# Input
category = [{'cat_id':1,'total':300,'from':250},{'cat_id':2,'total':100,'from':150}]
sub_category = [{'id':1,'cat_id':1,'charge':30},{'id':2,'cat_id':1,'charge':20},{'id':3,'cat_id':2,'charge':30}]
# Main code
for k in sub_category:
if k["cat_id"] in [i["cat_id"] for i in category if i["total"] >= i["from"]]:
k["charge"] = 0
print (sub_category)
# Output
[{'id': 1, 'cat_id': 1, 'charge': 0}, {'id': 2, 'cat_id': 1, 'charge': 0}, {'id': 3, 'cat_id': 2, 'charge': 30}]
I have several lists of dictionaries, where each dictionary contains a unique id value that is common among all lists. I'd like to combine them into a single list of dicts, where each dict is joined on that id value.
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I tried doing something like the answer found at https://stackoverflow.com/a/42018660/7564393, but I'm getting very confused since I have more than 2 lists. Should I try using a defaultdict approach? More importantly, I am NOT always going to know the other values, only that the id value is present in all dicts.
You can use itertools.groupby():
from itertools import groupby
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = []
for _, values in groupby(sorted([*list1, *list2, *list3], key=lambda x: x['id']), key=lambda x: x['id']):
temp = {}
for d in values:
temp.update(d)
desired_output.append(temp)
Result:
[{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
# combine all lists
d = {} # id -> dict
for l in [list1, list2, list3]:
for list_d in l:
if 'id' not in list_d: continue
id = list_d['id']
if id not in d:
d[id] = list_d
else:
d[id].update(list_d)
# dicts with same id are grouped together since id is used as key
res = [v for v in d.values()]
print(res)
You can first build a dict of dicts, then turn it into a list:
from itertools import chain
from collections import defaultdict
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
dict_out = defaultdict(dict)
for d in chain(list1, list2, list3):
dict_out[d['id']].update(d)
out = list(dict_out.values())
print(out)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
itertools.chain allows you to iterate on all the dicts contained in the 3 lists. We build a dict dict_out having the id as key, and the corresponding dict being built as value. This way, we can easily update the already built part with the small dict of our current iteration.
Here, I have presented a functional approach without using itertools (which is excellent in rapid development work).
This solution will work for any number of lists as the function takes variable number of arguments and also let user to specify the type of return output (list/dict).
By default it returns list as you want that otherwise it returns dictionary in case if you pass as_list = False.
I preferred dictionary to solve this because its fast and search complexity is also less.
Just have a look at the below get_packed_list() function.
get_packed_list()
def get_packed_list(*dicts_lists, as_list=True):
output = {}
for dicts_list in dicts_lists:
for dictionary in dicts_list:
_id = dictionary.pop("id") # id() is in-built function so preferred _id
if _id not in output:
# Create new id
output[_id] = {"id": _id}
for key in dictionary:
output[_id][key] = dictionary[key]
dictionary["id"] = _id # push back the 'id' after work (call by reference mechanism)
if as_list:
return [output[key] for key in output]
return output # dictionary
Test
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
output = get_packed_list(list1, list2, list3)
print(output)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
output = get_packed_list(list1, list2, list3, as_list=False)
print(output)
# {1: {'id': 1, 'value': 20, 'sum': 10, 'total': 30}, 2: {'id': 2, 'value': 21, 'sum': 11, 'total': 32}}
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
print(list1+list2+list3)
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
result = []
for i in range(0,len(list1)):
final_dict = dict(list(list1[i].items()) + list(list2[i].items()) + list(list3[i].items()))
result.append(final_dict)
print(result)
output : [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
so I have a list of dicts that looks like this:
[{
'field': {
'data': 'F1'
},
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
}, {
'field': {
'data': 'F2'
},
'value': F2Value2,
'date': datetime.datetime(2019, 2, 7, 0, 0)
}]
And I want an output that looks like this:
[
{
'F1': [
{
'value': F1Value1,
'date': datetime.datetime(2019, 3, 1, 0, 0)
}
]
},
{
'F2': [
{
'value': F2Value1,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
{
'value': F2Value2,
'date': datetime.datetime(2019, 2, 5, 0, 0)
},
]
}
]
That is, I want every field.data to be the key and have it append the value and date if it belongs to the same field.
Note: I want to do this WITHOUT using a for loop (apart from the loop to iterate through the list). I want to use python dict functions like update() and append() etc.
Any optimized solutions would be really helpful.
You could just use iterate through the list of dicts and use defaultdict from collections to add the items with a unique key,
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>>
>>> for items in x:
... d[items['field']['data']].append({
... 'value': items['value'],
... 'date': items['date']
... })
...
>>>
>>> import pprint
>>> pprint.pprint(x)
[{'date': datetime.datetime(2019, 3, 1, 0, 0),
'field': {'data': 'F1'},
'value': 'F1Value1'},
{'date': datetime.datetime(2019, 2, 5, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0),
'field': {'data': 'F2'},
'value': 'F2Value2'}]
>>>
>>> pprint.pprint(list(d.items()))
[('F1', [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]),
('F2',
[{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}])]
Use itertools.groupby:
from itertools import groupby
from pprint import pprint
result = [{key: [{k: v for k, v in element.items() if k != 'field'}
for element in group]}
for key, group in groupby(data, lambda element: element['field']['data'])]
pprint(result)
Output:
[{'F1': [{'date': datetime.datetime(2019, 3, 1, 0, 0), 'value': 'F1Value1'}]},
{'F2': [{'date': datetime.datetime(2019, 2, 5, 0, 0), 'value': 'F2Value1'},
{'date': datetime.datetime(2019, 2, 7, 0, 0), 'value': 'F2Value2'}]}]
Only using dict, list, and set:
[
{
field_data :
[
{ k:v for k, v in thing.items() if k != 'field' }
for thing in things if thing['field']['data'] == field_data
]
for field_data in set(thing['field']['data'] for thing in things)
}
]
I have a list of dictionaries:
AccountValues = [
{'portfolio_ref': 1, 'tag': 'FullInit', 'value': '20642.95', 'currency': 'USD', 'percent': 0.0},
{'portfolio_ref': 1, 'tag': 'FullMaint', 'value': '21350.54', 'currency': 'USD', 'percent': 0.0},
{'portfolio_ref': 1, 'tag': 'NetLiq', 'value': '70976.05', 'currency': 'USD', 'percent': 100.0} ]
Simple mission per SQL description: Order by portfolio_ref ASC, percent DESC
What I tried unsuccessfully:
sorted(AccountsValues, key=lambda x: (x[1],-x[4]))
which gives me
KeyError: 1
Second attempt:
import operator
result = sorted(myAccountsValues, key=itemgetter('percent'))
which fails to sort on percentage.
You can use dict.__getitem__ or its syntactic sugar []:
res = sorted(AccountValues, key=lambda x: (x['portfolio_ref'], -x['percent']))
Remember that dictionaries are not indexable by integers. Historically (pre-3.6), they are not even ordered. Even in Python 3.7, you cannot directly extract the nth key or value.
Result:
print(res)
[{'portfolio_ref': 1, 'tag': 'NetLiq', 'value': '70976.05', 'currency': 'USD', 'percent': 100.0},
{'portfolio_ref': 1, 'tag': 'FullInit', 'value': '20642.95', 'currency': 'USD', 'percent': 0.0},
{'portfolio_ref': 1, 'tag': 'FullMaint', 'value': '21350.54', 'currency': 'USD', 'percent': 0.0}]
You just have to combine all the things you did correctly: sort keys as a tuple and the proper way of referencing a dict entry:
>>> sorted(AccountValues, key=lambda x: (x["portfolio_ref"], -x["percent"]))
[{'tag': 'NetLiq', 'portfolio_ref': 1, 'value': '70976.05', 'percent': 100.0, 'currency': 'USD'},
{'tag': 'FullInit', 'portfolio_ref': 1, 'value': '20642.95', 'percent': 0.0, 'currency': 'USD'},
{'tag': 'FullMaint', 'portfolio_ref': 1, 'value': '21350.54', 'percent': 0.0, 'currency': 'USD'}]
Better yet, use
sorted(AccountValues, key=itemgetter("portfolio_ref", "percent"))
Your first attempt failed because x[1] and x[4] are not valid references into the dictionaries: you have to use the labels you originally gave, not relative positions.
Your second attempt is deficient only because you don't have the secondary sort key.
I have trouble in adding one value of dictionary when conditions met, For example I have this list of dictionaries:
[{'plu': 1, 'price': 150, 'quantity': 2, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 7, 'stock': 10},
{'plu': 1, 'price': 150, 'quantity': 6, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 3, 'stock': 10}
]
Then output should look like this:
[{'plu': 1, 'price': 150, 'quantity': 8, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 10, 'stock': 10}
]
Quantity should be added only if plu and price are the same, it should ignore key:values other than that (ex. stock). What is the most efficient way to do that?
#edit
I tried:
import itertools as it
keyfunc = lambda x: x['plu']
groups = it.groupby(sorted(new_data, key=keyfunc), keyfunc)
x = [{'plu': k, 'quantity': sum(x['quantity'] for x in g)} for k, g in groups]
But it works only on plu and then I get only quantity value when making html table in django, other are empty
You need to sort/groupby the combined key, not just one key. Easiest/most efficient way to do this is with operator.itemgetter. To preserve an arbitrary stock value, you'll need to use the group twice, so you'll need to convert it to a sequence:
from operator import itemgetter
keyfunc = itemgetter('plu', 'price')
# Unpack key and listify g so it can be reused
groups = ((plu, price, list(g))
for (plu, price), g in it.groupby(sorted(new_data, key=keyfunc), keyfunc))
x = [{'plu': plu, 'price': price, 'stock': g[0]['stock'],
'quantity': sum(x['quantity'] for x in g)}
for plu, price, g in groups]
Alternatively, if stock is guaranteed to be the same for each unique plu/price pair, you can include it in the key to simplify matters, so you don't need to listify the groups:
keyfunc = itemgetter('plu', 'price', 'stock')
groups = it.groupby(sorted(new_data, key=keyfunc), keyfunc)
x = [{'plu': plu, 'price': price, 'stock': stock,
'quantity': sum(x['quantity'] for x in g)
for (plu, price, stock), g in groups]
Optionally, you could create getquantity = itemgetter('quantity') at top level (like the keyfunc) and change sum(x['quantity'] for x in g) to sum(map(getquantity, g)) which pushes work to the C layer in CPython, and can be faster if your groups are large.
The other approach is to avoid sorting entirely using collections.Counter (or collections.defaultdict(int), though Counter makes the intent more clear here):
from collections import Counter
grouped = Counter()
for plu, price, stock, quantity in map(itemgetter('plu', 'price', 'stock', 'quantity'), new_data):
grouped[plu, price, stock] += quantity
then convert back to your preferred form with:
x = [{'plu': plu, 'price': price, 'stock': stock, 'quantity': quantity}
for (plu, price, stock), quantity in grouped.items()]
This should be faster for large inputs, since it replaces O(n log n) sorting work with O(n) dict operations (which are roughly O(1) cost).
Using pandas will make this a trivial problem:
import pandas as pd
data = [{'plu': 1, 'price': 150, 'quantity': 2, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 7, 'stock': 10},
{'plu': 1, 'price': 150, 'quantity': 6, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 3, 'stock': 10}]
df = pd.DataFrame.from_records(data)
# df
#
# plu price quantity stock
# 0 1 150 2 5
# 1 2 150 7 10
# 2 1 150 6 5
# 3 1 200 4 5
# 4 2 150 3 10
new_df = df.groupby(['plu','price','stock'], as_index=False).sum()
new_df = new_df[['plu','price','quantity','stock']] # Optional: reorder the columns
# new_df
#
# plu price quantity stock
# 0 1 150 8 5
# 1 1 200 4 5
# 2 2 150 10 10
And finally, if you want to, port it back to dict (though I would argue pandas give you a lot more functionality to handle the data elements):
new_data = df2.to_dict(orient='records')
# new_data
#
# [{'plu': 1, 'price': 150, 'quantity': 8, 'stock': 5},
# {'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
# {'plu': 2, 'price': 150, 'quantity': 10, 'stock': 10}]