This question already has answers here:
Group by multiple keys and summarize/average values of a list of dictionaries
(8 answers)
Closed 5 years ago.
I have a set of data in the list of dict format like below:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately
The final grouped data must be in the this format:
grouped_data = [
{'name': 'A', 'tea':7, 'coffee':9},
{'name': 'B', 'tea':16, 'coffee':5},
]
I tried some steps:
from collections import Counter
c = Counter()
for v in data:
c[v['name']] += v['tea']
my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
print e
The above step returned the following output:
{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}
Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format
Using pandas:
df = pd.DataFrame(data)
df
coffee name tea
0 6 A 5
1 3 A 2
2 1 B 7
3 4 B 9
g = df.groupby('name', as_index=False).sum()
g
name coffee tea
0 A 9 7
1 B 5 16
And, the final step, df.to_dict:
d = g.to_dict('r')
d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
You can try this:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])]
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]
Output:
[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}
Using pandas, this is pretty easy to do:
import pandas as pd
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
df = pd.DataFrame(data)
df.groupby(['name']).sum()
coffee tea
name
A 9 7
B 5 16
Here's one way to get it into your dict format:
grouped_data = []
for idx in gb.index:
d = {'name': idx}
d = {**d, **{col: gb.loc[idx, col] for col in gb}}
grouped_data.append(d)
grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
But COLDSPEED got the native pandas solution with the as_index=False config...
Click here to see snap shot
import pandas as pd
df = pd.DataFrame(data)
df2=df.groupby('name').sum()
df2.to_dict('r')
Here is a method I created, you can input the key you want to group by:
def group_sum(key,list_of_dicts):
d = {}
for dct in list_of_dicts:
if dct[key] not in d:
d[dct[key]] = {}
for k,v in dct.items():
if k != key:
if k not in d[dct[key]]:
d[dct[key]][k] = v
else:
d[dct[key]][k] += v
final_list = []
for k,v in d.items():
temp_d = {key: k}
for k2,v2 in v.items():
temp_d[k2] = v2
final_list.append(temp_d)
return final_list
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
grouped_data = group_sum("name",data)
print (grouped_data)
result:
[{'coffee': 5, 'name': 'B', 'tea': 16}, {'coffee': 9, 'name': 'A', 'tea': 7}]
I guess this would be slower when summing thousands of dicts compared to pandas, maybe not, I don't know. It also doesn't seem to maintain order unless you use ordereddict or python 3.6
Related
Given the following dictionary:
dict1 = {'AA':['THISISSCARY'],
'BB':['AREYOUAFRAID'],
'CC':['DONOTWORRY']}
I'd like to update the values in the dictionary given the information in the following table
Table = pd.DataFrame({'KEY':['AA','AA','BB','CC'],
'POSITION':[2,4,9,3],
'oldval':['I','I','A','O'],
'newval':['X','X','U','I']})
that looks like this
KEY POSITION oldval newval
0 AA 2 I X
1 AA 4 I X
2 BB 9 A U
3 CC 3 O I
The end result should look like this:
dict1 = {'AA':['THXSXSSCARY'],
'BB':['AREYOUAFRUID'],
'CC':['DONITWORRY']}
Essentially, I'm using the KEY and POSITION to find the location of the value in the dictionary then if the oldvalue matches the one in the dictionary, then replacing it with the newval
I've been looking at the update function where I'd convert my table to a dictionary but I'm unsure how to apply to my example.
First craft a nested Series/dictionary to map the key/position/newval, then use a dictionary comprehension:
s = (Table.groupby('KEY')
.apply(lambda d: d.set_index('POSITION')['newval'].to_dict())
)
out = {k: [''.join(s.get(k, {}).get(i, x) for i,x in enumerate(v[0]))]
for k,v in dict1.items()
}
Output:
{'AA': ['THXSXSSCARY'],
'BB': ['AREYOUAFRUID'],
'CC': ['DONITWORRY']}
Intermediate s:
KEY
AA {2: 'X', 4: 'X'}
BB {9: 'U'}
CC {3: 'I'}
dtype: object
you can use:
dict_df=Table.to_dict('records')
print(dict_df)
'''
[{'KEY': 'AA', 'POSITION': 2, 'oldval': 'I', 'newval': 'X'}, {'KEY': 'AA', 'POSITION': 4, 'oldval': 'I', 'newval': 'X'}, {'KEY': 'BB', 'POSITION': 9, 'oldval': 'A', 'newval': 'U'}, {'KEY': 'CC', 'POSITION': 3, 'oldval': 'O', 'newval': 'I'}]
'''
for i in list(dict1.keys()):
for j in dict_df:
if i == j['KEY']:
mask=list(dict1[i][0])
mask[j['POSITION']]=j['newval']
dict1[i]=["".join(mask)]
print(dict1)
# {'AA': ['THXSXSSCARY'], 'BB': ['AREYOUAFRUID'], 'CC': ['DONITWORRY']}
{'BLOCKER': 'F', 'CRITICAL': 'E', 'MAJOR': 'D', 'MINOR': 'B', 'NO RISK': 'A'}
this is the dictionary and is inside a column called severity ,
dataframe i want is
A B C D E F
NO RISK MINOR NA MAJOR CRITICAL BLOCKER
A straightforward solution:
input_map = {'BLOCKER': 'F', 'CRITICAL': 'E', 'MAJOR': 'D', 'MINOR': 'B', 'NO RISK': 'A'}
inv_map = {v: k for k, v in input_map.items()} # {'F': 'BLOCKER', 'E': 'CRITICAL', 'D': 'MAJOR', 'B': 'MINOR', 'A': 'NO RISK'}
pd.DataFrame({k: [inv_map.get(k, 'NA')] for k in 'ABCDEF'})
A B C D E F
0 NO RISK MINOR NA MAJOR CRITICAL BLOCKER
import pandas as pd
mdi = {"BLOCKER": ["F"], "CRITICAL": ["E"], "MAJOR": ["D"], "MINOR": ["B"], "NO RISK": ["A"]}
pd.DataFrame.from_records({v[0]: [k] for k, v in mdi.items()})
Output:
A B D E F
0 NO RISK MINOR MAJOR CRITICAL BLOCKER
You'll need to apply this yourself to your column in the source data.
If you have a Series of dictionaries as input, use:
df2 = pd.json_normalize(df['severity'])
cols = df2.groupby(np.zeros(len(df2))).first().squeeze()
df2 = (df2
.notna().mul(cols.index)
.set_axis(cols, axis=1)
.rename_axis(columns=None)
)
Output:
F E D B A
0 BLOCKER CRITICAL MAJOR MINOR NO RISK
Used input:
dic = {'BLOCKER': 'F', 'CRITICAL': 'E', 'MAJOR': 'D', 'MINOR': 'B', 'NO RISK': 'A'}
df = pd.DataFrame({'severity': [dic]})
Example of output on an input with several rows:
F E D B A O
0 BLOCKER CRITICAL MAJOR MINOR NO RISK
1 BLOCKER CRITICAL MINOR NO RISK OTHER
``
This should do it. (edited to swap key/vals)
def swap_dict(d):
return pd.Series({value:key for key, value in d.items()})
df.severity.apply(swap_dict)
What I have:
a=[{'name':'a','vals':1,'required':'yes'},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
What I tried:
[[i]+[j] for i in b for j in a if i['name']==j['name']]
What I got:
[[{'name': 'a', 'type': 'car'}, {'name': 'a', 'vals': 1}], [{'name': 'b', 'type': 'bike'}, {'name': 'b', 'vals': 2}]]
What I want:
[{'name': 'a', 'type': 'car','vals': 1},{'name': 'b', 'type': 'bike','vals': 2}]
Note:
I need to merge dicts into one dict.
It should merge only those have common 'name' in both a and b.
I want python one liner answer.
For Python 3, you can do this:
a=[{'name':'a','vals':1},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
print([{**i,**j} for i in b for j in a if i['name']==j['name']])
I am looking for the most efficient way to extract items from a list of dictionaries.I have a list of about 5k dictionaries. I need to extract those records/items for which grouping by a particular field gives more than a threshold T number of records. For example, if T = 2 and dictionary key 'id':
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
The result should be:
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}]
i.e. All the records with some id such that there are atleast 3 records of same id.
l = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
T = 2
for dct in l:
d[dct["id"]].append(dct)
print(list(chain.from_iterable(v for v in d.values() if len(v) > T)))
[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]
If you want to keep them in groups don't chain just use each value:
[v for v in d.values() if len(v) > T] # itervalues for python2
[[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]]
Avoid using list as a variable as it shadows the python list type and if you had a variable list then the code above would cause you a few problems in relation to d = defaultdict(list)
to start out I would make a dictionary to group by your id
control = {}
for d in list:
control.setdefault(d['id'],[]).append(d)
from here all you have to do is check the length of control to see if its greater than your specified threshold
put it in a function like so
def find_by_id(obj, threshold):
control = {}
for d in obj:
control.setdefault(d['id'], []).append(d)
for val in control.values():
if len(val) > threshold:
print val
I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)