What I have:
a=[{'name':'a','vals':1,'required':'yes'},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
What I tried:
[[i]+[j] for i in b for j in a if i['name']==j['name']]
What I got:
[[{'name': 'a', 'type': 'car'}, {'name': 'a', 'vals': 1}], [{'name': 'b', 'type': 'bike'}, {'name': 'b', 'vals': 2}]]
What I want:
[{'name': 'a', 'type': 'car','vals': 1},{'name': 'b', 'type': 'bike','vals': 2}]
Note:
I need to merge dicts into one dict.
It should merge only those have common 'name' in both a and b.
I want python one liner answer.
For Python 3, you can do this:
a=[{'name':'a','vals':1},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
print([{**i,**j} for i in b for j in a if i['name']==j['name']])
Related
This question already has answers here:
Top-k on a list of dict in python
(3 answers)
Closed 2 years ago.
I have a list of python dicts like this:
[{'name': 'A', 'score': 12},
{'name': 'B', 'score': 20},
{'name': 'C', 'score': 11},
{'name': 'D', 'score': 20},
{'name': 'E', 'score': 9}]
How do I select first three dicts with highest score values? [D, B, A]
Sort using the score as a key, then take the top 3 elements:
>>> sorted([{'name': 'A', 'score': 12},
... {'name': 'B', 'score': 20},
... {'name': 'C', 'score': 11},
... {'name': 'D', 'score': 20},
... {'name': 'E', 'score': 9}], key=lambda d: d['score'])[-3:]
[{'name': 'A', 'score': 12}, {'name': 'B', 'score': 20}, {'name': 'D', 'score': 20}]
This question already has answers here:
Group by multiple keys and summarize/average values of a list of dictionaries
(8 answers)
Closed 5 years ago.
I have a set of data in the list of dict format like below:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately
The final grouped data must be in the this format:
grouped_data = [
{'name': 'A', 'tea':7, 'coffee':9},
{'name': 'B', 'tea':16, 'coffee':5},
]
I tried some steps:
from collections import Counter
c = Counter()
for v in data:
c[v['name']] += v['tea']
my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
print e
The above step returned the following output:
{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}
Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format
Using pandas:
df = pd.DataFrame(data)
df
coffee name tea
0 6 A 5
1 3 A 2
2 1 B 7
3 4 B 9
g = df.groupby('name', as_index=False).sum()
g
name coffee tea
0 A 9 7
1 B 5 16
And, the final step, df.to_dict:
d = g.to_dict('r')
d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
You can try this:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])]
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]
Output:
[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}
Using pandas, this is pretty easy to do:
import pandas as pd
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
df = pd.DataFrame(data)
df.groupby(['name']).sum()
coffee tea
name
A 9 7
B 5 16
Here's one way to get it into your dict format:
grouped_data = []
for idx in gb.index:
d = {'name': idx}
d = {**d, **{col: gb.loc[idx, col] for col in gb}}
grouped_data.append(d)
grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
But COLDSPEED got the native pandas solution with the as_index=False config...
Click here to see snap shot
import pandas as pd
df = pd.DataFrame(data)
df2=df.groupby('name').sum()
df2.to_dict('r')
Here is a method I created, you can input the key you want to group by:
def group_sum(key,list_of_dicts):
d = {}
for dct in list_of_dicts:
if dct[key] not in d:
d[dct[key]] = {}
for k,v in dct.items():
if k != key:
if k not in d[dct[key]]:
d[dct[key]][k] = v
else:
d[dct[key]][k] += v
final_list = []
for k,v in d.items():
temp_d = {key: k}
for k2,v2 in v.items():
temp_d[k2] = v2
final_list.append(temp_d)
return final_list
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
grouped_data = group_sum("name",data)
print (grouped_data)
result:
[{'coffee': 5, 'name': 'B', 'tea': 16}, {'coffee': 9, 'name': 'A', 'tea': 7}]
I guess this would be slower when summing thousands of dicts compared to pandas, maybe not, I don't know. It also doesn't seem to maintain order unless you use ordereddict or python 3.6
I hate to ask this but I can't figure it out and it's getting to me.
I have to make a function that takes a given dictionary d1 and sort of compares it to another dictionary d2 then adds the compared value to d2.
d1 is already in the format needed to I don't have to worry about it.
d2 however, is a nested dictionary. It looks like this:
{’345’: {’Name’: ’xyzzy’, ’ID’: ’345’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’a’}},
’123’: {’Name’: ’foo’, ’ID’: ’123’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’a’, ’Q2’: ’b’}},
’234’: {’Name’: ’bar’, ’ID’: ’234’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’b’}}}
So d1 is in the format of the Responses key, and that's what I need from d2 to compare it to d1.
So to do that I isolate responses:
for key, i in d2.items():
temp = i['Responses']
Now I need to run temp through a function with d1 that will output an integer. Then match that integer with the top-level key it came from and update a new k/v entry associated with it. But I don't know how to do this.
I've managed to update each top-level key with that compared value, but it only uses the first compared value for all the top-level keys. I can't figure out how to match the integer found to its key. This is what I have so far that works the best:
for i in d2:
score = grade_student(d1,temp) #integer
placement = {'Score': score}
d2[i].update(placement)
You could just iterate over sub dictionaries in d2 and update them once you've called grade_student:
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
Here's a complete example:
import pprint
d1 = {}
d2 = {
'345': {'Name': 'xyzzy', 'ID': '345', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'a'}},
'123': {'Name': 'foo', 'ID': '123', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'a', 'Q2': 'b'}},
'234': {'Name': 'bar', 'ID': '234', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'b'}}
}
# Dummy
def grade_student(x, y):
return 1
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
pprint.pprint(d2)
Output:
{'123': {'ID': '123',
'Name': 'foo',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'a'},
'Score': 1},
'234': {'ID': '234',
'Name': 'bar',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'b'},
'Score': 1},
'345': {'ID': '345',
'Name': 'xyzzy',
'Responses': {'Q1': 'a', 'Q2': 'a', 'Q3': 'c', 'Q4': 'b'},
'Score': 1}}
You don't have to iterate them. Use the built-in update() method. Here is an example
>>> A = {'cat':10, 'dog':5, 'rat':50}
>>> B = {'cat':5, 'dog':10, 'pig':20}
>>> A.update(B) #This will merge the dicts by keeping the values of B if collision
>>> A
{'rat': 50, 'pig': 20, 'dog': 10, 'cat': 5}
>>> B
{'pig': 20, 'dog': 10, 'cat': 5}
I am looking for the most efficient way to extract items from a list of dictionaries.I have a list of about 5k dictionaries. I need to extract those records/items for which grouping by a particular field gives more than a threshold T number of records. For example, if T = 2 and dictionary key 'id':
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
The result should be:
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}]
i.e. All the records with some id such that there are atleast 3 records of same id.
l = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
T = 2
for dct in l:
d[dct["id"]].append(dct)
print(list(chain.from_iterable(v for v in d.values() if len(v) > T)))
[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]
If you want to keep them in groups don't chain just use each value:
[v for v in d.values() if len(v) > T] # itervalues for python2
[[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]]
Avoid using list as a variable as it shadows the python list type and if you had a variable list then the code above would cause you a few problems in relation to d = defaultdict(list)
to start out I would make a dictionary to group by your id
control = {}
for d in list:
control.setdefault(d['id'],[]).append(d)
from here all you have to do is check the length of control to see if its greater than your specified threshold
put it in a function like so
def find_by_id(obj, threshold):
control = {}
for d in obj:
control.setdefault(d['id'], []).append(d)
for val in control.values():
if len(val) > threshold:
print val
I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)