I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)
Related
Given the following dictionary:
dict1 = {'AA':['THISISSCARY'],
'BB':['AREYOUAFRAID'],
'CC':['DONOTWORRY']}
I'd like to update the values in the dictionary given the information in the following table
Table = pd.DataFrame({'KEY':['AA','AA','BB','CC'],
'POSITION':[2,4,9,3],
'oldval':['I','I','A','O'],
'newval':['X','X','U','I']})
that looks like this
KEY POSITION oldval newval
0 AA 2 I X
1 AA 4 I X
2 BB 9 A U
3 CC 3 O I
The end result should look like this:
dict1 = {'AA':['THXSXSSCARY'],
'BB':['AREYOUAFRUID'],
'CC':['DONITWORRY']}
Essentially, I'm using the KEY and POSITION to find the location of the value in the dictionary then if the oldvalue matches the one in the dictionary, then replacing it with the newval
I've been looking at the update function where I'd convert my table to a dictionary but I'm unsure how to apply to my example.
First craft a nested Series/dictionary to map the key/position/newval, then use a dictionary comprehension:
s = (Table.groupby('KEY')
.apply(lambda d: d.set_index('POSITION')['newval'].to_dict())
)
out = {k: [''.join(s.get(k, {}).get(i, x) for i,x in enumerate(v[0]))]
for k,v in dict1.items()
}
Output:
{'AA': ['THXSXSSCARY'],
'BB': ['AREYOUAFRUID'],
'CC': ['DONITWORRY']}
Intermediate s:
KEY
AA {2: 'X', 4: 'X'}
BB {9: 'U'}
CC {3: 'I'}
dtype: object
you can use:
dict_df=Table.to_dict('records')
print(dict_df)
'''
[{'KEY': 'AA', 'POSITION': 2, 'oldval': 'I', 'newval': 'X'}, {'KEY': 'AA', 'POSITION': 4, 'oldval': 'I', 'newval': 'X'}, {'KEY': 'BB', 'POSITION': 9, 'oldval': 'A', 'newval': 'U'}, {'KEY': 'CC', 'POSITION': 3, 'oldval': 'O', 'newval': 'I'}]
'''
for i in list(dict1.keys()):
for j in dict_df:
if i == j['KEY']:
mask=list(dict1[i][0])
mask[j['POSITION']]=j['newval']
dict1[i]=["".join(mask)]
print(dict1)
# {'AA': ['THXSXSSCARY'], 'BB': ['AREYOUAFRUID'], 'CC': ['DONITWORRY']}
This question already has answers here:
Group by multiple keys and summarize/average values of a list of dictionaries
(8 answers)
Closed 5 years ago.
I have a set of data in the list of dict format like below:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately
The final grouped data must be in the this format:
grouped_data = [
{'name': 'A', 'tea':7, 'coffee':9},
{'name': 'B', 'tea':16, 'coffee':5},
]
I tried some steps:
from collections import Counter
c = Counter()
for v in data:
c[v['name']] += v['tea']
my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
print e
The above step returned the following output:
{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}
Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format
Using pandas:
df = pd.DataFrame(data)
df
coffee name tea
0 6 A 5
1 3 A 2
2 1 B 7
3 4 B 9
g = df.groupby('name', as_index=False).sum()
g
name coffee tea
0 A 9 7
1 B 5 16
And, the final step, df.to_dict:
d = g.to_dict('r')
d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
You can try this:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])]
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]
Output:
[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}
Using pandas, this is pretty easy to do:
import pandas as pd
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
df = pd.DataFrame(data)
df.groupby(['name']).sum()
coffee tea
name
A 9 7
B 5 16
Here's one way to get it into your dict format:
grouped_data = []
for idx in gb.index:
d = {'name': idx}
d = {**d, **{col: gb.loc[idx, col] for col in gb}}
grouped_data.append(d)
grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
But COLDSPEED got the native pandas solution with the as_index=False config...
Click here to see snap shot
import pandas as pd
df = pd.DataFrame(data)
df2=df.groupby('name').sum()
df2.to_dict('r')
Here is a method I created, you can input the key you want to group by:
def group_sum(key,list_of_dicts):
d = {}
for dct in list_of_dicts:
if dct[key] not in d:
d[dct[key]] = {}
for k,v in dct.items():
if k != key:
if k not in d[dct[key]]:
d[dct[key]][k] = v
else:
d[dct[key]][k] += v
final_list = []
for k,v in d.items():
temp_d = {key: k}
for k2,v2 in v.items():
temp_d[k2] = v2
final_list.append(temp_d)
return final_list
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
grouped_data = group_sum("name",data)
print (grouped_data)
result:
[{'coffee': 5, 'name': 'B', 'tea': 16}, {'coffee': 9, 'name': 'A', 'tea': 7}]
I guess this would be slower when summing thousands of dicts compared to pandas, maybe not, I don't know. It also doesn't seem to maintain order unless you use ordereddict or python 3.6
I hate to ask this but I can't figure it out and it's getting to me.
I have to make a function that takes a given dictionary d1 and sort of compares it to another dictionary d2 then adds the compared value to d2.
d1 is already in the format needed to I don't have to worry about it.
d2 however, is a nested dictionary. It looks like this:
{’345’: {’Name’: ’xyzzy’, ’ID’: ’345’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’a’}},
’123’: {’Name’: ’foo’, ’ID’: ’123’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’a’, ’Q2’: ’b’}},
’234’: {’Name’: ’bar’, ’ID’: ’234’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’b’}}}
So d1 is in the format of the Responses key, and that's what I need from d2 to compare it to d1.
So to do that I isolate responses:
for key, i in d2.items():
temp = i['Responses']
Now I need to run temp through a function with d1 that will output an integer. Then match that integer with the top-level key it came from and update a new k/v entry associated with it. But I don't know how to do this.
I've managed to update each top-level key with that compared value, but it only uses the first compared value for all the top-level keys. I can't figure out how to match the integer found to its key. This is what I have so far that works the best:
for i in d2:
score = grade_student(d1,temp) #integer
placement = {'Score': score}
d2[i].update(placement)
You could just iterate over sub dictionaries in d2 and update them once you've called grade_student:
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
Here's a complete example:
import pprint
d1 = {}
d2 = {
'345': {'Name': 'xyzzy', 'ID': '345', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'a'}},
'123': {'Name': 'foo', 'ID': '123', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'a', 'Q2': 'b'}},
'234': {'Name': 'bar', 'ID': '234', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'b'}}
}
# Dummy
def grade_student(x, y):
return 1
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
pprint.pprint(d2)
Output:
{'123': {'ID': '123',
'Name': 'foo',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'a'},
'Score': 1},
'234': {'ID': '234',
'Name': 'bar',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'b'},
'Score': 1},
'345': {'ID': '345',
'Name': 'xyzzy',
'Responses': {'Q1': 'a', 'Q2': 'a', 'Q3': 'c', 'Q4': 'b'},
'Score': 1}}
You don't have to iterate them. Use the built-in update() method. Here is an example
>>> A = {'cat':10, 'dog':5, 'rat':50}
>>> B = {'cat':5, 'dog':10, 'pig':20}
>>> A.update(B) #This will merge the dicts by keeping the values of B if collision
>>> A
{'rat': 50, 'pig': 20, 'dog': 10, 'cat': 5}
>>> B
{'pig': 20, 'dog': 10, 'cat': 5}
I have a Counter object in Python, which contains the following data:
{'a': 4, 'b': 1, 'e': 1}
I'd like to convert this to a JSON object with the following form:
[{'name':'a', 'value': 4} , {'name':'b', 'value': 1}, {'name':'e', 'value': 1}]
Is there any efficient way to do so?
You can use a list comprehension to convert the dictionary to a list of dictionaries. Example -
data = {'a': 4, 'b': 1, 'e': 1}
result = [{'name':key, 'value':value} for key,value in data.items()]
Demo -
>>> data = {'a': 4, 'b': 1, 'e': 1}
>>> result = [{'name':key, 'value':value} for key,value in data.items()]
>>> result
[{'name': 'a', 'value': 4}, {'name': 'b', 'value': 1}, {'name': 'e', 'value': 1}]
I am looking for the most efficient way to extract items from a list of dictionaries.I have a list of about 5k dictionaries. I need to extract those records/items for which grouping by a particular field gives more than a threshold T number of records. For example, if T = 2 and dictionary key 'id':
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
The result should be:
list = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}]
i.e. All the records with some id such that there are atleast 3 records of same id.
l = [{'name': 'abc', 'id' : 1}, {'name': 'bc', 'id' : 1}, {'name': 'c', 'id' : 1}, {'name': 'bbc', 'id' : 2}]
from collections import defaultdict
from itertools import chain
d = defaultdict(list)
T = 2
for dct in l:
d[dct["id"]].append(dct)
print(list(chain.from_iterable(v for v in d.values() if len(v) > T)))
[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]
If you want to keep them in groups don't chain just use each value:
[v for v in d.values() if len(v) > T] # itervalues for python2
[[{'name': 'abc', 'id': 1}, {'name': 'bc', 'id': 1}, {'name': 'c', 'id': 1}]]
Avoid using list as a variable as it shadows the python list type and if you had a variable list then the code above would cause you a few problems in relation to d = defaultdict(list)
to start out I would make a dictionary to group by your id
control = {}
for d in list:
control.setdefault(d['id'],[]).append(d)
from here all you have to do is check the length of control to see if its greater than your specified threshold
put it in a function like so
def find_by_id(obj, threshold):
control = {}
for d in obj:
control.setdefault(d['id'], []).append(d)
for val in control.values():
if len(val) > threshold:
print val