Grouping an array of objects by key in python - python

Suppose I have an array of objects.
arr = [
{'grade': 'A', 'name': 'James'},
{'grade': 'B', 'name': 'Tom'},
{'grade': 'A', 'name': 'Zelda'}
]
I want this result
{
'A': [
{'grade': 'A', 'name': 'James'},
{'grade': 'A', 'name': 'Zelda'}
],
'B': [ {'grade': 'B', 'name': 'Tom'} ]
}

Use a dict and setdefault:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
arr2 = {}
for d in arr:
t = arr2.setdefault(d['grade'], [])
t.append(d)
>>> arr2
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}],
'B': [{'grade': 'B', 'name': 'Tom'}]}

Using dict.setdefault we can do this:
import json
gradeList = [
{"grade": 'A', "name": 'James'},
{"grade": 'B', "name": 'Tom'},
{"grade": 'A', "name": 'Zelda'}
]
gradeDict = {}
for d in gradeList:
gradeDict.setdefault(d["grade"], []).append(d)
print(json.dumps(gradeDict, indent=4))
Output:
{
"A": [
{
"grade": "A",
"name": "James"
},
{
"grade": "A",
"name": "Zelda"
}
],
"B": [
{
"grade": "B",
"name": "Tom"
}
]
}

You can use itertools.groupby
>>> keyfunc = lambda item: item['grade']
>>> {k:list(v) for k,v in itertools.groupby( sorted(arr,key=keyfunc) , keyfunc) }
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}

I would use a pd.Dataframe and do it like this:
import pandas as pd
df = pd.Dataframe(arr)
for index, group in df.groupby('grade'):
print(group)
Instead of print(group) you can write the data to whatever you need it, I suppose it is not necessarily a dict like you described.

I would do a simple loop like this:
arr = [{'grade': 'A', 'name': 'James'}, {'grade': 'B', 'name': 'Tom'}, {'grade': 'A', 'name': 'Zelda'}]
grouped_grades = {}
for item in arr:
if item['grade'] not in grouped_grades:
grouped_grades[item['grade']] = []
grouped_grades[item['grade']].append(item)
print(grouped_grades)
Output:
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}

I think that the easiest way is to use defaultdict. Then you could convert the result back into an ordinary dict if you need to by simply passing it in the constructor like dict(output).
from collections import defaultdict
output = defaultdict(lambda: [])
for item in arr:
output[item['grade']].append(item)

Related

Dataframe to Nested Dictionaries in Python

Having a bit of trouble here.. I need to take a dataframe
import pandas as pd
region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']
pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})
and convert into a nested dictionary with the following format:
[{name: "thing", children: [{name:"sub_thing",children:[{...}] }]}]
so a list of nested dictionaries where the key value pairs are always name:"", children:[{}], but childless children don't have children in their dict.. so the final desired output would be...
[{"name":"A",
"children":[{"name":"1","children":[{"name":"a"}]},
{"name":"2","children":[{"name":"b"},{"name":"c"}]}]
},
{"name":"B",
"children":[{"name":"3","children":[{"name":"d"},{"name":"e"}]},
{"name":"4","children":[{"name":"f"}]}]
}
]
Assume a generalized framework where the number of levels can vary.
I don't think you can do better than looping through the rows of the dataframe. That is, I don't see a way to vectorize this process. Also, if the number of levels can vary within the same dataframe, then the update function should be modified to handle nan entries (e.g. adding and not np.isnan(row[1]) to if len(row) > 1).
That said, I believe that the following script should be satisfactory.
import pandas as pd
region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']
df = pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})
ls = []
def update(row,ls):
for d in ls:
if d['name'] == row[0]:
break
else:
ls.append({'name':row[0]})
d = ls[-1]
if len(row) > 1:
if not 'children' in d:
d['children'] = []
update(row[1:],d['children'])
for _,r in df.iterrows():
update(r,ls)
print(ls)
The resulting list ls:
[{'name': 'A',
'children': [{'name': '1', 'children': [{'name': 'a'}]},
{'name': '2', 'children': [{'name': 'b'}, {'name': 'c'}]}]},
{'name': 'B',
'children': [{'name': '3', 'children': [{'name': 'd'}, {'name': 'e'}]},
{'name': '4', 'children': [{'name': 'f'}]}]}]
Here's a version where childless children have 'children':[] in their dict, which I find a bit more natural.
import pandas as pd
region = ['A','A','A','B','B','B']
sub_region = ['1','2','2','3','3','4']
state = ['a','b','c','d','e','f']
df = pd.DataFrame({"region":region,"sub_region":sub_region,"state":state})
ls = []
def update(row,ls):
if len(row) == 0:
return
for d in ls:
if d['name'] == row[0]:
break
else:
ls.append({'name':row[0], 'children':[]})
d = ls[-1]
update(row[1:],d['children'])
for _,r in df.iterrows():
update(r,ls)
print(ls)
The resulting list ls:
[{'name': 'A',
'children': [{'name': '1', 'children': [{'name': 'a', 'children': []}]},
{'name': '2',
'children': [{'name': 'b', 'children': []},
{'name': 'c', 'children': []}]}]},
{'name': 'B',
'children': [{'name': '3',
'children': [{'name': 'd', 'children': []},
{'name': 'e', 'children': []}]},
{'name': '4', 'children': [{'name': 'f', 'children': []}]}]}]

loop over a nested dictionary to create a new one

I've got a nested dictionary like that:
d={'a1': {'b': ['x', 1]}, 'a2': {'b1': ['x1', 2]}}
Expected result:
[
{
"measurements": "XXXXX",
"tags": {
"MPC": b,
"host": a1
},
"time": "timexxxxx",
"fields": {
x: 1
}
},
{
"measurements": "XXXXX",
"tags": {
"MPC": b,
"host": a2
},
"time": "timexxxxx",
"fields": {
x: 1
}
}
]
that is what I'm trying, however it's being overwritten
for k,v in d.items():
metrics['measurements'] = "XXXXX"
if isinstance(v,dict):
for j,h in v.items():
metrics['tags'] = {'MPC':j,'host':k}
metrics['time'] = "timexxxxx"
for value in h:
metrics['fields'] = {j:h}
and I'm getting:
{'fields': {'b1': ['x1', 2]},
'measurements': 'XXXXX',
'tags': {'MPC': 'b1', 'host': 'a2'},
'time': 'timexxxxx'}
Could you give me some pointers on how to deal with this?
Thanks
see below
import pprint
d = {'a1': {'b': ['x', 1]}, 'a2': {'b1': ['x1', 2]}}
data = []
for k, v in d.items():
entry = {"measurements": "XXXXX"}
entry['tags'] = {'MPC': list(v.keys())[0],"host": k}
entry["time"] = "timexxxxx"
values= list(v.values())
entry["fields"] = {values[0][0]:values[0][1]}
data.append(entry)
pprint.pprint(data)
output
[{'fields': {'x': 1},
'measurements': 'XXXXX',
'tags': {'MPC': 'b', 'host': 'a1'},
'time': 'timexxxxx'},
{'fields': {'x1': 2},
'measurements': 'XXXXX',
'tags': {'MPC': 'b1', 'host': 'a2'},
'time': 'timexxxxx'}]
This code can help you:
d={'a1': {'b': ['x', 1]}, 'a2': {'b1': ['x1', 2]}}
def convert(dictionary):
return [
{
"measurements": "XXXXX",
"tags": {
"MPC": list(value.keys())[0],
"host": key
},
"time": "timexxxxx",
"fields": dict(value.values())
} for key, value in dictionary.items()
]
print(convert(d))
Results in [{'measurements': 'XXXXX', 'tags': {'MPC': 'b', 'host': 'a1'}, 'time': 'timexxxxx', 'fields': {'x': 1}}, {'measurements': 'XXXXX', 'tags': {'MPC': 'b1', 'host': 'a2'}, 'time': 'timexxxxx', 'fields': {'x1': 2}}]
You can do it like this
#Empty List
li=[]
#Add Items in list
for i in range(2):
d = {}
d["measurment"] = "XXXXX"
d["tags"] = {1: "x"}
d["time"] = "timexxx"
d["field"] = {2: "y"}
li.append(d)
#Print list elements
for i in li:
for key, value in i.items():
print(key, ":", value)
print()

Convert from path list to Flare json format?

I have data in python that looks like this:
[['a', 'b', 'c', 50],
['a', 'b', 'd', 100],
['a', 'b', 'e', 67],
['a', 'g', 'c', 12],
['q', 'k', 'c', 11],
['q', 'b', 'p', 11]]
where each element of the list is a complete hierarchical path, and the last element is the size of the path. To do a visualization in D3, I need the data to be in the flare data format - seen here:
https://github.com/d3/d3-hierarchy/blob/master/test/data/flare.json
So a short piece would look like this
{
"name": "root",
"children": [
{
"name": "a",
"children": [
{
"name": "b",
"children": [
{"name": "c", "value": 50},
{"name": "d", "value": 100},
{"name": "e", "value": 67},
]
},
{
"name": "g",
"children": [
{"name": "c", "value": 12},
]
},
and so forth...
From what I've been looking up, I think the solution is recursive, and would use the json library on a Python dictionary, but I can't seem to get it to work. Any help is greatly appreciated.
Here's a solution using recursion:
def add_to_flare(n, flare):
children = flare["children"]
if len(n) == 2:
children.append({"name": n[0], "value": n[1]})
else:
for c in children:
if c["name"] == n[0]:
add_to_flare(n[1:], c)
return
children.append({"name": n[0], "children": []})
add_to_flare(n[1:], children[-1])
flare = {"name": "root", "children": []}
for i in data:
add_to_flare(i, flare)
To display it nicely, we can use the json library:
import json
print(json.dumps(flare, indent=1))
{
"name": "root",
"children": [
{
"name": "a",
"children": [
{
"name": "b",
"children": [
{
"name": "c",
"value": 50
},
{
"name": "d",
"value": 100
},
{
"name": "e",
"value": 67
}
]
},
{
"name": "g",
"children": [
{
"name": "c",
"value": 12
}
]
}
]
},
{
"name": "q",
"children": [
{
"name": "k",
"children": [
{
"name": "c",
"value": 11
}
]
},
{
"name": "b",
"children": [
{
"name": "p",
"value": 11
}
]
}
]
}
]
}
Try this:
master = []
for each in your_list:
head = master
for i in range(len(each)):
names = [e['name'] for e in head]
if i == len(each) - 2:
head.append({'name': each[i], 'value': each[i+1]})
break
if each[i] in names:
head = head[names.index(each[i])]['children']
else:
head.append({'name': each[i], 'children': []})
head = head[-1]['children']
results:
[{'children': [{'children': [{'name': 'c', 'value': 50},
{'name': 'd', 'value': 100},
{'name': 'e', 'value': 67}],
'name': 'b'},
{'children': [{'name': 'c', 'value': 12}], 'name': 'g'}],
'name': 'a'},
{'children': [{'children': [{'name': 'c', 'value': 11}], 'name': 'k'},
{'children': [{'name': 'p', 'value': 11}], 'name': 'b'}],
'name': 'q'}]
Please note that name and children are flipped in this dictionary since it's unordered. But the resulting structure is the same.
put it in root to get your target:
my_dict = {'name':'root', 'children': master}
Assuming your list of lists is stored in variable l, you can do:
o = []
for s in l:
c = o
for i, n in enumerate(['root'] + s[:-1]):
for d in c:
if n == d['name']:
break
else:
c.append({'name': n})
d = c[-1]
if i < len(s) - 1:
if 'children' not in d:
d['children'] = []
c = d['children']
else:
d['value'] = s[-1]
so that o[0] becomes:
{'children': [{'children': [{'children': [{'name': 'c', 'value': 50},
{'name': 'd', 'value': 100},
{'name': 'e', 'value': 67}],
'name': 'b'},
{'children': [{'name': 'c', 'value': 12}],
'name': 'g'}],
'name': 'a'},
{'children': [{'children': [{'name': 'c', 'value': 11}],
'name': 'k'},
{'children': [{'name': 'p', 'value': 11}],
'name': 'b'}],
'name': 'q'}],
'name': 'root'}

Creating an iterable of dictionaries from an iterable of tuples

Let's say we have the following data
all_values = (('a', 0, 0.1), ('b', 1, 0.5), ('c', 2, 1.0))
from which we want to produce a list of dictionaries like so:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
What's the most elegant way to do this in Python?
The best solution I've been able to come up with is
>>> import itertools
>>> zipped = zip(itertools.repeat(('name', 'location', 'value')), all_values)
>>> zipped
[(('name', 'location', 'value'), ('a', 0, 0.1)),
(('name', 'location', 'value'), ('b', 1, 0.5)),
(('name', 'location', 'value'), ('c', 2, 1.0))]
>>> dicts = [dict(zip(*e)) for e in zipped]
>>> dicts
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
It seems like a more elegant way to do this exists, probably using more of the tools in itertools.
How about:
In [8]: [{'location':l, 'name':n, 'value':v} for (n, l, v) in all_values]
Out[8]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
or, if you prefer a more general solution:
In [12]: keys = ('name', 'location', 'value')
In [13]: [dict(zip(keys, values)) for values in all_values]
Out[13]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]

all possible combinations of dicts based on values inside dicts

I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)

Categories