Pandas MultiIndex (more than 2 levels) DataFrame to Nested Dict/JSON - python

This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict() method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0) rather than nesting them in dictionaries.
For an example of what I'm looking to accomplish, consider this multiindex dataframe:
data = {0: {
('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0
},
1: {
('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0
}
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:
A B
0 1 0
0 1 0 1 0 1
entry1 274.0 19.0 67.0 12.0 83.0 45.0
entry2 254.0 11.0 58.0 11.0 76.0 56.0
You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:
[
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
]
I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.

So, you really need to do 2 things here:
df.to_dict()
Convert this to nested dictionary.
df.to_dict(orient='index') gives you a dictionary with the index as keys; it looks like this:
>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0},
'entry2': {('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0}}
Now you need to nest this. Here's a trick from Martijn Pieters to do that:
def nest(d: dict) -> dict:
result = {}
for key, value in d.items():
target = result
for k in key[:-1]: # traverse all keys but the last
target = target.setdefault(k, {})
target[key[-1]] = value
return result
Putting this all together:
def df_to_nested_dict(df: pd.DataFrame) -> dict:
d = df.to_dict(orient='index')
return {k: nest(v) for k, v in d.items()}
Output:
>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}

I took the idea from the previous answer and slightly modified it.
1) Took the function nested_dict from stackoverflow, to create the dictionary
from collections import defaultdict
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
2 Wrote the following function:
def df_to_nested_dict(self, df, type):
# Get the number of levels
temp = df.index.names
lvl = len(temp)
# Create the target dictionary
new_nested_dict=nested_dict(lvl, type)
# Convert the dataframe to a dictionary
temp_dict = df.to_dict(orient='index')
for x, y in temp_dict.items():
dict_keys = ''
# Process the individual items from the key
for item in x:
dkey = '[%d]' % item
dict_keys = dict_keys + dkey
# Create a string and execute it
dict_update = 'new_nested_dict%s = y' % dict_keys
exec(dict_update)
return new_nested_dict
It is the same idea but it is done slightly different

Related

How to count frequency of such list using basic libraries?

List looks like this having ascii character and number value, I want to count occurrence of each of ASCII character for 0, 1 and 2
So for A {0=10, 1=2, 2 =12} likewise
[('P', 0),
('S', 2),
('R', 1),
('O', 1),
('J', 1),
('E', 1),
('C', 1),
('T', 1),
('G', 1),
('U', 1),
('T', 1),
('E', 1),
('N', 1)]
I have tried
char_freq = {c:[0,0,0] for c in string.ascii_uppercase}
also
for i in range(3):
for x,i in a:
print(x,i)
I want to count X for i where X is [A-Z]
It should give me result like
Character | 0 | 1 | 2
A 10 5 4
although you don't supply enough example data to actually achieve your desired output.. i think this is what you're looking for:
from collections import Counter
import pandas as pd
l = [('P', 0),
('S', 2),
('R', 1),
('O', 1),
('J', 1),
('E', 1),
('C', 1),
('T', 1),
('G', 1),
('U', 1),
('T', 1),
('E', 1),
('N', 1)]
df = pd.DataFrame(l)
counts = df.groupby(0)[1].agg(Counter)
returns:
C {1: 1}
E {1: 2}
G {1: 1}
J {1: 1}
N {1: 1}
O {1: 1}
P {0: 1}
R {1: 1}
S {2: 1}
T {1: 2}
U {1: 1}
this will give you each ASCII character, along with each unique number, and how many occurrences of each number
from collections import Counter
l = [('A', 1),
('A', 1),
('A', 2),
('A', 2),
('B', 1),
('B', 2),
('B', 3),
('B', 4)]
data = {}
for k,v in l:
data[k] = [v] if k not in data else data[k] + [v]
char_freq = {k: dict(Counter(v)) for k, v in data.items()}
print(char_freq)
Outputs:
{'A': {1: 2, 2: 2}, 'B': {1: 1, 2: 1, 3: 1, 4: 1}}
your code looks fine you just have to make a small change to the char_freq variable to get the expected result:
char_freq = {c: {0: 0, 1: 0, 2: 0} for c in string.ascii_uppercase}
for x, i in a:
char_freq[x][i] += 1
to avoid having all the alphabet in your char_freq you could use only the necessary characters:
char_freq = {c: {0: 0, 1: 0, 2: 0} for c in {t[0] for t in a}}
for x, i in a:
char_freq[x][i] += 1
output:
{'O': {0: 0, 1: 1, 2: 0},
'T': {0: 0, 1: 2, 2: 0},
'N': {0: 0, 1: 1, 2: 0},
'G': {0: 0, 1: 1, 2: 0},
'U': {0: 0, 1: 1, 2: 0},
'E': {0: 0, 1: 2, 2: 0},
'J': {0: 0, 1: 1, 2: 0},
'R': {0: 0, 1: 1, 2: 0},
'C': {0: 0, 1: 1, 2: 0},
'S': {0: 0, 1: 0, 2: 1},
'P': {0: 1, 1: 0, 2: 0}}

Convert a list of tuples to a dictionary

I have a list of tuples with three elements:
A = [[(72, 1, 2), (96, 1, 4)],
[(72, 2, 1), (80, 2, 4)],
[],
[(96, 4, 1), (80, 4, 2), (70, 4, 5)],
[(70, 5, 4)],
]
I need to convert it to a dictionary in this format (note that the second element in the tuple will be the key):
A_dict = { 1: {2:72, 4:96},
2: {1:72, 4:80},
3: {},
4: {1:96, 2:80, 5:70},
5: {4:70},
}
Is there a way to convert A to A_dict?
I tried this:
A_dict = {b:{a:c} for a,b,c in A}
but I got an error:
ValueError: not enough values to unpack (expected 3, got 2)
You can just do:
A_dict = {k+1: {t[2]: t[0] for t in l} for k, l in enumerate(A)}
>>> A_dict
{
1: {2: 72, 4: 96},
2: {1: 72, 4: 80},
3: {},
4: {1: 96, 2: 80, 5: 70},
5: {4: 70}
}
By iterating on the indices of the list, according to its length. And for each value building its own dictionary:
A_dict = {i + 1 : {v[2] : v[0] for v in A[i]} for i in range(len(A))}
will output:
{1: {2: 72, 4: 96},
2: {1: 72, 4: 80},
3: {},
4: {1: 96, 2: 80, 5: 70},
5: {4: 70}}
Actually your desired code is:
A_dict = {A[i][0][1] : {v[2] : v[0] for v in A[i]} for i in range(len(A)) if len(A[i]) > 0}
But that will 'skip' the third line, as there is no list, thus not able to determinate the actual key, according to your specification.

Convert pandas multiindex dataframe to nested dictionary

I have a pandas multiindex dataframe that I'm trying to output as a nested dictionary.
# create the dataset
data = {'clump_thickness': {(0, 0): 274.0, (0, 1): 19.0, (1, 0): 67.0, (1, 1): 12.0, (2, 0): 83.0, (2, 1): 45.0, (3, 0): 16.0, (3, 1): 40.0, (4, 0): 4.0, (4, 1): 54.0, (5, 0): 0.0, (5, 1): 69.0, (6, 0): 0.0, (6, 1): 0.0, (7, 0): 0.0, (7, 1): 0.0, (8, 0): 0.0, (8, 1): 0.0, (9, 0): 0.0, (9, 1): 0.0}}
df = pd.DataFrame(data)
df.head()
# clump_thickness
# 0 0 274.0
# 1 19.0
# 1 0 67.0
# 1 12.0
# 2 0 83.0
df is the dataframe that I want to output as a nested dictionary. The output I'm looking for is in the form -
{"0":
{
"0":274,
"1":19
},
"1":{
"0":67,
"1":12
},
"2":{
"0":83,
"1":45
},
"3":{
"0":16,
"1":40
},
"4":{
"0":4,
"1":54
},
"5":{
"0":0,
"1":69
}
}
Here the first index forms the keys of the outer most dictionary. For each key we have a dictionary stored whose keys are the values in the second index.
When I do df.to_dict(), the instead of nesting, the multiindex is returned as a tuple. How do I achieve this?
For me working:
d = {l: df.xs(l)['clump_thickness'].to_dict() for l in df.index.levels[0]}
Another solution similar like DataFrame with MultiIndex to dict , but is necessary filter column for Series:
d = df.groupby(level=0).apply(lambda df: df.xs(df.name).clump_thickness.to_dict()).to_dict()
print (d)
{0: {0: 274.0, 1: 19.0},
1: {0: 67.0, 1: 12.0},
2: {0: 83.0, 1: 45.0},
3: {0: 16.0, 1: 40.0},
4: {0: 4.0, 1: 54.0},
5: {0: 0.0, 1: 69.0},
6: {0: 0.0, 1: 0.0},
7: {0: 0.0, 1: 0.0},
8: {0: 0.0, 1: 0.0},
9: {0: 0.0, 1: 0.0}}
df.unstack().clump_thickness.apply(lambda x: x.to_dict(), axis=1).to_dict()

pandas - create key value pair from grouped by data frame

I have a data frame with three columns, I would like to create a dictionary after applying groupby function on first and second column.I can do this by for loops, but is there any pandas way of doing it?
DataFrame:
Col X Col Y Sum
A a 3
A b 2
A c 1
B p 5
B q 6
B r 7
After grouping by on Col X and Col Y : df.groupby(['Col X','Col Y']).sum()
Sum
Col X Col Y
A a 3
b 2
c 1
B p 5
q 6
r 7
Dictionary I want to create
{A:{'a':3,'b':2,'c':1}, B:{'p':5,'q':6,'r':7}}
Use a dictionary comprehension while iterating via a groupby object
{name: dict(zip(g['Col Y'], g['Sum'])) for name, g in df.groupby('Col X')}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
If you insisted on using to_dict somewhere, you could do something like this:
s = df.set_index(['Col X', 'Col Y']).Sum
{k: s.xs(k).to_dict() for k in s.index.levels[0]}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
Keep in mind, that the to_dict method is just using some comprehension under the hood. If you have a special use case that requires something more than what the orient options provide for... there is no shame in constructing your own comprehension.
You can iterate over the MultiIndex series:
>>> s = df.set_index(['ColX', 'ColY'])['Sum']
>>> {k: v.reset_index(level=0, drop=True).to_dict() for k, v in s.groupby(level=0)}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
#A to_dict() solution
d = df.groupby(['Col X','Col Y']).sum().reset_index().pivot(columns='Col X',values='Sum').to_dict()
Out[70]:
{'A': {0: 3.0, 1: 2.0, 2: 1.0, 3: nan, 4: nan, 5: nan},
'B': {0: nan, 1: nan, 2: nan, 3: 5.0, 4: 6.0, 5: 7.0}}
#if you need to get rid of the nans:
{k1:{k2:v2 for k2,v2 in v1.items() if pd.notnull(v2)} for k1,v1 in d.items()}
Out[73]: {'A': {0: 3.0, 1: 2.0, 2: 1.0}, 'B': {3: 5.0, 4: 6.0, 5: 7.0}}

Convert redundant array to dict (or JSON)?

Suppose I have an array:
[['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
And I want a dict (or JSON):
{
'a': {
10: {1: 0.1, 2: 0.2},
20: {2: 0.3}
}
'b': {
10: {1: 0.4},
20: {2: 0.5}
}
}
Is there any good way or some library for this task?
In this example the array is just 4-column, but my original array is more complicated (7-column).
Currently I implement this naively:
import pandas as pd
df = pd.DataFrame(array)
grouped1 = df.groupby('column1')
for column1 in grouped1.groups:
group1 = grouped1.get_group(column1)
grouped2 = group1.groupby('column2')
for column2 in grouped2.groups:
group2 = grouped2.get_group(column2)
...
And defaultdict way:
d = defaultdict(lambda x: defaultdict(lambda y: defaultdict ... ))
for row in array:
d[row[0]][row[1]][row[2]... = row[-1]
But I think neither is smart.
I would suggest this rather simple solution:
from functools import reduce
data = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
result = dict()
for row in data:
reduce(lambda v, k: v.setdefault(k, {}), row[:-2], result)[row[-2]] = row[-1]
print(result)
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
An actual recursive solution would be something like this:
def add_to_group(keys: list, group: dict):
if len(keys) == 2:
group[keys[0]] = keys[1]
else:
add_to_group(keys[1:], group.setdefault(keys[0], dict()))
result = dict()
for row in data:
add_to_group(row, result)
print(result)
Introduction
Here is a recursive solution. The base case is when you have a list of 2-element lists (or tuples), in which case, the dict will do what we want:
>>> dict([(1, 0.1), (2, 0.2)])
{1: 0.1, 2: 0.2}
For other cases, we will remove the first column and recurse down until we get to the base case.
The code:
from itertools import groupby
def rows2dict(rows):
if len(rows[0]) == 2:
# e.g. [(1, 0.1), (2, 0.2)] ==> {1: 0.1, 2: 0.2}
return dict(rows)
else:
dict_object = dict()
for column1, groupped_rows in groupby(rows, lambda x: x[0]):
rows_without_first_column = [x[1:] for x in groupped_rows]
dict_object[column1] = rows2dict(rows_without_first_column)
return dict_object
if __name__ == '__main__':
rows = [['a', 10, 1, 0.1],
['a', 10, 2, 0.2],
['a', 20, 2, 0.3],
['b', 10, 1, 0.4],
['b', 20, 2, 0.5]]
dict_object = rows2dict(rows)
print dict_object
Output
{'a': {10: {1: 0.1, 2: 0.2}, 20: {2: 0.3}}, 'b': {10: {1: 0.4}, 20: {2: 0.5}}}
Notes
We use the itertools.groupby generator to simplify grouping of similar rows based on the first column
For each group of rows, we remove the first column and recurse down
This solution assumes that the rows variable has 2 or more columns. The result is unpreditable for rows which has 0 or 1 column.

Categories