Does anyone have a neat way of packing a dataframe including some columns which indicate hierarchy into a nested array?
Say I have the following data frame:
from pandas import DataFrame
df = DataFrame(
{
"var1": [1, 2, 3, 4, 9],
"var2": [5, 6, 7, 8, 9],
"group_1": [1, 1, 1, 1, 2],
"group_2": [None, 1, 2, 1, None],
"group_3": [None, None, None, 1, None],
}
)
var1 var2 group_1 group_2 group_3
0 1 5 1 NaN NaN
1 2 6 1 1.0 NaN
2 3 7 1 2.0 NaN
3 4 8 1 1.0 1.0
4 9 9 2 NaN NaN
The group_ columns show that the records on the 2nd and 3rd rows are children of the one on the first row. The 4th row is a child of the 2nd row, and the last row of the table has no children. I am looking to derive something like the following:
[
{
"var1": 1,
"var2": 5,
"children": [
{
"var1": 2,
"var2": 6,
"children": [{"var1": 4, "var2": 8, "children": []}],
},
{"var1": 3, "var2": 7, "children": []},
],
},
{"var1": 9, "var2": 9, "children": []},
]
You could try if the following recursive .groupby over the group_n columns works for you:
def nest_it(df, level=1):
record = {"var1": None, "var2": None, "children": []}
for key, gdf in df.groupby(f"group_{level}", dropna=False):
if pd.isna(key):
record["var1"], record["var2"] = map(int, gdf.iloc[0, 0:2])
elif level == 3:
var1, var2 = map(int, gdf.iloc[0, 0:2])
record["children"].append({"var1": var1, "var2": var2, "children": []})
else:
record["children"].append(nest_it(gdf, level=level + 1))
return record
result = nest_it(df)["children"]
While going over the key, group tuples from a (nested) df.groupby("group_n") 3 things could happen:
The key is a NaN, i.e. it's time to record the vars and there aren't any more children.
The level is 3, i.e. the end of the dataframe is reached, so it's also time to wrap up, but this time as child.
Otherwise (recursion): Put the recursively retrieved children in the resp. list.
Remark: I've only initialized the record dicts front up to get the item order as in your expected output.
Result for the sample:
[{'var1': 1,
'var2': 5,
'children': [{'var1': 2,
'var2': 6,
'children': [{'var1': 4, 'var2': 8, 'children': []}]},
{'var1': 3, 'var2': 7, 'children': []}]},
{'var1': 9, 'var2': 9, 'children': []}]
with a list of dict, say list1 like below
[
{'subId': 0, 'mainIds': [0]},
{'subId': 3, 'mainIds': [0, 3, 4, 5], 'parameter': 'off', 'Info': 'true'}
]
Need to convert to below format.
[
{'mainId': 0, 'subIds':[0,3]},
{'mainId': 3, 'subIds': [3] },
{'mainId': 4, 'subIds': [3] },
{'mainId': 5, 'subIds': [3]}
]
What is tried so far
finalRes = []
for i in list1:
subId = i['subId']
for j in i['mainIds']:
res = {}
res[mainId] = j
res['subIds'] = []
res['subIds'].append(subId)
finalRes.append(res)
This gives something closer to the required output. Need help with getting the output mentioned above. Is there any popular name for this kind of operation (something like one to many to many to one ?)
[
{'mainId': 0, 'subIds':[0]},
{'mainId': 0, 'subIds':[3]}
{'mainId': 3, 'subIds': [3]},
{'mainId': 4, 'subIds': [3]},
{'mainId': 5, 'subIds': [3]}
]
This kinds of joins can be implemented easily with defaultdict:
subs_by_main_id = defaultdict(list)
for entry in list1:
sub_id = entry['subId']
for main_id in entry['mainIds']:
subs_by_main_id[main_id].append(sub_id)
return [{'mainId': main_id, 'subIds': sub_ids}
for main_id, sub_ids in sub_by_main_id.items()]
Here's a solution using comprehensions and itertools.chain. Start by converting the lists to sets, for fast membership tests; then build the result directly. It is not as efficient as the defaultdict solution.
from itertools import chain
sets = { d['subId']: set(d['mainIds']) for d in data }
result = [
{'mainId': i, 'subIds': [ j for j, v in sets.items() if i in v ]}
for i in set(chain.from_iterable(sets.values()))
]
I have
l1 = [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
l2 = [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
l3 = [{"value": 1, "label": "One"}, {"value": 3, "label": "Three"}]
l4 = [{"value": 4, "label": "Four"}]
and I need something like this:
def foo(*lists):
...
that returns:
foo(l1, l2) -> [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
foo(l2, l3) -> [{"value": 1, "label": "One"}]
foo(l1, l2, l3) -> [{"value": 1, "label": "One"}]
foo(l1, l2, l3, l4) -> []
Edit (sorry I truncated part of the question):
The order in the output list doesn't matter.
I tried to use the sets but the dicts inside the lists are unhashable.
So I tried to transform dicts in frozendict or tuple but the keys order in input dict should not be significant:
{"value": 1, "label": "One"} == {"label": "One", "value": 1}
l5 = [{"value": 1, "label": "One"}]
l6 = [{"label": "One", "value": 1}]
foo(l5, l6) -> [{"value": 1, "label": "One"}]
Thanks so much.
You can convert the list of dicts to set of tuples of dict items so that you can use functools.reduce to perform set.intersection on all the sets, and then convert the resulting sequence of sets to a list of dicts by mapping the sequence to the dict constructor:
from functools import reduce
def intersection(*lists):
return list(map(dict, reduce(set.intersection, ({tuple(d.items()) for d in l} for l in lists))))
so that with your sample input:
print(intersection(l1, l2))
print(intersection(l2, l3))
print(intersection(l1, l2, l3))
print(intersection(l1, l2, l3, l4))
would output:
[{'value': 1, 'label': 'One'}, {'value': 2, 'label': 'Two'}]
[{'value': 1, 'label': 'One'}]
[{'value': 1, 'label': 'One'}]
[]
I have a dictionary of lists:
a = {
'a': [1, 2, 3],
'b': [1, 2, 4],
'c': [1, 2],
'd': [1, 2, 3, 4, 5],
'e': [3],
'f': [3, 7],
'g': [3, 3],
'h': [3, 3, 3, 3, 3],
'i': [3, 3, 3, 3, 4],
}
And I would like to create hierarchical structure from this dictionary which will group items in the similar manner (exact structure does not matter, as well as the relation between elements is preserved):
/ \
/ \
e c
/\ /\
f g a b
/\ |
h i d
The hierarchy goes as follows: array g is a prefix of array h and i and therefore it is their ancestor. But e is a prefix of g, so it e is an ancestor of g.
Here is my idea how to achieve this result.
Sort the dictionary based on the number of elements in the list, which I was able to achieve with s = sorted(a.items(), key=lambda e: len(e[1])). This will give me the following structure:
.
('e', [3])
('c', [1, 2])
('g', [3, 3])
('f', [3, 7])
('a', [1, 2, 3])
('b', [1, 2, 4])
('d', [1, 2, 3, 4, 5])
('h', [3, 3, 3, 3, 3])
Right now I can find first parents by iterating through elements and checking if an element is a prefix of other elements. Starting with the first one. e is a prefix of g, f, and h. And c is a prefix of a, b, d. So these two elements are the parents.
right now I understand that I have to use recursion to enter inside of each parent and to perform the same operation, but I was not able to come up with a right solution.
So does anyone knows how to approach this problem. Or am I over-complicating things and there is an easier way to achieve the solution.
P.S. this is not a homework assignment or interview question (also it might be). This is just my abstraction from a problem I am trying to solve.
Other people already give the methord, I just write some code here:
First sort:
t = sorted(a.items(), key=lambda x: x[1])
The build the structure
ret = {}
def build(ret, upv):
if not t:
return (None, None)
k, v = t.pop(0)
while k and v:
if upv and v[:len(upv)] != upv:
return (k, v)
r = {}
ret[k] = r
k, v = build(r, v)
return None, None
build(ret, None)
print ret
given an object that has a list of children, and an is_prefix function, and your sorted list of objects, I don't see why this wouldn't work
for indx, potential_prefix in enumerate(your_list):
for potential_child in your_list[indx:]:
if is_prefix(potential_prefix, potential_child):
potential_prefix.add_child(potential_child)
# and optionally
potential_child.add_parent(potential_prefix)
How about building the tree with a set of nested dictionaries, so that you'd access the e node by tree[3] and the h node by tree[3][3][3][3][3]:
from collections import nested
def nested():
return defaultdict(nested)
def build_tree(data):
tree = nested()
for name, path in data.items():
d = tree
for p in path:
d = d[p]
d["value"] = name
return tree
Example output:
>>> a = {
'a': [1, 2, 3],
'b': [1, 2, 4],
'c': [1, 2],
'd': [1, 2, 3, 4, 5],
'e': [3],
'f': [3, 7],
'g': [3, 3],
'h': [3, 3, 3, 3, 3],
'i': [3, 3, 3, 3, 4],
}
>>> import json # for pretty printing, note that in python the keys are ints, not str
>>> print(json.dumps(build_tree(a), indent=4))
{
"1": {
"2": {
"3": {
"4": {
"5": {
"value": "d"
}
},
"value": "a"
},
"4": {
"value": "b"
},
"value": "c"
}
},
"3": {
"7": {
"value": "f"
},
"3": {
"3": {
"3": {
"3": {
"value": "h"
},
"4": {
"value": "i"
}
}
},
"value": "g"
},
"value": "e"
}
}
Just sort arrays in lexicographical order:
(c,[1,2]),
(a,[1,2,3]),
(d,[1,2,3,4,5]),
(b,[1,2,4]),
(e,[3]),
(g,[3,3]),
(h,[3,3,3,3,3]),
(i,[3,3,3,3,4]),
(f,[3,7])
Then solution is pretty obvious.
root
Lc
|La
||Ld
|Lb
Le
Lg
|Lh
|Li
Lf
You need only track path form parent by prefix. From previous line. You will form somethink like stack. root has empty set so push it on stack. c has (empty) prefix as root so root is parent of c. Push c on stack. a has prefix which is c on top of stack so c is parent of a. push a on stack. d has prefix same as a on top of stack so a is parent of d and push on stack. b doesn't have prefix d on top of stack so pop. Same for a then pop. Now there is c which is prefix so b has parent c. Push b on stack. And continue in same way.
In Erlang simply:
-module(tree_from_prefix).
-export([tree/1]).
is_prefix(_, []) -> true;
is_prefix([H|A], [H|B]) -> is_prefix(A, B);
is_prefix(_, _) -> false.
tree(L) ->
tree(lists:keysort(2, L), [{root, []}]).
tree([], _) -> [];
tree([{X, L} = Record|T] = List, [{Parent, Prefix}|R] = Stack) ->
case is_prefix(L, Prefix) of
true -> [{Parent, X}|tree(T, [Record|Stack])];
false -> tree(List, R)
end.
And result
1> tree_from_prefix:tree([{e,[3]},{c,[1, 2]},{g,[3, 3]},{f,[3, 7]},{a,[1, 2, 3]},{b, [1, 2, 4]},{d,[1, 2, 3, 4, 5]},{h,[3, 3, 3, 3, 3]},{i,[3, 3, 3, 3, 4]}]).
[{root,c},
{c,a},
{a,d},
{c,b},
{root,e},
{e,g},
{g,h},
{g,i},
{e,f}]
In python it will not be so elegant but same algorithm will work too.