Create a hierarchy from a dictionary of lists - python

I have a dictionary of lists:
a = {
'a': [1, 2, 3],
'b': [1, 2, 4],
'c': [1, 2],
'd': [1, 2, 3, 4, 5],
'e': [3],
'f': [3, 7],
'g': [3, 3],
'h': [3, 3, 3, 3, 3],
'i': [3, 3, 3, 3, 4],
}
And I would like to create hierarchical structure from this dictionary which will group items in the similar manner (exact structure does not matter, as well as the relation between elements is preserved):
/ \
/ \
e c
/\ /\
f g a b
/\ |
h i d
The hierarchy goes as follows: array g is a prefix of array h and i and therefore it is their ancestor. But e is a prefix of g, so it e is an ancestor of g.
Here is my idea how to achieve this result.
Sort the dictionary based on the number of elements in the list, which I was able to achieve with s = sorted(a.items(), key=lambda e: len(e[1])). This will give me the following structure:
.
('e', [3])
('c', [1, 2])
('g', [3, 3])
('f', [3, 7])
('a', [1, 2, 3])
('b', [1, 2, 4])
('d', [1, 2, 3, 4, 5])
('h', [3, 3, 3, 3, 3])
Right now I can find first parents by iterating through elements and checking if an element is a prefix of other elements. Starting with the first one. e is a prefix of g, f, and h. And c is a prefix of a, b, d. So these two elements are the parents.
right now I understand that I have to use recursion to enter inside of each parent and to perform the same operation, but I was not able to come up with a right solution.
So does anyone knows how to approach this problem. Or am I over-complicating things and there is an easier way to achieve the solution.
P.S. this is not a homework assignment or interview question (also it might be). This is just my abstraction from a problem I am trying to solve.

Other people already give the methord, I just write some code here:
First sort:
t = sorted(a.items(), key=lambda x: x[1])
The build the structure
ret = {}
def build(ret, upv):
if not t:
return (None, None)
k, v = t.pop(0)
while k and v:
if upv and v[:len(upv)] != upv:
return (k, v)
r = {}
ret[k] = r
k, v = build(r, v)
return None, None
build(ret, None)
print ret

given an object that has a list of children, and an is_prefix function, and your sorted list of objects, I don't see why this wouldn't work
for indx, potential_prefix in enumerate(your_list):
for potential_child in your_list[indx:]:
if is_prefix(potential_prefix, potential_child):
potential_prefix.add_child(potential_child)
# and optionally
potential_child.add_parent(potential_prefix)

How about building the tree with a set of nested dictionaries, so that you'd access the e node by tree[3] and the h node by tree[3][3][3][3][3]:
from collections import nested
def nested():
return defaultdict(nested)
def build_tree(data):
tree = nested()
for name, path in data.items():
d = tree
for p in path:
d = d[p]
d["value"] = name
return tree
Example output:
>>> a = {
'a': [1, 2, 3],
'b': [1, 2, 4],
'c': [1, 2],
'd': [1, 2, 3, 4, 5],
'e': [3],
'f': [3, 7],
'g': [3, 3],
'h': [3, 3, 3, 3, 3],
'i': [3, 3, 3, 3, 4],
}
>>> import json # for pretty printing, note that in python the keys are ints, not str
>>> print(json.dumps(build_tree(a), indent=4))
{
"1": {
"2": {
"3": {
"4": {
"5": {
"value": "d"
}
},
"value": "a"
},
"4": {
"value": "b"
},
"value": "c"
}
},
"3": {
"7": {
"value": "f"
},
"3": {
"3": {
"3": {
"3": {
"value": "h"
},
"4": {
"value": "i"
}
}
},
"value": "g"
},
"value": "e"
}
}

Just sort arrays in lexicographical order:
(c,[1,2]),
(a,[1,2,3]),
(d,[1,2,3,4,5]),
(b,[1,2,4]),
(e,[3]),
(g,[3,3]),
(h,[3,3,3,3,3]),
(i,[3,3,3,3,4]),
(f,[3,7])
Then solution is pretty obvious.
root
Lc
|La
||Ld
|Lb
Le
Lg
|Lh
|Li
Lf
You need only track path form parent by prefix. From previous line. You will form somethink like stack. root has empty set so push it on stack. c has (empty) prefix as root so root is parent of c. Push c on stack. a has prefix which is c on top of stack so c is parent of a. push a on stack. d has prefix same as a on top of stack so a is parent of d and push on stack. b doesn't have prefix d on top of stack so pop. Same for a then pop. Now there is c which is prefix so b has parent c. Push b on stack. And continue in same way.
In Erlang simply:
-module(tree_from_prefix).
-export([tree/1]).
is_prefix(_, []) -> true;
is_prefix([H|A], [H|B]) -> is_prefix(A, B);
is_prefix(_, _) -> false.
tree(L) ->
tree(lists:keysort(2, L), [{root, []}]).
tree([], _) -> [];
tree([{X, L} = Record|T] = List, [{Parent, Prefix}|R] = Stack) ->
case is_prefix(L, Prefix) of
true -> [{Parent, X}|tree(T, [Record|Stack])];
false -> tree(List, R)
end.
And result
1> tree_from_prefix:tree([{e,[3]},{c,[1, 2]},{g,[3, 3]},{f,[3, 7]},{a,[1, 2, 3]},{b, [1, 2, 4]},{d,[1, 2, 3, 4, 5]},{h,[3, 3, 3, 3, 3]},{i,[3, 3, 3, 3, 4]}]).
[{root,c},
{c,a},
{a,d},
{c,b},
{root,e},
{e,g},
{g,h},
{g,i},
{e,f}]
In python it will not be so elegant but same algorithm will work too.

Related

Visualising rack network

I don't know where to start with this (programmatically) so I will describe input and output.
I have dictionary like this:
racks = {
"Rack_01" : [1, 2, 3],
"Rack_02" : [3, 4, 5],
"Rack_03" : [1, 2, 4, 5],
}
So generally, rack names with cable names. If the same cable is present in the two racks, it means they are connected.
Of course I have like 20 racks, and around 140 cables. Maximum connection to one rack is around 40 cables.
I would like to have nodes with names of the racks and connections to be named as cable that is connecting them.
Similar to this (shape could be different, just symbolic representation):
As a starting point, here's a script to convert that dictionary into a networkx graph, with the nodes and edges labeled correctly. Each pair of nodes has the correct number of edges connecting them.
from collections import defaultdict as dd
d = {
"Rack_01" : [1, 2, 3],
"Rack_02" : [3, 4, 5],
"Rack_03" : [1, 2, 4, 5],
}
m = len(d) #number of nodes
edge_set = set([i for v in d.values() for i in v])
n = len(edge_set) # number of edges
edge_label = dict(enumerate(edge_set))
node_label = dict(enumerate(d))
# number
inc_mat = np.array([[edge_label[j] in d[label[i]]
for j in range(n)]
for i in range(m)],dtype=int)
adj_mat = np.zeros((m,m),dtype=int)
nodes_to_edge = dd(list)
for k,col in enumerate(inc_mat.T):
i,j = np.nonzero(col)[0]
adj_mat[[i,j],[j,i]]+=1
nodes_to_edge[(i,j)].append(edge_label[k])
G = nx.from_numpy_array(adj_mat, parallel_edges=True,create_using=nx.MultiGraph)
for u,v,d in G.edges(data=True):
d['label'] = nodes_to_edge[(u,v)].pop()
nx.relabel_nodes(G,node_label,copy=False)
From there, you could use the answer here to generate your visualization.
The result of print(G.edges(data=True)), for reference:
[('Rack_01', 'Rack_02', {'weight': 1, 'label': 3}), ('Rack_01', 'Rack_03', {'weight': 1, 'label': 2}), ('Rack_01', 'Rack_03', {'weight': 1, 'label': 1}), ('Rack_02', 'Rack_03', {'weight': 1, 'label': 5}), ('Rack_02', 'Rack_03', {'weight': 1, 'label': 4})]

Adding values in a list if they are the same, with a twist

I have a question (python) regarding adding values in a list if they share the same key in a first list. So for example we have:
lst1 = [A, A, B, A, C, D]
lst2 = [1, 2, 3, 4, 5, 6]
What I would like to know is how I can add the numbers in lst2 if the strings in lst1 are the sane. The end result would thus be:
new_lst1 = [A, B, A, C, D]
new_lst2 = [3, 3, 4, 5, 6]
where
new_lst2[0] = 1+2
So values only get added when the are next to each other.
To make it more complicated, it should also be possible if we have this example:
lst3 = [A, A, A, B, B, A, A]
lst4 = [1, 2, 3, 4, 5, 6, 7]
for which the result has to be:
new_lst3 = [A, B, A]
new_lst4 = [6, 9, 13]
where
new_lst4[0] = 1 + 2 + 3, new_lst4[1] = 4 + 5, and new_lst4[2] = 6 + 7.
Thank you in advance!
for a bit of background:
I wrote a code that searches in Dutch online underground models and returns the data of the underground of a specific input location.
The data is made up of layers:
Layer1, Name: "BXz1", top_layer, bottom_layer, transitivity
Layer2, Name: "BXz2", top_layer, bottom_layer, transitivity
Layer3, Name: "KRz1", top_layer, bottom_layer, transitivity
etc..
BXz1 and BXz2 are the same main layer however different sublayers. In terms of transitivity I would like to combine them if they are next to each other.
so in that way i would get:
Layer1+2, Name: BX, top_layer1, bottom_layer2, combined transitivity
Layer3, Name: "KRz1", top_layer, bottom_layer, transitivity
If you're not allowed to use libraries, you could do it with a simple loop using zip() to pair up the keys and values.
lst3 = ["A", "A", "A", "B", "B", "A", "A"]
lst4 = [1, 2, 3, 4, 5, 6, 7]
new_lst3,new_lst4 = lst3[:1],[0] # initialize with first key
for k,n in zip(lst3,lst4): # pair up keys and numbers
if new_lst3[-1] != k: # add new items if key changed
new_lst3.append(k)
new_lst4.append(0)
new_lst4[-1] += n # tally for current key
print(new_lst3) # ['A', 'B', 'A']
print(new_lst4) # [6, 9, 13]
If you're okay with libraries, groupby from itertools combined with an iterator on the keys will allow you to express it more concisely:
from itertools import groupby
tally = ((k,sum(n)) for i3 in [iter(lst3)]
for k,n in groupby(lst4,lambda _:next(i3)))
new_lst3,new_lst4 = map(list,zip(*tally))
The itertools.groupby function in the standard library provides the base functionality you need. It's then a matter of giving it the right key and tallying the count in each group.
Here's my implementation:
from itertools import groupby
def tally_by_group(keys, counts):
groups = groupby(zip(keys, counts), key=lambda x: x[0])
tallies = [
(key, sum(count for _, count in group))
for key, group in groups
]
return tuple(list(l) for l in zip(*tallies))
Code explanation:
my first zip() creates (key, count) tuples from the two lists,
the groupby groups them by the first element of each tuple, i.e., by the key,
then I construct a list of (key, sum(count)) into tallies,
and finally unpack that back into two lists for the results.
Tests with your examples:
lst1 = ["A", "A", "B", "A", "C", "D"]
lst2 = [1, 2, 3, 4, 5, 6]
l1m, l2m = tally_by_group(lst1, lst2)
print(l1m)
print(l2m)
outputs:
['A', 'B', 'A', 'C', 'D']
[3, 3, 4, 5, 6]
And
lst3 = ["A", "A", "A", "B", "B", "A", "A"]
lst4 = [1, 2, 3, 4, 5, 6, 7]
l3m, l4m = tally_by_group(lst3, lst4)
print(l3m)
print(l4m)
outputs:
['A', 'B', 'A']
[6, 9, 13]

Nesting dictionary algorithm

Suppose I have the following dictionary:
{'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
I wish to write an algorithm which outputs the following:
{
"a": 0,
"b": 1,
"c": {
"c": 2,
"c.1": 3
},
"d":{
"d": 4,
"d.1": {
"d.1": 5,
"d.1.2": 6
}
}
}
Note how the names are repeated inside the dictionary. And some have variable level of nesting (eg. "d").
I was wondering how you would go about doing this, or if there is a python library for this? I know you'd have to use recursion for something like this, but my recursion skills are quite poor. Any thoughts would be highly appreciated.
You can use a recursive function for this or just a loop. The tricky part is wrapping existing values into dictionaries if further child nodes have to be added below them.
def nested(d):
res = {}
for key, val in d.items():
t = res
# descend deeper into the nested dict
for x in [key[:i] for i, c in enumerate(key) if c == "."]:
if x in t and not isinstance(t[x], dict):
# wrap leaf value into another dict
t[x] = {x: t[x]}
t = t.setdefault(x, {})
# add actual key to nested dict
if key in t:
# already exists, go one level deeper
t[key][key] = val
else:
t[key] = val
return res
Your example:
d = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
print(nested(d))
# {'a': 0,
# 'b': 1,
# 'c': {'c': 2, 'c.1': 3},
# 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}
Nesting dictionary algorithm ...
how you would go about doing this,
sort the dictionary items
group the result by index 0 of the keys (first item in the tuples)
iterate over the groups
if there are is than one item in a group make a key for the group and add the group items as the values.
Slightly shorter recursion approach with collections.defaultdict:
from collections import defaultdict
data = {'a': 0, 'b': 1, 'c': 2, 'c.1': 3, 'd': 4, 'd.1': 5, 'd.1.2': 6}
def group(d, p = []):
_d, r = defaultdict(list), {}
for n, [a, *b], c in d:
_d[a].append((n, b, c))
for a, b in _d.items():
if (k:=[i for i in b if i[1]]):
r['.'.join(p+[a])] = {**{i[0]:i[-1] for i in b if not i[1]}, **group(k, p+[a])}
else:
r[b[0][0]] = b[0][-1]
return r
print(group([(a, a.split('.'), b) for a, b in data.items()]))
Output:
{'a': 0, 'b': 1, 'c': {'c': 2, 'c.1': 3}, 'd': {'d': 4, 'd.1': {'d.1': 5, 'd.1.2': 6}}}

Regarding dictionary value manipulation in python

CONTEXT:
The code is to be used for representing graphs for use in implementations of graph search algorithms (like Breadth-First Search).
I want to store the graph in form of a dictionary, where keys represent the nodes and each key has three corresponding values. First is a set of nodes with which the "key" shares an edge. Second is a Boolean flag for showing visited/not visited. Third is distance of the "key" from starting node.
"""
The 'test.txt' file contains the following:
1 2 3 4 5
2 1 3 4 5
3 1 2 5
4 1 2
5 1 2 3
"""
import math as m
def readGraph(path):
a = {}
file = open(path)
data = file.readlines()
for line in data:
items = line.split()
items = [int(i) for i in items]
a[items[0]] = items[1:len(items) + 1], 0, m.inf
return a
if __name__ == '__main__':
G = readGraph('test.txt')
print(G)
The dictionary (stored in 'G') for the given file is:
G = {1: ([2, 3, 4, 5], 0, inf), 2: ([1, 3, 4, 5], 0, inf), 3: ([1, 2, 5], 0, inf), 4: ([1, 2], 0, inf), 5: ([1, 2, 3], 0, inf)}
DOUBT:
Suppose now I want to change the second value of key 1, from 0 to 1.
Typing G[1] = G[1][0], 1, G[1][2] does not seem efficient.
Is there a better approach?
UPDATE:
I tried saving the dictionary entries as lists, but that is undesirable as it would change the format of dictionary, which I want to implement.
The following is a solution, but still I want to use the dictionary in its default form, with the elements of each key stored as tuple.
if __name__ == '__main__':
G = readGraph('test.txt')
print(G)
G[1] = list(G[1])
G[1][1] = 1
print(G)
One way of doing this, you can use nested dictionary to store the node. Here is how graph G will look like.
G = {
1: {
'nodes': [2, 3, 4, 5],
'is_visited': 0,
'distance': 'inf'
},
2: {
'nodes': [1, 3, 4, 5],
'is_visited': 0,
'distance': 'inf'
}
}
and then you can get values by indexing.
G[1]['is_visited'] = 1
You can store the values for each node as a list instead of a tuple:
a[items[0]] = [items[1:len(items) + 1], 0, m.inf]
and then just update the value you want directly:
G[1][1] = 1
Another option (thanks to Bilal for his suggestion of this approach) is nested dictionaries:
a[items[0]] = {"edges": items[1:len(items) + 1], "is_visited": 0, "dist": m.inf}
Then you could access the individual elements as:
G[1]["is_visited"] = 1

Traverse a dictionary recursively in Python?

What is the better way to traverse a dictionary recursively?
Can I do it with lambda or/and list comprehension?
I have:
[
{
"id": 1,
"children": [
{
"id": 2,
"children": []
}
]
},
{
"id": 3,
"children": []
},
{
"id": 4,
"children": [
{
"id": 5,
"children": [
{
"id": 6,
"children": [
{
"id": 7,
"children": []
}
]
}
]
}
]
}
]
I want:
[1,2,3,4,5,6,7]
You can recursively traverse your dictionaries, with this generic generator function, like this
def rec(current_object):
if isinstance(current_object, dict):
yield current_object["id"]
for item in rec(current_object["children"]):
yield item
elif isinstance(current_object, list):
for items in current_object:
for item in rec(items):
yield item
print list(rec(data))
# [1, 2, 3, 4, 5, 6, 7]
The easiest way to do this will be with a recursive function:
recursive_function = lambda x: [x['id']] + [item for child in x['children'] for item in recursive_function(child)]
result = [item for topnode in whatever_your_list_is_called for item in recursive_function(topnode)]
My solution:
results = []
def function(lst):
for item in lst:
results.append(item.get('id'))
function(item.get('children'))
function(l)
print results
[1, 2, 3, 4, 5, 6, 7]
The dicter library can be useful. You can easily flatten or traverse the dictionary paths.
pip install dicter
import dicter as dt
# Example dict:
d = {'level_a': 1, 'level_b': {'a': 'hello world'}, 'level_c': 3, 'level_d': {'a': 1, 'b': 2, 'c': {'e': 10}}, 'level_e': 2}
# Walk through dict to get all paths
paths = dt.path(d)
print(paths)
# [[['level_a'], 1],
# [['level_c'], 3],
# [['level_e'], 2],
# [['level_b', 'a'], 'hello world'],
# [['level_d', 'a'], 1],
# [['level_d', 'b'], 2],
# [['level_d', 'c', 'e'], 10]]
The first column is the key path. The 2nd column are the values. In your case, you can take in the 1st column all last elements.

Categories