How to convert a binary tree in to a dictionary of levels - python

In python and any other language it is quite easy to to traverse (in level order so BFS) a binary tree using a queue data structure. Given an adjecency list representation in python and the root of a tree I can traverse the tree in level order and print level elements in order. Nonetheless what I cannot do is go from an adjecency list representation to a level_dictionary or something of the likes:
so for example I would like to go from
adjecency_list = {'A': {'B','C'}, 'C':{'D'}, 'B': {'E'}}
to
levels = {0: ['A'], 1: ['B','C'], 2: ['D','E']}
So far I have the following:
q = Queue()
o = OrderedDict()
root = find_root(adjencency_list) # Seperate function it works fine
height = find_height(root, adjencency_list) # Again works fine
q.put(root)
# Creating a level ordered adjecency list
# using a queue to keep track of pointers
while(not q.empty()):
current = q.get()
try:
if(current in adjencency_list):
q.put(list(adjencency_list[current])[0])
# Creating ad_list in level order
if current in o:
o[current].append(list(adjencency_list[current])[0])
else:
o[current] = [list(adjencency_list[current])[0]]
if(current in adjencency_list):
q.put(list(adjencency_list[current])[1])
# Creating ad_list in level order
if current in o:
o[current].append(list(adjencency_list[current])[1])
else:
o[current] = [list(adjencency_list[current])[1]]
except IndexError:
pass
All it does is place the adjecency list in the the correct level orders for the tree and if I printed a the start of the loop it would print in level order traversal. Nonetheless it does not solve my problem. I am aware adjecency list is not the best representation for a tree but I require using it for the task I am doing.

A recursive way to create the level dictionary from your adjacency list would be -
def level_dict(adj_list,curr_elems,order=0):
if not curr_elems: # This check ensures that for empty `curr_elems` list we return empty dictionary
return {}
d = {}
new_elems = []
for elem in curr_elems:
d.setdefault(order,[]).append(elem)
new_elems.extend(adj_list.get(elem,[]))
d.update(level_dict(adj_list,new_elems,order+1))
return d
The starting input to the method would be the root element in a list, example - ['A'] , and the initial level, which would be 0.
In each level, it takes the chlidren of the elements at that level and creates a new list, and at the same time, creates the level dictionary (in d) .
Example/Demo -
>>> adjecency_list = {'A': {'B','C'}, 'C':{'D'}, 'B': {'E'}}
>>> def level_dict(adj_list,curr_elems,order=0):
... if not curr_elems:
... return {}
... d = {}
... new_elems = []
... for elem in curr_elems:
... d.setdefault(order,[]).append(elem)
... new_elems.extend(adj_list.get(elem,[]))
... d.update(level_dict(adj_list,new_elems,order+1))
... return d
...
>>> level_dict(adjecency_list,['A'])
{0: ['A'], 1: ['C', 'B'], 2: ['D', 'E']}

Related

How can I convert a given list of lists to a tree structure in Python?

So I'm basically stuck at this problem. The problem gives us a parts inventory list where each component is made up of other components (in a tree relational manner). For example the input list can be given as
[["A",[2,"B"],[1,"C"]],
["B",[1,"D"],[1,"E"]],
["D",20.],
["E",10.]
["C",40.]]
, where A is made up of 2 Bs and 1 C, similarly B is made up of 1 D and 1 E. The lists with a float as the last index simply indicate the unit price of the given basic part.
The problem is, I need to convert this structure to a tree representation which can be written as;
[1,"A",[2,"B",[1,"D",20.],[1,"E",10.]], [1,"C",40.]]
where we simply bury the children of each node as a list in a nested list structure. In order to achieve this, I tried a recursive-iterative algorithm but as we don't know how many children a node has or what the depth of the tree is, I wasn't able to do so.
Can you recommend me a solution for this problem, thanks in advance.
P.S: There is not a predefined order for the input list, its elements can be placed from bottom to top of the tree or shuffled.
If your input structure remains same then you can try something like
e = [["A",[2,"B"],[1,"C"]],
["B",[1,"D"],[1,"E"]],
["D",20.],
["E",10.],
["C",40.]]
record = {}
for i in reversed(e):
if(len(i) == 2):
record[i[0]] = i[1]
else:
# there is children
temp = []
for j in i[1:]:
if(isinstance(record[j[1]], list)):
temp.append([*j, *record[j[1]]])
else:
temp.append([*j, record[j[1]]])
record[i[0]] = temp
root = e[0][0]
print([1, root, *record[root]])
output
[1, 'A', [2, 'B', [1, 'D', 20.0], [1, 'E', 10.0]], [1, 'C', 40.0]]
Otherwise, you can create a Tree structure and get the output.
You can leverage the fact that lists are pointers to perform the linking in one pass by copying the children list references within the parent lists:
def buildTree(data):
index = { c[0]:c for c in data } # used to access components by their id
root = index.copy() # used to retain top level component(s)
for component in data:
if isinstance(component[1],float): continue # no linking on leaf items
for i,(_,prod) in enumerate(component[1:],1): # expand children
component[i][1:] = index[prod] # embed child in parent
root.pop(prod) # remove child from root
return [[1,*rv] for rv in root.values()] # output root item(s)
output:
data = [["A",[2,"B"],[1,"C"]],
["B",[1,"D"],[1,"E"]],
["D",20.0],
["E",10.0],
["C",40.0]]
print(*buildTree(data))
# [1, 'A', [2, 'B', [1, 'D', 20.0], [1, 'E', 10.0]], [1, 'C', 40.0]]
Changing the order of the data does not change the result
data = [["D",20.0],
["E",10.0],
["B",[1,"D"],[1,"E"]],
["C",40.0],
["A",[2,"B"],[1,"C"]]]
print(*buildTree(data))
# [1, 'A', [2, 'B', [1, 'D', 20.0], [1, 'E', 10.0]], [1, 'C', 40.0]]
Note that, if your data has multiple root items, the function will output them all. In this instance there is only one so it printed only one root
without dictionaries
If you are not allowed to use dictionaries, you can still use this approach but you'll have to do a sequential search through the data to find products by their id:
def buildTree(data):
roots = data.copy() # used to retain top level component(s)
for component in data:
if isinstance(component[1],float): continue # no linking on leaf items
for i,(_,prod) in enumerate(component[1:],1): # expand children
child = next(c for c in data if c[0]==prod) # find child with id
component[i][1:] = child # embed child in parent
roots.remove(child) # remove child from root
return [[1,*rv] for rv in roots] # output root items

consolidating list of sets

Given a list of sets (sets of strings such as setlist = [{'this','is'},{'is','a'},{'test'}]), the idea is to join pairwise -union- sets that share strings in common. The snippet below takes the literal approach of testing pairwise overlap, joining, and starting anew using an inner loop break.
I know this is the pedestrian approach, and it does take forever for lists of usable size (200K sets of between 2 and 10 strings).
Any advice on how to make this more efficient? Thanks.
j = 0
while True:
if j == len(setlist): # both for loops are done
break # while
for i in range(0,len(setlist)-1):
for j in range(i+1,len(setlist)):
a = setlist[i];
b = setlist[j];
if not set(a).isdisjoint(b): # ... then join them
newset = set.union( a , b ) # ... new set
del setlist[j] # ... drop highest index
del setlist[i] # ... drop lowest index
setlist.insert(0,newset) # ... introduce consolidated set, which messes up i,j
break # ... back to the top for fresh i,j
else:
continue
break
As #user2357112 mentioned in comments this can be thought of as a graph problem. Every set is a vertex and every word shared between two sets is an edge. Then you can just iterate over vertices and do BFS (or DFS) for every unseen vertex to generate a connected component.
Other option is to use Union-Find. The advantage of the union find is that you don't need to construct a graph and there's no degenerate case when all the sets have same contents. Here's an example of it in action:
from collections import defaultdict
# Return ancestor of given node
def ancestor(parent, node):
if parent[node] != node:
# Do path compression
parent[node] = ancestor(parent, parent[node])
return parent[node]
def merge(parent, rank, x, y):
# Merge sets that x & y belong to
x = ancestor(parent, x)
y = ancestor(parent, y)
if x == y:
return
# Union by rank, merge smaller set to larger one
if rank[y] > rank[x]:
x, y = y, x
parent[y] = x
rank[x] += rank[y]
def merge_union(setlist):
# For every word in sets list what sets contain it
words = defaultdict(list)
for i, s in enumerate(setlist):
for w in s:
words[w].append(i)
# Merge sets that share the word
parent = list(range(len(setlist)))
rank = [1] * len(setlist)
for sets in words.values():
it = iter(sets)
merge_to = next(it)
for x in it:
merge(parent, rank, merge_to, x)
# Construct result by union the sets within a component
result = defaultdict(set)
for merge_from, merge_to in enumerate(parent):
result[merge_to] |= setlist[merge_from]
return list(result.values())
setlist = [
{'this', 'is'},
{'is', 'a'},
{'test'},
{'foo'},
{'foobar', 'foo'},
{'foobar', 'bar'},
{'alone'}
]
print(merge_union(setlist))
Output:
[{'this', 'is', 'a'}, {'test'}, {'bar', 'foobar', 'foo'}, {'alone'}]

Cycle through parallel lists deleting matches, until no more matches exist

I have 3 parallel lists representing a 3-tuple (date, description, amount), and 3 new lists that I need to merge without creating duplicate entries. Yes, the lists have overlapping entries, however these duplicate entries are not grouped together (instead of all of the duplicates being 0 through x and all of the new entries being x through the end).
The problem I'm having is iterating the correct number of times to ensure all of the duplicates are caught. Instead, my code moves on with duplicates remaining.
for x in dates:
MoveNext = 'false'
while MoveNext == 'false':
Reiterate = 'false'
for a, b in enumerate(descriptions):
if Reiterate == 'true':
break
if b in edescriptions:
eindex = [c for c, d in enumerate(edescriptions) if d == b]
for e, f in enumerate(eindex):
if Reiterate == 'true':
break
if edates[f] == dates[a]:
if eamounts[f] == amounts[a]:
del dates[a]
del edates[f]
del descriptions[a]
del edescriptions[f]
del amounts[a]
del eamounts[f]
Reiterate = 'true'
break
else:
MoveNext = 'true'
else:
MoveNext = 'true'
else:
MoveNext = 'true'
I don't know if it's a coincidence, but I'm currently getting exactly one half of the new items deleted and the other half remain. In reality, there should be far less than that remaining. That makes me think the for x in dates: is not iterating the correct number of times.
I suggest a different approach: Instead of trying to remove items from a list (or worse, several parallel lists), run through the input and yield only the data that passes your test --- in this case, data you haven't seen before. This is much easier with a single stream of input.
Your lists of data are crying out to be made into objects, since each piece (like the date) is meaningless without the other two... at least for your current purpose. Below, I start by combining each triplet into an instance of Record, a collections.namedtuple. They're great for this kind of use-once-and-throw-away work.
In the program below, build_records creates Record objects from your three input lists. dedup_records merges multiple streams of Record objects, using unique to filter out the duplicates. Keeping each function small (most of the main function is test data) makes each step easy to test.
#!/usr/bin/env python3
import collections
import itertools
Record = collections.namedtuple('Record', ['date', 'description', 'amount'])
def unique(records):
'''
Yields only the unique Records in the given iterable of Records.
'''
seen = set()
for record in records:
if record not in seen:
seen.add(record)
yield record
return
def dedup_records(*record_iterables):
'''
Yields unique Records from multiple iterables of Records, preserving the
order of first appearance.
'''
all_records = itertools.chain(*record_iterables)
yield from unique(all_records)
return
def build_records(dates, descriptions, amounts):
'''
Yields Record objects built from each date-description-amount triplet.
'''
for args in zip(dates, descriptions, amounts):
yield Record(*args)
return
def main():
# Sample data
dates_old = [
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
]
dates_new = [
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
'2003-01-01',
'2002-01-01',
'2001-01-01',
'2000-01-01',
]
descriptions_old = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
descriptions_new = ['b', 'b', 'c', 'a', 'a', 'c', 'd', 'd']
amounts_old = [0, 1, 0, 1, 0, 1, 0, 1]
amounts_new = [0, 0, 0, 0, 1, 1, 1, 1]
old = [dates_old, descriptions_old, amounts_old]
new = [dates_new, descriptions_new, amounts_new]
for record in dedup_records(build_records(*old), build_records(*new)):
print(record)
return
if '__main__' == __name__:
main()
This reduces the 16 input Records to 11:
Record(date='2000-01-01', description='a', amount=0)
Record(date='2001-01-01', description='b', amount=1)
Record(date='2002-01-01', description='c', amount=0)
Record(date='2003-01-01', description='d', amount=1)
Record(date='2000-01-01', description='b', amount=0)
Record(date='2001-01-01', description='b', amount=0)
Record(date='2003-01-01', description='a', amount=0)
Record(date='2003-01-01', description='a', amount=1)
Record(date='2002-01-01', description='c', amount=1)
Record(date='2001-01-01', description='d', amount=1)
Record(date='2000-01-01', description='d', amount=1)
Note that the yield from ... syntax requires Python 3.3 or greater.

dynamically nesting a python dictionary

I want to create a function that will create dynamic levels of nesting in a python dictionary.
e.g. if I call my function nesting, I want the outputs like the following:
nesting(1) : dict = {key1:<value>}
nesting(2) : dict = {key1:{key2:<value>}}
nesting(3) : dict = {key1:{key2:{key3:<value>}}}
and so on. I have all the keys and values before calling this function, but not before I start executing the code.
I have the keys stored in a variable 'm' where m is obtained from:
m=re.match(pattern,string)
the pattern is constructed dynamically for this case.
You can iterate over the keys like this:
def nesting(level):
ret = 'value'
for l in range(level, 0, -1):
ret = {'key%d' % l: ret}
return ret
Replace the range(...) fragment with the code which yields the keys in the desired order. So, if we assume that the keys are the captured groups, you should change the function as follows:
def nesting(match): # `match' is a match object like your `m' variable
ret = 'value'
for key in match.groups():
ret = {key: ret}
return ret
Or use reversed(match.groups()) if you want to get the keys in the opposite order.
def nesting(level, l=None):
# assuming `m` is accessible in the function
if l is None:
l = level
if level == 1:
return {m[l-level]: 'some_value'}
return {m[l-level]: nesting(level-1, l)
For reasonable levels, this won't exceed the recursion depth. This is also assuming that the value is always the same and that m is of the form:
['key1', 'key2', ...]
An iterative form of this function can be written as such:
def nesting(level):
# also assuming `m` is accessible within the function
d = 'some_value'
l = level
while level > 0:
d = {m[l-level]: d}
level -= 1
return d
Or:
def nesting(level):
# also assuming `m` is accessible within the function
d = 'some_value'
for l in range(level, 0, -1): # or xrange in Python 2
d = {m[l-level]: d}
return d

searching and adding a python list

I have a TList which is a list of lists. I would like to add new items to the list if they are not present before. For instance if item I is not present, then add to Tlist otherwise skip.Is there a more pythonic way of doing it ? Note : At first TList may be empty and elements are added in this code. After adding Z for example, TList = [ [A,B,C],[D,F,G],[H,I,J],[Z,aa,bb]]. The other elements are based on calculations on Z.
item = 'C' # for example this item will given by user
TList = [ [A,B,C],[D,F,G],[H,I,J]]
if not TList:
## do something
# check if files not previously present in our TList and then add to our TList
elif item not in zip(*TList)[0]:
## do something
Since it would appear that the first entry in each sublist is a key of some sort, and the remaining entries are somehow derived from that key, a dictionary might be a more suitable data structure:
vals = {'A': ['B','C'], 'D':['F','G'], 'H':['I','J']}
if 'Z' in vals:
print 'found Z'
else:
vals['Z'] = ['aa','bb']
#aix made a good suggestion to use a dict as your data structure; It seems to fit your use case well.
Consider wrapping up the value checking (i.e. 'Does it exist?') and the calculation of the derived values ('aa' and 'bb' in your example?).
class TList(object):
def __init__(self):
self.data = {}
def __iter__(self):
return iter(self.data)
def set(self, key):
if key not in self:
self.data[key] = self.do_something(key)
def get(self, key):
return self.data[key]
def do_something(self, key):
print('Calculating values')
return ['aa', 'bb']
def as_old_list(self):
return [[k, v[0], v[1]] for k, v in self.data.iteritems()]
t = TList()
## Add some values. If new, `do_something()` will be called
t.set('aval')
t.set('bval')
t.set('aval') ## Note, do_something() is not called
## Get a value
t.get('aval')
## 'in ' tests work
'aval' in t
## Give you back your old data structure
t.as_old_list()
if you need to keep the same data structure, something like this should work:
# create a set of already seen items
seen = set(zip(*TList)[:1])
# now start adding new items
if item not in seen:
seen.add(item)
# add new sublist to TList
Here is a method using sets and set.union:
a = set(1,2,3)
b = set(4,5,6)
c = set()
master = [a,b,c]
if 2 in set.union(*master):
#Found it, do something
else:
#Not in set, do something else
If the reason for testing for membership is simply to avoid adding an entry twice, the set structure uses a.add(12) to add something to a set, but only add it once, thus eliminating the need to test. Thus the following:
>>> a=set()
>>> a.add(1)
>>> a
set([1])
>>> a.add(1)
>>> a
set([1])
If you need the set elsewhere as a list you simply say "list(a)" to get "a" as a list, or "tuple(a)" to get it as a tuple.

Categories