Depth of a generic tree - Python - python

I know how to recursively calculate the depth of a binary tree, however I'm not sure I know the best way to calculate the depth of a generic tree where any node n could have k children. I think I've implemented a correct solution below, but I'm wondering if it can be optimized in any way?
def depth_binary(node):
if node == None:
return 0
return max(depth(node.left),depth(node.right)) + 1
def depth_tree(node):
if node == None:
return 0
max_val = 0
for n in node.adjacent:
d = depth_tree(n)
if d > max_val:
max_val = d
return max_val + 1

Related

Height of BST +1 more than expected

I made a function which determines the height of BST, but when the height of the tree is e.g. 2, the result that appears for me is 3, etc. I don't know what I should change in my code. If you need whole code to be able to answer me, tell me, so I'll copy it.
def maxDepth(self, node):
if node is None:
return 0
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Instead of return 0 just do return -1 and you'll get desired height smaller by 1. Corrected code is below:
def maxDepth(self, node):
if node is None:
return -1
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Also you can use built-in max() function to make your code shorter:
def maxDepth(self, node):
if node is None:
return -1
return max(self.maxDepth(node.left), self.maxDepth(node.right)) + 1
Note: OP is correct, height should be edge-based, i.e. tree with one node 5 should have height of 0. And empty tree (None-tree) has height -1. There are two proofs of this:
One proof in Wikipedia Tree Article says that height is edge based and Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1.
And another proof in famous book Cormen T.H. - Introduction to Algorithms:

How to count the number of unique numbers in sorted array using Binary Search?

I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))

Divide and Conquer. Find the majority of element in array

I am working on a python algorithm to find the most frequent element in the list.
def GetFrequency(a, element):
return sum([1 for x in a if x == element])
def GetMajorityElement(a):
n = len(a)
if n == 1:
return a[0]
k = n // 2
elemlsub = GetMajorityElement(a[:k])
elemrsub = GetMajorityElement(a[k:])
if elemlsub == elemrsub:
return elemlsub
lcount = GetFrequency(a, elemlsub)
rcount = GetFrequency(a, elemrsub)
if lcount > k:
return elemlsub
elif rcount > k:
return elemrsub
else:
return None
I tried some test cases. Some of them are passed, but some of them fails.
For example, [1,2,1,3,4] this should return 1, buit I get None.
The implementation follows the pseudocode here:
http://users.eecs.northwestern.edu/~dda902/336/hw4-sol.pdf
The pseudocode finds the majority item and needs to be at least half. I only want to find the majority item.
Can I get some help?
Thanks!
I wrote an iterative version instead of the recursive one you're using in case you wanted something similar.
def GetFrequency(array):
majority = int(len(array)/2)
result_dict = {}
while array:
array_item = array.pop()
if result_dict.get(array_item):
result_dict[array_item] += 1
else:
result_dict[array_item] = 1
if result_dict[array_item] > majority:
return array_item
return max(result_dict, key=result_dict.get)
This will iterate through the array and return the value as soon as one hits more than 50% of the total (being a majority). Otherwise it goes through the entire array and returns the value with the greatest frequency.
def majority_element(a):
return max([(a.count(elem), elem) for elem in set(a)])[1]
EDIT
If there is a tie, the biggest value is returned. E.g: a = [1,1,2,2] returns 2. Might not be what you want but that could be changed.
EDIT 2
The pseudocode you gave divided into arrays 1 to k included, k + 1 to n. Your code does 1 to k - 1, k to end, not sure if it changes much though ? If you want to respect the algorithm you gave, you should do:
elemlsub = GetMajorityElement(a[:k+1]) # this slice is indices 0 to k
elemrsub = GetMajorityElement(a[k+1:]) # this one is k + 1 to n.
Also still according to your provided pseudocode, lcount and rcount should be compared to k + 1, not k:
if lcount > k + 1:
return elemlsub
elif rcount > k + 1:
return elemrsub
else:
return None
EDIT 3
Some people in the comments highligted that provided pseudocode solves not for the most frequent, but for the item which is present more that 50% of occurences. So indeed your output for your example is correct. There is a good chance that your code already works as is.
EDIT 4
If you want to return None when there is a tie, I suggest this:
def majority_element(a):
n = len(a)
if n == 1:
return a[0]
if n == 0:
return None
sorted_counts = sorted([(a.count(elem), elem) for elem in set(a)], key=lambda x: x[0])
if len(sorted_counts) > 1 and sorted_counts[-1][0] == sorted_counts[-2][0]:
return None
return sorted_counts[-1][1]

How can I restrict a KDTree query to a subset of the nodes?

tl;dr
I need a way to find "Foreign Nearest Neighbors" using a KDTree or some other spatial data structure. i.e find the nearest neighbor in a subset of the tree.
I built a MST algorithm that uses a KDTree to find nearest neighbors. However eventually it needs to look beyond nearest neighbors and into "Nearest Foreign Neighbors" as to connect distant nodes. My first approach simply iteratively increases k-nn parameter until the query returns a node in the subset. I cache k as each time the function is called the breadth of its search is expanded and there is no point in searching the previous k < k_cache.
def FNNd(kdtree, A, b):
"""
kdtree -> nodes in subnet -> coord of b -> index of a
returns nearest foreign neighbor a∈A of b
"""
a = None
b = cartesian_projection(b)
k = k_cache[str(b)] if str(b) in k_cache else 2
while a not in A:
#scipy kdtree where query -> [dist], [idx]
_, nn = kdtree.query(b, k=k)
a = nn[-1][k-1]
k += 1
k_cache[str(b)] = k-1
#return NN a ∈ A of b
return a
However this is quite 'hacky' and inefficient, so I was thinking I could implement a KDTree myself that stops traversing when doing so would result in subtrees that doesn't include the restricted subset. Then the nearest neighbor in the subset would have to be that left or right branch. After many attempts I can't seem to get this to actually work. Is there a flaw in my logic? A better way to do this? A better Data Structure?
Heres my KDTree
class KDTree(object):
def __init__(self, data, depth=0, make_idx=True):
self.n, self.k = data.shape
if make_idx:
# index the data
data = np.column_stack((data, np.arange(self.n)))
else:
# subtract the indexed dimension in later calls
self.k -= 1
self.build(data, depth)
def build(self, data, depth):
if data.size > 0:
# get the axis to pivot on
self.axis = depth % self.k
# sort the data
s_data = data[np.argsort(data[:, self.axis])]
# find the pivot point
point = s_data[len(s_data) // 2]
# point coord
self.point = point[:-1]
# point index
self.idx = int(point[-1])
# all nodes below this node
self.children = s_data[np.all(s_data[:, :-1] != self.point, axis=1)]
# branches
self.left = KDTree(s_data[: len(s_data) // 2], depth+1, False)
self.right = KDTree(s_data[len(s_data) // 2 + 1: ], depth+1, False)
else:
# empty node
self.axis=0
self.point = self.idx = self.left = self.right = None
self.children = np.array([])
def query(self, point, best=None):
if self.point is None:
return best
if best is None:
best = (self.idx, self.point)
# check if current node is closer than best
if distance(self.point, point) < distance(best[1], point):
best = (self.idx, self.point)
# continue traversing the tree
best = self.near_tree(point).query(point, best)
# traverse the away branch if the orthogonal distance is less than best
if self.orthongonal_dist(point) < distance(best[1], point):
best = self.away_tree(point).query(point, best)
return best
def orthongonal_dist(self, point):
orth_point = np.copy(point)
orth_point[self.axis] = self.point[self.axis]
return distance(point, self.point)
def near_tree(self, point):
if point[self.axis] < self.point[self.axis]:
return self.left
return self.right
def away_tree(self, point):
if self.near_tree(point) == self.left:
return self.right
return self.left
[EDIT] Updated attempt, however this doesn't guarantee a return
def query_subset(self, point, subset, best=None):
# if point in subset, update best
if self.idx in subset:
# if closer than current best, or best is none update
if best is None or distance(self.point, point) < distance(best[1], point):
best = (self.idx, self.point)
# Dead end backtrack up the tree
if self.point is None:
return best
near = self.near_tree(point)
far = self.away_tree(point)
# what nodes are in the near branch
if near.children.size > 1:
near_set = set(np.append(near.children[:, -1], near.idx))
else: near_set = {near.idx}
# check the near branch, if its nodes intersect with the queried subset
# otherwise move to the away branch
if any(x in near_set for x in subset):
best = near.query_subset(point, subset, best)
else:
best = far.query_subset(point, subset, best)
# validate best, by ensuring closer point doesn't exist just beyond partition
if best is not None:
if self.orthongonal_dist(point) < distance(best[1], point):
best = far.query_subset(point, subset, best)
return best

Depth of a tree using DFS

I'm trying to write code that returns the depth of the deepest leaf in a tree with arbitrary number of children per nodes, in Python, using DFS rather than BFS. It seeems I'm close, but the following code still has some bug that I can't figure out (i.e. the returned depth is not correct). Any help?
A test tree would be simply: [[1,2,3],[4,5],[6],[7],[8],[],[],[],[]]
def max_depth_dfs(tree): # DOESN'T WORK
max_depth, curr_depth, Q = 0,0, [0]
visited = set()
while Q != []:
n = Q[0]
more = [v for v in tree[n] if v not in visited]
if not more:
visited.add(n)
curr_depth -= 1
Q = Q[1:]
else:
curr_depth += 1
max_depth = max(max_depth, curr_depth)
Q = more + Q
return max_depth
I found the bug!
if not more:
visited.add(n)
curr_depth -= 1
Q = Q[1:]
When you visit the node 4, curr_depth is equal to 2. Node 4 has no children, so you decrease the curr_depth and curr_depth is equal to 1 now. However, the next node you will visit is node 5 and the depth of node 5 is 2 instead of 1. Therefore, curr_depth doesn't record the correct depth of the node in the tree.
The following solution may be helpful.
def max_depth_dfs(tree):
max_depth, curr_depth, Q = 0, 0, [0]
visited = set()
while Q != []:
n = Q[0]
max_depth = max(max_depth, curr_depth)
if n in visited:
curr_depth -= 1
Q = Q[1:]
continue
#print n, curr_depth #show the node and its depth in the tree
visited.add(n)
more = [v for v in tree[n]]
if not more:
Q = Q[1:]
else:
curr_depth += 1
Q = more + Q
return max_depth
I used try .. catch to distinguish branches from leafs. update No more exceptions :)
from collections import Iterable
tree = [[1,2,3],[4,5, [1, 6]],[6],[7],[8],[],[],[],[]]
def max_depth(tree, level=0):
if isinstance(tree, Iterable):
return max([ max_depth(item, level+1) for item in tree])
else: # leaf
return level
print max_depth(tree)
Here is the non-recurison version:
from collections import Iterable
def max_depth_no_recur(tree):
max_depth, node = 0, iter(tree)
stack = [node]
while stack:
try:
n = node.next()
except StopIteration:
if len(stack) > max_depth:
max_depth = len(stack)
node = stack.pop()
continue
if isinstance(n, Iterable):
stack.append(node)
node = iter(n)
return max_depth
After taking into account all the good feedback I got from Alex and Adonis and refining the code, I currently have the current version:
def max_depth_dfs(tree): # correct
max_depth, curr_depth, Q = 0, 0, [0]
visited = set()
while Q != []:
n = Q[0]
if n in visited:
Q = Q[1:]
curr_depth -= 1
visited.remove(n) # won't go back, save memory
print 'backtrack from', n
continue
# proper place to print depth in sync with node id
print 'visiting', n, 'children=', tree[n], 'curr_depth=', curr_depth, 'Q=', Q,
print visited # only current path, instead of visited part of tree
if tree[n]:
visited.add(n) # if leaf, won't ever try to revisit
Q = tree[n] + Q
curr_depth += 1
max_depth = max(max_depth, curr_depth) # no need to check if depth decreases
else:
Q = Q[1:] # leaf: won't revisit, will go to peer, if any, so don't change depth
print 'no children for', n
return max_depth

Categories