How to find the kth smallest node in BST? (revisited) - python

I have asked a similar question yesterday but I reached a different solution from the one posted in the original question sO I am reposting with new code. I am not keeping track of number of right and left children of each node. The code works fine for some cases, but for the case of of finding 6th smalest element, it fails. The problem is that I somehow need to carry the number of children down the tree. For example, for node 5, I need to cary over rank of node 4 and I am not able to do that.
This is not a homework, I am trying to prepare for interview and this is one of the classical questions and I can't solve it.
class Node:
"""docstring for Node"""
def __init__(self, data):
self.data = data
self.left = None
self.right = None
self.numLeftChildren = 0
self.numRightChildren = 0
class BSTree:
def __init__(self):
# initializes the root member
self.root = None
def addNode(self, data):
# creates a new node and returns it
return Node(data)
def insert(self, root, data):
# inserts a new data
if root == None:
# it there isn't any data
# adds it and returns
return self.addNode(data)
else:
# enters into the tree
if data <= root.data:
root.numLeftChildren += 1
# if the data is less than the stored one
# goes into the left-sub-tree
root.left = self.insert(root.left, data)
else:
# processes the right-sub-tree
root.numRightChildren += 1
root.right = self.insert(root.right, data)
return root
def getRankOfNumber(self, root, x, rank):
if root == None:
return 0
if rank == x:
return root.data
else:
if x > rank:
return self.getRankOfNumber(root.right, x, rank+1+root.right.numLeftChildren)
if x <= rank:
return self.getRankOfNumber(root.left, x, root.left.numLeftChildren+1)
# main
btree = BSTree()
root = btree.addNode(13)
btree.insert(root, 3)
btree.insert(root, 14)
btree.insert(root, 1)
btree.insert(root, 4)
btree.insert(root, 18)
btree.insert(root, 2)
btree.insert(root, 12)
btree.insert(root, 10)
btree.insert(root, 5)
btree.insert(root, 11)
btree.insert(root, 8)
btree.insert(root, 7)
btree.insert(root, 9)
btree.insert(root, 6)
print btree.getRankOfNumber(root, 8, rank=root.numLeftChildren+1)

You have the rank of a node. You need to find the rank of its left or right child. Well, how many nodes are between the node and its child?
a
/ \
/ \
b c
/ \ / \
W X Y Z
Here's an example BST. Lowercase letters are nodes; uppercase are subtrees. The number of nodes between a and b is the number of nodes in X. The number of nodes between a and c is the number of nodes in Y. Thus, you can compute the rank of b or c from the rank of a and the size of X or Y.
rank(b) == rank(a) - size(X) - 1
rank(c) == rank(a) + size(Y) + 1
You had the c formula, but the wrong b formula.

Related

Finding a way throug a matrix using Python

Im trying to solve a Problem for my University Homework, The task is to find the cheapest path trough a NxN Matrix where every Point in the Matrix stores a random Integer between 0 and 9. The Start is at 0,0 and the end at N,N . The Output should consist of the cheapest Path as a List of Tupels and the Cost of the Path(adding up the values of each Step).
I have tried using a Tree where 0,0 is the root and the children are its neighbours in the matrix, and the children of the children are their neighbours and so on. Then i wanted to add up all the nodes that end with N,N as the last child, but i didnt get the tree working in the first place. We havent had Trees in our lectures yet, so im open to any other Solution for this Problem. Thank you :)
import random
import math
def Matrix_gen(n):
# Generate a n*n matrix with random values
matrix = []
for i in range(n):
matrix.append([])
for j in range(n):
matrix[i].append(random.randint(0, 9))
return matrix
MATRIX = Matrix_gen(5)
def get_neighbour(i, j, matrix,):
neighbours = []
n = len(matrix) - 1
for x in range(len(matrix)-1):
for y in range(len(matrix)-1):
if x != n:
if matrix[x+1][y] == matrix[i][j]:
neighbours.append((x + 1, y))
if x != 0:
if matrix[x-1][y] == matrix[i][j]:
neighbours.append((x - 1, y))
if y != n:
if matrix[x][y + 1] == matrix[i][j]:
neighbours.append((x, y + 1))
if y != 0:
if matrix[x][y - 1] == matrix[i][j]:
neighbours.append((x, y - 1))
if matrix[i][j] == matrix[n][n]:
return []
return neighbours
#creat a class that stores a Tree
class Tree:
def __init__(self, value, Children = []):
self.value = value
self.Children = Children
#the root of the tree is the first element of the matrix
def root(self):
#add (0,0) as the value of the root
self.value = (0,0)
return self.value
#add the neighbours of the root as the children of the root
def add_children(self, matrix):
#add the neighbours of the lowest node as the children of the lowest node until
#a node has no neighbours
while get_neighbour(self.value[0], self.value[1], matrix) != []:
self.Children.append(get_neighbour(self.value[0], self.value[1], matrix))
self.value = self.Children[-1]
return self.Children
#print the tree
def print_tree(self):
print(self.value)
for i in self.Children:
print(i)
return
#Create the tree in the Class Tree
Tree = Tree((0,0))
Tree.add_children(MATRIX)
Tree.print_tree()
Please read the open letter to students befor copy and paste any of this. Seek help with your tutor if things are unclear.
Disclaimer: Because this is homework, this is (intentionally) not a complete answer. The answer works under the assumption that we are NOT allowed to go diagonal. Allowing diagonal movements adds additional complexity in the path generation and is left for exercising (the needed flexibility is there).
The code will take longer and longer the bigger N is, because of the definition of the problem. See combination of pathes on a grid. See benchmark below...
I tried to keep the code readable and understandable, there are more compressed and probably also better optimized ways to do this (happy to take comments, given that readability is maintained).
Let's start with a set of functions.
from itertools import permutations
import numpy as np
DOWN = 'D'
RIGHT = 'R'
def random_int_matrix(size: int) -> np.array:
"""Generates a size x size matrix with random integers from 0 to 9"""
mat = np.random.random((size, size)) * 10
return mat.astype(int)
def find_all_paths(size: int):
"""Creates all possible pathes going down and right"""
return [gen_path(perm) for perm in permutations([DOWN] * (size-1) + [RIGHT] * (size-1))]
def gen_path(permutation: str) -> list:
track = [(0, 0)]
for entry in permutation:
if entry == DOWN:
track.append((track[-1][0] + 1, track[-1][1]))
else:
track.append((track[-1][0], track[-1][1] + 1))
return track
def sum_track_values(mat: np.array, track: list) -> list:
"""Computes the value sum for the given path"""
return sum([mat[e[0], e[1]] for e in track])
OK, now we can run the programm
MATRIX_SIZE = 4
matrix = random_int_matrix(MATRIX_SIZE)
print('Randomly generated matrix:\n', matrix)
paths = find_all_paths(MATRIX_SIZE)
costs = np.array([sum_track_values(matrix, p) for p in paths])
min_idx = costs.argmin()
print('Best path:', paths[min_idx])
print('Costs:', costs[min_idx])
In my case the result was
Randomly generated matrix:
[[3 8 6 6]
[2 4 1 4]
[7 4 0 4]
[9 6 8 4]]
Best path: [(0, 0), (1, 0), (1, 1), (1, 2), (2, 2), (2, 3), (3, 3)]
Costs: 18
Small benchmark:
Runtime for N=1: 0.0000 sec (1 possible paths)
Runtime for N=2: 0.0000 sec (2 possible paths)
Runtime for N=3: 0.0001 sec (24 possible paths)
Runtime for N=4: 0.0016 sec (720 possible paths)
Runtime for N=5: 0.1344 sec (40,320 possible paths)
Runtime for N=6: 19.9810 sec (3,628,800 possible paths)

Height of BST +1 more than expected

I made a function which determines the height of BST, but when the height of the tree is e.g. 2, the result that appears for me is 3, etc. I don't know what I should change in my code. If you need whole code to be able to answer me, tell me, so I'll copy it.
def maxDepth(self, node):
if node is None:
return 0
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Instead of return 0 just do return -1 and you'll get desired height smaller by 1. Corrected code is below:
def maxDepth(self, node):
if node is None:
return -1
else:
# Compute the depth of each subtree
lDepth = self.maxDepth(node.left)
rDepth = self.maxDepth(node.right)
# Use the larger one
if (lDepth > rDepth):
return lDepth + 1
else:
return rDepth + 1
Also you can use built-in max() function to make your code shorter:
def maxDepth(self, node):
if node is None:
return -1
return max(self.maxDepth(node.left), self.maxDepth(node.right)) + 1
Note: OP is correct, height should be edge-based, i.e. tree with one node 5 should have height of 0. And empty tree (None-tree) has height -1. There are two proofs of this:
One proof in Wikipedia Tree Article says that height is edge based and Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1.
And another proof in famous book Cormen T.H. - Introduction to Algorithms:

distance between two nodes using breadth first search algorithm using Python

How can I get just a distance(number of edges) between any two nodes of the graph using BFS algorithms?
I do not want to save the path information as a list (like the code below) to decrease the runtime of the code. (for better performance)
def check_distance(self, satrt, end, max_distance):
queue = deque([start])
while queue:
path = queue.popleft()
node = path[-1]
if node == end:
return len(path)
elif len(path) > max_distance:
return False
else:
for adjacent in self.graph.get(node, []):
queue.append(list(path) + [adjacent])
You can increase performance with two changes:
As you said, replace paths with distances. This will save memory, more so when the distances are large.
Maintain a set of already seen nodes. This will drastically cut the number of possible paths, especially when there are multiple edges per node. If you don't do this, then the algorithm will walk in circles and back-and-forth between nodes.
I would try something like this:
from collections import deque
class Foo:
def __init__(self, graph):
self.graph = graph
def check_distance(self, start, end, max_distance):
queue = deque([(start, 0)])
seen = set()
while queue:
node, distance = queue.popleft()
if node in seen or max_distance < distance:
continue
seen.add(node)
if node == end:
return distance
for adjacent in self.graph.get(node, []):
queue.append((adjacent, distance + 1))
graph = {}
graph[1] = [2, 3]
graph[2] = [4]
graph[4] = [5]
foo = Foo(graph)
assert foo.check_distance(1, 2, 10) == 1
assert foo.check_distance(1, 3, 10) == 1
assert foo.check_distance(1, 4, 10) == 2
assert foo.check_distance(1, 5, 10) == 3
assert foo.check_distance(2, 2, 10) == 0
assert foo.check_distance(2, 1, 10) == None
assert foo.check_distance(2, 4, 10) == 1

How to implement max_heapify in heap datastructure in Python

The following is a class of heap. I am trying to sort the heap but i have a problem with my max_heapify function. I have inserted the values [10, 9, 7, 6, 5, 4, 3] and my heap sort prints the given output. The given output and expected output is given below the class
class of heap
class Heap(object):
def __init__(self):
self.A = []
def insert(self, x):
self.A.append(x)
def Max(self):
"""
returns the largest value in an array
"""
return max(self.A)
def extractMax(self):
"""
returns and remove the largest value from an array
"""
x = max(self.A)
self.A.remove(x)
self.max_heapify(0)
return x;
def parent(self, i):
"""
returns the parent index
"""
i+=1
i = int(i/2)
return i
def left(self, i):
"""
returns the index of left child
"""
i = i+1
i = 2*i
return i
def right(self, i):
"""
returns the index of right child
"""
i+=1;
i = 2*i + 1
return i
def heap_size(self):
"""
returns the size of heap
"""
return len(self.A)
def max_heapify(self, i):
"""
heapify the array
"""
l = self.left(i)
r = self.right(i)
if(l < self.heap_size() and self.A[l] > self.A[i]):
largest = l
else:
largest = i
if(r < self.heap_size() and self.A[r] > self.A[largest]):
largest = r
if largest != i:
temp = self.A[i]
self.A[i] = self.A[largest]
self.A[largest] = temp
self.max_heapify(largest)
def build_max_heap(self):
n = len(self.A)
n = int(n/2)
for i in range(n, -1, -1):
self.max_heapify(i)
def heap_sort(self):
"""
sorts the heap
"""
while self.heap_size() > 0:
self.build_max_heap()
temp = self.A[0]
n = len(self.A) - 1
self.A[0] = self.A[n]
self.A[n] = temp
x = self.A.pop()
print(x)
self.max_heapify(0)
h = Heap()
h.insert(10)
h.insert(9)
h.insert(7)
h.insert(6)
h.insert(5)
h.insert(4)
h.insert(3)
h.heap_sort()
given output
10
7
6
5
4
3
9
expected output
10
9
7
6
5
4
3
It looks like you're trying to build a max-heap with the root at A[0]. If that's correct, then your left, right, and parent index calculations are not correct. You have:
def parent(self, i):
"""
returns the parent index
"""
i+=1
i = int(i/2)
return i
def left(self, i):
"""
returns the index of left child
"""
i = i+1
i = 2*i
return i
def right(self, i):
"""
returns the index of right child
"""
i+=1;
i = 2*i + 1
return i
So if i=0, the left child would be 2, and the right child would be 3. Worse, given i=3, parent will return 2. So you have the case where parent(right(i)) != i. That's never going to work.
The correct calculations are:
left = (2*i)+1
right = (2*i)+2
parent = (i-1)/2
I don't know why your extractMax is calling max(self.A). You already know that the maximum element is at A[0]. To extract the maximum item, all you need to do is:
returnValue = save value at self.A[0]
take last item in the array and place at self.A[0]
decrease length of array
maxHeapify(0)
I've used pseudo-code because I'm not particularly comfortable with Python.
The loop inside your heapSort method is seriously non-optimum. You're calling self.build_max_heap at each iteration. You don't need to do that. If extractMax is working correctly, all you have to do is:
while self.heap_size() > 0:
temp = self.extractMax()
print temp
Now, if you want to sort the array in-place, so that self.A itself is sorted, that's a bit more tricky. But it doesn't look like that's what you're trying to do.

How can I restrict a KDTree query to a subset of the nodes?

tl;dr
I need a way to find "Foreign Nearest Neighbors" using a KDTree or some other spatial data structure. i.e find the nearest neighbor in a subset of the tree.
I built a MST algorithm that uses a KDTree to find nearest neighbors. However eventually it needs to look beyond nearest neighbors and into "Nearest Foreign Neighbors" as to connect distant nodes. My first approach simply iteratively increases k-nn parameter until the query returns a node in the subset. I cache k as each time the function is called the breadth of its search is expanded and there is no point in searching the previous k < k_cache.
def FNNd(kdtree, A, b):
"""
kdtree -> nodes in subnet -> coord of b -> index of a
returns nearest foreign neighbor a∈A of b
"""
a = None
b = cartesian_projection(b)
k = k_cache[str(b)] if str(b) in k_cache else 2
while a not in A:
#scipy kdtree where query -> [dist], [idx]
_, nn = kdtree.query(b, k=k)
a = nn[-1][k-1]
k += 1
k_cache[str(b)] = k-1
#return NN a ∈ A of b
return a
However this is quite 'hacky' and inefficient, so I was thinking I could implement a KDTree myself that stops traversing when doing so would result in subtrees that doesn't include the restricted subset. Then the nearest neighbor in the subset would have to be that left or right branch. After many attempts I can't seem to get this to actually work. Is there a flaw in my logic? A better way to do this? A better Data Structure?
Heres my KDTree
class KDTree(object):
def __init__(self, data, depth=0, make_idx=True):
self.n, self.k = data.shape
if make_idx:
# index the data
data = np.column_stack((data, np.arange(self.n)))
else:
# subtract the indexed dimension in later calls
self.k -= 1
self.build(data, depth)
def build(self, data, depth):
if data.size > 0:
# get the axis to pivot on
self.axis = depth % self.k
# sort the data
s_data = data[np.argsort(data[:, self.axis])]
# find the pivot point
point = s_data[len(s_data) // 2]
# point coord
self.point = point[:-1]
# point index
self.idx = int(point[-1])
# all nodes below this node
self.children = s_data[np.all(s_data[:, :-1] != self.point, axis=1)]
# branches
self.left = KDTree(s_data[: len(s_data) // 2], depth+1, False)
self.right = KDTree(s_data[len(s_data) // 2 + 1: ], depth+1, False)
else:
# empty node
self.axis=0
self.point = self.idx = self.left = self.right = None
self.children = np.array([])
def query(self, point, best=None):
if self.point is None:
return best
if best is None:
best = (self.idx, self.point)
# check if current node is closer than best
if distance(self.point, point) < distance(best[1], point):
best = (self.idx, self.point)
# continue traversing the tree
best = self.near_tree(point).query(point, best)
# traverse the away branch if the orthogonal distance is less than best
if self.orthongonal_dist(point) < distance(best[1], point):
best = self.away_tree(point).query(point, best)
return best
def orthongonal_dist(self, point):
orth_point = np.copy(point)
orth_point[self.axis] = self.point[self.axis]
return distance(point, self.point)
def near_tree(self, point):
if point[self.axis] < self.point[self.axis]:
return self.left
return self.right
def away_tree(self, point):
if self.near_tree(point) == self.left:
return self.right
return self.left
[EDIT] Updated attempt, however this doesn't guarantee a return
def query_subset(self, point, subset, best=None):
# if point in subset, update best
if self.idx in subset:
# if closer than current best, or best is none update
if best is None or distance(self.point, point) < distance(best[1], point):
best = (self.idx, self.point)
# Dead end backtrack up the tree
if self.point is None:
return best
near = self.near_tree(point)
far = self.away_tree(point)
# what nodes are in the near branch
if near.children.size > 1:
near_set = set(np.append(near.children[:, -1], near.idx))
else: near_set = {near.idx}
# check the near branch, if its nodes intersect with the queried subset
# otherwise move to the away branch
if any(x in near_set for x in subset):
best = near.query_subset(point, subset, best)
else:
best = far.query_subset(point, subset, best)
# validate best, by ensuring closer point doesn't exist just beyond partition
if best is not None:
if self.orthongonal_dist(point) < distance(best[1], point):
best = far.query_subset(point, subset, best)
return best

Categories