Calculate height of an arbitrary (non-binary) tree - python

I'm currently taking on online data structures course and this is one of the homework assignments; please guide me towards the answer rather than giving me the answer.
The prompt is as follows:
Task. You are given a description of a rooted tree. Your task is to compute and output its height. Recall that the height of a (rooted) tree is the maximum depth of a node, or the maximum distance from a leaf to the root. You are given an arbitrary tree, not necessarily a binary tree.
Input Format. The first line contains the number of nodes n. The second line contains integer numbers from −1 to n−1 parents of nodes. If the i-th one of them (0 ≤ i ≤ n−1) is −1, node i is the root, otherwise it’s 0-based index of the parent of i-th node. It is guaranteed that there is exactly one root. It is guaranteed that the input represents a tree.
Constraints. 1 ≤ n ≤ 105.
My current solution works, but is very slow when n > 102. Here is my code:
# python3
import sys
import threading
# In Python, the default limit on recursion depth is rather low,
# so raise it here for this problem. Note that to take advantage
# of bigger stack, we have to launch the computation in a new thread.
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
threading.Thread(target=main).start()
# returns all indices of item in seq
def listOfDupes(seq, item):
start = -1
locs = []
while True:
try:
loc = seq.index(item, start+1)
except:
break
else:
locs.append(loc)
start = loc
return locs
def compute_height(node, parents):
if node not in parents:
return 1
else:
return 1 + max(compute_height(i, parents) for i in listOfDupes(parents, node))
def main():
n = int(input())
parents = list(map(int, input().split()))
print(compute_height(parents.index(-1), parents))
Example input:
>>> 5
>>> 4 -1 4 1 1
This will yield a solution of 3, because the root is 1, 3 and 4 branch off of 1, then 0 and 2 branch off of 4 which gives this tree a height of 3.
How can I improve this code to get it under the time benchmark of 3 seconds? Also, would this have been easier in another language?

Python will be fine as long as you get the algorithm right. Since you're only looking for guidance, consider:
1) We know the depth of a node iif the depth of its parent is known; and
2) We're not interested in the tree's structure, so we can throw irrelevant information away.
The root node pointer has the value -1. Suppose that we replaced its children's pointers to the root node with the value -2, their children's pointers with -3, and so forth. The greatest absolute value of these is the height of the tree.
If we traverse the tree from an arbitrary node N(0) we can stop as soon as we encounter a negative value at node N(k), at which point we can replace each node with the value of its parent, less one. I.e, N(k-1) = N(k) -1, N(k-2)=N(k-1) - 1... N(0) = N(1) -1. As more and more pointers are replaced by their depth, each traversal is more likely to terminate by encountering a node whose depth is already known. In fact, this algorithm takes basically linear time.
So: load your data into an array, start with the first element and traverse the pointers until you encounter a negative value. Build another array of the nodes traversed as you go. When you encounter a negative value, use the second array to replace the original values in the first array with their depth. Do the same with the second element and so forth. Keep track of the greatest depth you've encountered: that's your answer.

The structure of this question looks like it would be better solved bottom up rather than top down. Your top-down approach spends time seeking, which is unnecessary, e.g.:
def height(tree):
for n in tree:
l = 1
while n != -1:
l += 1
n = tree[n]
yield l
In []:
tree = '4 -1 4 1 1'
max(height(list(map(int, tree.split()))))
Out[]:
3
Or if you don't like a generator:
def height(tree):
d = [1]*len(tree)
for i, n in enumerate(tree):
while n != -1:
d[i] += 1
n = tree[n]
return max(d)
In []:
tree = '4 -1 4 1 1'
height(list(map(int, tree.split())))
Out[]:
3
The above is brute force as it doesn't take advantage of reusing parts of the tree you've already visited, it shouldn't be too hard to add that.

Your algorithm spends a lot of time searching the input for the locations of numbers. If you just iterate over the input once, you can record the locations of each number as you come across them, so you don't have to keep searching over and over later. Consider what data structure would be effective for recording this information.

Related

Strictly Increasing Path in Grid with Python

At each step we can go the one of the left,right,up or down cells only if the that cell is strictly greater thab our current cell. (We cannot move diagonally). We want to find all the paths that we can go from the top-left cell to the bottom-right cell.
[[1,4,3],
[5,6,7]]
In our example, these paths are 1->4->6->7 and 1->5->6->7.
How can i solve it in reversible use?
First, note that the number of such paths is exponential (in the size of the graph).
For example, look at the graph:
1 2 3 ... n
2 3 4 ... n+1
3 4 5 ... n+2
...
n (n+1) (n+2) ... 2n
In here, you have exactly n rights and n down - which gives you exponential number of such choices - what makes "efficient" solution irrelevant (as you stated, you are looking for all paths).
One approach to find all solutions, is using a variation of DFS, where you can go to a neighbor cell if and only if it's strictly higher.
It should look something like:
def Neighbors(grid, current_location):
"""returns a list of up to 4 tuples neighbors to current location"""
pass # Implement it, should be straight forward
# Note: Using a list as default value is problematic, don't do it in actual code.
def PrintPaths(grid, current_path = [(0,0)], current_location = (0, 0)):
"""Will print all paths with strictly increasing values.
"""
# Stop clause: if got to the end - print current path.
if current_location[0] == len(grid) and current_location[1] == len(grid[current_location[0]):
print(current_path)
return
# Iterate over all valid neighbors:
for next in Neighbors(grid, current_location):
if grid[current_location[0]][current_location[1]] < grid[next[0]][next[1]]:
current_path = current_path + next
PrintPaths(grid, current_path, next)
# When getting back here, all that goes through prefix + next are printed.
# Get next out of the current path
current_path.pop()

Recurrence relation and time complexity of finding next larger in Generic tree

Question: Given a generic tree and an integer n. Find and return the node with next larger element in the tree i.e. find a node with value just greater than n.
Although i was able to solve it is O(n) by removing the later for loop and doing comparisons while calling recursion. I am bit curious about time complexity of following version of code.
I came up with recurrence relation as T(n) = T(n-1) + (n-1) = O(n^2). Where T(n-1) is for time taken by children and + (n-1) for finding the next larger (second for loop). Have i done it right? or am i missing something?
def nextLargestHelper(root, n):
"""
root => reference to root node
n => integer
Returns node and value of node which is just larger not first larger than n.
"""
# Special case
if root is None:
return None, None
# list to store all values > n
largers = list()
# Induction step
if root.data > n:
largers.append([root, root.data])
# Induction step and Base case; if no children do not call recursion
for child in root.children:
# Induction hypothesis; my function returns me node and value just larger than 'n'
node, value = nextLargestHelper(child, n)
# If larger found in subtree
if node:
largers.append([node, value])
# Initialize node to none, and value as +infinity
node = None
value = sys.maxsize
# travers through all larger values and return the smallest value greater than n
for item in largers: # structure if item is [Node, value]
# this is why value is initialized to +infinity; so as it is true for first time
if item[1] < value:
node = item[0]
value = item[1]
return node, value
At first: please use different chacters for O-Notation and inputvalues.
You "touch" every node exactly once, so the result should be O(n). A bit special is your algorithm finding the minimum afterwards. You could include this in your go-though-all-children loop for an easier recurrence estimation. As it is, you have do a recurrence estimation for the minimum of the list as well.
Your recurrence equation should look more like T(n) = a*T(n/a) + c = O(n) since in each step you have a children forming a subtrees with size (n-1)/a. In each step you have next to some constant factors also the computation of the minimum of a list with at most a elements. You could write it as a*T(n/a) + a*c1 +c2 which is the same as a*T(n/a) + c. The actual formula would look more like this: T(n) = a*T((n-1)/a) + c but the n-1 makes it harder to apply the master theorem.

Count number of leaves in a tree - Failed edge case?

In an online assessment I was asked to count the number of leaves in a tree. The tree is given in parent-array representation, meaning the tree has n nodes with labels 0, 1, 2, .., n-1, and you are passed a length n array p, where p[i] returns the label of the parent of node i, except when i is the root of the tree in which case p[i] is -1.
I guess one thing to note is that the problem was as stated above, so there were no extra conditions such as e.g. it being a binary tree.
I thought this was a fairly straight forward problem, but the code that I submitted failed a "Small Tree Case" on the testing platform (which does not let you see the test cases). It passed the other tests, including a performance test on a large tree. I've thought about it for a while but I still can not see what the flaw in my algorithm or handling of some edge case is. I guess one thing to note is that the problem was as stated above, so there were no extra conditions such as e.g. it being a binary tree.
def countLeaves(p):
n = len(p)
if p is None or n == 0 : return 0
if n == 1 or n == 2 : return 1
leaves = set(range(n))
for i in range(n):
if p[i] == -1: # i is root of tree with >1 node, can't be a leaf
leaves.discard(i)
else: # p[i] is parent of node i, can't be a leaf
leaves.discard(p[i])
return len(leaves)
In trying to fix the failed "Small tree case" I also tried returning None if p is None, returning None if n == 0, or both modifications together, but to no success. If anyone could point out what the error in my code may have been I would greatly appreciate it. Thank you.
I would try this:
def countLeaves(p):
n = len(p)
if p is None or n < 2 : return 0
leaves = set(range(n))
for i in range(n):
if p[i] == -1: # i is root of tree with >1 node, can't be a leaf
leaves.discard(i)
else: # p[i] is parent of node i, can't be a leaf
leaves.discard(p[i])
return len(leaves)
The only real change is that it considers trees with a single node to have no leaves.
According to Wolfram Mathworld:
A leaf of an unrooted tree is a node of vertex degree 1. Note that for a rooted or planted tree, the root vertex is generally not considered a leaf node, whereas all other nodes of degree 1 are.

Simple Genetic Algorithm meeting local optimum for "Hello World"

My target was simple, using genetic algorithm to reproduce the classical "Hello, World" string.
My code was based on this post. The code mainly contain 4 parts:
Generate the population which has serval different individual
Define the fitness and grade function which evaluate the individual good or bad based on the comparing with target.
Filter the population and leave len(pop)*retain individuals
Add some other individuals and mutate randomly
The parents's DNA will pass over to its children to comprise the whole population.
I modified the code and shows like this:
import numpy as np
import string
from operator import add
from random import random, randint
def population(GENSIZE,target):
p = []
for i in range(0,GENSIZE):
individual = [np.random.choice(list(string.printable[:-5])) for j in range(0,len(target))]
p.append(individual)
return p
def fitness(source, target):
fitval = 0
for i in range(0,len(source)-1):
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
def grade(pop, target):
'Find average fitness for a population.'
summed = reduce(add, (fitness(x, target) for x in pop))
return summed / (len(pop) * 1.0)
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in p]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
#
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
GENSIZE = 40
target = "Hello, World"
p = population(GENSIZE,target)
fitness_history = [grade(p, target),]
for i in xrange(20):
p = evolve(p, target)
fitness_history.append(grade(p, target))
# print p
for datum in fitness_history:
print datum
But it seems that the result can't fit targetwell.
I tried to change the GENESIZE and loop time(more generation).
But the result always get stuck. Sometimes, enhance the loop time can help to find a optimum solution. But when I change the loop time to an much larger number like for i in xrange(10000). The result shows the error like:
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
ValueError: chr() arg not in range(256)
Anyway, how to modify my code and get an good result.
Any advice would be appreciate.
The chr function in Python2 only accepts values in the range 0 <= i < 256.
You are passing:
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
So you need to check that the result of
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
is not going to be outside that range, and take corrective action before passing to chr if it is outside that range.
EDIT
A reasonable fix for the ValueError might be to take the amended value modulo 256 before passing to chr:
chr((ord(individual[pos_to_mutate]) + np.random.randint(-1, 1)) % 256)
There is another bug: the fitness calculation doesn't take the final element of the candidate list into account: it should be:
def fitness(source, target):
fitval = 0
for i in range(0,len(source)): # <- len(source), not len(source) -1
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
Given that source and target must be of equal length, the function can be written as:
def fitness(source, target):
return sum((ord(t) - ord(s)) ** 2 for (t, s) in zip(target, source))
The real question was, why doesn't the code provided evolve random strings until the target string is reached.
The answer, I believe, is it may, but will take a lot of iterations to do so.
Consider, in the blog post referenced in the question, each iteration generates a child which replaces the least fit member of the gene pool if the child is fitter. The selection of the child's parent is biased towards fitter parents, increasing the likelihood that the child will enter the gene pool and increase the overall "fitness" of the pool. Consequently the members of the gene pool converge on the desired result within a few thousand iterations.
In the code in the question, the probability of mutation is much lower, based on the initial conditions, that is the defaults for the evolve function.
Parents that are retained have only a 1% chance of mutating, and one third of the time the "mutation" will not result in a change (zero is a possible result of random.randint(-1, 1)).
Discard parents are replaced by individuals created by merging two retained individuals. Since only 20% of parents are retained, the population can converge on a local minimum where each new child is effectively a copy of an existing parent, and so no diversity is introduced.
So apart from fixing the two bugs, the way to converge more quickly on the target is to experiment with the initial conditions and to consider changing the code in the question to inject more diversity, for example by mutating children as in the original blog post, or by extending the range of possible mutations.

Recursing over each path in a tree

I'm stuck on a programming question involving a tree for a project.
The problem itself is only a subproblem of the larger question (but I won't post that here as its not really relevant). Anyone the problem is:
I'm trying to go over each path in the tree and calculate the associated value.
The situation is for instance like in this tree:
a
b b
Now the result i should get is the multiplications as follows:
leave1 = a * b
leave2 = a * (1-b)
leave3 = (1-a) * b
leave4 = (1-a) * (1-b)
And so the leaves on one level lower in the tree would basically be the results (note that they do not exist in reality, its just conceptual).
Now, I want to do this recursively, but there are a few problems:
The values for a and b are generated during the traversal, but the value for b for instance should only be generated 1 time. All values are either 0 or 1.
If taking the left child of a node A, you use the value A in the multiplication. the right path you use the value 1-A.
Furthermore, the tree is always perfect, i.e. complete and balanced.
Now what I have (I program in python, but its more the algorithm in general im interested in with this question):
def f(n):
if n == 1:
return [1]
generate value #(a, b or whatever one it is)
g = f(n/2)
h = scalarmultiply(value,g)
return h.append(g - h)
Note that g and h are lists.
This code was giving by one of my professors as possible help, but I don't think this does what I want. At least, it wont give me as result a list h which has the result for each path. Especially, I don't think it differentiates between b and 1-b. Am I seeing this wrong and how should I do this?
I'm not very experienced at programming, so try and explain easy if you can :-)
Try something like this:
def f(element_count):
if element_count == 1: #<-------------------------------A
return [1,] #at the root node - it has to be 1 otherwise this is pointless
current_value = get_the_value_however_you_need_to()#<----B
result_so_far = f(element_count/2) #<-------------------C
result = []
for i in result_so_far:#<--------------------------------D
result.append(i*current_value)
result.append(i*(1-current_value))
result.append((1-i)*current_value)
result.append((1-i)*(1-current_value))
return result
Here's how it works:
Say you wanted to work with a three layer pyramid. Then the element_count would be the number of elements on the third layer so you would call f(4). The condition at A fails so we continue to B where the next value is generated. Now at C we call f(2).
The process in f(2) is similar, f(2) calls f(1) and f(1) returns [1,] to f(2).
Now we start working our way back to the widest part of the tree...
I'm not sure what your lecturer was getting at with the end of the function. The for loop does the multiplication you explained and builds up a list which is then returned
If I'm understanding correctly, you want to build up a binary tree like this:
A
/ \
/ \
/ \
B C
/ \ / \
D E F G
Where the Boolean values (1, 0, or their Python equivalents, True and False) of lower level nodes are calculated from the values of their parent and grandparent using the following rules:
D = A and B
E = A and not B
F = not A and C
G = not A and not C
That is, each node's right descendents calculate their values based on it's inverse. You further stated that the tree is defined by a single root value (a) and another value that is used for both of the root's children (b).
Here's a function that will calculate the value of any node of such a tree. The tree positions are defined by an integer index in the same way a binary heap often is, with the parent of a node N being N//2 and it's children being 2*N and 2*N+1 (with the root node being 1). It uses a memoization dictionary to avoid recomputing the same values repeatedly.
def tree_value(n, a, b, memo=None):
if memo is None:
memo = {1:a, 2:b, 3:b} # this initialization covers our base cases
if n not in memo: # if our value is unknown, compute it
parent, parent_dir = divmod(n, 2)
parent_val = tree_value(parent, a, b, memo) # recurse
grandparent, grandparent_dir = divmod(parent, 2)
grandparent_val = tree_value(grandparent, a, b, memo) # recurse again
if parent_dir: # we're a right child, so invert our parent's value
parent_val = not parent_val
if grandparent_dir: # our parent is grandparent's right child, so invert
grandparent_val = not grandparent_val
memo[n] = parent_val and grandparent_val
return memo[n]
You could probably improve performance slightly by noticing that the grandparent's value will always be in the memo dict after the parent's value has been calculated, but I've left that out so the code is clearer.
If you need to efficiently compute many values from the same tree (rather than just one), you probably want to keep a permanent memo dictionary somewhere, perhaps as a value in a global dict, keyed by an (a, b) tuple.

Categories