Recursing over each path in a tree

Recursing over each path in a tree - python

I'm stuck on a programming question involving a tree for a project.
The problem itself is only a subproblem of the larger question (but I won't post that here as its not really relevant). Anyone the problem is:
I'm trying to go over each path in the tree and calculate the associated value.
The situation is for instance like in this tree:
a
b b
Now the result i should get is the multiplications as follows:
leave1 = a * b
leave2 = a * (1-b)
leave3 = (1-a) * b
leave4 = (1-a) * (1-b)
And so the leaves on one level lower in the tree would basically be the results (note that they do not exist in reality, its just conceptual).
Now, I want to do this recursively, but there are a few problems:
The values for a and b are generated during the traversal, but the value for b for instance should only be generated 1 time. All values are either 0 or 1.
If taking the left child of a node A, you use the value A in the multiplication. the right path you use the value 1-A.
Furthermore, the tree is always perfect, i.e. complete and balanced.
Now what I have (I program in python, but its more the algorithm in general im interested in with this question):
def f(n):
if n == 1:
return [1]
generate value #(a, b or whatever one it is)
g = f(n/2)
h = scalarmultiply(value,g)
return h.append(g - h)
Note that g and h are lists.
This code was giving by one of my professors as possible help, but I don't think this does what I want. At least, it wont give me as result a list h which has the result for each path. Especially, I don't think it differentiates between b and 1-b. Am I seeing this wrong and how should I do this?
I'm not very experienced at programming, so try and explain easy if you can :-)

Try something like this:
def f(element_count):
if element_count == 1: #<-------------------------------A
return [1,] #at the root node - it has to be 1 otherwise this is pointless
current_value = get_the_value_however_you_need_to()#<----B
result_so_far = f(element_count/2) #<-------------------C
result = []
for i in result_so_far:#<--------------------------------D
result.append(i*current_value)
result.append(i*(1-current_value))
result.append((1-i)*current_value)
result.append((1-i)*(1-current_value))
return result
Here's how it works:
Say you wanted to work with a three layer pyramid. Then the element_count would be the number of elements on the third layer so you would call f(4). The condition at A fails so we continue to B where the next value is generated. Now at C we call f(2).
The process in f(2) is similar, f(2) calls f(1) and f(1) returns [1,] to f(2).
Now we start working our way back to the widest part of the tree...
I'm not sure what your lecturer was getting at with the end of the function. The for loop does the multiplication you explained and builds up a list which is then returned

If I'm understanding correctly, you want to build up a binary tree like this:
A
/ \
/ \
/ \
B C
/ \ / \
D E F G
Where the Boolean values (1, 0, or their Python equivalents, True and False) of lower level nodes are calculated from the values of their parent and grandparent using the following rules:
D = A and B
E = A and not B
F = not A and C
G = not A and not C
That is, each node's right descendents calculate their values based on it's inverse. You further stated that the tree is defined by a single root value (a) and another value that is used for both of the root's children (b).
Here's a function that will calculate the value of any node of such a tree. The tree positions are defined by an integer index in the same way a binary heap often is, with the parent of a node N being N//2 and it's children being 2*N and 2*N+1 (with the root node being 1). It uses a memoization dictionary to avoid recomputing the same values repeatedly.
def tree_value(n, a, b, memo=None):
if memo is None:
memo = {1:a, 2:b, 3:b} # this initialization covers our base cases
if n not in memo: # if our value is unknown, compute it
parent, parent_dir = divmod(n, 2)
parent_val = tree_value(parent, a, b, memo) # recurse
grandparent, grandparent_dir = divmod(parent, 2)
grandparent_val = tree_value(grandparent, a, b, memo) # recurse again
if parent_dir: # we're a right child, so invert our parent's value
parent_val = not parent_val
if grandparent_dir: # our parent is grandparent's right child, so invert
grandparent_val = not grandparent_val
memo[n] = parent_val and grandparent_val
return memo[n]
You could probably improve performance slightly by noticing that the grandparent's value will always be in the memo dict after the parent's value has been calculated, but I've left that out so the code is clearer.
If you need to efficiently compute many values from the same tree (rather than just one), you probably want to keep a permanent memo dictionary somewhere, perhaps as a value in a global dict, keyed by an (a, b) tuple.

Related

Get the depth of a node in a Ternary Tree Using The Indexes in Array Implementation of Trees

Let's assume we create a Ternary tree using an array implementation. The root is stored in the index 1, the left child is stored in index = 3(index)-1, middle child is stored in 3(index), and right child is stored in 3(index)+1. So for example. Assume the following Ternary Tree.
A
B C D
E F G H I J K L M
The array implementation would be [None, A, B, C, D, E, F, G, H, I, J, K, L, M]
If we take F for random, F is the middle child go B, and B is the left child of A. A has an index of 1, so B has an index of 2, so F has an index of 6.
My question is, how can I get the depth from the indexes. F has an index of 6 and a depth of 2. If this was a binary tree, the depth would simply be equal to int(math.log(index, 2)). What is the equation for the depth for a ternary tree, I can't think of any.

The number of nodes in level k is 3**k, for level 0 it's 1, for level 1 it's 3, for level 2 it's 9, and so on.
The total number of nodes in k levels is the sum of arithmetic progression:
sum(i in [0;k], 3**i) = (1 + 3**k)*(k+1) / 2
For a node index N you need to find such k that:
(1 + 3**k)*(k+1)/2 <= N < (1 + 3**(k+1))*(k+2) / 2
I would suggest binary search to find k given N, as it's a programming problem, not mathematical.
But you can also try to solve the equation by taking the natural logarithm of its parts, for N>1 it should keep the inequality sign. I'll leave this to you as an excersize, otherwise you can ask in the Mathematics stack exchange.

The formula is int(math.log(2*index - 1, 3))
You can derive it as follows:
Each level has a number of nodes that is a power of 3.
So the position of the last node in a level is a sum of such powers, which forms a geometric series. This means the index of the last node on level 𝑛 (zero based) is (3𝑛─1)/2. Solving this to 𝑛, and taking into account the dummy entry in the array, this gives the final solution at the top.

Recurrence relation and time complexity of finding next larger in Generic tree

Question: Given a generic tree and an integer n. Find and return the node with next larger element in the tree i.e. find a node with value just greater than n.
Although i was able to solve it is O(n) by removing the later for loop and doing comparisons while calling recursion. I am bit curious about time complexity of following version of code.
I came up with recurrence relation as T(n) = T(n-1) + (n-1) = O(n^2). Where T(n-1) is for time taken by children and + (n-1) for finding the next larger (second for loop). Have i done it right? or am i missing something?
def nextLargestHelper(root, n):
"""
root => reference to root node
n => integer
Returns node and value of node which is just larger not first larger than n.
"""
# Special case
if root is None:
return None, None
# list to store all values > n
largers = list()
# Induction step
if root.data > n:
largers.append([root, root.data])
# Induction step and Base case; if no children do not call recursion
for child in root.children:
# Induction hypothesis; my function returns me node and value just larger than 'n'
node, value = nextLargestHelper(child, n)
# If larger found in subtree
if node:
largers.append([node, value])
# Initialize node to none, and value as +infinity
node = None
value = sys.maxsize
# travers through all larger values and return the smallest value greater than n
for item in largers: # structure if item is [Node, value]
# this is why value is initialized to +infinity; so as it is true for first time
if item[1] < value:
node = item[0]
value = item[1]
return node, value

At first: please use different chacters for O-Notation and inputvalues.
You "touch" every node exactly once, so the result should be O(n). A bit special is your algorithm finding the minimum afterwards. You could include this in your go-though-all-children loop for an easier recurrence estimation. As it is, you have do a recurrence estimation for the minimum of the list as well.
Your recurrence equation should look more like T(n) = a*T(n/a) + c = O(n) since in each step you have a children forming a subtrees with size (n-1)/a. In each step you have next to some constant factors also the computation of the minimum of a list with at most a elements. You could write it as a*T(n/a) + a*c1 +c2 which is the same as a*T(n/a) + c. The actual formula would look more like this: T(n) = a*T((n-1)/a) + c but the n-1 makes it harder to apply the master theorem.

Calculate height of an arbitrary (non-binary) tree

I'm currently taking on online data structures course and this is one of the homework assignments; please guide me towards the answer rather than giving me the answer.
The prompt is as follows:
Task. You are given a description of a rooted tree. Your task is to compute and output its height. Recall that the height of a (rooted) tree is the maximum depth of a node, or the maximum distance from a leaf to the root. You are given an arbitrary tree, not necessarily a binary tree.
Input Format. The first line contains the number of nodes n. The second line contains integer numbers from −1 to n−1 parents of nodes. If the i-th one of them (0 ≤ i ≤ n−1) is −1, node i is the root, otherwise it’s 0-based index of the parent of i-th node. It is guaranteed that there is exactly one root. It is guaranteed that the input represents a tree.
Constraints. 1 ≤ n ≤ 105.
My current solution works, but is very slow when n > 102. Here is my code:
# python3
import sys
import threading
# In Python, the default limit on recursion depth is rather low,
# so raise it here for this problem. Note that to take advantage
# of bigger stack, we have to launch the computation in a new thread.
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
threading.Thread(target=main).start()
# returns all indices of item in seq
def listOfDupes(seq, item):
start = -1
locs = []
while True:
try:
loc = seq.index(item, start+1)
except:
break
else:
locs.append(loc)
start = loc
return locs
def compute_height(node, parents):
if node not in parents:
return 1
else:
return 1 + max(compute_height(i, parents) for i in listOfDupes(parents, node))
def main():
n = int(input())
parents = list(map(int, input().split()))
print(compute_height(parents.index(-1), parents))
Example input:
>>> 5
>>> 4 -1 4 1 1
This will yield a solution of 3, because the root is 1, 3 and 4 branch off of 1, then 0 and 2 branch off of 4 which gives this tree a height of 3.
How can I improve this code to get it under the time benchmark of 3 seconds? Also, would this have been easier in another language?

Python will be fine as long as you get the algorithm right. Since you're only looking for guidance, consider:
1) We know the depth of a node iif the depth of its parent is known; and
2) We're not interested in the tree's structure, so we can throw irrelevant information away.
The root node pointer has the value -1. Suppose that we replaced its children's pointers to the root node with the value -2, their children's pointers with -3, and so forth. The greatest absolute value of these is the height of the tree.
If we traverse the tree from an arbitrary node N(0) we can stop as soon as we encounter a negative value at node N(k), at which point we can replace each node with the value of its parent, less one. I.e, N(k-1) = N(k) -1, N(k-2)=N(k-1) - 1... N(0) = N(1) -1. As more and more pointers are replaced by their depth, each traversal is more likely to terminate by encountering a node whose depth is already known. In fact, this algorithm takes basically linear time.
So: load your data into an array, start with the first element and traverse the pointers until you encounter a negative value. Build another array of the nodes traversed as you go. When you encounter a negative value, use the second array to replace the original values in the first array with their depth. Do the same with the second element and so forth. Keep track of the greatest depth you've encountered: that's your answer.

The structure of this question looks like it would be better solved bottom up rather than top down. Your top-down approach spends time seeking, which is unnecessary, e.g.:
def height(tree):
for n in tree:
l = 1
while n != -1:
l += 1
n = tree[n]
yield l
In []:
tree = '4 -1 4 1 1'
max(height(list(map(int, tree.split()))))
Out[]:
3
Or if you don't like a generator:
def height(tree):
d = [1]*len(tree)
for i, n in enumerate(tree):
while n != -1:
d[i] += 1
n = tree[n]
return max(d)
In []:
tree = '4 -1 4 1 1'
height(list(map(int, tree.split())))
Out[]:
3
The above is brute force as it doesn't take advantage of reusing parts of the tree you've already visited, it shouldn't be too hard to add that.

Your algorithm spends a lot of time searching the input for the locations of numbers. If you just iterate over the input once, you can record the locations of each number as you come across them, so you don't have to keep searching over and over later. Consider what data structure would be effective for recording this information.

How do I locate the recursion conditions?

My code is as follows.
I tried coding out for each case first, so given n = 4, my code looks like this:
a = overlay_frac(0,blank_bb,scale(1/4,rune))
b = overlay_frac(1/4,blank_bb,scale(1/2,rune))
c = overlay_frac(1/2,blank_bb,scale(3/4,rune))
d = overlay_frac(3/4,blank_bb,scale(1,rune))
show (overlay(a,(overlay(b,(overlay(c,d))))))
My understanding is that the recursion pattern is:
a = overlay_frac((1/n)-(1/n),blank_bb,scale(1/n,rune))
b = overlay_frac((2/n)-(1/n),blank_bb,scale(2/n,rune))
c = overlay_frac((3/n)-(1/n),blank_bb,scale(3/n,rune))
d = overlay_frac((4/n)-(1/n),blank_bb,sale(4/n,rune))
Hence, the recursion pattern that I came up with is:
def tree(n,rune):
if n==1:
return rune
else:
for i in range(n+1):
return overlay(overlay_frac(1-(1/n),blank_bb,scale(i/n,rune)),tree(n-1,rune))
When I hardcode this, everything turns out just fine, but I suspect I'm not doing the recursion properly. Where have I gone wrong?

You are in fact trying to do an iteration within a recursive call. In stead of using loop, you can use an inner function to memorize your status. The coefficient you defined is actually changed with both n and i, but for a given n it changed with i only. The status you need to memorize with inner function is then i, which is the same as you looping through i.
You can still achieve your goal by doing so
def f(i, n):
return overlay_frac((i/n)-(1/n),blank_bb,scale(i/n,rune))
# for each iteration, you check if i is equal to n
# if yes, return the result (base case)
# otherwise, you apply next coefficient to the previous result
# you start with i = 0 and increase by one every iteration until i reach to n (base case)
# notice how similar this recursive call looks like a loop
# the only difference is the status are updated within the function call itself
# therefore you will not have the problem of earlier return
def recursion(n):
def iteration(i, out):
if i == n:
return out
else:
return iteration(i+1, overlay(f(n-1, n), out))
return iteration(0, f(n, n))
Here, n is assumed to be the times of overlay you want to apply. When n = 0, no function applied on the last coefficient f(n, n). When n = 1, the output would be overlay applied once on coefficient with i = n - 1 and coefficient with i = n.
This way avoids the earlier return inside your loop.
In fact you can omit the inner function by adding additional argument to your outer function. Then you need to assign the default initial i. The inner function is not really necessary here. The key is to use the function argument to memorize the status (variable i in this case).
def f(i, n):
return overlay_frac((i/n)-(1/n),blank_bb,scale(i/n,rune))
def recursion(n, i=0):
if i == n:
return f(n, n)
else:
return overlay(f(n-1, n), recursion(n, i+1))

Your first two code blocks don't correspond to the same operations. This would be equivalent to your first block (in Python 3).
def overlayer(n, rune):
def layer(k):
# Scale decreases linearly with k
return overlay_frac((1 - (k+1)/n), blank_bb, scale(1-k/n, rune))
result = layer(0)
for i in range(1, n):
# Overlay on top of previous layers
result = overlay(layer(i), result)
return result
show(overlayer(4, rune))

Let's look at your equations again:
a = overlay_frac(0,blank_bb,scale(1/4,rune))
b = overlay_frac(1/4,blank_bb,scale(1/2,rune))
c = overlay_frac(1/2,blank_bb,scale(3/4,rune))
d = overlay_frac(3/4,blank_bb,scale(1,rune))
show (overlay(a,(overlay(b,(overlay(c,d))))))
What you wrote as "recursion" is not a recursion formula. If you compare your formulas for the recursion with the ones you gave us, you can infer n=4 which makes no sense. For a recursion pattern you need to describe your inner variables as a manifestation of the same expression with only a different parameter. That is, you should replace:
f_n = overlay_frac((1/4)*(n-1),blank_bb,sale(n/4,rune))
such that f_1=a, f_2=b etc...
Then your recursion fomula that you want to calculate translates to:
show (overlay(f_1,(overlay(f_2,(overlay(f_3,f_4))))))
You can write the function f_n as f(n) (and maybe other paramters) in your code and then do
def recurse(n):
if n == 4:
return f(4)
else:
return overlay(f(n),recurse(n+1))
then call:
show( recurse (1))
You need to assert that n<5and integer, otherwise you'll end up in an infinity loop.
There may still be some mistake, but it should be along those lines. Once you've actually written it like this however, it (maybe) doesn't really make sense to do a recursion anyways. If you only want to do it for n_max=4, that is. Just call the function in one line by replacing a,b,c,d with f_1,f_2,f_3,f_4

Simple Genetic Algorithm meeting local optimum for "Hello World"

My target was simple, using genetic algorithm to reproduce the classical "Hello, World" string.
My code was based on this post. The code mainly contain 4 parts:
Generate the population which has serval different individual
Define the fitness and grade function which evaluate the individual good or bad based on the comparing with target.
Filter the population and leave len(pop)*retain individuals
Add some other individuals and mutate randomly
The parents's DNA will pass over to its children to comprise the whole population.
I modified the code and shows like this:
import numpy as np
import string
from operator import add
from random import random, randint
def population(GENSIZE,target):
p = []
for i in range(0,GENSIZE):
individual = [np.random.choice(list(string.printable[:-5])) for j in range(0,len(target))]
p.append(individual)
return p
def fitness(source, target):
fitval = 0
for i in range(0,len(source)-1):
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
def grade(pop, target):
'Find average fitness for a population.'
summed = reduce(add, (fitness(x, target) for x in pop))
return summed / (len(pop) * 1.0)
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in p]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
#
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
GENSIZE = 40
target = "Hello, World"
p = population(GENSIZE,target)
fitness_history = [grade(p, target),]
for i in xrange(20):
p = evolve(p, target)
fitness_history.append(grade(p, target))
# print p
for datum in fitness_history:
print datum
But it seems that the result can't fit targetwell.
I tried to change the GENESIZE and loop time(more generation).
But the result always get stuck. Sometimes, enhance the loop time can help to find a optimum solution. But when I change the loop time to an much larger number like for i in xrange(10000). The result shows the error like:
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
ValueError: chr() arg not in range(256)
Anyway, how to modify my code and get an good result.
Any advice would be appreciate.

The chr function in Python2 only accepts values in the range 0 <= i < 256.
You are passing:
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
So you need to check that the result of
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
is not going to be outside that range, and take corrective action before passing to chr if it is outside that range.
EDIT
A reasonable fix for the ValueError might be to take the amended value modulo 256 before passing to chr:
chr((ord(individual[pos_to_mutate]) + np.random.randint(-1, 1)) % 256)
There is another bug: the fitness calculation doesn't take the final element of the candidate list into account: it should be:
def fitness(source, target):
fitval = 0
for i in range(0,len(source)): # <- len(source), not len(source) -1
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
Given that source and target must be of equal length, the function can be written as:
def fitness(source, target):
return sum((ord(t) - ord(s)) ** 2 for (t, s) in zip(target, source))
The real question was, why doesn't the code provided evolve random strings until the target string is reached.
The answer, I believe, is it may, but will take a lot of iterations to do so.
Consider, in the blog post referenced in the question, each iteration generates a child which replaces the least fit member of the gene pool if the child is fitter. The selection of the child's parent is biased towards fitter parents, increasing the likelihood that the child will enter the gene pool and increase the overall "fitness" of the pool. Consequently the members of the gene pool converge on the desired result within a few thousand iterations.
In the code in the question, the probability of mutation is much lower, based on the initial conditions, that is the defaults for the evolve function.
Parents that are retained have only a 1% chance of mutating, and one third of the time the "mutation" will not result in a change (zero is a possible result of random.randint(-1, 1)).
Discard parents are replaced by individuals created by merging two retained individuals. Since only 20% of parents are retained, the population can converge on a local minimum where each new child is effectively a copy of an existing parent, and so no diversity is introduced.
So apart from fixing the two bugs, the way to converge more quickly on the target is to experiment with the initial conditions and to consider changing the code in the question to inject more diversity, for example by mutating children as in the original blog post, or by extending the range of possible mutations.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursing over each path in a tree - python

Related

Get the depth of a node in a Ternary Tree Using The Indexes in Array Implementation of Trees

Recurrence relation and time complexity of finding next larger in Generic tree

Calculate height of an arbitrary (non-binary) tree

How do I locate the recursion conditions?

Simple Genetic Algorithm meeting local optimum for "Hello World"

Categories

Resources