I have a recursive function which creates a json object
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
else:
add_to_tree(name, parent, x["children"])
It is called from another function
def caller():
start_tree = [{"name":"root", "parent":"null", "children":[]}] # basic structure of the json object which holds the d3.js tree data
for x in new_list:
name = x.split('/')[-2]
parent = x.split('/')[-3]
add_to_tree(name, parent, start_tree)
new_list is list which holds links in this form
/root/A/
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/E/
/root/E/F/
/root/E/F/G/
/root/E/F/G/H/
...
Everything is working fine except for the fact the run times grows exponentially with with the input size.
Normally new_list has ~500k links and depth of these links can be more than 10 so there is lots of looping and looks involved in the add_to_tree() function.
Any ideas on how to make this faster?
You are searching your whole tree each time you add a new entry. This is hugely inefficient as your tree grows; you can easily end up with a O(N^2) searches this way; for each new element search the whole tree again.
You could use a dictionary mapping names to specific tree entries, for fast O(1) lookups; this lets you avoid having to traverse the tree each time. It can be as simple as treeindex[parent]. This'll take some more memory however, and you may need to handle the case where the parent is added after the children (using a queue).
However, since your input list appears to be sorted, you could just process your list recursively or use a stack and take advantage of the fact you just found the parent already. If your path is longer than the previous entry, it'll be a child of that entry. If the path is equal or shorter, it'll be a sibling entry to the previous node or a parent of that node, so return or pop the stack.
For example, for these three elements:
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/A/B/C does not have to search the tree from the root for /root/A/B, it was the previously processed entry. That'll be the parent call for this recursive iteration, or the top of the stack. Just add to that parent directly.
/root/A/D is a sibling of a parent; the path is shorter than /root/A/B/C/, so return or pop that entry of the stack. The length is equal to /root/A/B/, so it is a direct sibling; again return or pop the stack. Now you'll be at the /root/A level, and /root/A/D/ is a child. Add, and continue your process.
I have not tested this, but it looks like the loop does not stop when an insertion has been made, so every entry in new_list will cause a recursive search through all of the tree. This should speed it up:
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
return True
elif add_to_tree(name, parent, x["children"]):
return True
return False
It stops searching as soon as the parent is found.
That said, I think there is a bug in the approach. What if you have:
/root/A/B/C/
/root/D/B/E/
Your algorithm only parses the last two elements and it seems that both C and E will be placed under B. I think you will need to take all elements into account and make your way down the tree element by element. Anyway that is better since you will know at each level which branch to take, and the correct version will be much faster. Each insert will be O(log N).
Related
Lately I've been working with some recursive problems in Python where I have to generate a list of possible configurations (i.e list of permutations of a given string, list of substrings, etc..) using recursion. I'm having a very hard time in finding the best practice and also in understanding how to manage this sort of variable in recursion.
I'll give the example of the generate binary trees problem. I more-or-less know what I have to implement in the recursion:
If n=1, return just one node.
If n=3, return the only possible binary tree.
For n>3, crate one node and then explore the possibilities: left node is childless, right node is childless, neither node is childless. Explore these possibilites recursively.
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees. Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list, but then the problem is in case 3 when calling the recursive function to explore the possibilites for the nodes it would be returning a list and not appending nodes to a tree that I am building. When I picture the recursion tree in my head I imagine a "tree" variable that is unique to each of the tree leaves, and these trees are added to a list which is returned by the "root" (i.e first) call. But I don't know if that is possible. I thought of a global list and the recursive function not returning anything (just appending to it) but the problem I believe is that at each call the function would receive a copy of the variable.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better. I would also like to know if there is a "best practice" when it comes to that.
Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list
This is not the purest way to attack a recursive problem. It would be better if you can make the recursive function such that it solves the sub problem without an extra parameter variable that it must use. So the recursive function should just return a result as if it was the only call that was ever made (by the testing framework). So in the example, that recursive call should return a list with trees.
Alternatively the recursive function could be a sub-function that doesn't return a list, but yields the individual values (in this case: trees). The caller can then decide whether to pack that into a list or not. This is more pythonic.
As to the example problem, it is also important to identify some invariants. For instance, it is clear that there are no solutions when n is even. As to recursive aspect: once you have decided to create a root, then both its left and right sided subtree will have an odd number of nodes. Of course, this is an observation that is specific to this problem, but it is important to look for such problem properties.
Finally, it is equally important to see if the same sub problems can reoccur multiple times. This surely is the case in the example problem: for instance, the left subtree may sometimes have the same number of nodes as the right subtree. In such cases memoization will improve efficiency (dynamic programming).
When the recursive function returns a list, the caller can then iterate that list to retrieve its elements (trees in the example), and use them to build an extended result that satisfies the caller's task. In the example case that means that the tree taken from the recursively retrieved list, is appended as a child to a new root. Then this new tree is appended to a new list (not related to the one returned from the recursive call). This new list will in many cases be longer, although this depends on the type of problem.
To further illustrate the way to tackle these problems, here is a solution for the example problem: one which uses the main function for the recursive calls, and using memoization:
class Solution:
memo = { 1: [TreeNode()] }
def allPossibleFBT(self, n: int) -> List[Optional[TreeNode]]:
# If we didn't solve this problem before...
if n not in self.memo:
# Create a list for storing the results (the trees)
results = []
# Before creating any root node,
# decide the size of the left subtree.
# It must be odd
for num_left in range(1, n, 2):
# Make the recursive call to get all shapes of the
# left subtree
left_shapes = self.allPossibleFBT(num_left)
# The remainder of the nodes must be in the right subtree
num_right = n - 1 - num_left # The root also counts as 1
right_shapes = self.allPossibleFBT(num_right)
# Now iterate the results we got from recursion and
# combine them in all possible ways to create new trees
for left in left_shapes:
for right in right_shapes:
# We have a combination. Now create a new tree from it
# by putting a root node on top of the two subtrees:
tree = TreeNode(0, left, right)
# Append this possible shape to our results
results.append(tree)
# All done. Save this for later re-use
self.memo[n] = results
return self.memo[n]
This code can be made more compact using list comprehension, but it may make the code less readable.
Don't pass information into the recursive calls, unless they need that information to compute their local result. It's much easier to reason about recursion when you write without side effects. So instead of having the recursive call put its own results into a list, write the code so that the results from the recursive calls are used to create the return value.
Let's take a trivial example, converting a simple loop to recursion, and using it to accumulate a sequence of increasing integers.
def recursive_range(n):
if n == 0:
return []
return recursive_range(n - 1) + [n]
We are using functions in the natural way: we put information in with the arguments, and get information out using the return value (rather than mutation of the parameters).
In your case:
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees.
So you know that you want to return a list of trees at the end of the process. So the natural way to proceed, is that you expect each recursive call to do that, too.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better.
The recursive calls return their lists of results for the sub-problems. You use those results to create the list of results for the current problem.
You don't need to think about how recursion is implemented in order to write recursive algorithms. You don't need to think about the call stack. You do need to think about two things:
What are the base cases?
How does the problem break down recursively? (Alternately: why is recursion a good fit for this problem?)
The thing is, recursion is not special. Making the recursive call is just like calling any other function that would happen to give you the correct answer for the sub-problem. So all you need to do is understand how solving the sub-problems helps you to solve the current one.
I have script that is supposed to filter out some elements out of XML file. I did it like this because I exactly knew what is depth of element, how many children there are...
But can you please give me an example of how this can be done without knowing the depth of nest?
Code looks like this:
def Filter_Modules(folder_name, corresponding_list):
for element in delta_root.iter('folder'):
if element.attrib.get('name') == str(folder_name):
corresponding_list.append(element)
for child in element:
corresponding_list.append(child)
for ch in child:
corresponding_list.append(ch)
for c in ch:
corresponding_list.append(c)
All suggestions are welcome..
I understand that you want to put in corresponding_list all
descendant elements of the folder element of which the name
attribute equals some string.
Then a good solution for that is to use a recursive function. (In
general, recursivity is a good approach to handle data structures like
trees, graphs, ...).
The recursive function add_sub_tree appends and element to
corresponding_list and then recursively calls itself on all its
children. Children will also be appended to corresponding_list and
the function will recursively call itself to append all grand-children
and so on.
def Filter_Modules(folder_name, corresponding_list):
def add_sub_tree(element):
corresponding_list.append(element)
for child in element:
add_sub_tree(child)
for element in delta_root.iter('folder'):
if element.attrib.get('name') == str(folder_name):
add_sub_tree(element)
I feel embarrassed asking this question but I spent close to four hours trying to make sense of why this code works. My problem is that this code to me looks like it works to me when the longest path is selected the first time but not when sub-optimal path is selected first time. My guess is that this code works because when when the not longest path is selected the depth value and height resets and then the next path is selected??? Can someone please explain?
Picture:
'''
For your reference:
class TreeNode:
def __init__(self):
self.children = []
'''
def find_height(root):
global max
max=0
if not root:
return 0
traverse(root,0)
return max-1
def traverse(node, depth):
global max
depth +=1
if depth>max:
max=depth
for child in node.children:
traverse(child, depth)
The code is correct, and it is like you say, but maybe it helps to just take a bit of abstraction. The code essentially traverses the whole tree. It does not really matter in which order it does that. In this code it performs a depth-first traversal, probably because that is the easiest to implement and has a small memory footprint. But it is not really relevant for understanding how it can return the length of the longest path.
Just note that the algorithm guarantees to visit each node and knows at which depth this node occurs. It should be clear that when you take the maximum depth of all the depths that you encounter during the traversal, that you have found the length of the longest path from the root, or in other words, the height of the tree.
The following statement makes sure the code always updates the maximum depth in light of the current depth:
if depth>max:
max=depth
I should maybe also highlight that depth is a local variable, which belongs to one execution context of the function traverse. So each execution of the function creates a new instance of that variable. When you return out of a recursive call, the execution comes back to where you have a previous version of this variable. A recursive call does not modify the value of depth from where that call is made. So it truly reflects the depth of the current node (after depth += 1 is executed)
In contrast, max is a global variable, and so there is only one version of that variable. And so the effect of max=depth persists also after a call of traverse terminates.
As a side note, many would say that modifying a global variable from inside a function is not ideal (it represents a side effect), and there are better ways to code this. The recursive function should better return the height of the subtree rooted in the node that is passed as argument. This also has as advantage that you don't need the depth argument, and you don't need a second function:
def find_height(root):
if not root:
return 0
# the height of the tree is 1 more than that of the tallest subtree below it
return 1 + max(find_height(child) for child in root.children, default=0)
I have tried to implement a BST. As of now it only adds keys according to the BST property(Left-Lower, Right-Bigger). Though I implemented it in a different way.
This is how I think BST's are supposed to be
Single Direction BST
How I have implemented my BST
Bi-Directional BST
The question is whether or not is it the correct implementation of BST?
(The way i see it in double sided BST's it would be easier to search, delete and insert)
import pdb;
class Node:
def __init__(self, value):
self.value=value
self.parent=None
self.left_child=None
self.right_child=None
class BST:
def __init__(self,root=None):
self.root=root
def add(self,value):
#pdb.set_trace()
new_node=Node(value)
self.tp=self.root
if self.root is not None:
while True:
if self.tp.parent is None:
break
else:
self.tp=self.tp.parent
#the self.tp varible always is at the first node.
while True:
if new_node.value >= self.tp.value :
if self.tp.right_child is None:
new_node.parent=self.tp
self.tp.right_child=new_node
break
elif self.tp.right_child is not None:
self.tp=self.tp.right_child
print("Going Down Right")
print(new_node.value)
elif new_node.value < self.tp.value :
if self.tp.left_child is None:
new_node.parent=self.tp
self.tp.left_child=new_node
break
elif self.tp.left_child is not None:
self.tp=self.tp.left_child
print("Going Down Left")
print(new_node.value)
self.root=new_node
newBST=BST()
newBST.add(9)
newBST.add(10)
newBST.add(2)
newBST.add(15)
newBST.add(14)
newBST.add(1)
newBST.add(3)
Edit: I have used while loops instead of recursion. Could someone please elaborate as why using while loops instead of recursion is a bad idea in this particular case and in general?
BSTs with parent links are used occasionally.
The benefit is not that the links make it easier to search or update (they don't really), but that you can insert before or after any given node, or traverse forward or backward from that node, without having to search from the root.
It becomes convenient to use a pointer to a node to represent a position in the tree, instead of a full path, even when the tree contains duplicates, and that position remains valid as updates or deletions are performed elsewhere.
In an abstract data type, these properties make it easy, for example, to provide iterators that aren't invalidated by mutations.
You haven't described how you gain anything with the parent pointer. An algorithm that cares about rewinding to the parent node, will do so by crawling back up the call stack.
I've been there -- in my data structures class, I implemented my stuff with bi-directional pointers. When we got to binary trees, those pointers ceased to be useful. Proper use of recursion replaces the need to follow a link back up the tree.
I do not have computer science background. I am trying to learn coding by myself, and I'm doing it, partly, by solving the problems on LeetCode.
Anyway, there are the problems that use Linked Lists. And I already found info that linked list have to be simulated in Phython. My problem is that I really cannot get what is behind linked list. For instance, what kind of problems those are suppose to target?
And in general how linked list function. Any link for such info would be really helpfull.
The recent problem I looked at LeetCode asks to swap every two adjacent nodes and return its head. And LeetCode offers following solution, that I cannot actually figure out how it acutaly works.
# Definition for singly-linked list.
# class ListNode(object):
# def __init__(self, x):
# self.val = x
# self.next = None
class Solution(object):
def swapPairs(self, head):
"""
:type head: ListNode
:rtype: ListNode
"""
pre = self
pre.next = head
while pre.next and pre.next.next:
a = pre.next
b = a.next
pre.next =b
b.next =a
a.next =b.next
pre = a
return self.next
As I said, I do not understand this solution. I tried to use example list 1->2->3->4 that should return list 2->1->4->3
All I managed is to make only one pass through the loop, and then computer should exit the loop, but then what happens? How are the last two numbers switched? How does this code work at all if list has only 2 elements, to me it seems impossible.
If you could just direct me to the online literature that explains something like this, I would be most grateful.
Thanks.
a linked-list acts almost the same as an array. There are a few main differences though. In a linked-list, the memory used doesn't (and almost never is) contiguous memory. So in an array, if u have 5 items and you look at the memory all 5 items will be right next to each other (for the most part). However each 'item' in a linked list has a pointer that points directly to the next item, removing the need to have contiguous memory. So an array is a 'list' of items that exist contiguously in memory and a linked-list is a 'list' of objects that each hold an item and a pointer to the next item. This is considered a single linked-list as traversal is only possible from one direction. There is also a double linked-list where each node now has a pointer to the next node and another pointer for the previous node allowing traversal from both directions.
https://www.cs.cmu.edu/~adamchik/15-121/lectures/Linked%20Lists/linked%20lists.html
the link will help you get familiar with visualizing how these linked-lists work. I would probably focus on inserting before and after as these should help you understand what your loop is doing.
Linked lists don't "exist" in Python as the language basically has an iterable builtin list object. Under the hood I'm sure this is implemented as a linked list in C code (most common implementation of Python).
The main feature is that a linked list is easily extendible, wheras an array has to be manually resized if you wish to expand it. Again, in Python these details are all abstracted away. So trying to work an example of linked lists in Python is pointless in my opinion, as you won't learn anything.
You should be doing this in C to get an actual understanding of memory allocation and pointers.
That said, given your example, each ListNode contains a value (like an array), but rather than just that, it has a variable 'next' where you store another ListNode object. This object, just like the first, has a value, and a variable that stores another ListNode object.This can continue for as many objects as desired.
The way the code works is that when we say pre.next, this refers to the ListNode object stored there, and the next object after that is pre.next.next. This works because pre.next is a ListNode object, which has a variable next.
Again, read up on linked lists in C. If you plan to work in higher level languages, I would say you don't really need an understanding of linked lists, as these data structures come "free" with most high level languages.