Satisfying Properties of nodes after rotation in AVL trees - python

I was learning about Self Balanced BST & AVL Trees and I am stuck at a particular case when x = z.
I have this example for better understanding:
As you may see according to the properties --> all the elements >= node x, should be on right subtree of node x, but In this case 3 will be on left subtree of node x which violates the properties of BST.
I may be wrong at something since I am learning about Data Structures using online resources on my own, It would be really helpful if you could answer this question, And correct me If I am wrong at something.

Usually binary search trees do not have duplicate elements, so this problem is avoided.

Related

How to find the largest linked grouping of sub-graphs?

I want to determine the largest contiguous (if that’s the right word) graph consisting of a bunch of sub-graphs. I define two sub-graphs as being contiguous if any of the nodes between the two sub-graphs are linked.
My initial solution to this is very slow and cumbersome and stupid – just to look at each sub-graph, see if it’s linked to any of the other sub-graphs, and do the analysis for all of the sub-graphs to find the largest number of linked sub-graphs. That’s just me coming from a Fortran background. Is there a better way to do it – a pythonic way, even a graph theory way? I imagine this is a standard question in network science.
A good starting point to answer the kind of question you've asked is to look at a merge-find (or disjoint-set) approach (https://en.wikipedia.org/wiki/Disjoint-set_data_structure).
It's offers an efficient algorithm (at least on an amortized basis) to identify which members of a collection of graphs are disjoint and which aren't.
Here are a couple of related questions that have pointers to additional resources about this algorithm (also know as "union-find"):
Union find implementation using Python
A set union find algorithm
You can get quite respectable performance by merging two sets using "union by rank" as summarized in the Wikipedia page (and the pseudocode provided therein):
For union by rank, a node stores its rank, which is an upper bound for its height. When a node is initialized, its rank is set to zero. To merge trees with roots x and y, first compare their ranks. If the ranks are different, then the larger rank tree becomes the parent, and the ranks of x and y do not change. If the ranks are the same, then either one can become the parent, but the new parent's rank is incremented by one. While the rank of a node is clearly related to its height, storing ranks is more efficient than storing heights. The height of a node can change during a Find operation, so storing ranks avoids the extra effort of keeping the height correct.
I believe there may be even more sophisticated approaches, but the above union-by-rank implementation is what I have used in the past.

Traversal method of DecisionTreeClassifier in sklearn

This class is used to build a decision tree. A lot of values within the trees structure including tree_.value and tree_.impurity are saved as arrays without much indication to which node each value is referring to. My deductions tell me that they are doing a preorder traversal, but I have no conclusive proof that this is how each array is being constructed. Does anyone know where to find this information?
From tree.pyx:
def class Tree:
"""Array-based representation of a binary decision tree.
The binary tree is represented as a number of parallel arrays. The i-th
element of each array holds information about the node `i`. Node 0 is the
tree's root. You can find a detailed description of all arrays in
`_tree.pxd`. NOTE: Some of the arrays only apply to either leaves or split
nodes, resp. In this case the values of nodes of the other type are
arbitrary!
So node[0] refers to the root. To add a node, it uses a splitter class on the leaves, which can either split the nodes based on the leaf that has a greater impurity improvement, or a depth-first splitting. I haven't looked into the order nodes are added to the parallel, but my guess is that they are added in the order in which they are created, which would be "similar" to pre-order transversal if the tree were ordered by impurity improvement.

Random Forest implementation in Python

all!
Could anybody give me an advice on Random Forest implementation in Python? Ideally I need something that outputs as much information about the classifiers as possible, especially:
which vectors from the train set are used to train each decision
trees
which features are selected at random in each node in each
tree, which samples from the training set end up in this node, which
feature(s) are selected for split and which threashold is used for
split
I have found quite some implementations, the most well known one is probably from scikit, but it is not clear how to do (1) and (2) there (see this question). Other implementations seem to have the same problems, except the one from openCV, but it is in C++ (python interface does not cover all methods for Random Forests).
Does anybody know something that satisfies (1) and (2)? Alternatively, any idea how to improve scikit implementation to get the features (1) and (2)?
Solved: checked the source code of sklearn.tree._tree.Tree. It has good comments (which fully describe the tree):
children_left : int*
children_left[i] holds the node id of the left child of node i.
For leaves, children_left[i] == TREE_LEAF. Otherwise,
children_left[i] > i. This child handles the case where
X[:, feature[i]] <= threshold[i].
children_right : int*
children_right[i] holds the node id of the right child of node i.
For leaves, children_right[i] == TREE_LEAF. Otherwise,
children_right[i] > i. This child handles the case where
X[:, feature[i]] > threshold[i].
feature : int*
feature[i] holds the feature to split on, for the internal node i.
threshold : double*
threshold[i] holds the threshold for the internal node i.
You can get nearly all the information in scikit-learn. What exactly was the problem? You can even visualize the trees using dot.
I don't think you can find out which split candidates were sampled at random, but you can find out which were selected in the end.
Edit: Look at the tree_ property of the decision tree. I agree, it is not very well documented. There really should be an example to visualize the leaf distributions etc. You can have a look at the visualization function to get an understanding of how to get to the properties.

Voronoi Tessellation in Python

Node Assignment Problem
The problem I want to solve is to tessellate the map given with the Blue Nodes(Source Nodes) as given input points, Once I am able to do this I would like to see how many Black Nodes(Demand Nodes) fall within each cell and assign it to the Blue Node associated with that cell.
I would like to know if there is a easier way of doing this without using Fortune's Algorithm.I came across this function under Mahotas called Mahotas.segmentation.gvoronoi(image)source. But I am not sure if this will solve my problem.
Also please suggest me if there is a better way of doing this segmentation(other than Voronoi tessellation). I am not sure if clustering algorithms would be a good choice. I am a programming newbie.
Here is an alternative approach to using Voronoi tessellation:
Build a k-d tree over the source nodes. Then for every demand node, use the k-d tree to find the nearest source node and increment a counter associated with that nearby source node.
The implementation of a k-d tree found at http://code.google.com/p/python-kdtree/ should be useful.
I've just been looking for the same thing and found this:
https://github.com/Softbass/py_geo_voronoi
There's not many points in your diagram. That suggests you can, for each demand node, just iterate through all the source nodes and find the nearest one.
Perhaps this:
def distance(a, b):
return sum((xa - xb) ** 2 for (xa, xb) in zip(a, b))
def clusters(sources, demands):
result = dict((source, []) for source in sources)
for demand in demands:
nearest = min(sources, key=lambda s: distance(s, demand))
result[nearest].append(demand)
return result
This code will give you a dictionary, mapping source nodes to a list of all demand nodes which are closer to that source node than any other.
This isn't particularly efficient, but it's very simple!
I think the spatial index answer by https://stackoverflow.com/users/1062447/wye-bee (A kd-tree for example) is the easiest solution to your problem.
Additionally, you did also ask is there an easier alternative to Fortune's algorithm and for that particular question I refer you to: Easiest algorithm of Voronoi diagram to implement?
You did not say why you wanted to avoid Fortune's algorithm. I assume you meant that you just didn't want to implement it yourself, but it has already been implemented in a script by Bill Simons and Carston Farmer so computing the voronoi diagram shouldn't be difficult.
Building on their script I made it even easier to use and uploaded it to PyPi under the name Pytess. So you could use the pytess.voronoi() function based on the blue points as input, returning the original points with their computed voronoi polygons. Then you would have to assign each black point through point-in-polygon testing, which you could base on http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html.
Run this code in Mathematica. It's spectacular! (Yes, I know it is not Python, but ...)
pts3 = RandomReal[1, {50, 3}];
ListDensityPlot[pts3,
InterpolationOrder -> 0, ColorFunction -> "SouthwestColors", Mesh -> All]

How to implement a state-space tree?

I'm trying to solve a knapsack like problem from MIT OCW.
Its problem set 5.
I need use branch and bound algorithm to find the optimal states.
So I need implement a state-space tree.
I understand the idea of this algorithm, but I find it's not so easy to implement.
If I find a node where the budget is not enough, I should stop here.
Should I add an attribute to every tree node?
When I add a node, I should start from a node with the largest upper bound.
How can I find such a node? Do I need to traverse all the nodes before I add each node? Or could I save some var to help with that?
Do you have any idea? Could you implement it in python?
I hope I understood correctly the problem, if not please direct me :)
(sorry for the confusion arising from the two different meanings of "state")
You can of course add the attribute in the node (it's part of the state!), since it's a very tiny amount of data. Mind that it is not mandatory to save it though, since it is implicitly present in the rest of the state (given the states that you have already chosen, you can compute it). Personally, I'd add the attribute, since there's no point in calculating it many times.
On the second question: IIRC, when you add nodes, you don't have to traverse ALL the tree, but rather only the fringe (that is, the set of nodes which have no descendants - not to be confused by the deepest level of the tree).
Since you're looking for an upper bound, (and since you're using only positive costs), there are three cases when you are looking for the node with the highest value:
on the last step you appended to the node which had the highest value, so the node which you just added has now the highest value
on the last step adding the you exceeded the budget, so you had to exclude the option. try to add another state
there are no more states to try to add to build a new node. This branch can't go further. Look at the fringe for the highest value in the other nodes

Categories