I'm trying to solve a knapsack like problem from MIT OCW.
Its problem set 5.
I need use branch and bound algorithm to find the optimal states.
So I need implement a state-space tree.
I understand the idea of this algorithm, but I find it's not so easy to implement.
If I find a node where the budget is not enough, I should stop here.
Should I add an attribute to every tree node?
When I add a node, I should start from a node with the largest upper bound.
How can I find such a node? Do I need to traverse all the nodes before I add each node? Or could I save some var to help with that?
Do you have any idea? Could you implement it in python?
I hope I understood correctly the problem, if not please direct me :)
(sorry for the confusion arising from the two different meanings of "state")
You can of course add the attribute in the node (it's part of the state!), since it's a very tiny amount of data. Mind that it is not mandatory to save it though, since it is implicitly present in the rest of the state (given the states that you have already chosen, you can compute it). Personally, I'd add the attribute, since there's no point in calculating it many times.
On the second question: IIRC, when you add nodes, you don't have to traverse ALL the tree, but rather only the fringe (that is, the set of nodes which have no descendants - not to be confused by the deepest level of the tree).
Since you're looking for an upper bound, (and since you're using only positive costs), there are three cases when you are looking for the node with the highest value:
on the last step you appended to the node which had the highest value, so the node which you just added has now the highest value
on the last step adding the you exceeded the budget, so you had to exclude the option. try to add another state
there are no more states to try to add to build a new node. This branch can't go further. Look at the fringe for the highest value in the other nodes
Related
I am starting a project which involves an allocation problem, and having explored a bit by myself, it is pretty challenging for me to solve it efficiently.
What I call here allocation problem is the following:
There is a set of available slots in 3D space, randomly sampled ([xspot_0, yspot_0, zspot_0], [xspot_1, yspot_1, zspot_1], etc.). These slots are identified with an ID and a position, and are fixed, so they will not change with time.
There are then mobile elements (same number as the number of available slots, on the order of 250,000) which can go from spot to spot. They are identified with an ID, and at a given time step, the spot in which they are.
Each spot must have one and only one element at a given step.
At first, elements are ordered in the same way as spots: the first element (element_id=0) is in the first spot (spot_id=0), etc.
But then, these elements need to move, based on a motion vector that is defined for each spot, which is also fixed. For example, ideally at the first step, the first element should move from [xspot_0, yspot_0, zspot_0] to [xspot_0 + dxspot_0, yspot_0 + dyspot_0, zspot_0 + dzspot_0], etc.
Since spots were randomly sampled, the new target position might not exist among the spots. The goal is therefore to find a candidate slot for the next step that is as close as possible to the "ideal" position the element should be in.
On top of that first challenge, since this will probably be done through a loop, it is possible that the best candidate was already assigned to another element.
Once all new slots are defined for each element (or each element is assigned to a new slot, depending on how you see it), we do it again, applying the same motion with the new order. This is repeated as many times as I need.
Now that I defined the problem, the first thing I tried was a simple allocation based on this information. However, if I pick the best candidate every time based on the distance to the target position, as I said some elements have their best candidate already taken, so they pick the 2nd, 3rd, ... 20th, ... 100th candidate slot, which becomes highly wrong compared to the ideal position.
Another technique I was trying, without being entirely sure about what I was doing, was to assign a probability distribution calculated by doing the inverse exponential of the distance between the slots and the target position. Then I normalized this distribution to obtain probabilities (which seem arbitrary). I still do not get very good results for a single step.
Therefore, I was wondering if someone knows how to solve this type of problem in a more accurate/more efficient way. For your information, I mainly use Python 3 for development.
Thank you!
I'm doing a Random Forest implementation (for classification), and I have some questions regarding the tree growing algorithm mentioned in literature.
When training a decision tree, there are 2 criteria to stop growing a tree:
a. Stop when there are no more features left to split a node on.
b. Stop when the node has all samples in it belonging to the same class.
Based on that,
1. Consider growing one tree in the forest. When splitting a node of the tree, I randomly select m of the M total features, and then from these m features I find that one feature with maximum information gain. After I've found this one feature, say f, should I remove this feature from the feature list, before proceeding down to the children of the node? If I don't remove this feature, then this feature might get selected again down the tree.
If I implement the algorithm without removing the feature selected at a node, then the only way to stop growing the tree is when the leaves of the tree become "pure". When I did this, I got the "maximum recursion depth" reached error in Python, because the tree couldn't reach that "pure" condition earlier.
The RF literature even those written by Breiman say that the tree should be grown to the maximum . What does this mean?
2. At a node split, after selecting the best feature to split on (by information gain), what should be the threshold on which to split? One approach is to have no threshold, create one child node for every unique value of the feature; but I've continuous-valued features too, so that means creating one child node per sample!
Q1
You shouldn't remove the features from M. Otherwise it will not be able to detect some types of relationships (ex: linear relationships)
Maybe you can stop earlier, in your condition it might go up to leaves with only 1 sample, this will have no statistical significance. So it's better to stop at say, when the number of samples at leaf is <= 3
Q2
For continuous features maybe you can bin them to groups and use them to figure out a splitting point.
all!
Could anybody give me an advice on Random Forest implementation in Python? Ideally I need something that outputs as much information about the classifiers as possible, especially:
which vectors from the train set are used to train each decision
trees
which features are selected at random in each node in each
tree, which samples from the training set end up in this node, which
feature(s) are selected for split and which threashold is used for
split
I have found quite some implementations, the most well known one is probably from scikit, but it is not clear how to do (1) and (2) there (see this question). Other implementations seem to have the same problems, except the one from openCV, but it is in C++ (python interface does not cover all methods for Random Forests).
Does anybody know something that satisfies (1) and (2)? Alternatively, any idea how to improve scikit implementation to get the features (1) and (2)?
Solved: checked the source code of sklearn.tree._tree.Tree. It has good comments (which fully describe the tree):
children_left : int*
children_left[i] holds the node id of the left child of node i.
For leaves, children_left[i] == TREE_LEAF. Otherwise,
children_left[i] > i. This child handles the case where
X[:, feature[i]] <= threshold[i].
children_right : int*
children_right[i] holds the node id of the right child of node i.
For leaves, children_right[i] == TREE_LEAF. Otherwise,
children_right[i] > i. This child handles the case where
X[:, feature[i]] > threshold[i].
feature : int*
feature[i] holds the feature to split on, for the internal node i.
threshold : double*
threshold[i] holds the threshold for the internal node i.
You can get nearly all the information in scikit-learn. What exactly was the problem? You can even visualize the trees using dot.
I don't think you can find out which split candidates were sampled at random, but you can find out which were selected in the end.
Edit: Look at the tree_ property of the decision tree. I agree, it is not very well documented. There really should be an example to visualize the leaf distributions etc. You can have a look at the visualization function to get an understanding of how to get to the properties.
I have a tree. It has a flat bottom. We're only interested in the bottom-most leaves, but this is roughly how many leaves there are at the bottom...
2 x 1600 x 1600 x 10 x 4 x 1600 x 10 x 4
That's ~13,107,200,000,000 leaves? Because of the size (the calculation performed on each leaf seems unlikely to be optimised to ever take less than one second) I've given up thinking it will be possible to visit every leaf.
So I'm thinking I'll build a 'smart' leaf crawler which inspects the most "likely" nodes first (based on results from the ones around it). So it's reasonable to expect the leaves to be evaluated in branches/groups of neighbours, but the groups will vary in size and distribution.
What's the smartest way to record which leaves have been visited and which have not?
You don't give a lot of information, but I would suggest tuning your search algorithm to help you keep track of what it's seen. If you had a global way of ranking leaves by "likelihood", you wouldn't have a problem since you could just visit leaves in descending order of likelihood. But if I understand you correctly, you're just doing a sort of hill climbing, right? You can reduce storage requirements by searching complete subtrees (e.g., all 1600 x 10 x 4 leaves in a cluster that was chosen as "likely"), and keeping track of clusters rather than individual leaves.
It sounds like your tree geometry is consistent, so depending on how your search works, it should be easy to merge your nodes upwards... e.g., keep track of level 1 nodes whose leaves have all been examined, and when all children of a level 2 node are in your list, drop the children and keep their parent. This might also be a good way to choose what to examine: If three children of a level 3 node have been examined, the fourth and last one is probably worth examining too.
Finally, a thought: Are you really, really sure that there's no way to exclude some solutions in groups (without examining every individual one)? Problems like sudoku have an astronomically large search space, but a good brute-force solver eliminates large blocks of possibilities without examining every possible 9 x 9 board. Given the scale of your problem, this would be the most practical way to attack it.
It seems that you're looking for a quick and efficient ( in terms of memory usage ) way to do a membership test. If so and if you can cope with some false-positives go for a bloom filter.
Bottom line is : Use bloom filters in situations where your data set is really big AND all what you need is checking if a particular element exists in the set AND a small chance of false positives is tolerable.
Some implementation for Python should exist.
Hope this will help.
Maybe this is too obvious, but you could store your results in a similar tree. Since your computation is slow, the results tree should not grow out of hand too quickly. Then just look up if you have results for a given node.
I have a bunch of objects with level, weight and 0 or more connections to objects of the next levels. I want to know how do I get the "heaviest" path (with the biggest sum of weights).
I'd also love to know of course, what books teach me how to deal with graphs in a practical way.
Your graph is acyclic right? (I presume so, since a node always points to a node on the next level). If your graph can have arbritrary cycles, the problem of finding the largest path becomes NP-complete and brute force search becomes the only solution.
Back to the problem - you can solve this by finding, for each node, the heaviest path that leads up to it. Since you already have a topological sort of your DAG (the levels themselves) it is straighfoward to find the paths:
For each node, store the cost of the heaviest path that leads to it and the last node before that on the said path. Initialy, this is always empty (but a sentinel value, like a negative number for the cost, might simplify code later)
For nodes in the first level, you already know the cost of the heaviest path that ends in them - it is zero (and the parent node is None)
For each level, propagate the path info to the next level - this is similar to a normal algo for shortest distance:
for level in range(nlevels):
for node in nodes[level]:
cost = the cost to this node
for (neighbour_vertex, edge_cost) in (the nodes edges):
alt_cost = cost + edge_cost
if alt_cost < cost_to_that_vertex:
cost_to_that_vertex = alt_cost
My book recommendation is Steve Skiena's "Algorithm Design Manual". There's a nice chapter on graphs.
I assume that you can only go down to a lower level in the graph.
Notice how the graph forms a tree. Then you can solve this using recursion:
heaviest_path(node n) = value[n] + max(heaviest_path(children[n][0]), heaviest_path(children[n][1]), etc)
This can easily be optimized by using dynamic programming instead.
Start with the children with the lowest level. Their heaviest_path is just their own value. Keep track of this in an array. Then calculate the heaviest_path for then next level up. Then the next level up. etc.
The method which i generally use to find the 'heaviest' path is to negate the weights and then find the shortest path. there are good algorithms( http://en.wikipedia.org/wiki/Shortest_path_problem) to find the shortest path. But this method holds good as long as you do not have a positive-weight cycle in your original graph.
For graphs having positive-weight cycles the problem of finding the 'heaviest' path is NP-complete and your algorithm to find the heaviest path will have non-polynomial time complexity.