How to track the progress of a tree traversal?

How to track the progress of a tree traversal? - python

I have a tree. It has a flat bottom. We're only interested in the bottom-most leaves, but this is roughly how many leaves there are at the bottom...
2 x 1600 x 1600 x 10 x 4 x 1600 x 10 x 4
That's ~13,107,200,000,000 leaves? Because of the size (the calculation performed on each leaf seems unlikely to be optimised to ever take less than one second) I've given up thinking it will be possible to visit every leaf.
So I'm thinking I'll build a 'smart' leaf crawler which inspects the most "likely" nodes first (based on results from the ones around it). So it's reasonable to expect the leaves to be evaluated in branches/groups of neighbours, but the groups will vary in size and distribution.
What's the smartest way to record which leaves have been visited and which have not?

You don't give a lot of information, but I would suggest tuning your search algorithm to help you keep track of what it's seen. If you had a global way of ranking leaves by "likelihood", you wouldn't have a problem since you could just visit leaves in descending order of likelihood. But if I understand you correctly, you're just doing a sort of hill climbing, right? You can reduce storage requirements by searching complete subtrees (e.g., all 1600 x 10 x 4 leaves in a cluster that was chosen as "likely"), and keeping track of clusters rather than individual leaves.
It sounds like your tree geometry is consistent, so depending on how your search works, it should be easy to merge your nodes upwards... e.g., keep track of level 1 nodes whose leaves have all been examined, and when all children of a level 2 node are in your list, drop the children and keep their parent. This might also be a good way to choose what to examine: If three children of a level 3 node have been examined, the fourth and last one is probably worth examining too.
Finally, a thought: Are you really, really sure that there's no way to exclude some solutions in groups (without examining every individual one)? Problems like sudoku have an astronomically large search space, but a good brute-force solver eliminates large blocks of possibilities without examining every possible 9 x 9 board. Given the scale of your problem, this would be the most practical way to attack it.

It seems that you're looking for a quick and efficient ( in terms of memory usage ) way to do a membership test. If so and if you can cope with some false-positives go for a bloom filter.
Bottom line is : Use bloom filters in situations where your data set is really big AND all what you need is checking if a particular element exists in the set AND a small chance of false positives is tolerable.
Some implementation for Python should exist.
Hope this will help.

Maybe this is too obvious, but you could store your results in a similar tree. Since your computation is slow, the results tree should not grow out of hand too quickly. Then just look up if you have results for a given node.

Related

How to find the largest linked grouping of sub-graphs?

I want to determine the largest contiguous (if that’s the right word) graph consisting of a bunch of sub-graphs. I define two sub-graphs as being contiguous if any of the nodes between the two sub-graphs are linked.
My initial solution to this is very slow and cumbersome and stupid – just to look at each sub-graph, see if it’s linked to any of the other sub-graphs, and do the analysis for all of the sub-graphs to find the largest number of linked sub-graphs. That’s just me coming from a Fortran background. Is there a better way to do it – a pythonic way, even a graph theory way? I imagine this is a standard question in network science.

A good starting point to answer the kind of question you've asked is to look at a merge-find (or disjoint-set) approach (https://en.wikipedia.org/wiki/Disjoint-set_data_structure).
It's offers an efficient algorithm (at least on an amortized basis) to identify which members of a collection of graphs are disjoint and which aren't.
Here are a couple of related questions that have pointers to additional resources about this algorithm (also know as "union-find"):
Union find implementation using Python
A set union find algorithm
You can get quite respectable performance by merging two sets using "union by rank" as summarized in the Wikipedia page (and the pseudocode provided therein):
For union by rank, a node stores its rank, which is an upper bound for its height. When a node is initialized, its rank is set to zero. To merge trees with roots x and y, first compare their ranks. If the ranks are different, then the larger rank tree becomes the parent, and the ranks of x and y do not change. If the ranks are the same, then either one can become the parent, but the new parent's rank is incremented by one. While the rank of a node is clearly related to its height, storing ranks is more efficient than storing heights. The height of a node can change during a Find operation, so storing ranks avoids the extra effort of keeping the height correct.
I believe there may be even more sophisticated approaches, but the above union-by-rank implementation is what I have used in the past.

Minimizing entries in a many to many hashtable

I've run into an interesting problem, where I need to make a many-to-many hash with a minimized number of entries. I'm working in python, so that comes in the form of a dictionary, but this problem would be equally applicable in any language.
The data initially comes in as input of one key to one entry (representing one link in the many-to-many relationship).
So like:
A-1, B-1, B-2, B-3, C-2, C-3
A simple way of handling the data would be linking them one to many:
A: 1
B: 1,2,3
C: 2,3
However the number of entries is the primary computational cost for a later process, as a file will need to be generated and sent over the internet for each entry (that is a whole other story), and there would most likely be thousands of entries in the one-to-many implementation.
Thus a more optimized hash would be:
[A, B]: 1
[B, C]: 2,3
This table would be discarded after use, so maintainability is not a concern, the only concern is the time-complexity of reducing the entries (the time it takes the algorithm to reduce the entries must not exceed the time the algorithm would save in reducing the entries from the baseline one-to-many table).
Now, I'm pretty sure that at least someone has faced this problem, this seems like a problem straight out of my Algorithms class in college. However, I'm having trouble finding applicable algorithms, as I can't find the right search terms. I'm about to take a crack at making an algorithm for this from scratch, but I figured it wouldn't hurt to ask around to see if people can't identify this as a problem commonly solved by a modified [insert well-known algorithm here].
I personally think it's best to start by creating a one-to-many hash and then examining subsets of the values in each entry, creating an entry in the solution hash for the maximum identified set of shared values. But I'm unsure how to guarantee a smaller number of subsets than just the one-to-many baseline implementation.

Let's go back to your your unoptimised dictionary of letters to sets of numbers:
A: 1
B: 1,2,3
C: 2,3
There's a - in this case 2-branch - tree of refactoring steps you could do:
A:1 B:1,2,3 C:2,3
/ \
factor using set 2,3 factor using set 1
/ \
A:1 B:1 B,C:2,3 A,B:1 B:2,3 C:2,3
/ \
factor using set 1 factor using set 2,3
/ \
A,B:1 B,C:2,3 A,B:1 B,C:2,3
In this case at least, you arrive at the same result regardless of which factoring you do first, but I'm not sure if that would always be the case.
Doing an exhaustive exploration of the tree sounds expensive, so you might want to avoid that, but if we could pick the optimal path, or at least a likely-good path, it would be relatively low computational cost. Rather than branching at random, my guy instinct is that it'd be faster and closer to optimal if you tried to make the largest-set factoring change possible at each point in the tree. For example, considering the two-branch tree above, you'd prefer the initial 2,3 factoring over the initial 1 factoring in your tree, because 2,3 has the larger set of size two. More dramatic refactoring suggest the number of refactorings before you get a stable result will be less.
What that amounts to is iterating the sets from largest towards smallest (it doesn't matter which order you iterate over same-length sets), looking for refactoring opportunities.
Much like a bubble-sort, after each refactoring the approach would be "I made a change; it's not stable yet; let's repeat". Restart by iterating from second-longest sets towards shortest sets, checking for optimisation opportunities as you go.
(I'm not sure about python, but in general set comparisons can be expensive, you might want to maintain a value for each set that is the XOR of the hashed values in the set - that's easy and cheap to update if a few set elements are changed, and a trivial comparison can tell you large sets are unequal, saving comparison time; it won't tell you if sets are equal though: multiple sets could have the same XOR-of-hashes value).

fastest Connect 4 win checking method

I am trying to make an ai following the alpha-beta pruning method for tic-tac-toe. I need to make checking a win as fast as possible, as the ai will goes through many different possible game states. Right now I have thought of 2 approaches, neither which is very efficient.
Create a large tuple for scoring every possible 4 in a row win conditions, and loop through that.
Using for loops, check horizontally, vertically, diag facing left, and diag facing right. This seems like it would be much slower that #1.
How would someone recommend doing it?

From your question, it's a bit unclear how your approaches would be implemented. But from the alpha-beta pruning, it seems as if you want to look at a lot of different game states, and in the recursion determine a "score" for each one.
One very important observation is that recursion ends once a 4-in-a-row has been found. That means that at the start of a recursion step, the game board does not have any 4-in-a-row instances.
Using this, we can intuitively see that the new piece placed in said recursion step must be a part of any 4-in-a-row instance created during the recursion step. This greatly reduces the search space for solutions from a total of 69 (21 vertical, 24 horizontal, 12+12 diagonals) 4-in-a-row positions to a maximum of 13 (3 vertical, 4 horizontal, 3+3 diagonal).
This should be the baseline for your second approach. It will require a maximum of 52 (13*4) checks for a naive implementation, or 25 (6+7+6+6) checks for a faster algorithm.
Now it's pretty hard to beat 25 boolean checks for this win-check I'd say, but I'm guessing that your #1 approach trades some extra memory-usage to enable less calculation per recursion step. The simplest way of doing this would be to store 8 integers (single byte is fine for this application) which represent the longest chains of same-color chips that can be found in any of the 8 directions.
Using this, a check for win can be reduced to 8 boolean checks and 4 additions. Simply get the chain lengths on opposite sides of the newly placed chip, check if they're the same color as the chip, and if they are, add their lengths and add 1 (for the newly placed chip).
From this calculation, it seems as if your #1 approach might be the most efficient. However, it has a much larger overhead of maintaining the data structure, and uses more memory, something that should be avoided unless you can pass by reference. Also (assuming that boolean checks and additions are similar in speed) the much harder approach only wins by a factor 2 even when ignoring the overhead.
I've made some simplifications, and some explanations maybe weren't crystal clear, but ask if you have any further questions.

Find the 'pits' in a list of integers. Fast

I'm attempting to write a program which finds the 'pits' in a list of
integers.
A pit is any integer x where x is less than or equal to the integers
immediately preceding and following it. If the integer is at the start
or end of the list it is only compared on the inward side.
For example in:
[2,1,3] 1 is a pit.
[1,1,1] all elements are pits.
[4,3,4,3,4] the elements at 1 and 3 are pits.
I know how to work this out by taking a linear approach and walking along
the list however i am curious about how to apply divide and conquer
techniques to do this comparatively quickly. I am quite inexperienced and
am not really sure where to start, i feel like something similar to a binary
tree could be applied?
If its pertinent i'm working in Python 3.
Thanks for your time :).

Without any additional information on the distribution of the values in the list, it is not possible to achieve any algorithmic complexity of less than O(x), where x is the number of elements in the list.
Logically, if the dataset is random, such as a brownian noise, a pit can happen anywhere, requiring a full 1:1 sampling frequency in order to correctly find every pit.
Even if one just wants to find the absolute lowest pit in the sequence, that would not be possible to achieve in sub-linear time without repercussions on the correctness of the results.
Optimizations can be considered, such as mere parallelization or skipping values neighbor to a pit, but the overall complexity would stay the same.

Program, that chooses the best out of 10

I need to make a program in python that chooses cars from an array, that is filled with the mass of 10 cars. The idea is that it fills a barge, that can hold ~8 tonnes most effectively and minimum space is left unfilled. My idea is, that it makes variations of the masses and chooses one, that is closest to the max weight. But since I'm new to algorithms, I don't have a clue how to do it

I'd solve this exercise with dynamic programming. You should be able to get the optimal solution in O(m*n) operations (n beeing the number of cars, m beeing the total mass).
That will only work however if the masses are all integers.
In general you have a binary linear programming problem. Those are very hard in general (NP-complete).
However, both ways lead to algorithms which I wouldn't consider to be beginners material. You might be better of with trial and error (as you suggested) or simply try every possible combination.

This is a 1D bin-packing problem. It's a NP problem and there isn't an optimal solution. However there is a way to solve this with greedy algorithm. Most likely you want to try my bin-packing solver at phpclasses.org (bin-packing).
If I have a graph unweigthed and undirected and each node is connected which each node then I have (n^2-n)/2 pairs of node and overall n^2-n possibilities/combinations:
1,2,3,4,5,...,64
2,1,X,X,X,...,X
3,X,1,X,X,...,X
4,X,X,1,X,...,X
5,X,X,X,1,...,X
.,X,X,X,X,1,.,X
.,X,X,X,X,X,1,X
64,X,X,X,X,X,X,1
Isn't this the same with 10 cars? (45 pairs of cars and 90 combinations/possibilites). Did I forgot something? Where is the error?

A problem like you have here is similar to the classic traveling salesman problem, which asks for the most efficient way for a salesman to visit a list of cities. The difference is that it is conceivable that you might not need every car to fill the barge, whereas the salesman must visit every city. But the problem is similar. The brute-force way to solve the problem is to investigate every possible combination of cars, from 1 car to all 10. We will assume that it is valid to have any number of each car (i.e. if car 2 is a Ford Focus, you could have three Ford Foci). This is easy to change if the car list is an exact list of specific cars, however, and you can use only 1 of each.
Now, this quickly begins to consume a lot of time. As the number of cars goes up, the number of possible combinations of cars goes up geometrically, which means that with a number smaller than you expect, it will take longer to run the program then there is time left in your life. 10 should be manageable, though (it turns out to be a little more than 700,000 combinations, or 1024 if you can only have one of each item).
The first thing is to define the weight of each car and the maximum weight the barge can carry.
weights = [1, 2, 1, 3, 1, 2, 2, 4, 1, 2, 2]
capacity = 8
Now we need some way to find each possible combination. Python's itertools module has a function that will give us every combination of a given length, but we want all lengths. So we will write one loop that goes from 1 to 10 and calls itertools.combinations_with_replacement for each length. We can then find out the total weight of each combination, and if it is higher than any weight we have already found, yet still within capacity, we will remember it as the best we have found so far.
The only real trick here is that we don't want combinations of the weights -- we want combinations of the indexes of the weights, because at the end we want to know which cars to put on the barge, not their weights. So combinations_with_replacements(range(10), ...) rather than combinations_with_replacements(weights, ...). Inside the loop you will want to get the weight of each car in the combination with weights[i] to sum it up.
I originally had code here, but took it out since this is homework. :-) (It wasn't originally tagged as such, but I should have known -- I blame the time change.)
If you wanted to allow only one of each car, you'd use itertools.combinations instead of combinations_with_replacement.
A shortcut is possible since you mention elsewhere that cars weigh from 1-2 tonnes. This means that you know you will need at least 4 of them (4 * 2 = 8), so you can skip all the combinations of 1-3 cars. However, this wouldn't generalize well if the prof changed the parameters on you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.