for a datastructures & algorithms class in college we have to implement an algorithm presented in a paper. The paper can be found here.
So i fullly implemented the algorithm, with still some errors left (but that's not really why I'm asking this question, if you want to see how I implemented it thus far, you can find it here)
The real reason why I'm asking a question on Stackoverflow is the second part of the assignment: we have to try to make the algorithm better. I had a few ways in mind, but all of them sound good in theory but they won't really do good in practice:
Draw a line between the source and end node, search the node closest to the middle of that line and divide the "path" in 2 recursively. The base case would be a smaller graph were a single Dijkstra would do the computation. This isn't really an adjustment to the current algorithm but with some thinking it is clear this wouldn't give an optimal solution.
Try to give the algorithm some sense of direction by giving a higher priority to edges that point to the end node. This also won't be optimal..
So now I'm all out of ideas and hoping that someone here could give me a little hint for a possible adjustment. It doesn't really have to improve the algorithm, I think the first reason why they asked us to do this is so we don't just implement the algorithm from the paper without knowing what's behind it.
(If Stackoverflow isn't the right place to ask this question, my apologies :) )
A short description of the algorithm:
The algorithm tries to select which nodes look promising. By promising I mean that they have a good chance on lying on a shortest path. How promising a node is is represented by it's 'reach'. The reach of a vertex on a path is the minimum of it's distances to the start and to the end. The reach of a vertex in a graph is the maximum of the reaches of the vertex on all shortest paths.
To eventually determine whether a node is added to the priority queue in Dijkstra's algorithm, a test() function is added. Test returns true (if the reach of a vertex in the graph is larger or equal then the weight of the path from the origin to v at the time v is to be inserted in the priority queue) or (the reach of the vertex in the graph is larger or equal then the euclidean distance from v to the end vertex).
Harm De Weirdt
Your best bet in cases like this is to think like a researcher: Research in general and Computer Science research specifically is about incremental improvement, one person shows that they can compute something faster using Dijkstra's Algorithm and then later they, or someone else, show that they can compute the same thing a little faster using A*. It's a series of small steps.
That said, the best place to look for ways to improve an algorithm presented in a paper is in the future directions section. This paper gives you a little bit to work on in that direction, but your gold mine in this case lies in sections 5 and 6. There are multiple places where the authors admit to different possible approaches. Try researching some of these approaches, this should lead you either to a possible improvement in the algorithm or at least an arguable one.
Best of luck!
Related
I have a rather simple problem to define but I did not find a simple answer so far.
I have two graphs (ie sets of vertices and edges) which are identical. Each of them has independently labelled vertices. Look at the example below:
How can the computer detect, without prior knowledge of it, that 1 is identical to 9, 2 to 10 and so on?
Note that in the case of symmetry, there may be several possible one to one pairings which give complete equivalence, but just finding one of them is sufficient to me.
This is in the context of a Python implementation. Does someone have a pointer towards a simple algorithm publicly available on the Internet? The problem sounds simple but I simply lack the mathematical knowledge to come up to it myself or to find proper keywords to find the information.
EDIT: Note that I also have atom types (ie labels) for each graphs, as well as the full distance matrix for the two graphs to align. However the positions may be similar but not exactly equal.
This is known as the graph isomorphism problem, and probably very hard; although the exactly details of how hard are still subject of research.
(But things look better if you graphs are planar.)
So, after searching for it a bit, I think that I found a solution that works most of the time for moderate computational cost. This is a kind of genetic algorithm which uses a bit of randomness, but it is practical enough for my purposes it seems. I didn't have any aberrant configuration with my samples so far even if it is theoretically possible that this happens.
Here is how I proceeded:
Determine the complete set of 2-paths, 3-paths and 4-paths
Determine vertex types using both atom type and surrounding topology, creating an "identity card" for each vertex
Do the following ten times:
Start with a random candidate set of pairings complying with the allowed vertex types
Evaluate how much of 2-paths, 3-paths and 4-paths correspond between the two pairings by scoring one point for each corresponding vertex (also using the atom type as an additional descriptor)
Evaluate all other shortlisted candidates for a given vertex by permuting the pairings for this candidate with its other positions in the same way
Sort the scores in descending order
For each score, check if the configuration is among the excluded configurations, and if it is not, take it as the new configuration and put it into the excluded configurations.
If the score is perfect (ie all of the 2-paths, 3-paths and 4-paths correspond), then stop the loop and calculate the sum of absolute differences between the distance matrices of the two graphs to pair using the selected pairing, otherwise go back to 4.
Stop this process after it has been done 10 times
Check the difference between distance matrices and take the pairings associated with the minimal sum of absolute differences between the distance matrices.
I'm learning about memoization and although I have no trouble implementing it when it's needed in Python, I still don't know how to identify situations where it will be needed.
I know it's implemented when there are overlapping subcalls, but how do you identify if this is the case? Some recursive functions seem to go deep before an overlapping call is made. How would you identify this in a coding interview? Would you draw out all the calls (even if some of them go 5-6 levels deep in an O(2^n) complexity brute force solution)?
Cases like the Fibonacci sequence make sense because the overlap happens immediately (the 'fib(i-1)' call will almost immediately overlap with the 'fib(i-2)' call). But for other cases, like the knapsack problem, I still can't wrap my head around how anyone can identify that memoization should be used while at an interview. Is there a quick way to check for an overlap?
I hope my question makes sense. If someone could point me to a good resource, or give me clues to look for, I would really appreciate it. Thanks in advance.
In order to reach the "memoization" solution, you first need to identify the following two properties in the problem:
Overlapping subproblems
Optimal substructure
Looks like you do understand (1). For (2):
Optimal Substructure: If the optimal solution to a problem, S, of size n can be calculated by JUST looking at optimal solution of a subproblem, s, with size < n and NOT ALL solutions to a subproblem, AND it will also result in an optimal solution for problem S, then this problem S is considered to have optimal substructure.
For more detail, please take a look at the following answer:
https://stackoverflow.com/a/59461413/12346373
My code itself doesn't have an issue, this is more of a general question about code that already works. I solve a maze using breadth first search and I'm looking to study the node expansion, space complexity, and O(n) - which, for BFS, is O(b^d).
I'm not used to studying the program once finished and I was wondering if there are any specific methods that are best. I know the code itself will give me a time but I was wondering if there are any library functions that could help, or if there is maybe a function I could implement that would better show me quantitative results.
I have the ability to run the test on multiple different mazes (I even have a maze creator) but I'm asking for something (anything) more quantitative than just running this code on three or four different mazes and using the auto-output. I'm also using pycharm, which I'm unfamiliar with - are there ways the IDE can formalize this information?
For maze problem (without any assumption and knowledge of the network learnt prior), A-STAR algorithm is the state of the earth. Its complexities are:
Worst complexity: O(|E|) = O(b^d)
Space complexity: O(|V|) = O(b^d)
For fairly large network (such as road networks), practical algorithms exists because we can pre-compute some paths apriori. They will have less time complexity (trade-off with larger space complexity) after the pre-processing (or learning) has been done.
I'm looking for an algorithm that will connect a large amount of geographic coordinates (100-1000), creating the shortest possible route between them, starting anywhere and finishing anywhere else. I'm working with Python.
I have researched the available algorithms, and my problem is similar to the Traveling Salesman, but it requires me to define a starting point, and will come back to this same point at the end. I will be taking an Uber to any starting point and from any other ending point back home. What I want is to cover all points while walking as little as possible. I don't care where it starts or ends.
Prim's and Kruskal's algorithms seem to find good starting and ending points, but they create a tree, not an optimized walking route as TSP.
Prim's algorithm:
Kruskal's algorithm:
Desired outcome based on Prim/Kruskal, but using TSP logic (example drawn by hand, not optimized):
If you don't need a productionized solution, write Python to dump your distance matrix in TSPLIB format with an extra city (representing the place that you will Uber to/from) with distance zero to each other city. Then feed it to (e.g.) Concorde or LKH.
Prim and Kruskal are algorithms to find a spanning tree. You're trying to solve a well-known variant of TS (the Traveling Salesman problem), in which you do not return to your starting point.
The location of your home is immaterial, per your definition. Your defined problem is to visit every location with the least distance traveled, without returning to your starting point. This is covered well in "the literature".
The "quick strike" solution is to take any standard TS solution and remove the longest segment. This is a good heuristic, although it doesn't guarantee the optimal solution.
To begin with this might more be a conceptual question than a programming one. I will not write my code below, but if you want to really see it you can take a look at this similar code link. The code's not the important thing here, so don't get confused trying to understand everything all the physics (electrostatics).
Let's say I want to solve this problem. I know the values on the boundaries and I guess a initial solution on the rectangular grid inside these boundaries, see figure below.
Potential=10 on boundaries; potential=0 inside
I know that eventually the field will converge to the same potential as the boundaries. If I let the program iterate long enough the whole area on the inside will also become yellow. The figure below shows an intermediate step towards equilibrium:
Now, in this project I am working on I am supposed to stop the simulation when the accuracy of the simulation is 1%. Is there a general definition of accuracy in these cases when working with a two dimensional grid? There are several grid nodes, all with different values, are these supposed to be 1% or less from equilibrium (all yellow)?
(This might be the wrong forum, I am aware of that.)
Best regards