Dijkstra's algorithm with Time Tables and varying missing edges

Dijkstra's algorithm with Time Tables and varying missing edges - python

I know Dijkstra's algorithm is a popular solution for the "shortest path" problem, however it seems to be backfiring when implementing time tables.
Say I have this graph with the following weights (time to get from one point to another):
A-----C A->C: 10
\--B--/ A->B: 5
B->C: 5
If you throw it into Dijkstra, it'll return route A->C. That's fine until you refer to a timetable that says route A->C only exists within a certain time frame. You could easily remove the A->C edge if the requested time frame falls outside the range when that edge is used. But obviously the data set I'm working with has a bunch of other ways to get from A->C with other, higher, costs. Not to mention what if you want to get from Z->Y which requires going from A->C in the middle. It doesn't seem like an ideal solution.
Is there a better way, other than Dijkstra, to create a shortest path while also keeping a timetable in mind? Or should the algorithm be modified to consider two weights when finding the optimal path?
If it matters, I'm using python.
[edit]
The time table is a basic table that says a train (in my case) leaves from point A at (say) 12:00 and leaves from station B at 12:05 then leaves from C at 12:10. When it doesn't stop at B, its column is empty and A will have 08:00 and C will have 08:10
A B C
800 8:10
12:00 12:05 12:10

One way could be creating a set of trees that denotes all simple paths between to given nodes and just selecting the shortest one among those that are not contain a deprecated edge. You can find all paths by adapting Dijkstra's algorithm or another algorithm like DFS or BFS. Also finding all paths between two nodes is considered to be a hard problem but based on your need and the type of graphs that you're dealing with you can create what you want. You can also read this post regarding this matter. -> Find all paths between two graph nodes. i.e you can have a limited set of paths (if using Dijkstra top N shortest).
Now for Optimizing this step so that find out if an edge is deprecated or not I suggest to have a dictionary of all edge ids or names as keys and their deprecation timestamp as value then filter the dictionary by comparing the values with now().timestamp() and after each find just remove the items from dictionary. Also note that before you start filtering you should check if the edge exist in dictionary or not (in order to prevent the lagorithm to run the filtering multiple times for duplicate edges).
The code should be like following:
def filter_edge(u_id):
if edge in deprecation:
time_stamp = deprecation[u_id]
if time_stamp > datetime.now().timestamp():
return True
return False
And the path validation is something like following:
def validate_path(path):
return not any(filter_edge(edge.id) for edge in path)

Related

Match the vertices of two identical labelled graphs

I have a rather simple problem to define but I did not find a simple answer so far.
I have two graphs (ie sets of vertices and edges) which are identical. Each of them has independently labelled vertices. Look at the example below:
How can the computer detect, without prior knowledge of it, that 1 is identical to 9, 2 to 10 and so on?
Note that in the case of symmetry, there may be several possible one to one pairings which give complete equivalence, but just finding one of them is sufficient to me.
This is in the context of a Python implementation. Does someone have a pointer towards a simple algorithm publicly available on the Internet? The problem sounds simple but I simply lack the mathematical knowledge to come up to it myself or to find proper keywords to find the information.
EDIT: Note that I also have atom types (ie labels) for each graphs, as well as the full distance matrix for the two graphs to align. However the positions may be similar but not exactly equal.

This is known as the graph isomorphism problem, and probably very hard; although the exactly details of how hard are still subject of research.
(But things look better if you graphs are planar.)

So, after searching for it a bit, I think that I found a solution that works most of the time for moderate computational cost. This is a kind of genetic algorithm which uses a bit of randomness, but it is practical enough for my purposes it seems. I didn't have any aberrant configuration with my samples so far even if it is theoretically possible that this happens.
Here is how I proceeded:
Determine the complete set of 2-paths, 3-paths and 4-paths
Determine vertex types using both atom type and surrounding topology, creating an "identity card" for each vertex
Do the following ten times:
Start with a random candidate set of pairings complying with the allowed vertex types
Evaluate how much of 2-paths, 3-paths and 4-paths correspond between the two pairings by scoring one point for each corresponding vertex (also using the atom type as an additional descriptor)
Evaluate all other shortlisted candidates for a given vertex by permuting the pairings for this candidate with its other positions in the same way
Sort the scores in descending order
For each score, check if the configuration is among the excluded configurations, and if it is not, take it as the new configuration and put it into the excluded configurations.
If the score is perfect (ie all of the 2-paths, 3-paths and 4-paths correspond), then stop the loop and calculate the sum of absolute differences between the distance matrices of the two graphs to pair using the selected pairing, otherwise go back to 4.
Stop this process after it has been done 10 times
Check the difference between distance matrices and take the pairings associated with the minimal sum of absolute differences between the distance matrices.

Add extra cost depending on length of path

I have a graph/network that obviously consists of some nodes and some edges. Each edge has a weight attached to it, or in this case a cost. Each edge also have a distance attached to it AND a type. So basically the weight/cost is pre-calculated from the distance of the edge along with some other metrics for both type of edges.
However, in my case I would like there to be added some additional cost for let's say every 100 distance or so, but only for one type of edge.But I'm not even certain if it is possible to add additional cost/distance depending on the sum of the previous steps in the path in algorithms such as Dijkstra's ?
I know I could just divide the cost into each distance unit, and thus getting a somewhat estimate. The problem there would be the edge cases, where the cost would be almost double at distance 199 compared to adding the cost at exactly each 100 distance, i.e. adding cost at 100 and 200.
But maybe there are other ways to get around this ?

I think you cannot implement this using Dijkstra, because you would validate the invariant, which is needed for correctness (see e.g. wikipedia). In each step, Dijkstra builds on this invariant, which more or less states, that all "already found paths" are optimal, i.e. shortest. But to show that it does not hold in your case of "additional cost by edge type and covered distance", let's have a look at a counterexample:
Counterexample against Usage of Dijkstra
Assume we have two types of edges, first type (->) and second type (=>). The second type has an additional cost of 10 after a total distance of 10. Now, we take the following graph, with the following edges
start -1-> u_1
start -1-> u_2
start -1-> u_3
...
start -1-> u_7
u_7 -1-> v
start =7=> v
v =4=> end
When, we play that through with Dijkstra (I skip all intermediate steps) with start as start node and end as target, we will first retrieve the path start=7=>v. This path has a length of 7 and that is shorter than the "detour" start-1->u_1-1->... -1->u_7->v, which has a length of 8. However, in the next step, we have to choose the edge v=4=>end, which makes the first path to a total of 21 (11 original + 10 penalty). But the detour path becomes now shorter with a length of 12=8+4 (no penalty).
In short, Dijkstra is not applicable - even if you modify the algorithm to take the "already found path" into account for retrieving the cost of next edges.
Alternative?
Maybe you can build your algorithm around a variant of Dijkstra, which usually retrieves multiple (suboptimal) solutions. First, you would need to extend Dijkstra, so that it takes the already found path into account. (In this function replace cost = weight(v, u, e) with cost = weight(v, u, e, paths[v]) and write a suitable function to calculate the penalty based on the previous path and the considered edge). Afterwards, remove edges from your original optimal solution and iterate the procedure to find a new alternative shortest path. However, I see no easy way of selecting which edge to remove from the graph-beside those from your penalty type-and the runtime complexity is probably awful.

Check if path exists that contains a set of N nodes

Given a graph g and a set of N nodes my_nodes = [n1, n2, n3, ...], how can I check if there's a path that contains all N nodes?
Checking among all_simple_paths for paths that contain all nodes in my_nodes becomes computationally cumbersome as the graph grows
The search above can be limited to paths between my_nodes pairwise couples. This reduces complexity only to a small degree. Plus it requires a lot of python looping, which is quite slow
Is there a faster solution to the problem?

You may try out some greedy algorithm here, starting the path find check from all the nodes to find, and step by step explore your graph. Can't provide some real sample, but pseudo-code should be something like this:
Start n path stubs from all your n nodes to find
For all these path stubs adjust them by all the neighbors which weren't checked before
If you have some intersection between path stubs, then you got a new one, which does contain more of your needed nodes than before
If after merging the stub paths you have the one which covers all needed nodes, you're done
If there are still some additional nodes to add to the path, you continue with second step again
If there are no nodes left in graph, the path doesn't exists
This algorithm has complexity O(E + N), because you're visiting the edges and nodes in non-recursive fashion.
However, in case of directed graph the "merge" will be a bit more complicated, yet still be done, but in this case the worst scenario may take a lot of time.
Update:
As you say that the graph is directed, the above approach wouldn't work well. In this case you may simplify your task like this:
Find the strongly connected components in graph (I suggest you to implement it by yourself, e.g., Kosaraju's algorithm). The complexity is O(E + N). You can use a NetworkX method for this, if you want some out-ofbox solution.
Create the condensation of graph, based on step 1 information, with saving the information about which component can be visited from other. Again, there is a NetworkX method for this.
Now you can easily say, which nodes from your set are in the same component, so a path containing all of them definitely exists.
After that all you need to check is a connectivity between different components for your nodes. For example, you can get the topological sort of condensation and do check in linear time again.

Scheduling: Minimizing Gaps between Non-Overlapping Time Ranges

Using Django to develop a small scheduling web application where people are assigned certain times to meet with their superiors. Employees are stored as models, with a OneToMany relation to a model representing time ranges and day of the week where they are free. For instance:
Bob: (W 9:00, 9:15), (W 9:15, 9:30), ... (W 15:00, 15:20)
Sarah: (Th 9:05, 9:20), (F 9:20, 9:30), ... (Th 16:00, 16:05)
...
Mary: (W 8:55, 9:00), (F 13:00, 13:35), ... etc
My program allows a basic schedule setup, where employers can choose to view the first N possible schedules with the least gaps in between meetings under the condition that they meet all their employees at least once during that week. I am currently generating all possible permutations of meetings, and filtering out schedules where there are overlaps in meeting times. Is there a way to generate the first N schedules out of M possible ones, without going through all M possibilities?
Clarification: We are trying to get the minimum sum of gaps for any given day, summed over all days.

I would use a search algorithm, like A-star, to do this. Each node in the graph represents a person's available time slots and a path from one node to another means that node_a and node_b are in the partial schedule.
Another solution would be to create a graph in which the nodes are each person's availability times and there is a edge from node_a to node_b if the person associated with node_a is not the same as the person associated with node_b. The weight of each node is the amount of time between the time associated with the two nodes.
After creating this graph, you could generate a variant of a minimum spanning tree from the graph. The variant would differ from MSTs in that:
you'll only add a node to the MST if the person associated with the node is not already in the MST.
you finish creating the MST when all persons are in the MST.
The minimum spanning tree generated would represent a single schedule.
To generate other schedules, remove all the edges from the graph which are found in the schedule you just created and then create a new minimum spanning tree from the graph with the removed edges.

In general, scheduling problems are NP-hard, and while I can't figure out a reduction for this problem to prove it such, it's quite similar to a number of other well-known NP-complete problems. There may be a polynomial-time solution for finding the minimum gap for a single day (though I don't know it off hand, either), but I have less hopes for needing to solve it for multiple days. Unfortunately, it's a complicated problem, and there may not be a perfectly elegant answer. (Or, I'm going to kick myself when someone posts one later.)
First off, I'd say that if your dataset is reasonably small and you've been able to compute all possible schedules fairly quickly, you may just want to stick with that solution, as all others will be approximations, and could possibly end up running slower, if the constant factor of their running time is large. (Meaning that it doesn't grow with the size of the dataset, so it will relatively be smaller for a large dataset.)
The simplest approximation would be to just use a greedy heuristic. It will almost assuredly not find the optimum schedules, and may take a long time to find a solution if most of the schedules are overlapping, and there are only a few that are even valid solutions - but I'm going to assume that this is not the case for employee times.
Start with an arbitrary schedule, choosing one timeslot for each employee at random. For each iteration, pick one employee and change his timeslot to the best possible time, with respect to the rest of current schedule. Repeat this process until your satisfied with the result - when it isn't improving quickly enough anymore or has taken too long. You're probably not going to want to repeat until you can't make any more changes that improve the schedule, since this process will likely loop for most data.
It's not a great heuristic, but it should produce some reasonable schedules, and has a lot of room for adjustment. You may want to always try to switch overlapping times first before any others, or you may want to try to flip the employee who currently contributes to the largest gap, or maybe eliminate certain slots that you've already tried. You may want to sometimes allow a move to a less optimal solution in hopes that you're at a local minima and want to get out of it - some randomness can also help with this. Make sure you always keep track of the best solution you've seen so far.
To generate more schedules, the most obvious thing would be to just start the process over with a different random schedule. Or, maybe flip a few arbitrary times from the previous solution you found, and repeat from there.
Edit: This is all fairly related to genetic algorithms, and you may want to use some of the ideas I presented here in a GA.

How do I find the path with the biggest sum of weights in a weighted graph?

I have a bunch of objects with level, weight and 0 or more connections to objects of the next levels. I want to know how do I get the "heaviest" path (with the biggest sum of weights).
I'd also love to know of course, what books teach me how to deal with graphs in a practical way.

Your graph is acyclic right? (I presume so, since a node always points to a node on the next level). If your graph can have arbritrary cycles, the problem of finding the largest path becomes NP-complete and brute force search becomes the only solution.
Back to the problem - you can solve this by finding, for each node, the heaviest path that leads up to it. Since you already have a topological sort of your DAG (the levels themselves) it is straighfoward to find the paths:
For each node, store the cost of the heaviest path that leads to it and the last node before that on the said path. Initialy, this is always empty (but a sentinel value, like a negative number for the cost, might simplify code later)
For nodes in the first level, you already know the cost of the heaviest path that ends in them - it is zero (and the parent node is None)
For each level, propagate the path info to the next level - this is similar to a normal algo for shortest distance:
for level in range(nlevels):
for node in nodes[level]:
cost = the cost to this node
for (neighbour_vertex, edge_cost) in (the nodes edges):
alt_cost = cost + edge_cost
if alt_cost < cost_to_that_vertex:
cost_to_that_vertex = alt_cost

My book recommendation is Steve Skiena's "Algorithm Design Manual". There's a nice chapter on graphs.

I assume that you can only go down to a lower level in the graph.
Notice how the graph forms a tree. Then you can solve this using recursion:
heaviest_path(node n) = value[n] + max(heaviest_path(children[n][0]), heaviest_path(children[n][1]), etc)
This can easily be optimized by using dynamic programming instead.
Start with the children with the lowest level. Their heaviest_path is just their own value. Keep track of this in an array. Then calculate the heaviest_path for then next level up. Then the next level up. etc.

The method which i generally use to find the 'heaviest' path is to negate the weights and then find the shortest path. there are good algorithms( http://en.wikipedia.org/wiki/Shortest_path_problem) to find the shortest path. But this method holds good as long as you do not have a positive-weight cycle in your original graph.
For graphs having positive-weight cycles the problem of finding the 'heaviest' path is NP-complete and your algorithm to find the heaviest path will have non-polynomial time complexity.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.