Variant of Dijkstra - no repeat groups - python

I'm trying to write an optimization process based on Dijkstra's algorithm to find the optimal path, but with a slight variation to disallow choosing items from the same group/family when finding the optimal path.
Brute force traversal of all edges to find the solution would be np-hard, which is why am attempting to (hopefully) use Dijkstra's algorithm, but I'm struggling to add in the no-repeat groups logic.
Think of it like a traveling salesman problem, but I want to travel from New Your to Los Angels, and have an interesting route (by never visiting 2 similar cities from same group) and minimize my fuel costs. There are approx 15 days and 40 cities, but for defining my program, I've pared it down to 4 cities and 3 days.
Valid paths don't have to visit every group, they just can't visit 2 cities in the same group. {XL,L,S} is a valid solution, but {XL,L,XL} is not valid because it visits the XL group twice. All Valid solutions will be the same length (15 days or edges) but can use any combination of groups (w/out duplicating groups) and need not use them all (since 15 days, but 40 different city groups).
Here's a picture I put together to illustrate a valid & invalid route: (FYI - groups are horizontal rows in the matrix)
**Day 1**
G1->G2 # $10
G3->G4 # $30
etc...
**Day 2**
G1->G3 # $50
G2->G4 # $10
etc...
**Day 3**
G1->G4 # $30
G2->G3 # $50
etc...
The optimal path would be G1->G2->G3, however a standard Dijkstra solution returns G1-
I found & tweaked this example code online, and name my nodes with the following syntax so I can quickly check what day & group they belong to: D[day#][Group#] by slicing the 3rd character.
## Based on code found here: https://raw.githubusercontent.com/nvictus/priority-queue-dictionary/0eea25fa0b0981558aa780ec5b74649af83f441a/examples/dijkstra.py
import pqdict
def dijkstra(graph, source, target=None):
"""
Computes the shortests paths from a source vertex to every other vertex in
a graph
"""
# The entire main loop is O( (m+n) log n ), where n is the number of
# vertices and m is the number of edges. If the graph is connected
# (i.e. the graph is in one piece), m normally dominates over n, making the
# algorithm O(m log n) overall.
dist = {}
pred = {}
predGroups = {}
# Store distance scores in a priority queue dictionary
pq = pqdict.PQDict()
for node in graph:
if node == source:
pq[node] = 0
else:
pq[node] = float('inf')
# Remove the head node of the "frontier" edge from pqdict: O(log n).
for node, min_dist in pq.iteritems():
# Each node in the graph gets processed just once.
# Overall this is O(n log n).
dist[node] = min_dist
if node == target:
break
# Updating the score of any edge's node is O(log n) using pqdict.
# There is _at most_ one score update for each _edge_ in the graph.
# Overall this is O(m log n).
for neighbor in graph[node]:
if neighbor in pq:
new_score = dist[node] + graph[node][neighbor]
#This is my attempt at tracking if we've already used a node in this group/family
#The group designator is stored as the 4th char in the node name for quick access
try:
groupToAdd = node[2]
alreadyVisited = predGroups.get( groupToAdd, False )
except:
alreadyVisited = False
groupToAdd = 'S'
#Solves OK with this line
if new_score < pq[neighbor]:
#Erros out with this line version
#if new_score < pq[neighbor] and not( alreadyVisited ):
pq[neighbor] = new_score
pred[neighbor] = node
#Store this node in the "visited" list to prevent future duplication
predGroups[groupToAdd] = groupToAdd
print predGroups
#print node[2]
return dist, pred
def shortest_path(graph, source, target):
dist, pred = dijkstra(graph, source, target)
end = target
path = [end]
while end != source:
end = pred[end]
path.append(end)
path.reverse()
return path
if __name__=='__main__':
# A simple edge-labeled graph using a dict of dicts
graph = {'START': {'D11':1,'D12':50,'D13':3,'D14':50},
'D11': {'D21':5},
'D12': {'D22':1},
'D13': {'D23':50},
'D14': {'D24':50},
'D21': {'D31':3},
'D22': {'D32':5},
'D23': {'D33':50},
'D24': {'D34':50},
'D31': {'END':3},
'D32': {'END':5},
'D33': {'END':50},
'D34': {'END':50},
'END': {'END':0}}
dist, path = dijkstra(graph, source='START')
print dist
print path
print shortest_path(graph, 'START', 'END')

Related

What is best: Global Variable or Parameter in this python function?

I have a question about the following code, but i guess applies to different functions.
This function computes the maximum path and its length for a DAG, given the Graph, source node, and end node.
To keep track of already computed distances across recursions I use "max_distances_and_paths" variable, and update it on each recursion.
Is it better to keep it as a function parameter (inputed and outputed across recursions) or
use a global variable and initialize it outside the function?
How can avoid to have this parameter returned when calling the function externally (i.e it
has to be outputed across recursions but I dont care about its value, externally)?
a better way than doing: LongestPath(G, source, end)[0:2] ??
Thanks
# for a DAG computes maximum distance and maximum path nodes sequence (ordered in reverse).
# Recursively computes the paths and distances to edges which are adjacent to the end node
# and selects the maximum one
# It will return a single maximum path (and its distance) even if there are different paths
# with same max distance
# Input {Node 1: adj nodes directed to Node 1 ... Node N: adj nodes directed to Node N}
# Example: {'g': ['r'], 'k': ['g', 'r']})
def LongestPath(G, source, end, max_distances_and_paths=None):
if max_distances_and_paths is None:
max_distances_and_paths = {}
max_path = [end]
distances_list = []
paths_list = []
# return max_distance and max_path from source to current "end" if already computed (i.e.
# present in the dictionary tracking maximum distances and correspondent distances)
if end in max_distances_and_paths:
return max_distances_and_paths[end][0], max_distances_and_paths[end][1], max_distances_and_paths
# base case, when end node equals source node
if source == end:
max_distance = 0
return max_distance, max_path, max_distances_and_paths
# if there are no adjacent nodes directed to end node (and is not the source node, previous case)
# means path is disconnected
if len(G[end]) == 0:
return 0, [0], {"": []}
# for each adjacent node pointing to end node compute recursively its max distance to source node
# and add one to get the distance to end node. Recursively add nodes included in the path
for t in G[end]:
sub_distance, sub_path, max_distances_and_paths = LongestPath(G, source, t, max_distances_and_paths)
paths_list += [[end] + sub_path]
distances_list += [1 + sub_distance]
# compute max distance
max_distance = max(distances_list)
# access the same index where max_distance is, in the list of paths, to retrieve the path
# correspondent to the max distance
index = [i for i, x in enumerate(distances_list) if x == max_distance][0]
max_path = paths_list[index]
# update the dictionary tracking maximum distances and correspondent paths from source
# node to current end node.
max_distances_and_paths.update({end: [max_distance, max_path]})
# return computed max distance, correspondent path, and tracker
return max_distance, max_path, max_distances_and_paths
Global variables are generally avoided due to several reasons (see Why are global variables evil?). I would recommend sending the parameter in this case. However, you could define a larger function housing your recursive function. Here's a quick example I wrote for a factorial code:
def a(m):
def b(m):
if m<1:return 1
return m*b(m-1)
n = b(m)
m=m+2
return n,m
print(a(6))
This will give: (720, 8). This proves that even if you used the same variable name in your recursive function, the one you passed in to the larger function will not change. In your case, you want to just return n as per my example. I only returned an edited m value to show that even though both a and b functions have m as their input, Python separates them.
In general I would say avoid the usage of global variables. This is because is makes you code harder to read and often more difficult to debug if you codebase gets a bit more complex. So it is good practice.
I would use a helper function to initialise your recursion.
def longest_path_helper(G, source, end, max_distances_and_paths=None):
max_distance, max_path, max_distances_and_paths = LongestPath(
G, source, end, max_distances_and_paths
)
return max_distance, max_path, max_distances_and_paths
On a side note, in Python it is convention to write functions without capital letters and separated with underscores and Capicalized without underscores are used for classes. So it would be more Pythonic to use def longest_path():

Unweighted directed graph distances

Let's say I have an unweighted directed graph. I was wondering if there was a way to store all the distances between a starting node and all the remaining nodes of the graph. I know Dijkstra's algorithm could be an option, but I'm not sure this would be the best one, since I'm working with a pretty big graph (~100k nodes), and it is an unweighted one. My toughts so far were to perform a BFS, trying to store all the distances meanwhile. Is this a feasible approach?
Finally, since I'm pretty new on graph theory, could someone maybe point me in the right direction for a good Python implementation of this kind of problem?
Definitely feasible, and pretty fast if your data structure contains a list of end nodes for each starting node indexed on the starting node identifier:
Here's an example using a dictionary for edges: {startNode:list of end nodes}
from collections import deque
maxDistance = 0
def getDistances(origin,edges):
global maxDistance
maxDistance = 0
distances = {origin:0} # {endNode:distance from origin}
toLink = deque([origin]) # start at origin (distance=0)
while toLink:
start = toLink.popleft() # previous end, will chain to next
dist = distances[start] + 1 # new next are at +1
for end in edges[start]: # next end nodes
if end in distances: continue # new ones only
distances[end] = dist # record distance
toLink.append(end) # will link from there
maxDistance = max(maxDistance,dist)
return distances
This does one iteration per node (excluding unreachable nodes) and uses fast dictionary access to follow links to new next nodes
Using some random test data (10 million edges) ...
import random
from collections import defaultdict
print("loading simulated graphs")
vertexCount = 100000
edgeCount = vertexCount * 100
edges = defaultdict(set)
edgesLoaded = 0
minSpan = 1 # vertexCount//2
while edgesLoaded<edgeCount:
start = random.randrange(vertexCount)
end = random.randrange(vertexCount)
if abs(start-end) > minSpan and end not in edges[start]:
edges[start].add(end)
edgesLoaded += 1
print("loaded!")
Performance:
# starting from a randomly selected node
origin = random.choice(list(edges.keys()))
from timeit import timeit
t = timeit(lambda:getDistances(origin,edges),number=1)
print(f"{t:.2f} seconds for",edgeCount,"edges", "max distance = ",maxDistance)
# 3.06 seconds for 10000000 edges max distance = 4

How to find the optimal path for a graph with weighted edges using depth first search method?

I am trying to solve "Problem Set 2: Fastest Way to Get Around MIT" from MIT Course Number 6.0002:
In this problem set you will solve a simple optimization problem on a graph. Specifically, you will find the shortest route from one building to another on the MIT campus given that you wish to constrain the amount of time you spend walking outdoors (in the cold). [...]
Problem 3: Find the Shortest Path using Optimized Depth First Search
In our campus map problem, the total distance traveled on a path is equal to the sum of all total distances traveled between adjacent nodes on this path. Similarly, the distance spent outdoors on the path is equal to the sum of all distances spent outdoors on the edges in the path.
Depending on the number of nodes and edges in a graph, there can be multiple valid paths from one node to another, which may consist of varying distances. We define the shortest path between two nodes to be the path with the least total distance traveled . You are trying to minimize the distance traveled while not exceeding the maximum distance outdoors.
How do we find a path in the graph? Work off the depth-first traversal algorithm covered in lecture to discover each of the nodes and their children nodes to build up possible paths. Note that you’ll have to adapt the algorithm to fit this problem. [...]
Problem 3b: Implement get_best_path
Implement the helper function get_best_path. Assume that any variables you need have been set correctly in directed_dfs. Below is some pseudocode to help get you started.
if start and end are not valid nodes:
raise an error
elif start and end are the same node:
update the global variables appropriately
else:
for all the child nodes of start
construct a path including that node
recursively solve the rest of the path, from the child node to the end node
return the shortest path
I can't figure out what I am doing wrong in the algorithm to find the shortest path using the depth-first search method.
I tried unweighted edges and it works fine for that, but when I try weighted edges it does not return the shortest path.
def get_best_path(digraph, start, end, path, max_dist_outdoors, best_dist,
best_path):
"""
Finds the shortest path between buildings subject to constraints.
Parameters:
digraph: Digraph instance
The graph on which to carry out the search
start: string
Building number at which to start
end: string
Building number at which to end
path: list composed of [[list of strings], int, int]
Represents the current path of nodes being traversed. Contains
a list of node names, total distance traveled, and total
distance outdoors.
max_dist_outdoors: int
Maximum distance spent outdoors on a path
best_dist: int
The smallest distance between the original start and end node
for the initial problem that you are trying to solve
best_path: list of strings
The shortest path found so far between the original start
and end node.
Returns:
a list of building numbers (in strings), [n_1, n_2, ..., n_k],
where there exists an edge from n_i to n_(i+1) in digraph,
for all 1 <= i < k and the distance of that path.
If there exists no path that satisfies max_total_dist and
max_dist_outdoors constraints, then return None.
"""
# TODO
# put the first node in the path on each recursion
path[0] = path[0] + [start]
# if start and end nodes are same then return the path
if start == end:
return tuple(path[0])
# create a node from the start point name
start_node = Node(start)
# for each edge starting at that start node, call the function recursively
# if the destination node is not already in path
# and if the best_dist has not been found yet or it is greater than the total distance
# current path
for an_edge in digraph.get_edges_for_node(start_node):
# get the destination node for the edge
a_node = an_edge.get_destination()
# update the total distance traveled so far
path[1] = path[1] + an_edge.get_total_distance()
# update the distance spent outside
path[2] = path[2] + an_edge.get_outdoor_distance()
# if the node is not in path
if str(a_node) not in path[0]:
# if the best_distance is none or greater than distance of current path
if path[1] < best_dist and path[2] < max_dist_outdoors:
new_path = get_best_path(digraph, str(a_node), end, [path[0], path[1], path[2]], max_dist_outdoors, best_dist, best_path)
if new_path != None:
best_dist = path[1]
print('best_dist', best_dist)
best_path = new_path
return best_path
def get_best_path(digraph, start, end, path, max_dist_outdoors, best_dist, best_path):
start_node = Node(start)
end_node = Node(end)
path[0] = path[0] + [start]
if len(path[0]) > 1:
dist, out_dist = get_distances_for_node(digraph, Node(path[0][-2]), start_node)
path[1] = path[1] + dist
path[2] = path[2] + out_dist
if not digraph.has_node(start_node) and not digraph.has_node(end_node):
raise ValueError('The graph does not contain these nodes')
elif start_node == end_node:
return (path[0], path[1])
else:
for an_edge in digraph.get_edges_for_node(start_node):
next_node = an_edge.get_destination()
if str(next_node) not in path[0]:
expected_dist = path[1] + an_edge.get_total_distance()
expected_out_dist = path[2] + an_edge.get_outdoor_distance()
if expected_dist < best_dist and expected_out_dist <= max_dist_outdoors:
#print('best_path_check', path[1], best_dist)
new_path = get_best_path(digraph, str(next_node), end, [path[0], path[1], path[2]], max_dist_outdoors, best_dist, best_path)
#print(new_path)
if new_path[0] != None:
best_path = new_path[0]
best_dist = new_path[1]
return (best_path, best_dist)
def get_distances_for_node(digraph, src, dest):
for an_edge in digraph.get_edges_for_node(src):
if an_edge.get_destination() == dest:
return an_edge.get_total_distance(), an_edge.get_outdoor_distance()
I was able to solve my problem using this function but i am not sure whether it is the best solution.
Hope this helps.

Prohibitively slow execution of function compute_resilience in Python

The idea is to compute resilience of the network presented as an undirected graph in form
{node: (set of its neighbors) for each node in the graph}.
The function removes nodes from the graph in random order one by one and calculates the size of the largest remaining connected component.
The helper function bfs_visited() returns the set of nodes that are still connected to the given node.
How can I improve the implementation of the algorithm in Python 2? Preferably without changing the breadth-first algorithm in the helper function
def bfs_visited(graph, node):
"""undirected graph {Vertex: {neighbors}}
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph {Vertex: {neighbors}}
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components = []
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in
the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph {V: N}
returns a list whose k+1th entry is the size of the largest cc after
the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
I received this tremendously great answer from Gareth Rees, which covers the question completely.
Review
The docstring for bfs_visited should explain the node argument.
The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.
In bfs_visited the lines:
queue = deque()
queue.append(node)
can be simplified to:
queue = deque([node])
The function largest_cc_size builds a list of pairs:
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]
Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:
if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0
If you are using Python 3.4 or later, this can be further simplified using the default argument to max:
return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:
for node in remaining_nodes:
only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:
remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:
remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)
Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.
There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In compute_resilience the first line is:
res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order, compute_resilience calls:
res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)
In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical = {}
components = {}
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = {node: set(neighbours) for node, neighbours in graph.items()}
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()
In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.
I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = {i: set() for i in range(n)}
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.

Convert graph to have outdegree 1 (except extra zero weight edges)

I am reading graphs such as http://www.dis.uniroma1.it/challenge9/data/rome/rome99.gr from http://www.dis.uniroma1.it/challenge9/download.shtml in python. For example, using this code.
#!/usr/bin/python
from igraph import *
fname = "rome99.gr"
g = Graph.Read_DIMACS(fname, directed=True )
(I need to change the line "p sp 3353 8870" " to "p max 3353 8870" to get this to work using igraph.)
I would like to convert the graph to one where all nodes have outdegree 1 (except for extra zero weight edges we are allowed to add) but still preserve all shortest paths. That is a path between two nodes in the original graph should be a shortest path in the new graph if and only if it is a shortest path in the converted graph. I will explain this a little more after an example.
One way to do this I was thinking is to replace each node v by a little linear subgraph with v.outdegree(mode=OUT) nodes. In the subgraph the nodes are connected in sequence by zero weight edges. We then connect nodes in the subgraph to the first node in other little subgraphs we have created.
I don't mind using igraph or networkx for this task but I am stuck with the syntax of how to do it.
For example, if we start with graph G:
I would like to convert it to graph H:
As the second graph has more nodes than the first we need to define what we mean by its having the same shortest paths as the first graph. I only consider paths between either nodes labelled with simple letters of with nodes labelled X1. In other words, in this example a path can't start or end in A2 or B2. We also merge all versions of a node when considering a path. So a path A1->A2->D in H is regarded as the same as A->D in G.
This is how far I have got. First I add the zero weight edges to the new graph
h = Graph(g.ecount(), directed=True)
#Connect the nodes with zero weight edges
gtoh = [0]*g.vcount()
i=0
for v in g.vs:
gtoh[v.index] = i
if (v.degree(mode=OUT) > 1):
for j in xrange(v.degree(mode=OUT)-1):
h.add_edge(i,i+1, weight = 0)
i = i+1
i = i + 1
Then I add the main edges
#Now connect the nodes to the relevant "head" nodes.
for v in g.vs:
h_v_index = gtoh[v.index]
i = 0
for neighbour in g.neighbors(v, mode=OUT):
h.add_edge(gtoh[v.index]+i,gtoh[neighbour], weight = g.es[g.get_eid(v.index, neighbour)]["weight"])
i = i +1
Is there a nicer/better way of doing this? I feel there must be.
The following code should work in igraph and Python 2.x; basically it does what you proposed: it creates a "linear subgraph" for every single node in the graph, and connects exactly one outgoing edge to each node in the linear subgraph corresponding to the old node.
#!/usr/bin/env python
from igraph import Graph
from itertools import izip
def pairs(l):
"""Given a list l, returns an iterable that yields pairs of the form
(l[i], l[i+1]) for all possible consecutive pairs of items in l"""
return izip(l, l[1:])
def convert(g):
# Get the old vertex names from g
if "name" in g.vertex_attributes():
old_names = map(str, g.vs["name"])
else:
old_names = map(str, xrange(g.vcount))
# Get the outdegree vector of the old graph
outdegs = g.outdegree()
# Create a mapping from old node IDs to the ID of the first node in
# the linear subgraph corresponding to the old node in the new graph
new_node_id = 0
old_to_new = []
new_names = []
for old_node_id in xrange(g.vcount()):
old_to_new.append(new_node_id)
new_node_id += outdegs[old_node_id]
old_name = old_names[old_node_id]
if outdegs[old_node_id] <= 1:
new_names.append(old_name)
else:
for i in xrange(1, outdegs[old_node_id]+1):
new_names.append(old_name + "." + str(i))
# Add a sentinel element to old_to_new just to make our job easier
old_to_new.append(new_node_id)
# Create the edge list of the new graph and the weights of the new
# edges
new_edgelist = []
new_weights = []
# 1) Create the linear subgraphs
for new_node_id, next_new_node_id in pairs(old_to_new):
for source, target in pairs(range(new_node_id, next_new_node_id)):
new_edgelist.append((source, target))
new_weights.append(0)
# 2) Create the new edges based on the old ones
for old_node_id in xrange(g.vcount()):
new_node_id = old_to_new[old_node_id]
for edge_id in g.incident(old_node_id, mode="out"):
neighbor = g.es[edge_id].target
new_edgelist.append((new_node_id, old_to_new[neighbor]))
new_node_id += 1
print g.es[edge_id].source, g.es[edge_id].target, g.es[edge_id]["weight"]
new_weights.append(g.es[edge_id]["weight"])
# Return the graph
vertex_attrs = {"name": new_names}
edge_attrs = {"weight": new_weights}
return Graph(new_edgelist, directed=True, vertex_attrs=vertex_attrs, \
edge_attrs=edge_attrs)

Categories