K-th order neighbors in graph - Python networkx - python

I have a directed graph in which I want to efficiently find a list of all K-th order neighbors of a node. K-th order neighbors are defined as all nodes which can be reached from the node in question in exactly K hops.
I looked at networkx and the only function relevant was neighbors. However, this just returns the order 1 neighbors. For higher order, we need to iterate to determine the full set. I believe there should be a more efficient way of accessing K-th order neighbors in networkx.
Is there a function which efficiently returns the K-th order neighbors, without incrementally building the set?
EDIT: In case there exist other graph libraries in Python which might be useful here, please do mention those.

You can use:
nx.single_source_shortest_path_length(G, node, cutoff=K)
where G is your graph object.

For NetworkX the best method is probably to build the set of neighbors at each k. You didn't post your code but it seems you probably already have done this:
import networkx as nx
def knbrs(G, start, k):
nbrs = set([start])
for l in range(k):
nbrs = set((nbr for n in nbrs for nbr in G[n]))
return nbrs
if __name__ == '__main__':
G = nx.gnp_random_graph(50,0.1,directed=True)
print(knbrs(G, 0, 3))

Yes,you can get a k-order ego_graph of a node
subgraph = nx.ego_graph(G,node,radius=k)
then neighbors are nodes of the subgraph
neighbors= list(subgraph.nodes())

I had a similar problem, except that I had a digraph, and I need to maintain the edge-attribute dictionary. This mutual-recursion solution keeps the edge-attribute dictionary if you need that.
def neighbors_n(G, root, n):
E = nx.DiGraph()
def n_tree(tree, n_remain):
neighbors_dict = G[tree]
for neighbor, relations in neighbors_dict.iteritems():
E.add_edge(tree, neighbor, rel=relations['rel'])
#you can use this map if you want to retain functional purity
#map(lambda neigh_rel: E.add_edge(tree, neigh_rel[0], rel=neigh_rel[1]['rel']), neighbors_dict.iteritems() )
neighbors = list(neighbors_dict.iterkeys())
n_forest(neighbors, n_remain= (n_remain - 1))
def n_forest(forest, n_remain):
if n_remain <= 0:
return
else:
map(lambda tree: n_tree(tree, n_remain=n_remain), forest)
n_forest( [root] , n)
return E

You solve your problem using modified BFS algorithm. When you're storing node in queue, store it's level (distance from root) as well. When you finish processing the node (all neighbours visited - node marked as black) you can add it to list of nodes of its level. Here is example based on this simple implementation:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from collections import defaultdict
from collections import deque
kth_step = defaultdict(list)
class BFS:
def __init__(self, node,edges, source):
self.node = node
self.edges = edges
self.source = source
self.color=['W' for i in range(0,node)] # W for White
self.graph =color=[[False for i in range(0,node)] for j in range(0,node)]
self.queue = deque()
# Start BFS algorithm
self.construct_graph()
self.bfs_traversal()
def construct_graph(self):
for u,v in self.edges:
self.graph[u][v], self.graph[v][u] = True, True
def bfs_traversal(self):
self.queue.append((self.source, 1))
self.color[self.source] = 'B' # B for Black
kth_step[0].append(self.source)
while len(self.queue):
u, level = self.queue.popleft()
if level > 5: # limit searching there
return
for v in range(0, self.node):
if self.graph[u][v] == True and self.color[v]=='W':
self.color[v]='B'
kth_step[level].append(v)
self.queue.append((v, level+1))
'''
0 -- 1---7
| |
| |
2----3---5---6
|
|
4
'''
node = 8 # 8 nodes from 0 to 7
edges =[(0,1),(1,7),(0,2),(1,3),(2,3),(3,5),(5,6),(2,4)] # bi-directional edge
source = 0 # set fist node (0) as source
bfs = BFS(node, edges, source)
for key, value in kth_step.items():
print key, value
Output:
$ python test.py
0 [0]
1 [1, 2]
2 [3, 7, 4]
3 [5]
4 [6]
I don't know networkx, neither I found ready to use algorithm in Graph Tool. I believe such a problem isn't common enough to have its own function. Also I think it would be overcomplicated, inefficient and redundant to store lists of k-th neighbours for any node in graph instance so such a function would probably have to iterate over nodes anyway.

As proposed previously, the following solution gives you all secondary neighbors (neighbors of neighbors) and lists all neighbors once (the solution is based on BFS):
{n: path for n, path in nx.single_source_shortest_path(G, 'a', cutoff=2).items() if len(path)==3}
Another solution which is slightly faster (6.68 µs ± 191 ns vs. 13.3 µs ± 32.1 ns, measured with timeit) includes that in undirected graphs the neighbor of a neighbor can be the source again:
def k_neighbors(G, source, cutoff):
neighbors = {}
neighbors[0] = {source}
for k in range(1, cutoff+1):
neighbors[k] = set()
for node in level[k-1]:
neighbors[k].update(set(G.neighbors(node)))
return neighbors
k_neighbors(B, 'a', 2) #dict keyed with level until `cutoff`, in this case 2
Both solutions give you the source itself as 0th-order neighbor.
So it depends on your context which one to prefer.

Related

Find local shortest path with greedy best first search algorithm

Recently I took a test in the theory of algorithms. I had a normal best first search algorithm (code below).
from queue import PriorityQueue
# Filling adjacency matrix with empty arrays
vertices = 14
graph = [[] for i in range(vertices)]
# Function for adding edges to graph
def add_edge(x, y, cost):
graph[x].append((y, cost))
graph[y].append((x, cost))
# Function For Implementing Best First Search
# Gives output path having the lowest cost
def best_first_search(source, target, vertices):
visited = [0] * vertices
pq = PriorityQueue()
pq.put((0, source))
print("Path: ")
while not pq.empty():
u = pq.get()[1]
# Displaying the path having the lowest cost
print(u, end=" ")
if u == target:
break
for v, c in graph[u]:
if not visited[v]:
visited[v] = True
pq.put((c, v))
print()
if __name__ == '__main__':
# The nodes shown in above example(by alphabets) are
# implemented using integers add_edge(x,y,cost);
add_edge(0, 1, 1)
add_edge(0, 2, 8)
add_edge(1, 2, 12)
add_edge(1, 4, 13)
add_edge(2, 3, 6)
add_edge(4, 3, 3)
source = 0
target = 2
best_first_search(source, target, vertices)
He brings out Path: 0 1 0 2 (path sum — 8), it's correct.
My teacher suggested that I remake the code so that it looks for the local minimum path, i.e. Path: 0 1 2 (path sum — 13).
I need greedily take the shortest edge from the current node to an unvisited node and I don't really understand how to do it right.
Since this is homework, I won't spell out the entire code for you.
For best-first search, you don't need a priority queue. You just need to track which nodes you have visited, and which node you are currently at. While your current node is not the target node, find the shortest edge that leads to an unvisited node, and set your current node to the node at the other end of that edge.

Dijkstra algorithm not working even though passes the sample test cases

So I have followed Wikipedia's pseudocode for Dijkstra's algorithm as well as Brilliants. https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Pseudocode https://brilliant.org/wiki/dijkstras-short-path-finder/. Here is my code which doesn't work. Can anyone point in the flaw in my code?
# Uses python3
from queue import Queue
n, m = map(int, input().split())
adj = [[] for i in range(n)]
for i in range(m):
u, v, w = map(int, input().split())
adj[u-1].append([v, w])
adj[v-1].append([u, w])
x, y = map(int, input().split())
x, y = x-1, y-1
q = [i for i in range(n, 0, -1)]
#visited = set()
# visited.add(x+1)
dist = [float('inf') for i in range(len(adj))]
dist[x] = 0
# print(adj[visiting])
while len(q) != 0:
visiting = q.pop()-1
for i in adj[visiting]:
u, v = i
dist[u-1] = dist[visiting]+v if dist[visiting] + \
v < dist[u-1] else dist[u-1]
# print(dist)
if dist[y] != float('inf'):
print(dist[y])
else:
print(-1)
Your algorithm is not implementing Dijkstra's algorithm correctly. You are just iterating over all nodes in their input order and updating the distance to the neighbors based on the node's current distance. But that latter distance is not guaranteed to be the shortest distance, because you iterate some nodes before their "turn". Dijkstra's algorithm specifies a particular order of processing nodes, which is not necessarily the input order.
The main ingredient that is missing from your algorithm, is a priority queue. You did import from Queue, but never use it. Also, it lacks the marking of nodes as visited, a concept which you seemed to have implemented a bit, but which you commented out.
The outline of the algorithm on Wikipedia explains the use of this priority queue in the last step of each iteration:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
There is currently no mechanism in your code that selects the visited node with smallest distance. Instead it picks the next node based on the order in the input.
To correct your code, please consult the pseudo code that is available on that same Wikipedia page, and I would advise to go for the variant with priority queue.
In Python you can use heapq for performing the actions on the priority queue (heappush, heappop).

Creating lists of mutual neighbor elements

Say, I have a set of unique, discrete parameter values, stored in a variable 'para'.
para=[1,2,3,4,5,6,7,8,9,10]
Each element in this list has 'K' number of neighbors (given: each neighbor ϵ para).
EDIT: This 'K' is obviously not the same for each element.
And to clarify the actual size of my problem: I need a neighborhood of close to 50-100 neighbors on average, given that my para list is around 1000 elements large.
NOTE: A neighbor of an element, is another possible 'element value' to which it can jump, by a single mutation.
neighbors_of_1 = [2,4,5,9] #contains all possible neighbors of 1 (i.e para[0])
Question: How can I define each of the other element's
neighbors randomly from 'para', but, keeping in mind the previously
assigned neighbors/relations?
eg:
neighbors_of_5=[1,3,7,10] #contains all possible neighbors of 5 (i.e para[4])
NOTE: '1' has been assigned as a neighbor of '5', keeping the values of 'neighbors_of_1' in mind. They are 'mutual' neighbors.
I know the inefficient way of doing this would be, to keep looping through the previously assigned lists and check if the current state is a neighbor of another state, and if True, store the value of that state as one of the new neighbors.
Is there a cleaner/more pythonic way of doing this? (By maybe using the concept of linked-lists or any other method? Or are lists redundant?)
This solution does what you want, I believe. It is not the most efficient, as it generates quite a bit of extra elements and data, but the run time was still short on my computer and I assume you won't run this repeatedly in a tight, inner loop?
import itertools as itools
import random
# Generating a random para variable:
#para=[1,2,3,4,5,6,7,8,9,10]
para = list(range(10000))
random.shuffle(para)
para = para[:1000]
# Generate all pais in para (in random order)
pairs = [(a,b) for a, b in itools.product(para, para) if a < b]
random.shuffle(pairs)
K = 50 # average number of neighbors
N = len(para)*K//2 # total connections
# Generating a neighbors dict, holding all the neighbors of an element
neighbors = dict()
for elem in para:
neighbors[elem] = []
# append the neighbors to eachother
for pair in pairs[:N]:
neighbors[pair[0]].append(pair[1])
neighbors[pair[1]].append(pair[0])
# sort each neighbor list
for neighbor in neighbors.values():
neighbor.sort()
I hope you understand my solution. Otherwise feel free to ask for a few pointers.
Neighborhood can be represented by a graph. If N is a neighbor of B does not necessarily implies that B is a neighbor of A, it is directed. Else it is undirected. I'm guessing you want a undirected graph since you want to "keep in mind the relationship between the nodes".
Besides the obvious choice of using a third party library for graphs, you can solve your issue by using a set of edges between the graph vertices. Edges can be represented by the pair of their two extremities. Since they are undirected, either you use a tuple (A,B), such that A < B or you use a frozenset((A,B)).
Note there are considerations to take about what neighbor to randomly choose from when in the middle of the algorithm, like discouraging to pick nodes with a lot of neighbor to avoid to go over your limits.
Here is a pseudo-code of what I'd do.
edges = set()
arities = [ 0 for p in para ]
for i in range(len(para)):
p = para[i]
arity = arities[i]
n = random.randrange(50, 100)
k = n
while k > 0:
w = list(map(lambda x : 1/x, arities))
#note: test what random scheme suits you best
j = random.choices(para, weight = w )
#note: I'm storing the vertices index in the edges rather than the nodes.
#But if the nodes are unique, you could store the nodes.
e = frozenset((i,j))
if e not in edges:
edges.add(e)
#instead of arities, you could have a list of list of the neighbours.
#arity[i] would be len(neighbors[i]), then
arities[i] += 1
arities[j] += 1
k-=1

Networkx: Find all minimal cuts consisting of only nodes from one set in a bipartite graph

In the networkx python package, is there a way to find all node cuts of minimal size consisting of only nodes from one set in a bipartite graph? For example, if the two sides of a bipartite graph are A and B, how might I go about finding all minimal node cuts consisting of nodes entirely from set B? The following code I have works but it's extremely slow:
def get_one_sided_cuts(G, A, B):
#get all cuts that consist of nodes exclusively from B which disconnect
#nodes from A
one_sided_cuts = []
seen = []
l = list(combinations(A, 2))
for x in l:
s = x[0]
t = x[1]
cut = connectivity.minimum_st_node_cut(G, s, t)
if set(cut).issubset(B) and (cut not in seen):
one_sided_cuts.append(cut)
seen.append(cut)
#find minimum cut size
cur_min = float("inf")
for i in one_sided_cuts:
if len(i) < cur_min:
cur_min = len(i)
one_sided_cuts = [x for x in one_sided_cuts if len(x) == cur_min]
return one_sided_cuts
Note that this actually only checks if there is a minimal cut which, if removed, would disconnect two nodes in A only. If your solution does this (instead of finding a cut that will separate any two nodes) that's fine too. Any ideas on how to do this more efficiently?
As stated in the comment, there are a couple of interpretations of “all node cuts of minimal size consisting of only nodes from one set in a bipartite graph”. It either means
All node cuts of minimum size when restricting cuts to be in one set of the bipartite graph, or
All node cuts in an unconstrained sense (consisting of nodes from A or B) that happen to completely lie in B.
From your code example you are interested in 2. According to the docs, there is a way to speed up this calculation, and from profile results it helps a bit. There are auxiliary structures built, per graph, to determine the minimum node cuts. Each node is replaced by 2 nodes, additional directed edges are added, etc. according to the Algorithm 9 in http://www.cse.msu.edu/~cse835/Papers/Graph_connectivity_revised.pdf
We can reuse these structures instead of reconstructing them inside a tight loop:
Improvement for Case 2:
from networkx.algorithms.connectivity import (
build_auxiliary_node_connectivity)
from networkx.algorithms.flow import build_residual_network
from networkx.algorithms.flow import edmonds_karp
def getone_sided_cuts_Case2(G, A, B):
# build auxiliary networks
H = build_auxiliary_node_connectivity(G)
R = build_residual_network(H, 'capacity')
# get all cutes that consist of nodes exclusively from B which disconnet
# nodes from A
one_sided_cuts = []
seen = []
l = list(combinations(A,2))
for x in l:
s = x[0]
t = x[1]
cut = minimum_st_node_cut(G, s, t, auxiliary=H, residual=R)
if set(cut).issubset(B):
if cut not in seen:
one_sided_cuts.append(cut)
seen.append(cut)
# Find minimum cut size
cur_min = float('inf')
for i in one_sided_cuts:
if len(i) < cur_min:
curr_min = len(i)
one_sided_cuts = [x for x in one_sided_cuts if len(x) == cur_min]
return one_sided_cuts
For profiling purposes, you might use the following, or one of the built-in bipartite graph generators in Networkx:
def create_bipartite_graph(size_m, size_n, num_edges):
G = nx.Graph()
edge_list_0 = list(range(size_m))
edge_list_1 = list(range(size_m,size_m+size_n))
all_edges = []
G.add_nodes_from(edge_list_0, bipartite=0)
G.add_nodes_from(edge_list_1, bipartite=1)
all_edges = list(product(edge_list_0, edge_list_1))
num_all_edges = len(all_edges)
edges = [all_edges[i] for i in random.sample(range(num_all_edges), num_edges)]
G.add_edges_from(edges)
return G, edge_list_0, edge_list_1
Using %timeit, the second version runs about 5-10% faster.
For Case 1, the logic is a little more involved. We need to consider minimal cuts from nodes only inside B. This requires a change to minimum_st_node_cut in the following way. Then replace all occurences of minimum_st_node_cut to rest_minimum_st_node_cut in your solution or the Case 2 solution I gave above, noting that the new function also requires specification of the sets A, B, necessarily:
def rest_build_auxiliary_node_connectivity(G,A,B):
directed = G.is_directed()
H = nx.DiGraph()
for node in A:
H.add_node('%sA' % node, id=node)
H.add_node('%sB' % node, id=node)
H.add_edge('%sA' % node, '%sB' % node, capacity=1)
for node in B:
H.add_node('%sA' % node, id=node)
H.add_node('%sB' % node, id=node)
H.add_edge('%sA' % node, '%sB' % node, capacity=1)
edges = []
for (source, target) in G.edges():
edges.append(('%sB' % source, '%sA' % target))
if not directed:
edges.append(('%sB' % target, '%sA' % source))
H.add_edges_from(edges, capacity=1)
return H
def rest_minimum_st_node_cut(G, A, B, s, t, auxiliary=None, residual=None, flow_func=edmonds_karp):
if auxiliary is None:
H = rest_build_auxiliary_node_connectivity(G, A, B)
else:
H = auxiliary
if G.has_edge(s,t) or G.has_edge(t,s):
return []
kwargs = dict(flow_func=flow_func, residual=residual, auxiliary=H)
for node in [x for x in A if x not in [s,t]]:
edge = ('%sA' % node, '%sB' % node)
num_in_edges = len(H.in_edges(edge[0]))
H[edge[0]][edge[1]]['capacity'] = num_in_edges
edge_cut = minimum_st_edge_cut(H, '%sB' % s, '%sA' % t,**kwargs)
node_cut = set([n for n in [H.nodes[node]['id'] for edge in edge_cut for node in edge] if n not in A])
return node_cut - set([s,t])
We then have, for example:
In [1]: G = nx.Graph()
# A = [0,1,2,3], B = [4,5,6,7]
In [2]: G.add_edges_from([(0,4),(0,5),(1,6),(1,7),(4,1),(5,1),(6,3),(7,3)])
In [3]: minimum_st_node_cut(G, 0, 3)
{1}
In [4]: rest_minimum_st_node_cut(G,A,B,0,3)
{6, 7}
Finally note that the minimum_st_edge_cut() function returns [] if two nodes are adjacent. Sometimes the convention is to return a set of n-1 nodes in this case, all nodes except the source or sink. Anyway, with the empty list convention, and since your original solution to Case 2 loops over node pairs in A, you will likely get [] as a return value for most configurations, unless no nodes in A are adjacent, say.
EDIT
The OP encountered a problem with bipartite graphs for which the sets A, B contained a mix of integers and str types. It looks to me like the build_auxiliary_node_connectivity converts those str nodes to integers causing collisions. I rewrote things above, I think that takes care of it. I don't see anything in the networkx docs about this, so either use all integer nodes or use the rest_build_auxiliary_node_connectivity() thing above.

How to Analyze DAG Time Complexity?

I am learning about topological sort, and graphs in general. I implemented a version below using DFS but I am having trouble understanding why the wikipedia page says this is O(|V|+|E|) and analyzing its time complexity, and the difference between |V|+|E| and n^2 in general.
Firstly, I have two for loops, logic says that it would be (n^2) but also isnt it true that in any DAG(or Tree), there is n-1 edges, and n vertexes? How is this any different from n^2 if we can remove the "-1" for non significant value?
graph = {
1:[4, 5, 7],
2:[3,5,6],
3:[4],
4:[5],
5:[6,7],
6:[7],
7:[]
}
from collections import defaultdict
def topological_sort(graph):
ordered, marked = [], defaultdict(int)
while len(ordered) < len(graph):
for vertex in graph:
if marked[vertex]==0:
visit(graph, vertex, ordered, marked)
return ordered
def visit(graph, n, ordered, marked):
if marked[n] == 1:
raise 'Not a DAG'
marked[n] = 1
for neighbor in graph.get(n):
if marked[neighbor]!=2:
visit(graph, neighbor, ordered, marked)
marked[n] = 2
ordered.insert(0, n)
def main():
print(topological_sort(graph))
main()
The proper implementation works in O(|V| + |E|) time because it goes through every edge and every vertex at most once. It's the same thing as O(|V|^2) for a complete (or almost complete graph). However, it's much better when the graph is sparse.
You implementation is O(|V|^2), not O(|V| + |E|). These two nested loops:
while len(ordered) < len(graph):
for vertex in graph:
if marked[vertex]==0:
visit(graph, vertex, ordered, marked)
do 1 + 2 ... + |V| = O(|V|^2) iterations in the worst case (for instance, for an empty graph). You can easily fix by getting rid of the outer loop (it's that simple: just remove the while loop. You don't need it).

Categories