Calculate the Laplacian matrix of a graph object in NetworkX - python

I am writing my own function that calculates the Laplacian matrix for any directed graph, and am struggling with filling the diagonal entries of the resulting matrix. The following equation is what I use to calculate entries of the Laplacian matrix, where e_ij represents an edge from node i to node j.
I am creating graph objects with NetworkX (https://networkx.org/). I know NetworkX has its own Laplacian function for directed graphs, but I want to be 100% sure I am using a function that carries out the correct computation for my purposes. The code I have developed thus far is shown below, for the following example graph:
# Create a simple example of a directed weighted graph
G = nx.DiGraph()
G.add_nodes_from([1, 2, 3])
G.add_weighted_edges_from([(1, 2, 1), (1, 3, 1), (2, 1, 1), (2, 3, 1), (3, 1, 1), (3, 2, 1)])
# Put node, edge, and weight information into Python lists
node_list = []
for item in G.nodes():
node_list.append(item)
edge_list = []
weight_list = []
for item in G.edges():
weight_list.append(G.get_edge_data(item[0],item[1])['weight'])
item = (item[0]-1,item[1]-1)
edge_list.append(item)
print(edge_list)
> [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]
# Fill in the non-diagonal entries of the Laplacian
num_nodes = len(node_list)
num_edges = len(edge_list)
J = np.zeros(shape = (num_nodes,num_nodes))
for x in range(num_edges):
i = edge_list[x][0]
j = edge_list[x][1]
J[i,j] = weight_list[x]
I am struggling to figure out how to fill in the diagonal entries. edge_list is a list of tuples. To perform the computation in the above equation for L(G), I need to loop through the second entries of each tuple, store the first entry into a temporary list, sum over all the elements of that temporary list, and finally store the negative of the sum in the correct diagonal entry of L(G).
Any suggestions would be greatly appreciated, especially if there are steps above that can be done more efficiently or elegantly.

I adjusted networkx.laplacian_matrix function for undirected graphs a little bit
import networkx as nx
import scipy.sparse
G = nx.DiGraph()
G.add_nodes_from([1, 2, 3])
G.add_weighted_edges_from([(1, 2, 1), (1, 3, 1), (2, 1, 1), (2, 3, 1), (3, 1, 1), (3, 2, 1)])
nodelist = list(G)
A = nx.to_scipy_sparse_matrix(G, nodelist=nodelist, weight="weight", format="csr")
n, m = A.shape
diags = A.sum(axis=0) # 1 = outdegree, 0 = indegree
D = scipy.sparse.spdiags(diags.flatten(), [0], m, n, format="csr")
print((A - D).todense())
# [[-2 1 1]
# [ 1 -2 1]
# [ 1 1 -2]]

I will deviate a little from your method, since I prefer to work with Numpy if possible :P.
In the following snippet, I generate test data for a network of n=10 nodes; that is, I generate an array of tuples V to populate with random nodes, and also a (n,n) array A with the values of the edges between nodes. Hopefully the code is somewhat self-explanatory and is correct (let me know otherwise):
from random import sample
import numpy as np
# Number and list of nodes
n = 10
nodes = list(np.arange(n)) # random.sample needs list
# Test array of linked nodes
# V[i] is a tuple with all nodes the i-node connects to.
V = np.zeros(n, dtype = tuple)
for i in range(n):
nv = np.random.randint(5) # Random number of edges from node i
# To avoid self-loops (do not know if it is your case - comment out if necessary)
itself = True
while itself:
cnodes = sample(nodes, nv) # samples nv elements from the nodes list w/o repetition
itself = i in cnodes
V[i] = cnodes
# Test matrix of weighted edges (from i-node to j-node)
A = np.zeros((n,n))
for i in range(n):
for j in range(n):
if j in V[i]:
A[i,j] = np.random.random()*5
# Laplacian of network
J = np.copy(A) # This already sets the non-diagonal elements
for i in range(n):
J[i,i] = - np.sum(A[:,i]) - A[i,i]

Thank you all for your suggestions! I agree that numpy is the way to go. As a rudimentary solution that I will optimize later, this is what I came up with:
def Laplacian_all(edge_list,weight_list,num_nodes,num_edges):
J = np.zeros(shape = (num_nodes,num_nodes))
for x in range(num_edges):
i = edge_list[x][0]
j = edge_list[x][1]
J[i,j] = weight_list[x]
for i in range(num_nodes):
temp = []
for x in range(num_edges):
if i == edge_list[x][1]:
temp.append(weight_list[x])
temp_sum = -1*sum(temp)
J[i,i] = temp_sum
return J
I have yet to test this on different graphs, but this was what I was hoping to figure out for my immediate purposes.

Related

Generating graph from a batch of adjacency matrices

I am trying to train a network for generating adjacency matrix for graphs. In the training process, for a single graph I use
import networkx as nx
import numpy as np
adj = np.asarray([[0,1,0,0],[1,0,1,0],[0,0,0,1], [0,0,1,0]])
G = nx.from_numpy_matrix(adj)
for transforming adjacency to graph. However, while training the network, I need to do this with a batch of matrices and it seems that networkx cannot do this. Is there a package that can handle the following:
import networkx as nx
import numpy as np
adjs = []
adjs.append(np.asarray([[0,1,0,0],[1,0,1,0],[0,0,0,1], [0,0,1,0]]))
adjs.append(np.asarray([[0,1,0,1],[1,0,0,0],[0,0,0,1], [1,0,1,0]]))
adjs = np.asarray(adjs)
G = nx.from_numpy_matrix(adjs)
You can add a map over the nx.from_numpy_matrix function to apply it over all the adjacency matrices in the adjs list. Something like this
import networkx as nx
import numpy as np
adjs = []
adjs.append(np.asarray([[0,1,0,0],[1,0,1,0],[0,0,0,1], [0,0,1,0]]))
adjs.append(np.asarray([[0,1,0,1],[1,0,0,0],[0,0,0,1], [1,0,1,0]]))
adjs = np.asarray(adjs)
graph_list = list(map(lambda adj_matrix:nx.from_numpy_matrix(adj_matrix), adjs))
Now, graph_list is simply a list of NetworkX graphs.
for graph in graph_list:
print("Printing information for Graph at index:", idx)
print(graph.nodes())
print(graph.edges())
# Output:
# Printing information for Graph at index: 0
# [0, 1, 2, 3]
# [(0, 1), (1, 2), (2, 3)]
# Printing information for Graph at index: 1
# [0, 1, 2, 3]
# [(0, 1), (0, 3), (2, 3)]
You can view the code here as well.
Reference:
Python Map Tutorial

How to segment a matrix by neighbouring values?

Suppose I have a matrix like this:
m = [0, 1, 1, 0,
1, 1, 0, 0,
0, 0, 0, 1]
And I need to get the coordinates of the same neighbouring values (but not diagonally):
So the result would be a list of lists of coordinates in the "matrix" list, starting with [0,0], like this:
r = [[[0,0]],
[[0,1], [0,2], [1,0], [1,1]],
[[0,3], [1,2], [1,3], [2,0], [2,1], [2,2]]
[[2,3]]]
There must be a way to do that, but I'm really stuck.
tl;dr: We take an array of zeros and ones and use scipy.ndimage.label to convert it to an array of zeros and [1,2,3,...]. We then use np.where to find the coordinates of each element with value > 0. Elements that have the same value end up in the same list.
scipy.ndimage.label interprets non-zero elements of a matrix as features and labels them. Each unique feature in the input gets assigned a unique label. Features are e.g. groups of adjacent elements (or pixels) with the same value.
import numpy as np
from scipy.ndimage import label
# make dummy data
arr = np.array([[0,1,1,0], [1,1,0,0], [0,0,0,1]])
#initialise list of features
r = []
Since OP wanted all features, that is groups of zero and non-zero pixels, we use label twice: First on the original array, and second on 1 - original array. (For an array of zeros and ones, 1 - array just flips the values).
Now, label returns a tuple, containing the labelled array (which we are interested in) and the number of features that it found in that array (which we could use, but when I coded this, I chose to ignore it. So, we are interested in the first element of the tuple returned by label, which we access with [0]:
a = label(arr)[0]
b = label(1-arr)[0]
Now we check which unique pixel values label has assigned. So we want the set of a and b, repectively. In order for set() to work, we need to linearise both arrays, which we do with .ravel(). We have to subtract {0} in both cases, because for both a and b we are interested in only the non-zero values.
So, having found the unique labels, we loop through these values, and use np.where to find where on the array a given value is located. np.where returns a tuple of arrays. The first element of this tuple are all the row-coordinates for which the condition was met, and the second element are the column-coordinates.
So, we can use zip(* to unpack the two containers of length n to n containers of length 2. This means that we go from list of all row-coords + list of all column-coords to list of all row-column-coordinate pairs for which the condition is met. Finally in python 3, zip is a generator, which we can evaluate by calling list() on it. The resulting list is then appended to our list of coordinates, r.
for x in set(a.ravel())-{0}:
r.append(list(zip(*np.where(a==x))))
for x in set(b.ravel())-{0}:
r.append(list(zip(*np.where(b==x))))
print(r)
[[(0, 1), (0, 2), (1, 0), (1, 1)],
[(2, 3)],
[(0, 0)],
[(0, 3), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2)]]
That said, we can speed up this code slightly by making use of the fact that label returns the number of features it assigned. This allows us to avoid the set command, which can take time on large arrays:
a, num_a = label(arr)
for x in range(1, num_a+1): # range from 1 to the highest label
r.append(list(zip(*np.where(a==x))))
A solution with only standard libraries:
from pprint import pprint
m = [0, 1, 1, 0,
1, 1, 0, 0,
0, 0, 0, 1]
def is_neighbour(x1, y1, x2, y2):
return (x1 in (x2-1, x2+1) and y1 == y2) or \
(x1 == x2 and y1 in (y2+1, y2-1))
def is_value_touching_group(val, groups, x, y):
for d in groups:
if d['color'] == val and any(is_neighbour(x, y, *cell) for cell in d['cells']):
return d
def check(m, w, h):
groups = []
for i in range(h):
for j in range(w):
val = m[i*w + j]
touching_group = is_value_touching_group(val, groups, i, j)
if touching_group:
touching_group['cells'].append( (i, j) )
else:
groups.append({'color':val, 'cells':[(i, j)]})
final_groups = []
while groups:
current_group = groups.pop()
for c in current_group['cells']:
touching_group = is_value_touching_group(current_group['color'], groups, *c)
if touching_group:
touching_group['cells'].extend(current_group['cells'])
break
else:
final_groups.append(current_group['cells'])
return final_groups
pprint( check(m, 4, 3) )
Prints:
[[(2, 3)],
[(0, 3), (1, 3), (1, 2), (2, 2), (2, 0), (2, 1)],
[(0, 1), (0, 2), (1, 1), (1, 0)],
[(0, 0)]]
Returns as a list of groups under value key.
import numpy as np
import math
def get_keys(old_dict):
new_dict = {}
for key, value in old_dict.items():
if value not in new_dict.keys():
new_dict[value] = []
new_dict[value].append(key)
else:
new_dict[value].append(key)
return new_dict
def is_neighbor(a,b):
if a==b:
return True
else:
distance = abs(a[0]-b[0]), abs(a[1]-b[1])
return distance == (0,1) or distance == (1,0)
def collate(arr):
arr2 = arr.copy()
ret = []
for a in arr:
for i, b in enumerate(arr2):
if set(a).intersection(set(b)):
a = list(set(a+b))
ret.append(a)
for clist in ret:
clist.sort()
return [list(y) for y in set([tuple(x) for x in ret])]
def get_groups(d):
for k,v in d.items():
ret = []
for point in v:
matches = [a for a in v if is_neighbor(point, a)]
ret.append(matches)
d[k] = collate(ret)
return d
a = np.array([[0,1,1,0],
[1,1,0,0],
[0,0,1,1]])
d = dict(np.ndenumerate(a))
d = get_keys(d)
d = get_groups(d)
print(d)
Result:
{
0: [[(0, 3), (1, 2), (1, 3)], [(0, 0)], [(2, 0), (2, 1)]],
1: [[(2, 2), (2, 3)], [(0, 1), (0, 2), (1, 0), (1, 1)]]
}

Spanning Tree list from edge list in Python

I am trying to figure out how to print a Spanning Tree list from a given list of edges. For example, if I read in:
0 1
2 1
0 2
1 3
I want to print out a Spanning Tree list of:
[[1], [0,2,3], [1], [1]]
I know how to create an adjacency list using the code:
n = int(input("Enter number of vertices: "))
adjList = [[] for i in range(n)]
with open("graph.txt") as edges:
for line in edges:
line = line.replace("\n", "").split(" ")
adjList[int(line[0])].append(int(line[1]))
adjList[int(line[1])].append(int(line[0]))
print(l)
But creating a Spanning Tree is a different story. Given that the Spanning Tree would be unweighted, I am not sure if I need to use some version of Prim's Algorithm here?
Any help is appreciated!
This implementation is based on the networkx package (documentation for it is here). Note that there's a few different spanning trees that I think you can get for that set of connections. This code will represent the spanning tree by its collection of edges, but I'll see if I can modify it to represent the tree the way you prefer.
import networkx as nx
def spanning_tree_from_edges(edges):
graph = nx.Graph()
for n1, n2 in edges:
graph.add_edge(n1, n2)
spanning_tree = nx.minimum_spanning_tree(graph)
return spanning_tree
if __name__ == '__main__':
edges = [(0, 1), (2, 1), (0, 2), (1, 3)]
tree = spanning_tree_from_edges(edges)
print(sorted(tree.edges()))
Output
[(0, 1), (0, 2), (1, 3)]
This alternative seems to represent the tree as the collection of node connections (i.e. in the format you want). Note that this collection is different because it's a different spanning tree to what you get:
if __name__ == '__main__':
edges = [(0, 1), (2, 1), (0, 2), (1, 3)]
tree = spanning_tree_from_edges(edges)
print([list(nx.all_neighbors(tree, n)) for n in tree.nodes()])
Output
[[1, 2], [0, 3], [0], [1]]
Starting Graph
Generated Spanning Tree

How to index a Cartesian product

Suppose that the variables x and theta can take the possible values [0, 1, 2] and [0, 1, 2, 3], respectively.
Let's say that in one realization, x = 1 and theta = 3. The natural way to represent this is by a tuple (1,3). However, I'd like to instead label the state (1,3) by a single index. A 'brute-force' method of doing this is to form the Cartesian product of all the possible ordered pairs (x,theta) and look it up:
import numpy as np
import itertools
N_x = 3
N_theta = 4
np.random.seed(seed = 1)
x = np.random.choice(range(N_x))
theta = np.random.choice(range(N_theta))
def get_box(x, N_x, theta, N_theta):
states = list(itertools.product(range(N_x),range(N_theta)))
inds = [i for i in range(len(states)) if states[i]==(x,theta)]
return inds[0]
print (x, theta)
box = get_box(x, N_x, theta, N_theta)
print box
This gives (x, theta) = (1,3) and box = 7, which makes sense if we look it up in the states list:
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]
However, this 'brute-force' approach seems inefficient, as it should be possible to determine the index beforehand without looking it up. Is there any general way to do this? (The number of states N_x and N_theta may vary in the actual application, and there might be more variables in the Cartesian product).
If you always store your states lexicographically and the possible values for x and theta are always the complete range from 0 to some maximum as your examples suggests, you can use the formula
index = x * N_theta + theta
where (x, theta) is one of your tuples.
This generalizes in the following way to higher dimensional tuples: If N is a list or tuple representing the ranges of the variables (so N[0] is the number of possible values for the first variable, etc.) and p is a tuple, you get the index into a lexicographically sorted list of all possible tuples using the following snippet:
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
This might not be the most Pythonic way to do it but it shows what is going on: You think of your tuples as a hypercube where you can only go along one dimension, but if you reach the edge, your coordinate in the "next" dimension increases and your traveling coordinate resets. The reader is advised to draw some pictures. ;)
I think it depends on the data you have. If they are sparse, the best solution is a dictionary. And works for any tuple's dimension.
import itertools
import random
n = 100
m = 100
l1 = [i for i in range(n)]
l2 = [i for i in range(m)]
a = {}
prod = [element for element in itertools.product(l1, l2)]
for i in prod:
a[i] = random.randint(1, 100)
A very good source about the performance is in this discution.
For the sake of completeness I'll include my implementation of Julian Kniephoff's solution, get_box3, with a slightly adapted version of the original implementation, get_box2:
# 'Brute-force' method
def get_box2(p, N):
states = list(itertools.product(*[range(n) for n in N]))
return states.index(p)
# 'Analytic' method
def get_box3(p, N):
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
return index
p = (1,3,2) # Tuple characterizing the total state of the system
N = [3,4,3] # List of the number of possible values for each state variable
print "Brute-force method yields %s" % get_box2(p, N)
print "Analytical method yields %s" % get_box3(p, N)
Both the 'brute-force' and 'analytic' method yield the same result:
Brute-force method yields 23
Analytical method yields 23
but I expect the 'analytic' method to be faster. I've changed the representation to p and N as suggested by Julian.

Check if some elements in a matrix are cohesive

I have to write a very little Python program that checks whether some group of coordinates are all connected together (by a line, not diagonally). The next 2 pictures show what I mean. In the left picture all colored groups are cohesive, in the right picture not:
I've already made this piece of code, but it doesn't seem to work and I'm quite stuck, any ideas on how to fix this?
def cohesive(container):
co = container.pop()
container.add(co)
return connected(co, container)
def connected(co, container):
done = {co}
todo = set(container)
while len(neighbours(co, container, done)) > 0 and len(todo) > 0:
done = done.union(neighbours(co, container, done))
return len(done) == len(container)
def neighbours(co, container, done):
output = set()
for i in range(-1, 2):
if i != 0:
if (co[0] + i, co[1]) in container and (co[0] + i, co[1]) not in done:
output.add((co[0] + i, co[1]))
if (co[0], co[1] + i) in container and (co[0], co[1] + i) not in done:
output.add((co[0], co[1] + i))
return output
this is some reference material that should return True:
cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
and this should return False:
cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Both tests work, but when I try to test it with different numbers the functions fail.
You can just take an element and attach its neighbors while it is possible.
def dist(A,B):return abs(A[0]-B[0]) + abs(A[1]-B[1])
def grow(K,E):return {M for M in E for N in K if dist(M,N)<=1}
def cohesive(E):
K={min(E)} # an element
L=grow(K,E)
while len(K)<len(L) : K,L=L,grow(L,E)
return len(L)==len(E)
grow(K,E) return the neighborhood of K.
In [1]: cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
Out[1]: True
In [2]: cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Out[2]: False
Usually, to check if something is connected, you need to use disjoint set data structures, the more efficient variations include weighted quick union, weighted quick union with path compression.
Here's an implementation, http://algs4.cs.princeton.edu/15uf/WeightedQuickUnionUF.java.html which you can modify to your needs. Also, the implementation found in the book "The Design and Analysis of Computer Algorithms" by A. Aho, allows you to specify the name of the group that you add 2 connected elements to, so I think that's the modification you're looking for.(It just involves using 1 extra array which keeps track of group numbers).
As a side note, since disjoint sets usually apply to arrays, don't forget that you can represent an N by N matrix as an array of size N*N.
EDIT: just realized that it wasn't clear to me what you were asking at first, and I realized that you also mentioned that diagonal components aren't connected, in that case the algorithm is as follows:
0) Check if all elements refer to the same group.
1) Iterate through the array of pairs that represent coordinates in the matrix in question.
2) For each pair make a set of pairs that satisfies the following formula:
|entry.x - otherEntry.x| + |entry.y - otherEntry.y|=1.
'entry' refers to the element that the outer for loop is referring to.
3) Check if all of the sets overlap. That can be done by "unioning" the sets you're looking at, at the end if you get more than 1 set, then the elements are not cohesive.
The complexity is O(n^2 + n^2 * log(n)).
Example:
(0,4), (1,2), (1,4), (2,2), (2,3)
0) check that they are all in the same group:
all of them belong to group 5.
1) make sets:
set1: (0,4), (1,4)
set2: (1,2), (2,2)
set3: (0,4), (1,4) // here we suppose that sets are sorted, other than that it
should be (1,4), (0,4)
set4: (1,2), (2,2), (2,3)
set5: (2,2), (2,3)
2) check for overlap:
set1 overlaps with set3, so we get:
set1' : (0,4), (1,4)
set2 overlaps with set4 and set 5, so we get:
set2' : (1,2), (2,2), (2,3)
as you can see set1' and set2' don't overlap, hence you get 2 disjoint sets that are in the same group, so the answer is 'false'.
Note that this is inefficient, but I have no idea how to do it more efficiently, but this answers your question.
The logic in your connected function seems wrong. You make a todo variable, but then never change its contents. You always look for neighbours around the same starting point.
Try this code instead:
def connected(co, container):
done = {co}
todo = {co}
while len(todo) > 0:
co = todo.pop()
n = neighbours(co, container, done)
done = done.union(n)
todo = todo.union(n)
return len(done) == len(container)
todo is a set of all the points we are still to check.
done is a set of all the points we have found to be 4-connected to the starting point.
I'd tackle this problem differently... if you're looking for five exactly, that means:
Every coordinate in the line has to be neighbouring another coordinate in the line, because anything less means that coordinate is disconnected.
At least three of the coordinates have to be neighbouring another two or more coordinates in the line, because anything less and the groups will be disconnected.
Hence, you can just get the coordinate's neighbours and check whether both conditions are fulfilled.
Here is a basic solution:
def cells_are_connected(connections):
return all(c > 0 for c in connections)
def groups_are_connected(connections):
return len([1 for c in connections if c > 1]) > 2
def cohesive(coordinates):
connections = []
for x, y in coordinates:
neighbours = [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]
connections.append(len([1 for n in neighbours if n in coordinates]))
return cells_are_connected(connections) and groups_are_connected(connections)
print cohesive([(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)]) # True
print cohesive([(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)]) # False
No need for a general-case solution or union logic. :) Do note that it's specific to the five-in-a-line problem, however.

Categories