How to visualize communities from a list in igraph python - python

I have a community list as the following list_community.
How do I edit the code below to make the community visible?
from igraph import *
list_community = [['A', 'B', 'C', 'D'],['E','F','G'],['G', 'H','I','J']]
list_nodes = ['A', 'B', 'C', 'D','E','F','G','H','I','J']
tuple_edges = [('A','B'),('A','C'),('A','D'),('B','C'),('B','D'), ('C','D'),('C','E'),
('E','F'),('E','G'),('F','G'),('G','H'),
('G','I'), ('G','J'),('H','I'),('H','J'),('I','J'),]
# Make a graph
g_test = Graph()
g_test.add_vertices(list_nodes)
g_test.add_edges(tuple_edges)
# Plot
layout = g_test.layout("kk")
g.vs["name"] = list_nodes
visual_style = {}
visual_style["vertex_label"] = g.vs["name"]
visual_style["layout"] = layout
ig.plot(g_test, **visual_style)
I would like a plot that visualizes the community as shown below.
I can also do this by using a module other than igraph.
Thank you.

In igraph you can use the VertexCover to draw polygons around clusters (as also suggested by Szabolcs in his comment). You have to supply the option mark_groups when plotting the cover, possibly with some additional palette if you want. See some more detail in the documentation here.
In order to construct the VertexCover, you first have to make sure you get integer indices for each node in the graph you created. You can do that using g_test.vs.find.
clusters = [[g_test.vs.find(name=v).index for v in cl] for cl in list_community]
cover = ig.VertexCover(g_test, clusters)
After that, you can simply draw the cover like
ig.plot(cover,
mark_groups=True,
palette=ig.RainbowPalette(3))
resulting in the following picture

Here is a script that somewhat achieves what you're looking for. I had to handle the cases of single-, and two-nodes communities separately, but for greater than two nodes this draws a polygon within the nodes.
I had some trouble with matplotlib not accounting for overlapping edges and faces of polygons which meant the choice was between (1) not having the polygon surround the nodes or (2) having an extra outline just inside the edge of the polygon due to matplotlib overlapping the widened edge with the fill of the polygon. I left a comment on how to change the code from option (2) to option (1).
I also blatantly borrowed a convenience function from this post to handle correctly sorting the nodes in the polygon for appropriate filling by matplotlib's plt.fill().
Option 1:
Option 2:
Full code:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
def sort_xy(x, y):
x0 = np.mean(x)
y0 = np.mean(y)
r = np.sqrt((x-x0)**2 + (y-y0)**2)
angles = np.where((y-y0) > 0, np.arccos((x-x0)/r), 2*np.pi-np.arccos((x-x0)/r))
mask = np.argsort(angles)
x_sorted = x[mask]
y_sorted = y[mask]
return x_sorted, y_sorted
G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(8, 10))
nx.draw(G, pos=pos, with_labels=True)
communities = nx.community.louvain_communities(G)
alpha = 0.5
edge_padding = 10
colors = cm.get_cmap('viridis', len(communities))
for i, comm in enumerate(communities):
if len(comm) == 1:
cir = plt.Circle((pos[comm.pop()]), edge_padding / 100, alpha=alpha, color=colors(i))
ax.add_patch(cir)
elif len(comm) == 2:
comm_pos = {k: pos[k] for k in comm}
coords = [a for a in zip(*comm_pos.values())]
x, y = coords[0], coords[1]
plt.plot(x, y, linewidth=edge_padding, linestyle="-", alpha=alpha, color=colors(i))
else:
comm_pos = {k: pos[k] for k in comm}
coords = [a for a in zip(*comm_pos.values())]
x, y = sort_xy(np.array(coords[0]), np.array(coords[1]))
plt.fill(x, y, alpha=alpha, facecolor=colors(i),
edgecolor=colors(i), # set to None to remove edge padding
linewidth=edge_padding)

Related

Update network graph plotting in same figure and change node color according to node degree

I'm trying to update graph plotting, every 5-7 second. so I tried matplotlib to refresh the plot figure. But by using this, the same graph plotted again and again.
for better understanding - I have two files one for node creation, which is changed by every 10 seconds using another program and another is edge creation, which is static. from this files trying to create one graph which is dynamic like this - https://drive.google.com/file/d/1snFITs4jvW5H8JSF-3pqFE2F1XjX88PD/view?usp=sharing
my network code is -
fig = plt.figure()
net = fig.add_subplot(111)
def update(it):
with open('node.csv', 'r') as nodecsv:
nodereader = csv.reader(nodecsv)
nodes = [n for n in nodereader][1:]
node_names = [n[0] for n in nodes]
with open('edge.csv', 'r') as edgecsv:
edgereader = csv.reader(edgecsv)
edges = [tuple(e) for e in edgereader][1:]
g = nx.Graph()
g.add_nodes_from(node_names)
g.add_edges_from(edges)
print(nx.info(g))
# * ******************** Node Color Part **************************** *
nx.draw(g,pos=pos,node_size=node_size,node_color=color, linewidths=2,**options)
ani = animation.FuncAnimation(fig, update, interval=1000)
plt.show()
and another thing is, I'm trying to apply node colour according to the node attribute that is working but also according to node degree means if node attribute colour is blue then apply degree separately in blue. If green then applies degree separately in green and so on.
My node colour code -
node_status = {}
for node in nodes:
node_status[node[0]] = node[1]
nx.set_node_attributes(g,node_status,'node_status')
color = []
for n in g.nodes():
#print(n,g.nodes[n]['node_status'])
if g.nodes[n]['node_status'] == 'A': color.append("blue")
if g.nodes[n]['node_status'] == 'B': color.append("yellow")
if g.nodes[n]['node_status'] == 'C': color.append("red")
if g.nodes[n]['node_status'] == 'D': color.append("green")
if g.nodes[n]['node_status'] == 'E': color.append("pink")
betCent = nx.betweenness_centrality(g, normalized=True, endpoints=True)
#node_color = [20000.0 * g.degree(v) for v in g]
node_size = [v * 10000 for v in betCent.values()]
Thanks in advance.

Networkx apparently scrambling color list python [duplicate]

I managed to produce the graph correctly, but with some more testing noted inconsistent result for the following two different line of codes:
colors = [h.edge[i][j]['color'] for (i,j) in h.edges_iter()]
widths = [h.edge[i][j]['width'] for (i,j) in h.edges_iter()]
nx.draw_circular(h, edge_color=colors, width=widths)
This approach results in consistent output, while the following produces wrong color/size per the orders of edges:
colors = list(nx.get_edge_attributes(h,'color').values())
widths = list(nx.get_edge_attributes(h,'width').values())
nx.draw_circular(h, edge_color=colors, width=widths)
However, it looks to me the above two lines both rely on the function call to return the attributes per the order of edges. Why the different results?
It looks a bit clumsy to me to access attributes with h[][][]; is it possible to access it by dot convention, e.g. edge.color for edge in h.edges().
Or did I miss anything?
The order of the edges passed to the drawing functions are important. If you don't specify (using the edges keyword) you'll get the default order of G.edges(). It is safest to explicitly give the parameter like this:
import networkx as nx
G = nx.Graph()
G.add_edge(1,2,color='r',weight=2)
G.add_edge(2,3,color='b',weight=4)
G.add_edge(3,4,color='g',weight=6)
pos = nx.circular_layout(G)
edges = G.edges()
colors = [G[u][v]['color'] for u,v in edges]
weights = [G[u][v]['weight'] for u,v in edges]
nx.draw(G, pos, edges=edges, edge_color=colors, width=weights)
This results in an output like this:
Dictionaries are the underlying data structure used for NetworkX graphs, and as of Python 3.7+ they maintain insertion order.
This means that we can safely use nx.get_edge_attributes to retrieve edge attributes since we are guaranteed to have the same edge order in every run of Graph.edges() (which is internally called by get_edge_attributes).
So when plotting, we can directly set attributes such as edge_color and width from the result returned by get_edge_attributes. Here's an example:
G = nx.Graph()
G.add_edge(0,1,color='r',weight=2)
G.add_edge(1,2,color='g',weight=4)
G.add_edge(2,3,color='b',weight=6)
G.add_edge(3,4,color='y',weight=3)
G.add_edge(4,0,color='m',weight=1)
colors = nx.get_edge_attributes(G,'color').values()
weights = nx.get_edge_attributes(G,'weight').values()
pos = nx.circular_layout(G)
nx.draw(G, pos,
edge_color=colors,
width=list(weights),
with_labels=True,
node_color='lightgreen')
if you want to avoid adding edge colors and alphas / width manually, you may also find this function helpful:
def rgb_to_hex(rgb):
return '#%02x%02x%02x' % rgb
adjacency_matrix = np.array([[0, 0, 0.5], [1, 0, 1], [1, 0.5, 0]]))
n_graphs = 5
fig, axs = plt.subplots(1, len(n_graphs), figsize=(19,2.5))
for graph in range(n_graphs):
pos = {0: (1, 0.9), 1: (0.9, 1), 2: (1.1, 1)}
# draw DAG graph from adjacency matrix
gr = nx.from_numpy_matrix(adjacency_matrix, create_using=nx.DiGraph)
weights = nx.get_edge_attributes(gr, "weight")
# adding nodes
all_rows = range(0, adjacency_matrix.shape[0])
for n in all_rows:
gr.add_node(n)
# getting edges
edges = gr.edges()
# weight and color of edges
scaling_factor = 4 # to emphasise differences
alphas = [weights[edge] * scaling_factor for edge in edges]
colors = [rgb_to_hex(tuple(np.repeat(int(255 * (1-
weights[edge])),3))) for edge in edges]
# draw graph
nx.draw(gr,
pos,
ax=axs[graph],
edgecolors='black',
node_color='white',
node_size=2000,
labels={0: "A", 1: "B", 2: "C"},
font_weight='bold',
linewidths=2,
with_labels=True,
connectionstyle="arc3,rad=0.15",
edge_color=colors,
width=alphas)
plt.tight_layout()

networkx - change color/width according to edge attributes - inconsistent result

I managed to produce the graph correctly, but with some more testing noted inconsistent result for the following two different line of codes:
colors = [h.edge[i][j]['color'] for (i,j) in h.edges_iter()]
widths = [h.edge[i][j]['width'] for (i,j) in h.edges_iter()]
nx.draw_circular(h, edge_color=colors, width=widths)
This approach results in consistent output, while the following produces wrong color/size per the orders of edges:
colors = list(nx.get_edge_attributes(h,'color').values())
widths = list(nx.get_edge_attributes(h,'width').values())
nx.draw_circular(h, edge_color=colors, width=widths)
However, it looks to me the above two lines both rely on the function call to return the attributes per the order of edges. Why the different results?
It looks a bit clumsy to me to access attributes with h[][][]; is it possible to access it by dot convention, e.g. edge.color for edge in h.edges().
Or did I miss anything?
The order of the edges passed to the drawing functions are important. If you don't specify (using the edges keyword) you'll get the default order of G.edges(). It is safest to explicitly give the parameter like this:
import networkx as nx
G = nx.Graph()
G.add_edge(1,2,color='r',weight=2)
G.add_edge(2,3,color='b',weight=4)
G.add_edge(3,4,color='g',weight=6)
pos = nx.circular_layout(G)
edges = G.edges()
colors = [G[u][v]['color'] for u,v in edges]
weights = [G[u][v]['weight'] for u,v in edges]
nx.draw(G, pos, edges=edges, edge_color=colors, width=weights)
This results in an output like this:
Dictionaries are the underlying data structure used for NetworkX graphs, and as of Python 3.7+ they maintain insertion order.
This means that we can safely use nx.get_edge_attributes to retrieve edge attributes since we are guaranteed to have the same edge order in every run of Graph.edges() (which is internally called by get_edge_attributes).
So when plotting, we can directly set attributes such as edge_color and width from the result returned by get_edge_attributes. Here's an example:
G = nx.Graph()
G.add_edge(0,1,color='r',weight=2)
G.add_edge(1,2,color='g',weight=4)
G.add_edge(2,3,color='b',weight=6)
G.add_edge(3,4,color='y',weight=3)
G.add_edge(4,0,color='m',weight=1)
colors = nx.get_edge_attributes(G,'color').values()
weights = nx.get_edge_attributes(G,'weight').values()
pos = nx.circular_layout(G)
nx.draw(G, pos,
edge_color=colors,
width=list(weights),
with_labels=True,
node_color='lightgreen')
if you want to avoid adding edge colors and alphas / width manually, you may also find this function helpful:
def rgb_to_hex(rgb):
return '#%02x%02x%02x' % rgb
adjacency_matrix = np.array([[0, 0, 0.5], [1, 0, 1], [1, 0.5, 0]]))
n_graphs = 5
fig, axs = plt.subplots(1, len(n_graphs), figsize=(19,2.5))
for graph in range(n_graphs):
pos = {0: (1, 0.9), 1: (0.9, 1), 2: (1.1, 1)}
# draw DAG graph from adjacency matrix
gr = nx.from_numpy_matrix(adjacency_matrix, create_using=nx.DiGraph)
weights = nx.get_edge_attributes(gr, "weight")
# adding nodes
all_rows = range(0, adjacency_matrix.shape[0])
for n in all_rows:
gr.add_node(n)
# getting edges
edges = gr.edges()
# weight and color of edges
scaling_factor = 4 # to emphasise differences
alphas = [weights[edge] * scaling_factor for edge in edges]
colors = [rgb_to_hex(tuple(np.repeat(int(255 * (1-
weights[edge])),3))) for edge in edges]
# draw graph
nx.draw(gr,
pos,
ax=axs[graph],
edgecolors='black',
node_color='white',
node_size=2000,
labels={0: "A", 1: "B", 2: "C"},
font_weight='bold',
linewidths=2,
with_labels=True,
connectionstyle="arc3,rad=0.15",
edge_color=colors,
width=alphas)
plt.tight_layout()

How to put colours in dendograms of matplotlib - scipy in python?

I have the following code to perform hierarchical clutering on data:
Z = linkage(data,method='weighted')
plt.subplot(2,1,1)
dendro = dendrogram(Z)
leaves = dendro['leaves']
print leaves
plt.show()
How ever at the dendogram all the clusters have the same color (blue). Is there a way to use different colors with respect to similarity in between clusters?
Look at the documentation, Looks like you could pass the link_color_func keyword or color_threshold keyword to have different colors.
Edit:
The default behavior of the dendrogram coloring scheme is, given a color_threshold = 0.7*max(Z[:,2]) to color all the descendent links below a cluster node k the same color if k is the first node below the cut threshold; otherwise, all links connecting nodes with distances greater than or equal to the threshold are colored blue [from the docs].
What the hell does this mean? Well, if you look at a dendrogram, different clusters linked together. The "distance" between two clusters is the height of the link between them. The color_threshold is the height below which new clusters will be different colors. If all your clusters are blue, then you need to raise your color_threshold. For example,
In [48]: mat = np.random.rand(10, 10)
In [49]: z = linkage(mat, method="weighted")
In [52]: d = dendrogram(z)
In [53]: d['color_list']
Out[53]: ['g', 'g', 'b', 'r', 'c', 'c', 'c', 'b', 'b']
In [54]: plt.show()
I can check what the default color_threshold is by
In [56]: 0.7*np.max(z[:,2])
Out[56]: 1.0278719020096947
If I lower the color_threshold, I get more blue because more links have distances greater than the new color_threshold. You can see this visually because all the links above 0.9 are now blue:
In [64]: d = dendrogram(z, color_threshold=.9)
In [65]: d['color_list']
Out[65]: ['g', 'b', 'b', 'r', 'b', 'b', 'b', 'b', 'b']
In [66]: plt.show()
If I increase the color_threshold to 1.2, the links below 1.2 will no longer be blue. Additionally, the cyan and red links will merge into a single color because their parent link is below 1.2:
The following code will produce a dendrogram with a different color for each leaf. If in the process of merging clusters it encounters two clusters with different colors, then it selects the default one dflt_col = tab:blue.
Note: the link_matrix function is a plain-copy of the one from the AgglomerativeClustering example in scikit-learn.
To explain what all it does, it's really time-consuming. Thus, print directly every unclear step.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
from scipy.spatial.distance import squareform, pdist
from matplotlib.pyplot import cm
from sklearn.cluster import AgglomerativeClustering
import matplotlib.colors as clrs
def link_matrix(model, **kwargs):
# Create linkage matrix and then plot the dendrogram as in the standard sci-kit learn documentation
counts = np.zeros(model.children_.shape[0])
n_samples = len(model.labels_)
for i, merge in enumerate(model.children_):
current_count = 0
for child_idx in merge:
if child_idx < n_samples:
current_count += 1 # leaf node
else:
current_count += counts[child_idx - n_samples]
counts[i] = current_count
Z = np.column_stack(
[model.children_, model.distances_, counts]
).astype(float)
return Z
def assign_link_colors(model):
n_clusters = len(model.Z)
scl_map_to_hex = mpl.cm.ScalarMappable(cmap = "jet").to_rgba(np.unique(model.labels_), norm = True) #colors.to_hex()
col = [clrs.to_hex(rgb) for rgb in scl_map_to_hex]
dic_labels = {s:[c, idx] for s, c, idx in zip(np.arange(len(model.feature_names_in_), dtype = int), model.feature_names_in_, model.labels_, )}
model.dict_idx_name_cl = {k: v for k, v in sorted(dic_labels.items(), key=lambda item: item[1][1])}
dflt_col = "tab:blue" # Unclustered blue
model.dict_colors = {x:col[model.dict_idx_name_cl[x][1]] for x in model.dict_idx_name_cl}
link_cols = {}
for i, i_cl in enumerate(model.Z[:,:2].astype(int)): # select only 1st two rows
c1, c2 = (link_cols[x] if x > n_clusters else model.dict_colors[x] for x in i_cl)
# Choice of coloring assignment: if same color --> ok; if no leaf, dft ("undefined") color
if c1 == c2:
tmp_cl = c1
elif min(i_cl) <= n_clusters: # select the leaf color
tmp_cl = model.dict_colors[min(i_cl)]
else:
tmp_cl = dflt_col
link_cols[i+1+n_clusters] = tmp_cl
#print(f'-link_cols: {link_cols}',)
return link_cols
def mod_2_dendrogram(model, **kwargs):
plt.style.use('seaborn-whitegrid')
plt.figure(figsize=(int(.5 * len(model.feature_names_in_)), 7))
print(f'-0.7*max(Z[:,2]): {0.7*max(model.Z[:,2])}',)
# Plot the corresponding dendrogram
ddata = dendrogram(model.Z, #count_sort = "descending",
**kwargs)
# Plot distances on the dendrogram
# plot cluster points & distance labels
y_lim = dist_thr
for i, d, c in zip(ddata['icoord'], ddata['dcoord'], ddata['color_list']):
x = sum(i[1:3])/2
y = d[1]
if y > y_lim:
plt.plot(x, y, 'o', c=c, markeredgewidth=0)
plt.annotate(np.round(y,2), (x, y), xytext=(0, -5),
textcoords='offset points',
va='top', ha='center', fontsize=9)
plt.axhline(y=dist_thr, color='orange', alpha = 0.7, linestyle='--', label = f"threshold: {int(model.dist_thr)}")
plt.title(f'Agglomerative Dendrogram with n_clust: {model.n_clusters_}')
plt.xlabel('Clusters')
plt.ylabel('Distance')
plt.legend()
return ddata
Now, the running example:
import string
import pandas as pd
np.random.seed(0)
dist = np.random.randint(1e4, size = (10,10))
np.fill_diagonal(dist, 0)
dist = pd.DataFrame(dist, columns = list(string.ascii_lowercase)[:dist.shape[0]])
dist_thr = 1.5e3
model = AgglomerativeClustering(distance_threshold = dist_thr, n_clusters=None, linkage = "single", metric = "precomputed",)
model.dist_thr = dist_thr
model = model.fit(dist)
model.Z = link_matrix(model)
link_cols = assign_link_colors(model)
_ = mod_2_dendrogram(model, labels = dist.columns,
link_color_func = lambda x: link_cols[x])

Shape recognition with numpy/scipy (perhaps watershed)

My goal is to trace drawings that have a lot of separate shapes in them and to split these shapes into individual images. It is black on white. I'm quite new to numpy,opencv&co - but here is my current thought:
scan for black pixels
black pixel found -> watershed
find watershed boundary (as polygon path)
continue searching, but ignore points within the already found boundaries
I'm not very good at these kind of things, is there a better way?
First I tried to find the rectangular bounding box of the watershed results (this is more or less a collage of examples):
from numpy import *
import numpy as np
from scipy import ndimage
np.set_printoptions(threshold=np.nan)
a = np.zeros((512, 512)).astype(np.uint8) #unsigned integer type needed by watershed
y, x = np.ogrid[0:512, 0:512]
m1 = ((y-200)**2 + (x-100)**2 < 30**2)
m2 = ((y-350)**2 + (x-400)**2 < 20**2)
m3 = ((y-260)**2 + (x-200)**2 < 20**2)
a[m1+m2+m3]=1
markers = np.zeros_like(a).astype(int16)
markers[0, 0] = 1
markers[200, 100] = 2
markers[350, 400] = 3
markers[260, 200] = 4
res = ndimage.watershed_ift(a.astype(uint8), markers)
unique(res)
B = argwhere(res.astype(uint8))
(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1
tr = a[ystart:ystop, xstart:xstop]
print tr
Somehow, when I use the original array (a) then argwhere seems to work, but after the watershed (res) it just outputs the complete array again.
The next step could be to find the polygon path around the shape, but the bounding box would be great for now!
Please help!
#Hooked has already answered most of your question, but I was in the middle of writing this up when he answered, so I'll post it in the hopes that it's still useful...
You're trying to jump through a few too many hoops. You don't need watershed_ift.
You use scipy.ndimage.label to differentiate separate objects in a boolean array and scipy.ndimage.find_objects to find the bounding box of each object.
Let's break things down a bit.
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
def draw_circle(grid, x0, y0, radius):
ny, nx = grid.shape
y, x = np.ogrid[:ny, :nx]
dist = np.hypot(x - x0, y - y0)
grid[dist < radius] = True
return grid
# Generate 3 circles...
a = np.zeros((512, 512), dtype=np.bool)
draw_circle(a, 100, 200, 30)
draw_circle(a, 400, 350, 20)
draw_circle(a, 200, 260, 20)
# Label the objects in the array.
labels, numobjects = ndimage.label(a)
# Now find their bounding boxes (This will be a tuple of slice objects)
# You can use each one to directly index your data.
# E.g. a[slices[0]] gives you the original data within the bounding box of the
# first object.
slices = ndimage.find_objects(labels)
#-- Plotting... -------------------------------------
fig, ax = plt.subplots()
ax.imshow(a)
ax.set_title('Original Data')
fig, ax = plt.subplots()
ax.imshow(labels)
ax.set_title('Labeled objects')
fig, axes = plt.subplots(ncols=numobjects)
for ax, sli in zip(axes.flat, slices):
ax.imshow(labels[sli], vmin=0, vmax=numobjects)
tpl = 'BBox:\nymin:{0.start}, ymax:{0.stop}\nxmin:{1.start}, xmax:{1.stop}'
ax.set_title(tpl.format(*sli))
fig.suptitle('Individual Objects')
plt.show()
Hopefully that makes it a bit clearer how to find the bounding boxes of the objects.
Use the ndimage library from scipy. The function label places a unique tag on each block of pixels that are within a threshold. This identifies the unique clusters (shapes). Starting with your definition of a:
from scipy import ndimage
image_threshold = .5
label_array, n_features = ndimage.label(a>image_threshold)
# Plot the resulting shapes
import pylab as plt
plt.subplot(121)
plt.imshow(a)
plt.subplot(122)
plt.imshow(label_array)
plt.show()

Categories