Creating fixed set of nodes using networkx in python

Creating fixed set of nodes using networkx in python - python

I have a problem concerning graph diagrams. I have 30 nodes(points). I want to construct an adjacency matrix in such a way that each ten set of nodes are like at a vertices of a triangle. So lets say a group of 10 nodes is at the vertex A, B and C of a triangle ABC.
Two of the vertex sets should have only 10 edges(basically each node within a cluster is connected to other one). Lets say groups at A and B have 10 edges within the group. While the third vertex set should have 11 edges(10 for each nodes and one node connecting with two nodes, so 11 edges in that group). Lets say the one at C has 11 edges in it.
All these three clusters would be having one edge between them to form a triangle.That is connect group at A with group at B with one edge and B with C with one edge and C with A with one edge.
Later on I would add one more edge between B and C. Represented as dotted line in the attached figure. The point at a vertex can be in a circle or any other formation as long as they represent a group.
How do I create an adjacency matrix for such a thing. I actually know how to create the adjacency matrix for such a matrix as it is just binary symmetric matrix(undirected graph) but the problem is when I try to plot that adjacency matrix it would bring the one node from other group closer to the group to which that node is connected. So lets say I connected one node at Vertex A with one node at Vertex B by connecting an edge between the two. This edge would depict the side AB of the triangle. But when I depict it using networkx then those two nodes which are connected from these two different groups would eventually come closer and look like part of one group. How do I keep it as separate group. ?
Pls note I am making use of networkx lib of python which helps plot the adjacency matrix.
EDIT:
A code I am trying to use after below inspiration:
G=nx.Graph()
# Creating three separate groups of nodes (10 nodes each)
node_clusters = [range(1,11), range(11,21) , range(21,31)]
# Adding edges between each set of nodes in each group.
for x in node_clusters:
for y in x:
if(y!=x[-1]):
G.add_edge(y,y+1,len=2)
else:
G.add_edge(y,x[0],len=2)
# Adding three inter group edges separately:
for x in range(len(node_clusters)):
if(x<2):
G.add_edge(node_clusters[x][-1],node_clusters[x+1][0],len=8)
else:
G.add_edge(node_clusters[x][-1],node_clusters[0][0],len=8)
nx.draw_graphviz(G, prog='neato')
Gives the following error:
--> 260 '(not available for Python3)')
261 if root is not None:
262 args+="-Groot=%s"%root
ImportError: ('requires pygraphviz ', 'http://networkx.lanl.gov/pygraphviz ', '(not available for Python3)')
My python version is not 3, its 2. And am using anaconda distribution
EDIT2:
I used Marius's code but instead used the following to plot:
graph_pos=nx.spring_layout(G,k=0.20,iterations=50)
nx.draw_networkx(G,graph_pos)
It has destroyed completely the whole graph. and shows this:

I was able to get something going fairly quickly just by hacking away at this, all you need to do is put together tuples representing each edge, you can also set some arbitrary lengths on the edges to get a decent approximation of your desired layout:
import networkx
import string
all_nodes = string.ascii_letters[:30]
a_nodes = all_nodes[:10]
b_nodes = all_nodes[10:20]
c_nodes = all_nodes[20:]
all_edges = []
for node_set in [a_nodes, b_nodes, c_nodes]:
# Link each node to the next
for i, node in enumerate(node_set[:-1]):
all_edges.append((node, node_set[i + 1], 2))
# Finish off the circle
all_edges.append((node_set[0], node_set[-1], 2))
joins = [(a_nodes[0], b_nodes[0], 8), (b_nodes[-1], c_nodes[0], 8), (c_nodes[-1], a_nodes[-1], 8)]
all_edges += joins
# One extra edge for C:
all_edges.append((c_nodes[0], c_nodes[5], 5))
G = networkx.Graph()
for edge in all_edges:
G.add_edge(edge[0], edge[1], len=edge[2])
networkx.draw_graphviz(G, prog='neato')
Try something like networkx.to_numpy_matrix(G) if you then want to export as an adjacency matrix.

Related

Algorithm design: Find CCs containing specific nodes from a large edge list

Given: Large edge list, with about 90-100 million edges, 2 million nodes. These edges compose a giant graph with thousands of connected components.
I also have a list of nodes of interest. I want to extract the edges of only the CCs containing these nodes.
For example here's a tiny example: Consider a small graph with 3 CCs inside it.
edge_list = [(1,2), (3,2), (1,1), (4,5), (5,6), (4,6), (7,8), (8,9), (9,7)] #three CCs
## Below commands for visualization sake
G = nx.Graph()
G.add_edges_from(edges)
nx.draw(G, with_labels=True)
And I have nodes of interest: NOI = [1,5]
I want a function that does the following:
CC_list = find_ccs(edge_list, NOI)
print(CC_list)
#output: [[(1,2), (3,2), (1,1)],
# [(4,5), (5,6), (4,6)]]
Notice that only the edges of the two CCs that contain the NOIs are returned. I want to do this on a massive scale.
I'm open to any solution using Networkx, StellarGraph, or, most preferably PySpark. I can read the edge list using pyspark and once I have the filtered edge list, I can convert it to a CC using whatever library.
NOTE: The entire graph itself is too large to build fully using networkx or stellargraph. Thus, I have to take in an edge list and only extract the ccs I want. The above example with networkx is just for visualization purposes.

Pick a node of interest. Run BFS from it to find the connected component that contains it. Each time you add a node to the CC, check if it's a node-of-interest and remove it from the set to check if it is. Repeat until you've found all the CCs containing nodes of interest.
Running time is O(V+E), where V & E are the nodes and edges of CCs that contain nodes of interest, not the larger graph.

How to get a random component subgraph with given number of edges from the existing component graph?

I would like to make a component subgraph with edges from the existing component graph in networkx. Both graphs will be undirected.
For example, I want my new graph to have 100 edges from the existing one and be component. The existing one has about 2 million edges and is component.
My current approach is below:
def get_random_component(number_of_edges):
G_def = nx.Graph()
G_copy = nx.Graph()
G_iter = nx.Graph()
G_copy.add_edges_from(G.edges)
for i in range(number_of_edges):
G_iter.clear()
G_iter.add_edges_from(G_copy.edges)
currently_found_edge = random.choices(list(G_iter.edges), k=1)[0]
while (G_def.has_edge(*currently_found_edge) or (not G_def.has_node(currently_found_edge[0])
and not G_def.has_node(currently_found_edge[1]))):
G_iter.remove_edge(*currently_found_edge)
currently_found_edge = random.choices(list(G_iter.edges), k=1)[0]
G_def.add_edge(*currently_found_edge)
G_copy.remove_edge(*currently_found_edge)
return G_def
but it is very time-consuming. Is there a better way to find a random component subgraph with given number of edges?

Yes. First, when you're asking for algorithm help, post your algorithm in an easily-readable form. Code is fine, but only if you use meaningful variable names: def, copy, and iter don't mean much.
Your posted algorithm goes through a lot of failure pain in that while loop, especially with your given case of building a 100-edge component from a graph of 2e6 edges. Unless the graph is heavily connected, you will spin a lot for each new edge.
Instead of flailing through the graph, construct a connected subgraph. Let's call it SG. Also, assume that G below is a copy of the original graph that we can mutate as desired.
new_node = a random node of G.
legal_move = new_node.edges() # A set of edges you can traverse from nodes in SG
for _ in range(100):
if legal_move is empty:
# You have found a connected component
# with fewer than 100 edges.
# SG is the closure of that component.
# Break out of this loop, Subtract SG from G, and start over.
new_edge = # a random edge from legal_move (see note below)
subtract SG from legal_move (don't consider any edges already used).
add new_edge to SG
One node of new_edge is already in SG; if the other is not ...
Add other node to SG
Add other.edges to legal_move (except for new_edge)
note on random choice:
You have to define your process for "random". One simple way is to
choose a random edge from legal_moves. Another way is to choose a
random node from those in legal_moves, and then choose a random edge
from that node. Your growth patterns will differ, depending on the
degree of each node.
The process above will be much faster for most graphs.
Since each edge references both of its nodes, and each node maintains a list of its edges, the exploration and update phases will be notably faster.
Coding is left as an exercise for the student. :-)

Cycles in a highly connected directed graph - networkx

I have a moderately sized directed graph consisting of around 3000 nodes and 260000 edges that i have built in networkx. The network is mostly transitive: i.e if a is directed to b, and b is directed to c then a directs to c. I am trying to use simple_cycles algorithm from networkx package to obtain a list of every cycle in that network (i.e any violation of transitivity).
To do this i run
l = nx.simple_cycles(G)
cycle_list = list(l)
where G is the network.
I am running to the issue where the second line cannot run to completion (i've left it running for 24hr). When i apply the algorithm to a subset of 2100 node of the original network it takes around 4 seconds to run.
Any idea where the bottleneck is and what I can do to fix it so it runs quickly.
Update: Creation method
df = pd.read_csv('epsilon_djordje.csv')
edges = [tuple([df['i'][x],df['j'][x]]) if df['f.i.j'][x] > 0 else tuple([df['j'][x],df['i'][x]]) if df['f.i.j'][x] < 0 else tuple([0,0]) for x in range(0,len(df))]
edges = list(set(edges))
edges.remove(tuple([0,0]))
G = nx.DiGraph(edges)
As a reference:
df['i'] is a column of strings (which correspond to the nodes).
df['j'] is a column of strings ( which correspond to the nodes).
df['f.i.j'] is a column of floats (which determine the direction of the edges between two nodes).

Python NumPy vectorization

I'm trying to code what is known as the List Right Heuristic for the unweighted vertex cover problem. The background is as follows:
Vertex Cover Problem: In the vertex cover problem, we are given an undirected graph G = (V, E) where V is the set of vertices and E is the set of Edges. We need to find the smallest set V' which is a subset of V such that V' covers G. A set V' is said to cover a graph G if all the edges in the graph have at least one vertex in V'.
List Right Heuristic: The algorithm is very simple. Given a list of vertices V = [v1, v2, ... vn] where n is the number of vertices in G, vi is said to be a right neighbor of vj if i > j and vi and vj are connected by an edge in the graph G. We initiate a cover C = {} (empty set) and scan V from right to left. At any point, say the current vertex being scanned is u. If u has at least one right neighbor not in C then u is added to c. The entire V is just scanned once.
I'm solving this for multiple graphs (with same vertices but different edges) at once.
I coded the List Right Heuristic in python. I was able to vectorize it to solve multiple graphs at once, but I was unable to vectorize the original for loop. I'm representing the graph using an Adjacency matrix. I was wondering if it can be further vectorized. Here's my code:
def list_right_heuristic(population: np.ndarray, adj_matrix: np.ndarray):
adj_matrices = np.matlib.repmat(adj_matrix,population.shape[0], 1).reshape((population.shape[0], *adj_matrix.shape))
for i in range(population.shape[0]):
# Remove covered vertices from the graph. Delete corresponding edges
adj_matrices[i, np.outer(population[i], population[i]).astype(bool)] = 0
vertex_covers = np.zeros(shape=population.shape, dtype=population.dtype)
for index in range(population.shape[-1] - 1, -1, -1):
# Get num of intersecting elements (for each row) in right neighbors and vertex_covers
inclusion_rows = np.sum(((1 - vertex_covers) * adj_matrices[..., index])[..., index + 1:], axis=-1).astype(bool)
# Only add vertices to cover for rows which have at least one right neighbor not in vertex cover
vertex_covers[inclusion_rows, index] = 1
return vertex_covers
I have p graphs that I'm trying to solve simultaneously, where p=population.shape[0]. Each graph has the same vertices but different edges. The population array is a 2D array where each row indicates vertices of the graph G that are already in the cover. I'm only trying to find the vertices which are not in the cover. So for this reason, setting all rows and columns of vertices in cover to 0, i.e., I'm deleting the corresponding edges. The heuristic should theoretically only return vertices not in the cover now.
So in the first for loop, I just set the corresponding rows and columns in the adjacency matrix to 0 ( all elements in the rows and columns will be zero). Next I'm going through the 2D array of vertices from right to left and finding number of right neighbors in each row not in vertex_covers. For this I'm first finding the vertices not in cover (1 - vertex_covers) and then multiplying that with corresponding columns in adj_matrices (or rows since adj matrix is symmetric) to get neighbors of that that vertex we're scanning. Then I'm summing all elements to the right of this. If this value is greater than 0 then there's at least one right neighbor not in vertex_covers.
Am I doing this correctly for one?
And is there any way to vectorize the second for loop ( or the first for that matter) or speed up the code in general? calling this function thousands of times in some other code for large graphs (with 1000+ vertices). Any help would be appreciated.

You can use np.einsum to perform many complex operations between indices. In your case, the first loop can be performed this way:
adj_matrices[np.einsum('ij, ik->ijk', population, population).astype(bool)] = 0
It took me some time to understand how einsum works. I found this SO question very helpful.
BTW, Your code gave me the following syntax error:
SyntaxError: can use starred expression only as assignment target
and I had to re-write the first line of the function as:
adj_matrices = np.matlib.repmat(adj_matrix,population.shape[0],
1).reshape((population.shape[0],) + adj_matrix.shape)

NetworkX graph: creating nodes with ordered list

I am completely new to graphs. I have a 213 X 213 distance matrix. I have been trying to visualize the distance matrix using network and my idea is that far apart nodes will appear as separate clusters when the graph will be plotted. So I am creating a graph with nodes representing column index. I need to keep track of nodes in order to label it afterwards. I need to add edges in certain order so I need to keep track of nodes and their labels.
Here is the code:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(time_pres) ##time_pres is the list of labels that I want specific node to have
for i in range(212):
for j in range(i+1, 212):
color = ['green' if j == i+1 else 'red'][0]
edges.append((i,j, dist[i,j], 'green')) ##This thing requires allocation of distance as per the order in dist matrirx
G.add_edge(i,j, dist = dist[i,j], color = 'green')
The way I am doing right now, it is allocating nodes with id as a number which is not as per the index of labels in time_pres.

I can answer the question you seem to be asking, but this won't be the end of your troubles. Specifically, I'll show you where you go wrong.
So, we assume that the variable time_pres is defined as follows
time_pres = [('person1', '1878'), ('person2', '1879'), etc)]
Then,
G.add_nodes_from(time_pres)
Creates the nodes with labels ('person1', '1878'), ('person2', '1879'), etc. These nodes are held in a dictionary, with keys the label of the nodes and values any additional attributes related to each node. In your case, you have no attributes. You can also see this from the manual online, or if you type help(G.add_nodes_from).
You can even see the label of the nodes by typing either of the following lines.
G.nodes() # either this
G.node.keys() # or this
This will print a list of the labels, but since they come from a dictionary, they may not be in the same order as time_pres. You can refer to the nodes by their labels. They don't have any additional id numbers, or anything else.
Now, for adding an edge. The manual says that any of the two nodes will be added if they are not already in the graph. So, when you do
G.add_edge(i, j, dist = dist[i,j], color = 'green')
where, i and j are numbers, they are added in the graph since they don't already exist in the graph labels. So, you end up adding the nodes i and j and the edge between them. Instead, you want to do
G.add_edge(time_pres[i], time_pres[j], dist = dist[i,j], color = 'green')
This will add an edge between the nodes time_pres[i] and time_pres[j]. As far as I understand, this is your aim.
However, you seem to expect that when you draw the graph, the distance between nodes time_pres[i] and time_pres[j] will be decided by the attribute dist=dist[i,j] in G.add_edge(). In fact, the position of a node is decided by tuple holding the x and y positions of the node. From the manual for nx.draw().
pos : dictionary, optional
A dictionary with nodes as keys and positions as values. If not specified a spring layout positioning will be computed. See networkx.layout for functions that compute node positions.
If you don't define the node positions, they will be generated randomly. In your case, you would need a dictionary like
pos = {('person1', '1878'): (23, 10),
('person2', '1879'): (18, 11),
etc}
Then, the coordinates between the nodes i and j would result to a distance equal to dist[i,j]. You would have to figure out these coordinates, but since you haven't made it clear exactly how you derived the matrix dist, I can't say anything about it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.