Random deletion of edges in graph - python

I am trying to delete edges in a random process in a graph as a function of p where p is from 0 to 1. In the first iteration 0.1 or 10% of the nodes are deleted randomly from the graph. In the second iteration 20% of the remaining edges are deleted and so on.
My error is occurring when the edges which have been deleted come up in the random function again.
My attempt:
import networkx as nx
import random
import numpy as np
graph = nx.fast_gnp_random_graph(20,0.3)
p_values = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
for i in p_values:
print(i)
for i in p_values:
array=[]
n=nx.number_of_edges(graph)
edges = list(graph.edges)
no_edges_del = int(n*i)
print(no_edges_del)
for j in range(no_edges_del):
chosen_edge = random.choice(edges)
print(chosen_edge)
print(chosen_edge[0])
graph.remove_edge(chosen_edge[0], chosen_edge[1])
GC = nx.number_of_nodes(max(nx.connected_component_subgraphs(graph), key=len))
array.append(GC/n)
error-
Traceback (most recent call last):
File "1.py", line 26, in <module>
graph.remove_edge(chosen_edge[0], chosen_edge[1])
File "D:\anaconda\lib\site-packages\networkx\classes\graph.py", line 1011, in remove_edge
raise NetworkXError("The edge %s-%s is not in the graph" % (u, v))
networkx.exception.NetworkXError: The edge 14-15 is not in the graph

You get the set of edges before the for loop starts. You need to remove the edges from this set as they are removed from the graph so that they aren't chosen again during a later iteration.
Alternatively, get the set of edges from the graph on each iteration just before you choose the one to remove.

A solution may be the following.
At each iteration you consider the current percentage p and remove p*number_of_remaining_edges edges.
import random
import networkx as nx
g = nx.fast_gnp_random_graph(20,0.3)
p_values = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
for p in p_values:
g.remove_edges_from(random.sample(g.edges(),k=int(p*g.number_of_edges())))

Related

Python DGL increasing node degree after removals

I'm studying effects of node removals on the model training accuracy, using several heuristics, the CORA dataset and DGL graph library. The issue is when I try to remove by reversed order of degrees, such as nodes with higher degree are removed first. I extract the graph degree array, which is indexed by id's, and reverse argsort it. This would be the largest degree node ids, in decreasing order.
Finally, I remove the desired amount from the graph, returing the modified graph.
After a few iterations, I noticed the largest degree present tends to increase, something that should not happen from my algorithm, as I reverse argsort the degrees indexes, slice and remove.
I've inserted some prints inside the code to show progress and how the degrees change over time. To avoid having to clone the code, I saved the output inside the repository.
Here is the minimum reproducible example: github repository
import math
import random
import secrets
import time
import numpy as np
import torch
import dgl
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import pdb
from dgl.data import CoraGraphDataset
import gnn
def remove_nodes(g, total):
degreeArray = g.in_degrees().numpy()
print('Mean of degrees: ', degreeArray.sum()/len(degreeArray))
print("Size of degree array: ", len(degreeArray))
print("__________")
#sort indexes and reverse, to get greater degrees first
sortIndexes = np.argsort(degreeArray)[::-1].copy()
#print("Sorted indexes: ", sortIndexes.tolist())
#2nd step: get degree value info
debug_sorted_degrees = np.array(degreeArray)[sortIndexes]
# indexes and degrees of 10 to be removed nodes
degreeDict = list(zip(sortIndexes, debug_sorted_degrees))[0:10]
#print("DegreeDict: ", degreeDict)
#take all degrees in graph to dataframe and group by degree
hist = pd.DataFrame(debug_sorted_degrees)
hist.columns = ['degrees in graph, grouped']
y = hist.groupby("degrees in graph, grouped").size()
print("number of nodes to be removed in round: ", total)
print(y)
#slice the desired number of nodes from sorted indexes
nodes = sortIndexes[0:total].copy()
#print(nodes.tolist())
removedNodesSearchedInGraph = g.in_degrees(torch.tensor(nodes)).numpy().tolist()
maiorGrau = max(removedNodesSearchedInGraph)
menorGrau = min(removedNodesSearchedInGraph)
print("\nSorted degree removals: ")
print(removedNodesSearchedInGraph[0:total], sep='\t')
print(f"Largest degree removed: {maiorGrau}")
print(f"Smallest degree removed: {menorGrau}")
g.remove_nodes(torch.tensor(nodes, dtype=torch.int64), store_ids=True)
return g, nodes
dataset = CoraGraphDataset()[0]
precision = []
trainingEpochs = 60
nodeRemovalsPerRound = 50
for i in range(7):
print(f"\n______________ITERATION #{i}______________________")
g, removedNodes = remove_nodes(dataset, nodeRemovalsPerRound)
currentPrecision = gnn.train(dataset, trainingEpochs)
precision.append(currentPrecision)
for i in range(len(precision)):
print(f"Precision of iteration {i+1}: {precision[i]}")
And this is the output I get from running the code
After the first iteration, it starts removing from the lowest nodes, not the highest ones.
What is it that I'm missing?

Networkx - problem while working big data

I am trying to triangulate a large amount of massive data using Delaunay to scipy.spatial for triangulation and networkx to get the node adjacency relations. My code works very well on small data sets but when I try to introduce volumes of about 2 miollion points I always get the following error:
raise NetworkXError(f"The node {n} is not in the graph.") from e
NetworkXError: The node 1 is not in the graph.
It seems like my graph store the first node and nothing more. When I did my research I found that networkx is well adapted to massive data
Here is my code :
import numpy as np
import networkx as nx
import scipy.spatial
points = np.genfromtxt('las1.xyz', delimiter = ';')
xy= points[:,0:2]
z= points[:,2]
delTri = scipy.spatial.Delaunay(xy)
edges = set()
for n in range(delTri.nsimplex):
edge = sorted([delTri.vertices[n,0], delTri.vertices[n,1]])
edges.add((edge[0], edge[1]))
edge = sorted([delTri.vertices[n,0], delTri.vertices[n,2]])
edges.add((edge[0], edge[1]))
edge = sorted([delTri.vertices[n,1], delTri.vertices[n,2]])
edges.add((edge[0], edge[1]))
pts_neigh = {}
graph = nx.Graph(list(edges))
for i in range(len(xy)):
pts_neigh[i] = list(graph.neighbors(i))
I still get the edges list from my networkx graph but it seems like it fails at the level of constructing the nodes.
I will be so grateful for your help.
Although it's possible to instantiate graph with specific data, the syntax can be a bit complex. An easier option is to explicitly add edges from a list:
graph = nx.Graph()
graph.add_edges_from(list(edges))

Running a program until networkx.check_planarity() returns True

I have the following function that generates a graph.
import numpy as np
import networkx as nx
import random
def random_graph(vertices, connectivity):
#Creates random symmetric graph
arr = np.random.randint(0,10,(vertices,vertices))
sym = (arr+arr.T)
#removing self loops with fixing diagonal
np.fill_diagonal(sym,0)
#connectivity of graph -> 0 for highest connections, 9 for least connections
mat = (sym>connectivity).astype(int)
#convert to dictionary
G = {k:[i for i,j in enumerate(v) if j==1] for k,v in enumerate(mat)}
return G
I use that function with a randomly generated connectivity, unfortunately the generated graph is not always planar, so I want to run the function until I get a planar graph. The planarity of the graph is checked with the function check_planarity from networkx. The function check_planarity returns a boolean, if it's true, the graph is planar, if false the graph is not planar and I need to generate a new graph until it's planar. This is what I have tried:
while True:
random_connectivity = random.randint(4, 9)
G = random_graph(5, random_connectivity)
g = nx.Graph(G)
planar = nx.check_planarity(g)
if planar:
break
else:
pass
As you can see from the loop above, first I generate random_connectivity, them I create the graph with that connectivity and then with networkx I check the planarity. Then I try to check if it's planar or not and if not I run the program again. Unfortunately it doesn't work because it often returns a non planar graph. I also tried unpacking planar and checking the first element of the unpacked list but it doesn't work:
planar_check = [x[0] for x in planar]
When trying the line of code above I get the following traceback:
Traceback (most recent call last):
File "c:\Users\besan\OneDrive\Desktop\LAM da correggere\test.py", line 64, in <module>
planar_check = [x[0] for x in planar]
File "c:\Users\besan\OneDrive\Desktop\LAM da correggere\test.py", line 64, in <listcomp>
planar_check = [x[0] for x in planar]
TypeError: 'bool' object is not subscriptable
Solved by directly checking the first element of planar insead of unpacking it:
planar = nx.check_planarity(g)[0]

Python, Generating Random Graphs with Graph-tool

So I'm trying to generate a random directed graph such that each vertex has 3 in-nodes and 1 outnode. But graph tool seems to be getting stuck in the deg_sampler() function.
from graph_tool.all import *
def deg_sampler():
return 1,2
g = random_graph(1000,deg_sampler,verbose=True)
I get this error after running the code
adding vertices: 1000 of 1000 (100%)
fixing average degrees. Total degree difference: 1000^CTraceback (most recent call last):
File "code.py", line 6, in <module>
g = random_graph(1000,deg_sampler,verbose=True)
File "/usr/lib/python2.7/dist-packages/graph_tool/generation/__init__.py", line 384, in random_graph
_get_rng(), verbose, True)
File "/usr/lib/python2.7/dist-packages/graph_tool/generation/__init__.py", line 379, in <lambda>
sampler = lambda i: deg_sampler()
KeyboardInterrupt
The degree sampler function should return the in- and out-degrees of the nodes. In your implementation, each node has an in-degree of 1 and out-degree of 2. It is, of course, impossible to construct a graph with this degree sequence, since the average in- and out-degrees must identical. This is why the algorithm gets stuck in the "fixing average degrees" phase.

Fix position of subset of nodes in NetworkX spring graph

Using Networkx in Python, I'm trying to visualise how different movie critics are biased towards certain production companies. To show this in a graph, my idea is to fix the position of each production-company-node to an individual location in a circle, and then use the spring_layout algorithm to position the remaining movie-critic-nodes, such that one can easily see how some critics are drawn more towards certain production companies.
My problem is that I can't seem to fix the initial position of the production-company-nodes. Surely, I can fix their position but then it is just random, and I don't want that - I want them in a circle. I can calculate the position of all nodes and afterwards set the position of the production-company-nodes, but this beats the purpose of using a spring_layout algorithm and I end up with something wacky like:
Any ideas on how to do this right?
Currently my code does this:
def get_coordinates_in_circle(n):
return_list = []
for i in range(n):
theta = float(i)/n*2*3.141592654
x = np.cos(theta)
y = np.sin(theta)
return_list.append((x,y))
return return_list
G_pc = nx.Graph()
G_pc.add_edges_from(edges_2212)
fixed_nodes = []
for n in G_pc.nodes():
if n in production_companies:
fixed_nodes.append(n)
pos = nx.spring_layout(G_pc,fixed=fixed_nodes)
circular_positions = get_coordinates_in_circle(len(dps_2211))
i = 0
for p in pos.keys():
if p in production_companies:
pos[p] = circular_positions[i]
i += 1
colors = get_node_colors(G_pc, "gender")
nx.draw_networkx_nodes(G_pc, pos, cmap=plt.get_cmap('jet'), node_color=colors, node_size=50, alpha=0.5)
nx.draw_networkx_edges(G_pc,pos, alpha=0.01)
plt.show()
To create a graph and set a few positions:
import networkx as nx
G=nx.Graph()
G.add_edges_from([(1,2),(2,3),(3,1),(1,4)]) #define G
fixed_positions = {1:(0,0),2:(-1,2)}#dict with two of the positions set
fixed_nodes = fixed_positions.keys()
pos = nx.spring_layout(G,pos=fixed_positions, fixed = fixed_nodes)
nx.draw_networkx(G,pos)
Your problem appears to be that you calculate the positions of all the nodes before you set the positions of the fixed nodes.
Move pos = nx.spring_layout(G_pc,fixed=fixed_nodes) to after you set pos[p] for the fixed nodes, and change it to pos = nx.spring_layout(G_pc,pos=pos,fixed=fixed_nodes)
The dict pos stores the coordinates of each node. You should have a quick look at the documentation. In particular,
pos : dict or None optional (default=None).
Initial positions for nodes as a dictionary with node as keys and values as a list or tuple. If None, then nuse random initial positions.
fixed : list or None optional (default=None).
Nodes to keep fixed at initial position.
list or None optional (default=None)
You're telling it to keep those nodes fixed at their initial position, but you haven't told them what that initial position should be. So I would believe it takes a random guess for that initial position, and holds it fixed. However, when I test this, it looks like I run into an error. It appears that if I tell (my version of) networkx to hold nodes in [1,2] as fixed, but I don't tell it what their positions are, I get an error (at bottom of this answer). So I'm surprised your code is running.
For some other improvements to the code using list comprehensions:
def get_coordinates_in_circle(n):
thetas = [2*np.pi*(float(i)/n) for i in range(n)]
returnlist = [(np.cos(theta),np.sin(theta)) for theta in thetas]
return return_list
G_pc = nx.Graph()
G_pc.add_edges_from(edges_2212)
circular_positions = get_coordinates_in_circle(len(dps_2211))
#it's not clear to me why you don't define circular_positions after
#fixed_nodes with len(fixed_nodes) so that they are guaranteed to
#be evenly spaced.
fixed_nodes = [n for n in G_pc.nodes() if n in production_companies]
pos = {}
for i,p in enumerate(fixed_nodes):
pos[p] = circular_positions[i]
colors = get_node_colors(G_pc, "gender")
pos = nx.spring_layout(G_pc,pos=pos, fixed=fixed_nodes)
nx.draw_networkx_nodes(G_pc, pos, cmap=plt.get_cmap('jet'), node_color=colors, node_size=50, alpha=0.5)
nx.draw_networkx_edges(G_pc,pos, alpha=0.01)
plt.show()
Here's the error I see:
import networkx as nx
G=nx.Graph()
G.add_edge(1,2)
pos = nx.spring_layout(G, fixed=[1,2])
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-4-e9586af20cc2> in <module>()
----> 1 pos = nx.spring_layout(G, fixed=[1,2])
.../networkx/drawing/layout.pyc in fruchterman_reingold_layout(G, dim, k, pos, fixed, iterations, weight, scale)
253 # We must adjust k by domain size for layouts that are not near 1x1
254 nnodes,_ = A.shape
--> 255 k=dom_size/np.sqrt(nnodes)
256 pos=_fruchterman_reingold(A,dim,k,pos_arr,fixed,iterations)
257 if fixed is None:
UnboundLocalError: local variable 'dom_size' referenced before assignment

Categories