How to assign graph label for graph in pytorch geometric? - python

Question: How can we assign a graph-level label to a graph made in PyTorch geometric?
Example: Let us say we create an undirected graph in PyTorch geometric and now we want to label that graph according to its class (can use a numerical value). How could we now assign a class label for the whole graph, such that it can be used for graph classification tasks? Furthermore, how could we collect a bunch of graphs with labels to form our dataset?
Code: (to be run in Google Colab)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import torch
from torch.nn import Linear
import torch.nn.functional as F
torch.__version__
# install pytorch geometric
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cpu.html
from torch_geometric.nn import GCNConv
from torch_geometric.utils.convert import to_networkx, from_networkx
# Make the networkx graph
G = nx.Graph()
# Add some cars
G.add_nodes_from([
('Ford', {'y': 0, 'Name': 'Ford'}),
('Lexus', {'y': 1, 'Name': 'Lexus'}),
('Peugot', {'y': 2, 'Name': 'Peugot'}),
('Mitsubushi', {'y': 3, 'Name': 'Mitsubishi'}),
('Mazda', {'y': 4, 'Name': 'Mazda'}),
])
# Relabel the nodes
remapping = {x[0]: i for i, x in enumerate(G.nodes(data = True))}
G = nx.relabel_nodes(G, remapping, copy=True)
# Add some edges --> A = [(0, 1, 0, 1, 1), (1, 0, 1, 1, 0), (0, 1, 0, 0, 1), (1, 1, 0, 0, 0), (1, 0, 1, 0, 0)] as the adjacency matrix
G.add_edges_from([
(0, 1), (0, 3), (0, 4),
(1, 2), (1, 3),
(2, 1), (2, 4),
(3, 0), (3, 1),
(4, 0), (4, 2)
])
# Convert the graph into PyTorch geometric
pyg_graph = from_networkx(G)
Now how could we give this graph a label = 0 (for class e.g. cars)? Then if we did that for lots of graphs, how could we bunch them together to form a dataset?
Thanks

The pyg_graph object has type torch_geometric.data.Data.
Inspecting the source code of Data class, you can see that it defines the dunder methods __setattr__ and __setitem__.
Thanks to __setattr__, you can assign the label with the line
pyg_graph.label = 0
or you can instead use __setitem__ doing
pyg_graph["label"] = 0
The two notations perform the same action internally, so they can be used interchangeably.
To create a batch of graphs and labels, you can simply do
batch = torch_geometric.data.Batch.from_data_list([pyg_graph, pyg_graph])
>>> batch.label
tensor([0, 0])
and PyG takes care of the batching of all attributes automatically.

Related

Generate grid of coordinate tuples

Assume a d-dimensional integer grid, containing n^d (n >= 1) points.
I am trying to write a function that takes the number of domain points n and the number of dimensions d and returns a set that contains all the coordinate points in the grid, as tuples.
Example: intGrid (n=2, dim=2) should return the set:
{(0,0), (0,1), (1,0), (1,1)}
Note: I cannot use numpy or any external imports.
Python has a good set of built-in modules that provides most of the basic functionality you will probably need to start getting your things done.
One of such good modules is itertools, where you will find all sorts of functions related to iterations and combinatorics. The perfect function for you is product, that you can use as below:
from itertools import product
def grid(n, dim):
return set(product(range(n), repeat=dim))
print(grid(2, 2))
# {(0, 0), (0, 1), (1, 0), (1, 1)}
print(grid(2, 3))
# {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}

Vizualize distance matrix into graph

For the following distance matrix:
∞, 1, 2
∞, ∞, 1
∞, ∞, ∞
I would need to visualise the following graph:
That's how it should look like
I tried with the following code:
import networkx as nx
import numpy as np
import string
dt = [('len', float)]
A = np.array([ (0, 1, None, 3, None),
(2, 0, 4, 1, None),
(5, None, 0, 3, None),
(None, None, None, 0, None),
(None, None, None, 2, 0),
])*10
A = A.view(dt)
G = nx.from_numpy_matrix(A)
G = nx.drawing.nx_agraph.to_agraph(G)
G.node_attr.update(color="red", style="filled")
G.edge_attr.update(color="blue", width="2.0")
G.draw('out.png', format='png', prog='neato')
but I cannot seem to input infinity (∞) to show that there is no connection. I tried with None, -1, and even ∞ but nothing seems to work right, so if anyone has any idea how I can visualise that distance matrix, please let me know.
It's not immediately obvious if this is what you are after, but one option is to use np.inf to denote the infinity. Below is a snippet where edges with value np.inf are removed, but whether this makes sense will depend on the context:
import networkx as nx
import numpy as np
A = np.array(
[
(0, 1, np.inf),
(2, 0, 4),
(5, np.inf, 0),
],
dtype="float",
)
# if edge is np.inf replace with zero
A[A == np.inf] = 0
G = nx.from_numpy_matrix(A, create_using=nx.DiGraph)
G = nx.drawing.nx_agraph.to_agraph(G)
G.node_attr.update(color="red", style="filled")
G.edge_attr.update(color="blue", width="0.3")
G.draw("out.png", format="png", prog="neato")

How to retain node ordering when converting graph from networkx to pytorch geometric?

Question: How to retain the node ordering/labels when converting a graph from networkx to pytorch geometric?
Code: (to be run in Google Colab)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import torch
from torch.nn import Linear
import torch.nn.functional as F
torch.__version__
# install pytorch geometric
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cpu.html
from torch_geometric.nn import GCNConv
from torch_geometric.utils.convert import to_networkx, from_networkx
# Make the networkx graph
G = nx.Graph()
# Add some cars
G.add_nodes_from([
('Ford', {'y': 0, 'Name': 'Ford'}),
('Lexus', {'y': 1, 'Name': 'Lexus'}),
('Peugot', {'y': 2, 'Name': 'Peugot'}),
('Mitsubushi', {'y': 3, 'Name': 'Mitsubishi'}),
('Mazda', {'y': 4, 'Name': 'Mazda'}),
])
# Relabel the nodes
remapping = {x[0]: i for i, x in enumerate(G.nodes(data = True))}
G = nx.relabel_nodes(G, remapping, copy=False)
# Add some edges --> A = [(0, 1, 0, 1, 1), (1, 0, 1, 1, 0), (0, 1, 0, 0, 1), (1, 1, 0, 0, 0), (1, 0, 1, 0, 0)] as the adjacency matrix
G.add_edges_from([
(0, 1), (0, 3), (0, 4),
(1, 2), (1, 3),
(2, 1), (2, 4),
(3, 0), (3, 1),
(4, 0), (4, 2)
])
# Convert the graph into PyTorch geometric
pyg_graph = from_networkx(G)
pyg_graph.edge_index
When I print the edge indices in the last line of the code, I get different answers each time I run it. Most importantly, I am looking to consistently get the same (correct) answer whereby each node numbering is retained from networkx:
tensor([[0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 4, 4],
[4, 2, 4, 2, 3, 0, 1, 1, 4, 0, 1, 3]])
The form of this edge index tensor is:
the first list contains the node ids of the source node
the second list contains the node ids of the target node
For the node ids to be retained, we would expect node 0 to appear three times in the first (source) list instead of just twice.
Is there any way for me to force PyTorch Geometric to copy over the node ids?
Thanks
[EDIT] One possible work-around I have is using the following bit of code which is able to produce edge index and weight tensors for PyTorch geometric
# Create a dictionary of the mappings from company --> node id
mapping_dict = {x: i for i, x in enumerate(list(G.nodes()))}
# Get the number of nodes
num_nodes = len(mapping_dict)
# Now create a source, target, and edge list for PyTorch geometric graph
edge_source_list = []
edge_target_list = []
edge_weight_list = []
# iterate through all the edges
for e in G.edges():
# first element of tuple is appended to source edge list
edge_source_list.append(mapping_dict[e[0]])
# last element of tuple is appended to target edge list
edge_target_list.append(mapping_dict[e[1]])
# add the edge weight to the edge weight list
edge_weight_list.append(1)
# now create full edge lists for pytorch geometric - undirected edges need to be defined in both directions
full_source_list = edge_source_list + edge_target_list # full source list
full_target_list = edge_target_list + edge_source_list # full target list
full_weight_list = edge_weight_list + edge_weight_list # full edge weight list
print(len(edge_source_list), len(edge_target_list), len(full_source_list))
# now convert these to torch tensors
edge_index_tensor = torch.LongTensor( np.concatenate([ [np.array(full_source_list)], [np.array(full_target_list)]] ))
edge_weight_tensor = torch.FloatTensor(np.array(full_weight_list))
It seems this issue was resolved in the comments (the solution proposed by #Sparky05 is to use copy=True, which is the default for nx.relabel_nodes), but below is the explanation for why the node order is changed.
When copy=False is passed, nx.relabel_nodes will re-add the nodes to the graph in the order they appear in the set of keys of remapping dict. The relevant lines in the code are here:
def _relabel_inplace(G, mapping):
old_labels = set(mapping.keys())
new_labels = set(mapping.values())
if len(old_labels & new_labels) > 0:
# skip codes for labels sets that overlap
else:
# non-overlapping label sets
nodes = old_labels
# skip lines
for old in nodes: # this is now in the set order
By using set the order of nodes is modified, so to preserve the order the non-overlapping label sets should be treated as:
else:
# non-overlapping label sets
nodes = mapping.keys()
A related PR is submitted here.

How can I randomly permute the nodes of a graph with python in networkx?

I think this can be done with relabel_nodes, but how can I create a mapping that permutes the nodes? I want to permute the nodes of a graph while keeping the network structure intact. Currently I am rebuilding the graph with a shuffled set of nodes which doesn't seem the most efficient way to go about things:
import networkx as nx
import random
n=10
nodes=[]
for i in range(0,n):
nodes.append(i)
G=nx.gnp_random_graph(n,.5)
newG=nx.empty_graph(n)
shufflenodes=nodes
random.shuffle(shufflenodes)
for i in range(0,n-1):
for j in range(i+1,n):
if(G.has_edge(i,j)):
newG.add_edge(shufflenodes[i],shufflenodes[j])
Anyone have any ideas how to speed this up?
What you can do is to build a random mapping and use relabel_nodes.
Code:
# create a random mapping old label -> new label
node_mapping = dict(zip(G.nodes(), sorted(G.nodes(), key=lambda k: random.random())))
# build a new graph
G_new = nx.relabel_nodes(G, node_mapping)
Example:
>>> G.nodes()
NodeView((0, 1, 2, 3, 4))
>>> G.edges()
EdgeView([(0, 1), (0, 2), (0, 3), (1, 2), (3, 4)])
>>> node_mapping
{0: 2, 1: 0, 2: 3, 3: 4, 4: 1}
>>> G_new.nodes()
NodeView((2, 0, 3, 4, 1))
>>> G_new.edges()
EdgeView([(2, 0), (2, 3), (2, 4), (0, 3), (4, 1)])

Graph isomorphism with constraints on the edges using networkx

I would like to define my own isomorphism of two graphs. I want to check if two graphs are isomorphic given that each edge has some attribute --- basically the order of placing each edge. I wonder if one can use the method:
networkx.is_isomorphic(G1,G2, edge_match=some_callable)
somehow by defining function some_callable().
For example, the following graphs are isomorphic, because you can relabel the nodes to obtain one from another.
Namely, relabel [2<->3].
But, the following graphs are not isomorphic.
There is no way to obtain one from another by re-labeling the nodes.
Here you go. This is exactly what the edge_match option is for doing. I'll create 3 graphs the first two are isomorphic (even though the weights have different names --- I've set the comparison function to account for that). The third is not isomorphic.
import networkx as nx
G1 = nx.Graph()
G1.add_weighted_edges_from([(0,1,0), (0,2,1), (0,3,2)], weight = 'aardvark')
G2 = nx.Graph()
G2.add_weighted_edges_from([(0,1,0), (0,2,2), (0,3,1)], weight = 'baboon')
G3 = nx.Graph()
G3.add_weighted_edges_from([(0,1,0), (0,2,2), (0,3,2)], weight = 'baboon')
def comparison(D1, D2):
#for an edge u,v in first graph and x,y in second graph
#this tests if the attribute 'aardvark' of edge u,v is the
#same as the attribute 'baboon' of edge x,y.
return D1['aardvark'] == D2['baboon']
nx.is_isomorphic(G1, G2, edge_match = comparison)
> True
nx.is_isomorphic(G1, G3, edge_match = comparison)
> False
Here answer the problem specifically in the question, with the very same graphs. Note that I'm using the networkx.MultiGraph and consider some 'ordering' in placing those edges.
import networkx as nx
G1,G2,G3,G4=nx.MultiGraph(),nx.MultiGraph(),nx.MultiGraph(),nx.MultiGraph()
G1.add_weighted_edges_from([(0, 1, 0), (0, 2, 1), (0, 3, 2)], weight='ordering')
G2.add_weighted_edges_from([(0, 1, 0), (0, 3, 1), (0, 2, 2)], weight='ordering')
G3.add_weighted_edges_from([(0, 1, 0), (0, 1, 1), (2, 3, 2)], weight='ordering')
G4.add_weighted_edges_from([(0, 1, 0), (2, 3, 1), (0, 1, 2)], weight='ordering')
def comparison(D1,D2):
return D1[0]['ordering'] == D2[0]['ordering']
nx.is_isomorphic(G1,G2, edge_match=comparison)
>True
nx.is_isomorphic(G3,G4, edge_match=comparison)
>False

Categories