When I generate random geometric graphs using NetworkX in python the resulting graphs are not always connected. To change this property I would like to determine the separate subgraphs and find the two nodes in each respective subgraph that is closest to a node in the largest subgraph to connect them.
Preferably I would like to be able to also determine the second closest and the third and so on.
Is there a utility in networkx that does it. If not do you know the mathematical most efficient way to solve this? (Maybe randomly select two points in each graph and then execute a k-d tree algorithm - still problem is that at least for the smaller subgraph I would need to execute the algorithm for all nodes?!?!)
Would be great if you could advise me whether there is something existing in networkx that gets the job done or tell me the most efficient way to implement such a routine.
I would like to determine the separate subgraphs
This is the problem of finding the maximal cliques in a graph. That is the sets of vertices that are all reachable from each other, but not reachable from any vertex outside the set.
Here is the pseudo code for the algorithm
LOOP
CONSTRUCT empty current set
SELECT V arbitrary vertex
add V to current set
remove V from graph
LOOP // while set is growing
added_to_set = false
LOOP V over vertices in graph
LOOP Vset over current set
IF Vset connected to V
add V to current set
remove V from graph
added_to_set = true
break;
IF added_to_set == false
break; // the set is maximal
ADD current set to list of sets
IF graph has no remaining vertices
OUTPUT sets found
STOP
For a C++ implementation of this see code at https://github.com/JamesBremner/PathFinder2/blob/dbd6ff06edabd6a6d35d5eb10ed7972dc2d779a6/src/cPathFinder.cpp#L483
find the node in each respective subgraph that is closest to a node in
the largest subgraph
Probably the best and certainly simplest is to calculate the distance between every pair of nodes in the subgraphs, keeping the closest pair.
If you were satisfied with an approximate answer and if the subgraphs do not overlap then you could
calculate center of gravity of largest subgraph
compare distances of subgraph nodes to center of gravity ( instead of with every node in largest subgraph )
Related
I have a list of cities (nodes) plotted in a 2D plane each given by an X,Y coordinate.
I now want to add roads (edges) to it, but the roads cannot intersect. I want to create the most number of roads possible. By count, not by total length.
In more general graph theory parlance, I think I want the maximum number of edges (or regions?? maybe it's the same thing), where edges do not intersect in 2-dimensions, for a given set of Nodes at X,Y points.
In a brief view of NetworkX, it seems that they generate Graphs by making "nodes" but nodes can be "anywhere" and cannot force nodes to be at a certain location with respect to each other (they have abstracted too far!).
Edit: networkx add_node with specific position
suggests that you can plot them in a given location. #Stef thanks!!
Am i thinking about the problem correctly?
Can I visualize using some python package my Nodes/edges, where this package can automatically calculate the proper edges given a set of nodes?
Is automatically finding the maximum number of non-intersecting edges a thing (and what is this called so I can find out more about it?)
Very possibly similar to this question, but this question wasn't really answered and from 8 years ago (Algorithm for finding minimal cycle basis of planar graph)
I am looking to generate a random planar graph in python with around 20 vertices. I checked out this planar graph generator but two problems emerged:
The algorithm on the aforementioned GitHub project seems a bit too overkill to generate a random planar graph that doesn’t have those many edges
Because it’s meant to generate massive graph, that algorithm is very complex, and therefore also a bit clunky and difficult to use
With that said, is there a simpler way to randomly generate a relatively small planar graph in python?
Create required number of nodes
Assign random x,y locations to the nodes.
WHILE nodes with no connected edges
Select N a random node with no edge
LOOP select M a different node at random
IF edge N-M does NOT intersect previous edges
Add N-M edge to graph
BREAK out of LOOP
Traffic map Image
Traffic map contains straight segments of two types. The ones with arrows can only go one way in the direction of the arrow and those without arrows can go in two directions. Calculate the number of ways to go from A to B without any repeated lines?
With this math problem, how to solve it ? I don't know what to do right now !
It's a graph theory problem, in your problem try to consider every junction as a node and the segments as the edges of the graph, generally your graph is a directed-graph, when the segments that have two directions are just two edges from the same rwo nodes.
The algorithm you need to implement is DFS (Depth First Traversal).
The idea is as following:
Start the DFS traversal from source.
Keep storing the visited vertices in an array or HashMap say ‘path[]’.
If the destination vertex is reached, print contents of path[].
The important thing is to mark current vertices in the path[] as visited also so that the traversal doesn’t go in a cycle.
I'm trying to write a script which will let me mangle (edit, cut, change) some big network files from a command line interface. One of the things I'm trying to do is isolate a subnetwork from a larger network based on searching for matches in node labels.
So basically I'd have a networkx graph with maybe 7000 nodes and corresponding edges with various labels. Then I'd match a string, eg "Smith" to the nodes. I'd get a match of maybe 30 nodes (label:"John Smith", label:"Peter Smith", etc). I'd then like to make a new networkx network containing those 30 nodes, and the edges they have, and the nodes those edges connect to, up to a depth of n, or optionally until all the nodes and edges are found.
My current code is rubbish, so maybe I'll try to write some pseudocode:
for node in networkx_network:
if searched_string in node:
new_network.add(node.subnetwork(depth=n))
I've spent days googling for a solution, and maybe subgraph, or neighbors, or connected_components is the right thing to do, but I can't wrap my head around how to do it.
single_source_shortest_path has an optional cutoff argument. Including it you can tell networkx to basically find paths to nodes within a certain distance of a given node. It's a bit of overkill because there's a lot of other information in those paths you don't need. If you then just take the keys of the resulting set of paths, you have all nodes reachable within that distance, and networkx has ways to find the graphs containing all of these nodes and the edges between them.
By looking at the source code for this, and removing the effort taken to track the actual paths, you can make it much more efficient if needed. But as it stands, the following works:
import networkx as nx
G=nx.fast_gnp_random_graph(100000,0.00002) #sample graph.
base = range(3) #arbitrarily choose to start from nodes 0, 1, and 2
depth = 3 #look for those within length 3.
foundset = {key for source in base for key in nx.single_source_shortest_path(G,source,cutoff=depth).keys()}
H=G.subgraph(foundset)
nx.draw_networkx(H)
import pylab as py
py.savefig('tmp.png')
try snowball sampling?
so for the set of nodes you have searched that contains your keyword.
look for all their neighbors, add then to the set.
look for all the neighbors' neighbors, add the new ones to the set.
iterate this process for n times.
at the end you will get a set of all the nodes you want, then use the subgraph function to get a subgraph of all the nodes in your final set.
this may not be the most efficient solution but should work.
I'm trying to draw a networkx graph with weighted edges, but right now I'm having some difficulty.
As the title suggests, this graph is really huge:
Number of Nodes: 103362
Number of Edges: 1419671
And when I try to draw this graph with the following code:
pos = nx.spring_layout(G)
nx.draw(G, node_color='#A0CBE2',edge_color='#BB0000',width=2,edge_cmap=plt.cm.Blues,with_labels=False)
plt.savefig("edge_colormap.png") # save as png
plt.show() # display
(This is just me testing functionality, not my desired end result). I get the error:
ValueError: array is too big.
It's triggered from the spring_layout algorithm. Any idea what's causing this? Even when I use a different pos algorithm I get the same error, how can I avoid it (if I can)?
On another note, I want to colour the edges based on their weight. As you can see there are a lot of edges and probably a wide range of weights, what is the best way to do this?
Thanks for your patience.
EDIT FROM MY COMMENT:
I'm trying to investigate the density of the data I have. Basically I am looking at 50,000 matches each containing 10 players, and whenever two players meet in a game I +1 to the weight of the edge between them. The idea is that my end result will show me the strength of my data set. In my mind I want the heaviest edges at the centre and as we move out from the centre the data is less densely connected.
The problem lies in the spring_layout approach. With this many nodes it will take a while to calculate where all of them should go with respect to one another. With this many nodes I would suggest either figuring out the x,y positions yourself or plot much smaller subgraphs. (<5000 nodes or your computer might be a bit sluggish for a while.
Here is what 1000 nodes from an erdos_renyi_graph (randomly chosen edges) looks like.
It pulled off 2 nodes to highlight.
Next is what 1500 looks like
It got a little more detail. Now with 7-8 interesting nodes.
There isn't much to be gained by so many edges and so many nodes on a graph. And what happens if you don't like the output, you would need to re-run it again.
To get x,y positions of each node take a look at this. in NetworkX show graph with nodes at exact (x,y) position. Result is rotated