Tracking cycles while adding random edges to a sparse graph - python

Scenario: I have a graph, represented as a collection of nodes (0...n). There are no edges in this graph.
To this graph, I connect nodes at random, one at a time. An alternative way of saying this would be that I add random edges to the graph, one at a time.
I do not want to create simple cycles in this graph.
Is there a simple and/or very efficient way to track the creation of cycles as I add random edges? With a graph traversal, it is easy, since we only need to track the two end nodes of a single path. But, with this situation, we have any number of paths that we need to track - and sometimes these paths combine into a larger path, and we need to track that too.
I have tried several approaches, which mostly come down to maintaining a list of "outer nodes" and a set of nodes internal to them, and then when I add an edge going through it and updating it. But, it becomes extremely convoluted, especially if I remove an edge in the graph.
I have attempted to search out algorithms or discussions on this, and I can't really find anything. I know I can do a BFS to check for cycles, but it's so so so horribly inefficient to BFS after every single edge addition.

Possible solution I came up with while in the shower.
What I will do is maintain a list of size n, representing how many times that node has been on an edge.
When I add an edge (i,j), I will increment list[i] and list[j].
If after an edge addition, list[i] > 1, and list[j] > 1, I will do a DFS starting from that edge.
I realized I don't need to BFS, I only need to DFS from the last added edge, and I only need to do it if it at least has potential to be in a cycle (it's nodes show up twice).
I doubt it is optimal.. maybe some kind of list of disjoint sets would be better. But this is way better than anything I was thinking of before.

If you keep track of the connected components of your graph, you can test for every edge you insert whether the involved nodes are already in the same component. If they are, then the edge you are inserting will introduce a cycle to your graph.
Have a look at this post that seems to give some good references on how to do this: https://cstheory.stackexchange.com/questions/2548/is-there-an-online-algorithm-to-keep-track-of-components-in-a-changing-undirecte

Related

Programming Combination Math (Using Python)

Traffic map Image
Traffic map contains straight segments of two types. The ones with arrows can only go one way in the direction of the arrow and those without arrows can go in two directions. Calculate the number of ways to go from A to B without any repeated lines?
With this math problem, how to solve it ? I don't know what to do right now !
It's a graph theory problem, in your problem try to consider every junction as a node and the segments as the edges of the graph, generally your graph is a directed-graph, when the segments that have two directions are just two edges from the same rwo nodes.
The algorithm you need to implement is DFS (Depth First Traversal).
The idea is as following:
Start the DFS traversal from source.
Keep storing the visited vertices in an array or HashMap say ‘path[]’.
If the destination vertex is reached, print contents of path[].
The important thing is to mark current vertices in the path[] as visited also so that the traversal doesn’t go in a cycle.

octree vs graph data structure

I need to implement in python a data structure that each node in the data structure represents a rectangle on a plane.
the operation that I need from the data structure is:
1) split a node, that split a rectangle into 4 rectangles with the same
size(in the end I suppose to get something like from A to B in this example)
2) get all neighbor rectangles(for some computation)
Up to now, I thought about two options both of them not optimal, the first one is to use some kind of octree/quadtree which make the splitting very easy but I'm not sure about finding all the neighbor rectangles. the second is a Graph which enables me to find the neighbors very easy but makes it difficult to split a node.
I didn't succeed to think about an elegant solution for doing both things, and I will appreciate suggestions, even better if they are implemented in a python library.

How to isolate a subnetwork from a graph, including nodes and edges up to depth n?

I'm trying to write a script which will let me mangle (edit, cut, change) some big network files from a command line interface. One of the things I'm trying to do is isolate a subnetwork from a larger network based on searching for matches in node labels.
So basically I'd have a networkx graph with maybe 7000 nodes and corresponding edges with various labels. Then I'd match a string, eg "Smith" to the nodes. I'd get a match of maybe 30 nodes (label:"John Smith", label:"Peter Smith", etc). I'd then like to make a new networkx network containing those 30 nodes, and the edges they have, and the nodes those edges connect to, up to a depth of n, or optionally until all the nodes and edges are found.
My current code is rubbish, so maybe I'll try to write some pseudocode:
for node in networkx_network:
if searched_string in node:
new_network.add(node.subnetwork(depth=n))
I've spent days googling for a solution, and maybe subgraph, or neighbors, or connected_components is the right thing to do, but I can't wrap my head around how to do it.
single_source_shortest_path has an optional cutoff argument. Including it you can tell networkx to basically find paths to nodes within a certain distance of a given node. It's a bit of overkill because there's a lot of other information in those paths you don't need. If you then just take the keys of the resulting set of paths, you have all nodes reachable within that distance, and networkx has ways to find the graphs containing all of these nodes and the edges between them.
By looking at the source code for this, and removing the effort taken to track the actual paths, you can make it much more efficient if needed. But as it stands, the following works:
import networkx as nx
G=nx.fast_gnp_random_graph(100000,0.00002) #sample graph.
base = range(3) #arbitrarily choose to start from nodes 0, 1, and 2
depth = 3 #look for those within length 3.
foundset = {key for source in base for key in nx.single_source_shortest_path(G,source,cutoff=depth).keys()}
H=G.subgraph(foundset)
nx.draw_networkx(H)
import pylab as py
py.savefig('tmp.png')
try snowball sampling?
so for the set of nodes you have searched that contains your keyword.
look for all their neighbors, add then to the set.
look for all the neighbors' neighbors, add the new ones to the set.
iterate this process for n times.
at the end you will get a set of all the nodes you want, then use the subgraph function to get a subgraph of all the nodes in your final set.
this may not be the most efficient solution but should work.

Drawing massive networkx graph: Array too big

I'm trying to draw a networkx graph with weighted edges, but right now I'm having some difficulty.
As the title suggests, this graph is really huge:
Number of Nodes: 103362
Number of Edges: 1419671
And when I try to draw this graph with the following code:
pos = nx.spring_layout(G)
nx.draw(G, node_color='#A0CBE2',edge_color='#BB0000',width=2,edge_cmap=plt.cm.Blues,with_labels=False)
plt.savefig("edge_colormap.png") # save as png
plt.show() # display
(This is just me testing functionality, not my desired end result). I get the error:
ValueError: array is too big.
It's triggered from the spring_layout algorithm. Any idea what's causing this? Even when I use a different pos algorithm I get the same error, how can I avoid it (if I can)?
On another note, I want to colour the edges based on their weight. As you can see there are a lot of edges and probably a wide range of weights, what is the best way to do this?
Thanks for your patience.
EDIT FROM MY COMMENT:
I'm trying to investigate the density of the data I have. Basically I am looking at 50,000 matches each containing 10 players, and whenever two players meet in a game I +1 to the weight of the edge between them. The idea is that my end result will show me the strength of my data set. In my mind I want the heaviest edges at the centre and as we move out from the centre the data is less densely connected.
The problem lies in the spring_layout approach. With this many nodes it will take a while to calculate where all of them should go with respect to one another. With this many nodes I would suggest either figuring out the x,y positions yourself or plot much smaller subgraphs. (<5000 nodes or your computer might be a bit sluggish for a while.
Here is what 1000 nodes from an erdos_renyi_graph (randomly chosen edges) looks like.
It pulled off 2 nodes to highlight.
Next is what 1500 looks like
It got a little more detail. Now with 7-8 interesting nodes.
There isn't much to be gained by so many edges and so many nodes on a graph. And what happens if you don't like the output, you would need to re-run it again.
To get x,y positions of each node take a look at this. in NetworkX show graph with nodes at exact (x,y) position. Result is rotated

Fast processing

In python and igraph I have many nodes with high degree. I always need to consider the edges from a node in order of their weight. It is slow to sort the edges each time I visit the same node. Is there some way to persuade igraph to always give the edges from a node in weight sorted order, perhaps by some preprocessing?
As far as I understand, you wont have access to the C backend from Python. What about storing the sorted edge in an attribute of the vertices eg in g.vs["sortedOutEdges"] ?

Categories