Fast processing

Fast processing - python

In python and igraph I have many nodes with high degree. I always need to consider the edges from a node in order of their weight. It is slow to sort the edges each time I visit the same node. Is there some way to persuade igraph to always give the edges from a node in weight sorted order, perhaps by some preprocessing?

As far as I understand, you wont have access to the C backend from Python. What about storing the sorted edge in an attribute of the vertices eg in g.vs["sortedOutEdges"] ?

Related

Generating a random planar graph in python

I am looking to generate a random planar graph in python with around 20 vertices. I checked out this planar graph generator but two problems emerged:
The algorithm on the aforementioned GitHub project seems a bit too overkill to generate a random planar graph that doesn’t have those many edges
Because it’s meant to generate massive graph, that algorithm is very complex, and therefore also a bit clunky and difficult to use
With that said, is there a simpler way to randomly generate a relatively small planar graph in python?

Create required number of nodes
Assign random x,y locations to the nodes.
WHILE nodes with no connected edges
Select N a random node with no edge
LOOP select M a different node at random
IF edge N-M does NOT intersect previous edges
Add N-M edge to graph
BREAK out of LOOP

Programming Combination Math (Using Python)

Traffic map Image
Traffic map contains straight segments of two types. The ones with arrows can only go one way in the direction of the arrow and those without arrows can go in two directions. Calculate the number of ways to go from A to B without any repeated lines?
With this math problem, how to solve it ? I don't know what to do right now !

It's a graph theory problem, in your problem try to consider every junction as a node and the segments as the edges of the graph, generally your graph is a directed-graph, when the segments that have two directions are just two edges from the same rwo nodes.
The algorithm you need to implement is DFS (Depth First Traversal).
The idea is as following:
Start the DFS traversal from source.
Keep storing the visited vertices in an array or HashMap say ‘path[]’.
If the destination vertex is reached, print contents of path[].
The important thing is to mark current vertices in the path[] as visited also so that the traversal doesn’t go in a cycle.

Tracking cycles while adding random edges to a sparse graph

Scenario: I have a graph, represented as a collection of nodes (0...n). There are no edges in this graph.
To this graph, I connect nodes at random, one at a time. An alternative way of saying this would be that I add random edges to the graph, one at a time.
I do not want to create simple cycles in this graph.
Is there a simple and/or very efficient way to track the creation of cycles as I add random edges? With a graph traversal, it is easy, since we only need to track the two end nodes of a single path. But, with this situation, we have any number of paths that we need to track - and sometimes these paths combine into a larger path, and we need to track that too.
I have tried several approaches, which mostly come down to maintaining a list of "outer nodes" and a set of nodes internal to them, and then when I add an edge going through it and updating it. But, it becomes extremely convoluted, especially if I remove an edge in the graph.
I have attempted to search out algorithms or discussions on this, and I can't really find anything. I know I can do a BFS to check for cycles, but it's so so so horribly inefficient to BFS after every single edge addition.

Possible solution I came up with while in the shower.
What I will do is maintain a list of size n, representing how many times that node has been on an edge.
When I add an edge (i,j), I will increment list[i] and list[j].
If after an edge addition, list[i] > 1, and list[j] > 1, I will do a DFS starting from that edge.
I realized I don't need to BFS, I only need to DFS from the last added edge, and I only need to do it if it at least has potential to be in a cycle (it's nodes show up twice).
I doubt it is optimal.. maybe some kind of list of disjoint sets would be better. But this is way better than anything I was thinking of before.

If you keep track of the connected components of your graph, you can test for every edge you insert whether the involved nodes are already in the same component. If they are, then the edge you are inserting will introduce a cycle to your graph.
Have a look at this post that seems to give some good references on how to do this: https://cstheory.stackexchange.com/questions/2548/is-there-an-online-algorithm-to-keep-track-of-components-in-a-changing-undirecte

How to isolate a subnetwork from a graph, including nodes and edges up to depth n?

I'm trying to write a script which will let me mangle (edit, cut, change) some big network files from a command line interface. One of the things I'm trying to do is isolate a subnetwork from a larger network based on searching for matches in node labels.
So basically I'd have a networkx graph with maybe 7000 nodes and corresponding edges with various labels. Then I'd match a string, eg "Smith" to the nodes. I'd get a match of maybe 30 nodes (label:"John Smith", label:"Peter Smith", etc). I'd then like to make a new networkx network containing those 30 nodes, and the edges they have, and the nodes those edges connect to, up to a depth of n, or optionally until all the nodes and edges are found.
My current code is rubbish, so maybe I'll try to write some pseudocode:
for node in networkx_network:
if searched_string in node:
new_network.add(node.subnetwork(depth=n))
I've spent days googling for a solution, and maybe subgraph, or neighbors, or connected_components is the right thing to do, but I can't wrap my head around how to do it.

single_source_shortest_path has an optional cutoff argument. Including it you can tell networkx to basically find paths to nodes within a certain distance of a given node. It's a bit of overkill because there's a lot of other information in those paths you don't need. If you then just take the keys of the resulting set of paths, you have all nodes reachable within that distance, and networkx has ways to find the graphs containing all of these nodes and the edges between them.
By looking at the source code for this, and removing the effort taken to track the actual paths, you can make it much more efficient if needed. But as it stands, the following works:
import networkx as nx
G=nx.fast_gnp_random_graph(100000,0.00002) #sample graph.
base = range(3) #arbitrarily choose to start from nodes 0, 1, and 2
depth = 3 #look for those within length 3.
foundset = {key for source in base for key in nx.single_source_shortest_path(G,source,cutoff=depth).keys()}
H=G.subgraph(foundset)
nx.draw_networkx(H)
import pylab as py
py.savefig('tmp.png')

try snowball sampling?
so for the set of nodes you have searched that contains your keyword.
look for all their neighbors, add then to the set.
look for all the neighbors' neighbors, add the new ones to the set.
iterate this process for n times.
at the end you will get a set of all the nodes you want, then use the subgraph function to get a subgraph of all the nodes in your final set.
this may not be the most efficient solution but should work.

Drawing massive networkx graph: Array too big

I'm trying to draw a networkx graph with weighted edges, but right now I'm having some difficulty.
As the title suggests, this graph is really huge:
Number of Nodes: 103362
Number of Edges: 1419671
And when I try to draw this graph with the following code:
pos = nx.spring_layout(G)
nx.draw(G, node_color='#A0CBE2',edge_color='#BB0000',width=2,edge_cmap=plt.cm.Blues,with_labels=False)
plt.savefig("edge_colormap.png") # save as png
plt.show() # display
(This is just me testing functionality, not my desired end result). I get the error:
ValueError: array is too big.
It's triggered from the spring_layout algorithm. Any idea what's causing this? Even when I use a different pos algorithm I get the same error, how can I avoid it (if I can)?
On another note, I want to colour the edges based on their weight. As you can see there are a lot of edges and probably a wide range of weights, what is the best way to do this?
Thanks for your patience.
EDIT FROM MY COMMENT:
I'm trying to investigate the density of the data I have. Basically I am looking at 50,000 matches each containing 10 players, and whenever two players meet in a game I +1 to the weight of the edge between them. The idea is that my end result will show me the strength of my data set. In my mind I want the heaviest edges at the centre and as we move out from the centre the data is less densely connected.

The problem lies in the spring_layout approach. With this many nodes it will take a while to calculate where all of them should go with respect to one another. With this many nodes I would suggest either figuring out the x,y positions yourself or plot much smaller subgraphs. (<5000 nodes or your computer might be a bit sluggish for a while.
Here is what 1000 nodes from an erdos_renyi_graph (randomly chosen edges) looks like.
It pulled off 2 nodes to highlight.
Next is what 1500 looks like
It got a little more detail. Now with 7-8 interesting nodes.
There isn't much to be gained by so many edges and so many nodes on a graph. And what happens if you don't like the output, you would need to re-run it again.
To get x,y positions of each node take a look at this. in NetworkX show graph with nodes at exact (x,y) position. Result is rotated

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast processing - python

As far as I understand, you wont have access to the C backend from Python. What about storing the sorted edge in an attribute of the vertices eg in g.vs["sortedOutEdges"] ?

Related

Generating a random planar graph in python

Programming Combination Math (Using Python)

Tracking cycles while adding random edges to a sparse graph

How to isolate a subnetwork from a graph, including nodes and edges up to depth n?

Drawing massive networkx graph: Array too big

Categories

Resources