Is it possible to position nodes in a networkx graph so that nodes sharing a certain (single) attribute are clustered near each other?
For example, if the nodes represent people and each has an attribute 'age', how can I make it so that people of the same age are near each other when I draw the graph? Is this possible?
You can specify the x,y coordinates of each node. So if you have some idea on how you want it to look it can be programmed. You could try a spring layout but this isn't going to be hit or miss, it's going to be more misses. The way to attempt it is by connecting the nodes of the same data to each other. (people of the same age have one or more edges between them)
The only way I see this working well with large amounts of data is using a tool called Gephi to manipulate by hand based on node data etc... it's like a photoshop of network graphs.
I would suggest yet another approach. Create an extra attribute for your nodes that corresponds to a range of values of the attribute you want to use for the grouping. For example, if your attribute is age, then create ranges 18-30, 31-40, etc. Save the result in GraphML format and load the network with NodeXL (which is not free anymore but you could buy it for a small fee)
In NodeXL you can group the nodes by some attribute and it lays out the different groups so that nodes belonging to the same group are laid out close to each other. You can also choose how nodes in a group are laid out, from a list of layout options)
Related
Our network graph is noisy and needs to be understood, broken up in to separate clusters, and generally cleaned up. How do people do that is my general question and my specific question regards simply deleting an object.
I'd like to design a mini-app to hand a PyVis network graph to a user and let them delete ( drop, remove ) edges or nodes somehow, preferably with a clean button or keyboard-shortcut to affect the currently selected objects.
I can't find in the PyVis documentation a command or example to drop/delete/remove a node or edge. I can't find any question let alone answer on StackOverflow tagged [pyvis] for how to do this. ( I can manage lists and queues etc in Python, this is about altering a visualized graph in real-time without having to secretly rebuild the whole thing.
Am I missing something obvious? Didn't anyone else ever want to do this?
But, I can't find any documentation on how to delete an existing node or edge from a visualized network. I'm looking at having to capture input, figure out the X, Y coordinates and current list of objects, back way up and remove what I want gone from the lists, and regenerating the whole display. Seriously?
That's right dropping nodes and edges doesn't exist in pyvis. I used Networkx for removing nodes and edges. This package is in my opinion better suited for playing around with the graph, although it contains minimal support for visualizations.
Pyvis has a from_nx function for importing it from Networkx. Since you want to do it in realtime however, I guess the problem is that there is an absence of this feature within pyvis. In a pyvis approach you could destroy the node object (part of network). If you want to do it in the interface by clicking I suggest combining it with networkx as it can help you check conditions and feed these node ids.
I hope that my answers helps you. I also posted some code below for the people interested in converting networkx to pyvis.
import networkx as nx
from pyvis.network import Network
foo = nx.DiGraph()
foo.add_nodes_from(list_with_node_ids)
foo.add_edge(child_node_id), parent_node_id)
foo.remove_nodes_from(list_with_nodes_to_remove)
bar = Network('500px', '500px'
Network.from_nx(bar, foo)
bar.show('bar.html')
Hi I'm quite new with networks and I've been trying to programme a code that gets all the .edges and .nodes files of a folder and generate a graphml file so I can visualize it in another software. But I also need to add some colours in my nodes but when I tried it I got: KeyError 29
I was running a loop through my array of nodes to add the color of each node.
Here's the part of the code where I try to add the color attribute. So the nodes will be coloured with 4 different colors: the best fitness, the worse, top 10% best fitness and 10% worse.
for i in range(len(nodes)):
if nodes[i]==top:
NetGraph.node[i]['color']='r'
Hope you can help me! Cheers
If you are trying to "merge" relationship data that are stored in a number of different .nodes' and.edges' files into one graph, then it is possible that as the files are read from the disk you come across a node that has not yet been added to the graph.
In general, I feel that more information is required to provide a more meaningful answer to this question. For example: what is the format of the .node and .edge files? What is in the top variable? (Is that a list or a single number variable representing a threshold?).
However, based on what is mentioned so far in this question, here are a few pointers:
Try to build the graph first and colour it later. This might appear inneficient if you already have the fitness data but it will be the easiest way to get you to a working piece of code.
Make sure that your node IDs are indeed integer numbers. That is, each node, is known in the graph, by its index value. For example 2,3,5, etc instead of "Paris", "London", "Berlin", etc (i.e. string node IDs). If it is the latter, then the for would be better formed as: for aNode in G.nodes(data = True):. This will return an iterator with each node's ID and a dictionary with all of the existing node's data.
If top is a single variable, then it doesn't make sense to compare the node's ID with the top threshold. It would be like saying if 22 (which is a node ID) is equal to 89 (which is some expression of efficiency) then apply the red colour to the node. If top is a list that contains all the nodes that are considered top nodes, then the condition expression should have been: if nodes[i] in top:.
You seem to have skipped an indentation below the if (?). For the statement that assigns the colour to the node provided that the condition is True, to work, it needs to be indented one more set of 4 spaces to the right.
The expression to assign the colour is correct.
Please note that Networkx will make an attempt to write every node and edge attribute it comes across in a Graph to the appropriate format. For more information on this, please see the response to this question. Therefore, once you are satisfied with the structure of a given graph (G), then you can simply call networkx.write_graphml(G, 'mygraph.gml') to save it to the disk (where networkx is the name of the module). The networkx.write_* functions will export a complete version of your graph (G) to a number of different formats or raise an exception if a data type cannot be serialised properly.
I hope this helps. Happy to ammend the response if more details are provided later on.
I'm trying to write a script which will let me mangle (edit, cut, change) some big network files from a command line interface. One of the things I'm trying to do is isolate a subnetwork from a larger network based on searching for matches in node labels.
So basically I'd have a networkx graph with maybe 7000 nodes and corresponding edges with various labels. Then I'd match a string, eg "Smith" to the nodes. I'd get a match of maybe 30 nodes (label:"John Smith", label:"Peter Smith", etc). I'd then like to make a new networkx network containing those 30 nodes, and the edges they have, and the nodes those edges connect to, up to a depth of n, or optionally until all the nodes and edges are found.
My current code is rubbish, so maybe I'll try to write some pseudocode:
for node in networkx_network:
if searched_string in node:
new_network.add(node.subnetwork(depth=n))
I've spent days googling for a solution, and maybe subgraph, or neighbors, or connected_components is the right thing to do, but I can't wrap my head around how to do it.
single_source_shortest_path has an optional cutoff argument. Including it you can tell networkx to basically find paths to nodes within a certain distance of a given node. It's a bit of overkill because there's a lot of other information in those paths you don't need. If you then just take the keys of the resulting set of paths, you have all nodes reachable within that distance, and networkx has ways to find the graphs containing all of these nodes and the edges between them.
By looking at the source code for this, and removing the effort taken to track the actual paths, you can make it much more efficient if needed. But as it stands, the following works:
import networkx as nx
G=nx.fast_gnp_random_graph(100000,0.00002) #sample graph.
base = range(3) #arbitrarily choose to start from nodes 0, 1, and 2
depth = 3 #look for those within length 3.
foundset = {key for source in base for key in nx.single_source_shortest_path(G,source,cutoff=depth).keys()}
H=G.subgraph(foundset)
nx.draw_networkx(H)
import pylab as py
py.savefig('tmp.png')
try snowball sampling?
so for the set of nodes you have searched that contains your keyword.
look for all their neighbors, add then to the set.
look for all the neighbors' neighbors, add the new ones to the set.
iterate this process for n times.
at the end you will get a set of all the nodes you want, then use the subgraph function to get a subgraph of all the nodes in your final set.
this may not be the most efficient solution but should work.
I am doing an agent based modeling and currently have this set up in Python, but I can switch over to Java if necessary.
I have a dataset on Twitter (11 million nodes and 85 million directed edges), and I have set up a dictionary/hashmap so that the key is a specific user A and its value is a list of all the followers (people that follow user A). The "nodes" are actually just the integer ID numbers (unique), and there is no other data. I want to be able to visualize this data through some method of clustering. Not all individual nodes have to be visualized, but I want the nodes with the n most followers to be visualized clearly, and the surrounding area around that node would represent all the people who follow it. I'm modeling the spread of something throughout the map, so I need the nodes and areas around the nodes to change colors. Ideally, it would be a continuous visualization, but I don't mind it just taking snapshots at every ith iteration.
Additionally, I was thinking of having the clusters be separated such that:
if person A and person B have enough followers to be visualized individually, and person A and B are connected (one follows the other or maybe even both ways), then they are both visualized, but are visually separated from each other despite being connected so that the visualization is clearer.
Anyways, I was wondering whether there was a package in Python (preferably) or Java that would allow one to do this semi easily.
Gephi has a very nice GUI and an associated Java toolkit. You can experiment with visual layout in the GUI until you have everything looking the way you like and then code up your own version using the toolkit.
I have an XML file which contains different nodes of data that I randomly generated. What I want to do is run through each node and create a tree out of it. My customized software uses the XML data to draw these nodes and their connections visually.
There is no criteria for which node connects to which; given 500 nodes, I want the ability to generate a tree with a decently complex breadth and depth.
I'm coding this in python using a customized library that draws diagrams using JgraphX so there's no point for me to show the exact code. But assume that I have the following 3 functions:
getXY_coord(a), get the XY coord of the node on the diagram
connectNodes(a,b), connects node a with b
getAllNodes(), returns list of all nodes on diagram
How would I approach making this complex tree? It doesn't even have to be visually organized, a node can connect to another node on the opposite side of the diagram, as long as the connections themselves are complex.
The only thing I was able to pull off was to randomize the list of nodes and connect the nodes adjacent in the list. This doesn't get what I want however.
I suggest looking at Minimum Spanning Tree algorithms like Prim's algorithm.
The networkx module will do this for you - see the documentation.