Package to vizualize clustering in python or Java?

Package to vizualize clustering in python or Java? - python

I am doing an agent based modeling and currently have this set up in Python, but I can switch over to Java if necessary.
I have a dataset on Twitter (11 million nodes and 85 million directed edges), and I have set up a dictionary/hashmap so that the key is a specific user A and its value is a list of all the followers (people that follow user A). The "nodes" are actually just the integer ID numbers (unique), and there is no other data. I want to be able to visualize this data through some method of clustering. Not all individual nodes have to be visualized, but I want the nodes with the n most followers to be visualized clearly, and the surrounding area around that node would represent all the people who follow it. I'm modeling the spread of something throughout the map, so I need the nodes and areas around the nodes to change colors. Ideally, it would be a continuous visualization, but I don't mind it just taking snapshots at every ith iteration.
Additionally, I was thinking of having the clusters be separated such that:
if person A and person B have enough followers to be visualized individually, and person A and B are connected (one follows the other or maybe even both ways), then they are both visualized, but are visually separated from each other despite being connected so that the visualization is clearer.
Anyways, I was wondering whether there was a package in Python (preferably) or Java that would allow one to do this semi easily.

Gephi has a very nice GUI and an associated Java toolkit. You can experiment with visual layout in the GUI until you have everything looking the way you like and then code up your own version using the toolkit.

Related

How to represent grid environment as a graph?

I would like to try to proceed some experiment, where I give two inputs for reinforcement learning agents. As first input, I would like to provide him grid-like environment that would represent room-maze with some walls and reward in one of the maze rooms. And the second one would be a graph representation of this maze.
I'm still stuck at planning the experiment, because I'm not sure about creating sufficient way to create a proper graph structure of 2D-grid environment. For example, I was thinking about looking for some algorithms that is able to find modularities (e.g. modules could correspond to rooms).
Do you have any idea what would be worth recommending when it comes to building a graph? And maybe what RL algorithm would work fine on both?

A maze grid is often represented in a text file as "ASCII ART"
The * mark the cells that are in the path from the start to the end.
It is fairly straightforward to write code to read this text file and create a graph with links between adjacent cells. If there is a wall the link has "infinite" cost, if there is a door then the link cost can be small or zero. From this graph a standard path finding code will find the path through the maze.
Here is the result as visualized by graphviz
"cnrm" is the node representing the call at row m, column n and the path is highlighted in red.
You can look at C++ code for doing this at https://github.com/JamesBremner/PathFinder
specifically the code to read the ascii art and generate the graph is in https://github.com/JamesBremner/PathFinder/blob/main/src/cMaze.cpp

Interactive Scatter plots based on user input

I need to create a specific scatterplot using 3 data columns, x,y,z, (height, weight, ID number) as the basic inputs.
I have height, weight, and a unique identifier for each of ~2000 individuals in the set. I want the user to be able to highlight “their location” within the scatterplot of all the data point precisely. To do that out of 2000 datapoints, they’ll need to input their unique ID into a text box, executing, and altering the graph:
a)”accent” the unique input data point (e.g., change the individuals specific point to red while other data points remain gray)
b) as a sort of “tool tip”: Provide the exact values of height, weights and IDnumber in a readable “box” somewhere in or near the graph area, preferably in some open area of the graph’s “state space” This is mainly to let them note their recorded values in our dataset. (Yes, they presumably know their own height and weight… but imagine they’d like to check on whether we have misenterwd their values in the dataset)
I figure there’s an interactive graph package that allows this filter-by typed-input-value-z option, but have only seen filter options that are a small number predefined categories. For example what I have seen permits a drop down box to filter data based on Z. The problem is that my drop down for ID would have as many values as data points and that’s 1000s… so unwieldy compared to my text box idea.
I would like to do this in R or a package that can easily (Im a Stats user mainly, so my do not have much programming skills are limited to writing basic batch programs, .do files, with canned procedures). A non R package that will easily let me create, edit and slap this in a webpage would certainly do.

multi colored edges python

I'm testing an algorithm that finds a shortest path between two certain vertexes in graph and gives a list of vertexes after each turn (actually it gives three paths - one of them is a shortest path in this graph and two others are some kind of extra paths that are also important for us and are used for further shortest path calculations). On each turn the weights of graph edges change somehow so every turn we get a new triple of lists (paths). I would like to visualize the evolving of these paths by drawing a graph (this graph is actually a grid that represents a city, e.g. New York) and each kind of path would be represented with certain colour (so on each turn there would be a grid with three coloured paths). One more time - on every turn the paths will be different so the picture will change. What is the best way to represent it? And one more question - sometimes there would be edges that belong to two or maybe even three of these pathes and I'd like to show it, so it would be nice if there is an opportunity to colour this edge with two/three colours at once. It would be perfect if it was possible to make it look like two/three thinner edges put along together, but I could only find a situation where we draw several lines of different colour that are being put together consecutively (like that: enter image description here). Is there a way to make it the first way?
I'm sorry for being discursive but I've never dealt with graphics in Python and I desperately need help. Thanks!

If you want to show the image in a GUI, it depends on the GUI toolkit that you want to use. In the Tkinter toolkit that comes with most Python distributions you could use the Canvas widget. There are several tutorials online [1], [2]. Most GUI toolkits have a similar functionality, but they can have different names.
If you want to save an image to a file, there are many graphics libraries you could use, depending on what kind of format you want to save it to.
For example the Python bindings to the Cairo library can save a picture as PDF or SVG vector formats.
The Pillow library on the other hand supports many bitmap formats.
There are many others; matplotlib, agg, gd are just some examples.

Positioning networkx nodes by shared node attributes

Is it possible to position nodes in a networkx graph so that nodes sharing a certain (single) attribute are clustered near each other?
For example, if the nodes represent people and each has an attribute 'age', how can I make it so that people of the same age are near each other when I draw the graph? Is this possible?

You can specify the x,y coordinates of each node. So if you have some idea on how you want it to look it can be programmed. You could try a spring layout but this isn't going to be hit or miss, it's going to be more misses. The way to attempt it is by connecting the nodes of the same data to each other. (people of the same age have one or more edges between them)
The only way I see this working well with large amounts of data is using a tool called Gephi to manipulate by hand based on node data etc... it's like a photoshop of network graphs.

I would suggest yet another approach. Create an extra attribute for your nodes that corresponds to a range of values of the attribute you want to use for the grouping. For example, if your attribute is age, then create ranges 18-30, 31-40, etc. Save the result in GraphML format and load the network with NodeXL (which is not free anymore but you could buy it for a small fee)
In NodeXL you can group the nodes by some attribute and it lays out the different groups so that nodes belonging to the same group are laid out close to each other. You can also choose how nodes in a group are laid out, from a list of layout options)

Server Side Google Markers Clustering - Python/Django

After experimenting with client side approach to clustering large numbers of Google markers I decided that it won't be possible for my project (social network with 28,000+ users).
Are there any examples of clustering the coordinates on the server side - preferably in Python/Django?
The way I would like this to work is to gradually index the markers based on their proximity (radius) and zoom level.
In another words when a new user registers he/she is automatically assigned to a certain 'group' of markers that are close to each other thus increasing the 'group's' counter. What's being send to the server is just a small number of 'groups'. Only when the zoom level/scale of map is 1:1 - actual users are shown on the map.
That way the client side will have to deal only with 10-50 markers per request/zoom level.

This is a paid service that uses server-side clustering, but I'm not sure how it works. I'm guessing that they just use your data to generate the markers to be shown at each zoom level.
Update: This tutorial demonstrates a basic server-side clustering function. It's written in PHP for the Static Maps API, but you could use it as a starting point.

You might want to take a look at the DBSCAN and OPTICS pages on wikipedia, these looks very suitable for clustering places on a map. There is also a page about Cluster Analysis that shows all the possible algorithms you can use, most would be trivial to implement using the language of your choice.
With 28k+ points, you might want to skip django and just jump into C/C++ directly, and surely not expect this to get calculated in real-time in response to web requests.

One way to do it would be to define a grid with a unit size based on the zoom level. So you collect up all the items within a grid by lat,lon to one decimal place. An example is 42.2x73.4. So a point at 42.2003x73.4021 falls in that grid cell. That cell is bounded by 42.2x73.3 and 42.2x73.5.
If there are one or more points in a grid cell, you place a marker in the center of that grid.
You then hook up the zoomend event and change your grid size accordingly, and redraw the markers.
http://code.google.com/apis/maps/documentation/reference.html#GMap2.zoomend

You can try my server-side clustering django app:
https://github.com/biodiv/anycluster
It prvides a kmeans and a grid cluster.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.