I have installed networkx and matplotlib packages. How can I generate a power law graph based on degree correlation i.e. graphs with high or low degree of homophily
Have you looked at the examples on the Networkx site? This example might help you get started with this.
There are also a number of functions within Networkx which generate random graphs which will probably be helpful. Have a look for the random_powerlaw_tree(....) function detailed in the graph generators section of the documentation.
networkx.generators.barabasi_albert_graph will generate a graph according to the Barabasi-Albert model, which will have a power-law degree distribution.
Related
I have a network that is a graph network and it is the Email-Eu network that is available in here.
This dataset has the actual dataset, which is a graph of around 1005 nodes with the edges that form this giant graph. It also has the ground truth labels for the nodes and its corresponding communities (department). Each one of these nodes belongs to one of each 42 departments.
I want to run a community detection algorithm on the graph to find to the corresponding department for each node. My main objective is to find the nodes in the largest community.
So, first I need to find the first 42 departments (Communities), then find the nodes in the biggest one of them.
I started with Girvan-Newman Algorithm to find the communities. The beauty of Girvan-Newman is that it is easy to implement since every time I need to find the edge with the highest betweenness and remove it till I find the 42 departments(Communities) I want.
I am struggling to find other Community Detection Algorithms that give me the option of specifying how many communities/partitions I need to break down my graph into.
Is there any Community Detection Function/Technique that I can use, which gives me the option of specifying how many communities do I need to uncover from my graph? Any ideas are very much appreciated.
I am using Python and NetworkX.
A (very) partial answer (and solution) to your question is to use Fluid Communities algorithm implemented by Networkx as asyn_fluidc.
Note that it works on connected, undirected, unweighted graphs, so if your graph has n connected components, you should run it n times. In fact this could be a significant issue as you should have some sort of preliminary knowledge of each component to choose the corresponding k.
Anyway, it is worth a try.
You may want to try pysbm. It is based on networkx and implements different variants of stochastic block models and inference methods.
If you consider to switch from networkxto a different python based graph package you may want to consider graph-tool, where you would be able to use the stochastic block model for the clustering task. Another noteworthy package is igraph, may want to look at How to cluster a graph using python igraph.
The approaches directly available in networkx are rather old fashioned. If you aim for state of the art clustering methods, you may consider spectral clustering or Infomap. The selection depends on your desired usage of the inferred communities. The task of inferring ground truth from a network, falls under (approximate) the No-Free-Lunch theorem, i.e. (roughly) no algorithm exists, such that it returns "better" communities than any other algorithm, if we average the results over all possibilities.
I am not entirely sure of my answer but maybe you can try this. Are you aware of label propagation ? The main idea is that you have some nodes in graph which are labelled i.e. they belong to a community and you want to give labels to other unlabelled nodes in your graph. LPA will spread these labels across the graph and give you a list of nodes and the communities they belong to. These communities will be the same as the ones that your labelled set of nodes belong to.
So I think you can control the number of communities you want to extract from the graph by controlling the number of communities you initialise in the beginning. But I think it is also possible that after LPA converges some of the communities you initialised vanish from the graph due the graph structure and also randomness of the algorithm. But there are many variants of LPA where you can control this randomness. I believe this page of sklearn talks about it.
You can read about LPA here and also here
I've been getting familiar with the pymatgen package and need to make phase diagrams. There's a quick tutorial on this web page that goes through how to make a ternary diagram, but I actually want to make a much simpler one of a pure substance.
I have in mind something like this. I've gone through the documentation and done a lot of google searches but haven;t been able to find what I'm looking for. Perhaps it's possible to combine the data from pymatgen with a graphing package like matplotlib?
T-P phase diagrams like those show phase stability against pressure and temperature as the independent variables. The data on the Materials Project was calculated using density functional theory (DFT) at a temperature of 0K and a pressure of 0Pa. Unfortunately it's not possible to create a T-P phase diagram from the MP data.
I have two questions.
1) I have an array like [1,2,3,4,5,5,3,1]. and I don't know which distributions it is. Can I use scipy.stats to calculate pmf,cdf automatically?
2)scipy.stats is just like a library of distributions? If I want to analysis data, I have to find one distributions or define one? I need to manually calculate some of data, like pmf. Am I understanding correctly?
Well, scipy.stats is not a library for telling you the distribution of data and calculating pmf and cdf automatically. Its a library for easing your tasks while estimating the probabily distribution. You have to explore your data and find which distribution which fits your data with least error ,which is the ultimate task and scipy.stats helps you achieving this.... you don't have to reinvent the wheel as they say by writing all the mathematical functions again and again.
Well, to answer your question in the comment, lets suppose you have a dataset, to get an insight and get a starting point for your analysis, what you'll do is plot the data in a historgram (which also shows distribution of data), now you can plot different distributions on the same plot using scipy.stats to get a feel of bestfit....
Check out this answer, it might help ya...
https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best
So I have a 2D vector field {u(x,y,t), v(x,y,t)} representing velocities of an unsteady flow at different instances in time. I don't have an analytical description of the flow, just the two components u and v over time.
I am aware of matplotlib.quiver and the answer to this question which suggests to use this for plotting streamlines.
Now I want to also plot a couple of pathlines and streaklines of the vector field.
Is there any tool that is capable of doing this (preferably a Python package)? This seems to be a common task but I couldn't find anything and don't want to waste time on reinventing the wheel.
Currently, there is no functionality in matplotlib to plot streaklines. However, Tom Flannaghan's streamline plotting utility has been improved and merged into the codebase. It will be available in matplotlib version 1.2, which is to be released in the next few weeks.
At present, your best bet is to solve the streakline ODE in the Wikipedia page you linked to. If you want to use python to do this, you can use scipy.integrate.odeint. This is exactly what matplotlib.axes.streamplot does at present for streamlines.
I want to create a graph using networkx which has positive or negative degree correlation.
Like a graph for a social network or citations in academic papers etc.
Can you suggest some function for this?
If you are talking about producing a visual graph (diagram) you could look at using matplotlib to generate them. I'm not sure if there is going to be a single function that will do what you want (not enough detail) but its a comprehensive library used in many projects to achieve complex graphing related tasks.