I have a temporal multilayer network that I want to find communities by using modularity maximization.
I was wondering if there is an equivalent version of Matlab GenLouvain in Python for maximizing modularity in community detection?
Preliminary search yielded this library, but the corresponding GitHub repository is gone.
https://pypi.org/project/louvain/
There are several other algorithms, such as Leiden algorithm (https://www.nature.com/articles/s41598-019-41695-z) for maximizing modularity with python implementation (https://github.com/vtraag/leidenalg) but I am trying to explore my options at the moment and run different solvers on the supra-modularity matrix I have. So, I want to start with the good old GenLouvain and then compare different solvers with python implementation.
Does anyone have any suggestions?
It's been a while I posted this question but as given above I found out the leidenalg is the best implementation in Python for Multilayer Modularity Maximization(MMM) with the improvement of intermediate refinement step, handling arbitrarily disconnected communities appropriately.
Here's a piece of code I use to implement MMM with the configuration null model.
def leiden(self, G, interslice, resolution):
## G: the appropriate igraph as explained in the documentation
## interslice: float, interlayer edge weights for the diagonal coupling of consecutive layers
## resolution: float, spatial resolution parameter
layers, interslice_layer, G_full = la.time_slices_to_layers(G, interslice_weight = interslice)
partitions = [la.RBConfigurationVertexPartition(H,
weights = 'weight',
resolution_parameter = resolution) for H in layers]
## resolution parameter below has to be 0 to recover the original multilayer modularity equation
interslice_partition = la.RBConfigurationVertexPartition(interslice_layer,
weights = 'weight',
resolution_parameter = 0)
optimiser = la.Optimiser()
diff = optimiser.optimise_partition_multiplex(partitions + [interslice_partition])
return(partitions, interslice_partition)
Having said that, MMM is not the greatest approach to perform dynamic community detection. There are more temporal community detection tools in the literature that is utilizing random walks, stochastic block models, tensor factorization methods one should check.
There is now a Python reimplementation of GenLouvain called PyGenStability available here. It optimizes Markov Stability (a generalization of Modularity) with the Louvain or Leiden algorithms.
i currently use CPMVertexPartion from leidenalg to analyse my networks. by varying the resolution parameter i can find maximum modularity Q.
my network is dense, beta approx >10, and modularity can improve up to 0.5
Related
I have been trying to implement the Reinforcement learning algorithm on Python using different variants like Q-learning, Deep Q-Network, Double DQN and Dueling Double DQN. Consider a cart-pole example and to evaluate the performance of each of these variants, I can think of plotting sum of rewards to number of episodes (attaching a picture of the plot) and the actual graphical output where how well the pole is stable while the cart is moving.
But these two evaluations are not really of interest in terms to explain the better variants quantitatively. I am new to the Reinforcement learning and trying to understand if any other ways to compare different variants of RL models on the same problem.
I am referring to the colab link https://colab.research.google.com/github/ageron/handson-ml2/blob/master/18_reinforcement_learning.ipynb#scrollTo=MR0z7tfo3k9C for the code on all the variants of cart pole example.
You can find the answer in research paper about those algorithms, because when a new algorithm been proposed we usually need the experiments to show the evident that it have advantage over other algorithm.
The most commonly used evaluation method in research paper about RL algorithms is average return (note not reward, return is accumulated reward, is like the score in game) over timesteps, and there many way you can average the return, e.g average wrt different hyperparameters like in Soft Actor-Critic paper's comparative evaluation average wrt different random seeds (initialize the model):
Figure 1 shows the total average return of evaluation rolloutsduring
training for DDPG, PPO, and TD3. We train fivedifferent instances of
each algorithm with different randomseeds, with each performing one
evaluation rollout every1000 environment steps. The solid curves
corresponds to themean and the shaded region to the minimum and
maximumreturns over the five trials.
And we usually want compare the performance of many algorithms not only on one task but diverse set of tasks (i.e Benchmark), because algorithms may have some form of inductive bias for them to better at some form of tasks but worse on other tasks, e.g in Phasic Policy Gradient paper's experiments comparison to PPO:
We report results on the environments in Procgen Benchmark
(Cobbe et al.,2019). This benchmark was designed to be highly
diverse, and we expect improvements on this benchmark to transfer well
to many other RL environment
I am working on a project with Wasserstein GANs and more specifically with an implementation of the improved version of Wasserstein GANs. I have two theoretical questions about wGANs regarding their stability and training process. Firstly, the result of the loss function notoriously is correlated with the quality of the result of the generated samples (that is stated here). Is there some extra bibliography that supports that argument?
Secondly, during my experimental phase, I noticed that training my architecture using wGANs is much faster than using a simple version of GANs. Is that a common behavior? Is there also some literature analysis about that?
Furthermore, one question about the continuous functions that are guaranteed by using Wasserstein loss. I am having some issues understanding this concept in practice, what it means that the normal GANs loss is not continuous function?
You can check Inception Score and Frechet Inception Distance for now. And also here. The problem is that GANs not having a unified objective functions(there are two networks) there's no agreed way of evaluating and comparing GAN models. INstead people devise metrics that's relating the image distributinos and generator distributions.
wGAN could be faster due to having morestable training procedures as opposed to vanilla GAN(Wasserstein metric, weight clipping and gradient penalty(if you are using it) ) . I dont know if there's a literature analysis for speed and It may not always the case for WGAN faster than a simple GAN. WGAN cannot find the best Nash equlibirum like GAN.
Think two distributions: p and q. If these distributions overlap, i.e. , their domains overlap, then KL or JS divergence are differentiable. The problem arises when p and q don't overlap. As in WGAN paper example, say two pdfs on 2D space, V = (0, Z) , Q = (K , Z) where K is different from 0 and Z is sampled from uniform distribution. If you try to take derivative of KL/JS divergences of these two pdfs well you cannot. This is because these two divergence would be a binary indicator function (equal or not) and we cannot take derivative of these functions. However, if we use Wasserstein loss or Earth-Mover distance, we can take it since we are approximating it as a distance between two points on space. Short story: Normal GAN loss function is continuous iff the distributions have an overlap, otherwise it is discrete.
Hope this helps
The most common way to stabilize the training of a WGAN is to replace the Gradient Clipping technique that was used in the early W-GAN with Gradient Penalty (WGAN-GP). This technique seems outperform the original WGAN. The paper that describes what GP is can be found here:
https://arxiv.org/pdf/1704.00028.pdf
Also, If you need any help of how to implement this, You can check a nice repository that I have found here:
https://github.com/kochlisGit/Keras-GAN
There are also other tricks that You can use to improve the overall quality of your generated images, described in the repository. For example:
Add Random Gaussian Noise at the inputs of the discriminator that decays over the time.
Random/Adaptive Data Augmentations
Separate fake/real batches
etc.
I'm using TensorFlow 1.4 and the Estimator framework.
I'd like to use tf.contrib.estimator.multi_head to create one head (in a model that has one other head) that summarizes the result of N softmax_cross_entropy_with_logits. The N probability distributions are each defined over the same set of classes, but are independent distributions. The summary loss score I'd like to compute is simply the sum of squares of the softmax cross entropy losses.
I can almost use a tf.contrib.estimator.regression_head to compute the summary if I fake a labels vector of N zeros, as mean squared error with a zero vector is equivalent to summing the squares of the softmax losses. But this seems kludgy and I'd like a more direct approach.
It seems that I will need to create my own subclass of _Head implemented in tensorflow.python.estimator.canned.head, and will have to implement the method create_loss which is documented as to be used by framework developers.
Before I start down this path, I'd like to hear if there are alternate approaches I should consider.
I originally started development with Keras and at one time had a multihead model using the functional API. I wonder if perhaps I should return to using Keras, and then create my model_fn using the tf.keras.estimator.model_to_estimator. All things being equal I would prefer to code in pure Tensorflow idioms, but perhaps Keras is the easiest path forward.
I want to evolve a neural network using a genetic algorithm in order to approximate mathematical functions (linear, cubic, sine, tanh, etc). The requirement is that the NN should be evolved in terms of topology, weights and activation function of the neurons. The evaluation function is one that will calculate the error between the output of the NN and the function that should be approximated (mean squared error).
My main concern is that I want to be able to have absolute control over the representation/encoding/genome as I want to evolve layers of neurons, their weights and their activation functions at the same time, so I have to come up with a chromosome that incorporates all of these things.
What tools would you recommend? I'm checking DEAP for evolving the neural network and I was thinking of PyLearn 2 for implementing the actual best configuration the GA will result in.
The state-of-the-art for neural nets is Torch7 that is written in Lua. This is what Facebook AI and Google DeepMind use (as well as my lab). But I guess lua is not the best language to implement the sampling algorithms.
The alternative state-of-the-art framework is written in python it is called Theano, but I have a strong preference towards Torch. However, both of them might be too much for your task.
An easy alternative could also be to use Autograd, an automatic numpy differentiation library https://github.com/HIPS/autograd to easily train your neural net and have absolute control over it. There are several examples in their documentation.
Finally, genetic algorithms is another name for Monte Carlo, and there are many resources around implementing such things python (ex. https://codereview.stackexchange.com/questions/41004/organising-code-for-a-genetic-algorithm-in-python or github).
DEAP sounds a good and popular choice, and PyEvolve is outdated.
Edit: I forgot to mention Keras (http://keras.io/) which actually might be another good alternative and is Theano based.
You might also look at Blocks. It's a library built on theano that manages the computation graph via annotations of variables. It allows richer (ie, more complex, more recurrancy, more of those weird neural-stack-like structures you daydreamed) models than libraries like keras, pylearn2, and lasgne. Although those libraries might be better if you only want to 'evolve' the variations of the classical MLP architecture.
I'm trying to use scikit-learn to do some machine learning on natural language data. I've got my corpus transformed into bag-of-words vectors (which take the form of a sparse CSR matrix) and I'm wondering if there's a supervised dimensionality reduction algorithm in sklearn capable of taking high-dimensional, supervised data and projecting it into a lower dimensional space which preserves the variance between these classes.
The high-level problem description is that I have a collection of documents, each of which can have multiple labels on it, and I want to predict which of those labels will get slapped on a new document based on the content of the document.
At it's core, this is a supervised, multi-label, multi-class problem using a sparse representation of BoW vectors. Is there a dimensionality reduction technique in sklearn that can handle that sort of data? Are there other sorts of techniques people have used in working with supervised, BoW data in scikit-learn?
Thanks!
I am a bit confused by your question. In my experience, dimensionality reduction is never really supervised... but it seems that what you want is some sort of informed feature selection, which is impossible to do before the classification is done. In other words, you cannot know which features are more informative before your classifier is trained and validated.
However, reducing the size and complexity of your data is always good, and you have various ways to do it with text data. The applicability and performance depends on the type of vectors you have (frequency counts, tfidf) and you will always have to determine the number of dimensions (components) you want in your output. The implementations in scikit-learn are mostly in the decomposition module.
The most popular method in Natural Language Processing is Singular Value Decomposition (SVD), which is at the core of Latent Semantic Analysis (LSA, also LSI). Staying with scikit-learn, you can simply apply TruncatedSVD() on your data. A similar method is Non-negative matrix factorization, implemented in scikit-learn as NMF().
An increasingly popular approach uses transformation by random projections, Random Indexing. You can do this in scikit-learn with the functions in random_projection.
As someone pointed out in another answer, Latent Dirichlet Allocation is also an alternative, although it is much slower and computationally more demanding than the methods above. Besides, it is at the time of writing unavailable in scikit-learn.
If all you want is to simplify your data in order to feed it to a classifier, I would suggest SVD with n_components between 100 and 500, or random projection with n_components between 500 and 2000 (common values from the literature).
If you are interested in using the reduced dimensions as some sort of classification/clustering already (people call this topic extraction, although you are really not extracting topics, rather latent dimensions), then LDA might be better option. Beware, it is slow and it only takes pure frequency counts (no tfidf). And the number of components is a parameter that you have to determine in advance (no estimation possible).
Returning to your problem, I would make a sckit-learn pipeline with a vectorizer, dimensionality reduction options and classifier and would carry out a massive parameter search. In this way, you will see what gives you best results with the label set you have.
You can use latent dirichlet allocation (here's the wiki) to discover the topics in your documents. For the assignment of a label to a document, you can use the conditional probability distribution for a document label (given the distribution over the topics in your document). If you have labels for your documents already, then you just need to learn the CPD, which is trivial. Unfortunately, scikit-learn does not have an LDA implementation, but gensim does.
PS: Here's another paper that may help. If you're not very well versed in statistical inference/learning or machine learning, I suggest that your start here (note: it's still assumes a high level of mathematical maturity).
Several existing scikit modules do something similar to what you asked for.
Linear Discriminant Analysis is probably closest to what you asked for. It find a projection of the data that maximizes the distance between the class centroids relative to the projected variances.
Cross decomposition includes methods like Partial Least Squares which fit linear regression models for multidimentional targets via a projection through a lower dimentonial intermediate space. It is a lot like a single hidden layer neural net without the sigmoids.
These are linear regression methods, but you could apply a 0-1 encoding to you target signal
and use these models anyway.
You could use an L1 regularized classifier like LogisticRegression or SGDClassifier to do feature selection. RandomizedLogisticRegression combines this with bootstrapping get a more stable feature set.
Try ISOMAP. There's a super simple built-in function for it in scikits.learn. Even if it doesn't have some of the preservation properties you're looking for, it's worth a try.
Use a multi-layer neural net for classification. If you want to see what the representation of the input is in the reduced dimension, look at the activations of the hidden layer. The role of the hidden layer is by definition optimised to distinguish between the classes, since that's what's directly optimised when the weights are set.
You should remember to use a softmax activation on the output layer, and something non-linear on the hidden layer (tanh or sigmoid).