Fisher information for a posterior distribution MCMC

Fisher information for a posterior distribution MCMC - python

The method I am using requires me to calculate the Fisher Information for a the posterior distribution (with respect to all hyperparameters). What I have at the moment is a Monte Carlo sample from the posterior distribution. I feel I can use and approximate these by sample mean and covariance of second and first derivatives respectively but I am looking for a more efficient way.
However, it was suggested to me to use optim(..., Hessian=TRUE) however I do not see how an optimisation routine could help.

Related

Estimating parameters of binomial distribution to use as machine learning features

I'm working with genetic data in which alleles were observed n times in t number of chromosomes sequenced. In other words, n successes in t trials.
I want to include an estimate of each allele's frequency as a feature in a machine learning algorithm. I can of course get a point estimate with n/t, but I want to represent the confidence of that point estimate -- i.e. something about the likelihood of that estimate.
Now, I believe the negative binomial (or just binomial) distribution would be the right one to use, but
How can I estimate the parameters of the distribution in Python?
What representation of the distribution would be ideal as a feature for classical (non-NN) machine learning? A conservative estimate might be the 95% CI upper bound, but how would I calculate that, and is there a better way to featurize the distribution than just taking that one value?
Thanks!

I suppose that all of the required information that you need can be calculated by mean of the standard statistical methods without applying machine learning.
MLE estimate of the parameter p of your Binomial distribution
Bin(t,p) is just n/t as you properly suggested. If you want to get a confidence interval instead of a point estimate, there is one way to do it by means of the
Wald method:
where z is 1 - 0.5α quantile of a standard normal distribution. You can find more possibilities via the following link depending on your modelling assumptions: Binomial confidence intervals.
95% CI for p̂ can be calculated as indicated above with z = 1.96.
As for the feature engineering for the machine learning algorithm: since your parametric distribution basically depends only on one estimated parameter p (except for t which is given), you can use it directly as a feature for the unique distribution representation. It is also possible to add CI or variance as additional features of course. Everything depends on what exactly you are going to learn and what is your final objective/criterion is.

Binoculars implements many methods for calculating binomial confidence intervals. (PS: i am the author of Binoculars).
pip install bincoulars
If N=(total chromosomes sequenced) and p=(number of times allele is observed / N), you can estimate the confidence interval straightforwardly:
from binoculars import binomial_confidence
N, p = 100, 0.2
binomial_confidence(p, N)
# (0.1307892803998113, 0.28628125447599173)

Limiting density of discrete points (LDDP) in python

Shannon's entropy from information theory measures the uncertainty or disorder in a discrete random variable's empirical distribution, while differential entropy measures it for a continuous r.v. The classical definition of differential entropy was found to be wrong, however, and was corrected with the Limiting density of discrete points (LDDP). Does scipy or other compute the LDDP? How can I estimate LDDP in python?

Since LDDP is equivalent to the negative KL-divergence from your density function m(x) to your probability distribution p(x), you might be able to use one of the many implementations of KL-divergence, for example from scipy.stats.entropy.
An appropriate procedure (assuming you have finite support) is to approximate the continuous distribution with a discrete one by sampling over its support, and calculating the KL divergence.
If this is not possible, then your only option that I can think of is probably to use numerical (or possibly analytic?) integration methods, of which you should have plenty. An easy first step would be to try monte-carlo methods.

Find underlaying normal distribution of random vectors

I am trying to solve a statistics-related real world problem with Python and am looking for inputs on my ideas: I have N random vectors from a m-dimensional normal distribution. I have no information about the means and the covariance matrix of the underlying distribution, in fact also that it is a normal distribution is only an assumption, a very plausible one though. I want to compute an approximation of the mean vector and covariance matrix of the distribution. The number of random vectors is in the order of magnitude of 100 to 300, the dimensionality of the normal distribution is somewhere between 2 and 5. The time for the calculation should ideally not exceed 1 minute on a standard home computer.
I am currently thinking about three approaches and am happy about all suggestions for other approaches or preferences between those three:
Fitting: Make a multi dimensional histogram of all random vectors and fit a multi dimensional normal distribution to the histogram. Problem about that approach: The covariance matrix has many entries, this could possibly be a problem for the fitting process?
Invert cumulative distribution function: Make a multi dimensional histogram as approximation of the density function of the random vectors. Then integrate this to get a multi dimensional cumulative distribution function. For one dimension, this is invertible and one could use the cum-dist function to distribute random numbers like in the original distribution. Problem: For the multi-dimensional case the cum-dist function is not invertible(?) and I don't know if this approach still works then?
Bayesian: Use Bayesian Statistics with some normal distribution as prior and update for every observation. The result should always be again a normal distribution. Problem: I think this is computationally expensive? Also, I don't want the later updates have more impact on the resulting distribution than the earlier ones.
Also, maybe there is some library which has this task already implemented? I did not find exactly this in Numpy or Scipy, maybe someone has an idea where else to look?

If the simple estimates described in the section Parameter estimation of the wikipedia article on the multivariate normal distribution are sufficient for your needs, you can use numpy.mean to compute the mean and numpy.cov to compute the sample covariance matrix.

Studentized range statistic (q*) in Python Scipy

I am wondering if it is possible to find the Studentized range statistic (q*) in Python Scipy lib as an input into Tukey's HSD calculation, similar to interpolating a table such as this (http://cse.niaes.affrc.go.jp/miwa/probcalc/s-range/srng_tbl.html#fivepercent) or pulling from a continuous distribution.
I have found some guidance here (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.tukeylambda.html#scipy.stats.tukeylambda), but lost on how to input the df (degrees of freedom) or k (# of samples groups).
I am looking for something like the critical F or critical t statistic, which can be obtained via
scipy.stats.f.isf(alpha, df-between, df-within)
or
scipy.stats.t.isf(alpha, df).

from statsmodels.stats.libqsturng import psturng, qsturng
provides cdf (or tail probabilities) and quantile function (inverse of cdf or of survival function, I don't remember)
It was written by Roger Lew as a package for interpolating the distribution of the studentized range statistic and was incorporated in statsmodels for use in tukeyhsd.
Until now it has only be used internally in statsmodels, and you would have to check the limitations and explanation in libqsturng.
As reference, statsmodels has a tukeyhsd function and a MultipleComparison class.
http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html

Sampling methods

Can you help me out with these questions? I'm using Python
Sampling Methods
Sampling (or Monte Carlo) methods form a general and useful set of techniques that use random numbers to extract information about (multivariate) distributions and functions. In the context of statistical machine learning, we are most often concerned with drawing samples from distributions to obtain estimates of summary statistics such as the mean value of the distribution in question.
When we have access to a uniform (pseudo) random number generator on the unit interval (rand in Matlab or runif in R) then we can use the transformation sampling method described in Bishop Sec. 11.1.1 to draw samples from more complex distributions. Implement the transformation method for the exponential distribution
$$p(y) = \lambda \exp(−\lambda y) , y \geq 0$$
using the expressions given at the bottom of page 526 in Bishop: Slice sampling involves augmenting z with an additional variable u and then drawing samples from the joint (z,u) space.
The crucial point of sampling methods is how many samples are needed to obtain a reliable estimate of the quantity of interest. Let us say we are interested in estimating the mean, which is
$$\mu_y = 1/\lambda$$
in the above distribution, we then use the sample mean
$$b_y = \frac1L \sum^L_{\ell=1} y(\ell)$$
of the L samples as our estimator. Since we can generate as many samples of size L as we want, we can investigate how this estimate on average converges to the true mean. To do this properly we need to take the absolute diﬀerence
$$|\mu_y − b_y|$$
between the true mean $µ_y$ and estimate $b_y$
averaged over many, say 1000, repetitions for several values of $L$, say 10, 100, 1000.
Plot the expected absolute deviation as a function of $L$.
Can you plot some transformed value of expected absolute deviation to get a more or less straight line and what does this mean?
I'm new to this kind of statistical machine learning and really don't know how to implement it in Python. Can you help me out?

There are a few shortcuts you can take. Python has some built-in methods to do sampling, mainly in the Scipy library. I can recommend a manuscript that implements this idea in Python (disclaimer: I am the author), located here.
It is part of a larger book, but this isolated chapter deals with the more general Law of Large Numbers + convergence, which is what you are describing. The paper deals with Poisson random variables, but you should be able to adapt the code to your own situation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.