Defining priors and marginalizing over priors in pymc - python

I am going through the tutorial about Monte Carlo Markov Chain process with pymc library. I am also a newbie using pymc and try to establish my own MCMC process. I have faced couple of question that I couldn't find proper answer in pymc tutorial:
First: How could we define priors with pymc and then marginalise over the priors in the chain process?
My second question is about Dirichlet distribution , how is this distribution related to the prior information in MCMC and how should it be defined?

I recommend following the PyMC user's guide. It explicitly shows you how to specify your model (including priors). With MCMC, you end up getting marginals of all posterior values, so you don't need to know how to marginalize over priors.
The Dirichlet is often used as a prior to multinomial probabilities in Bayesian models. The values of the Dirichlet parameters can be used to encode prior information, typically in terms of a notional number of prior events corresponding to each element of the multinomial. For example, a Dirichlet with a vector of ones as the parameters is just a generalization of a Beta(1,1) prior to multinomial quantities.

Related

How to provide custom gradients to HMC sampler in tensorflow-probability?

I am trying to use the in-built HMC sampler of tensorflow-probability to generate samples from the posterior. According to documentation, it seems like one has to provide (possibly unnormalized) log density of posterior to target_log_prob_fn callable and tensorflow-probability automatically computes its gradient (with respect to parameters to be inferred) to perform Hamiltonian MCMC updates.
However for my application, the likelihood and the gradient of resulting posterior are computed outside of tensorflow (it involves solution of a partial differential equation and I can compute it efficiently using some other python library). So I was wondering is there a way I can somehow directly pass target_log_prob_fn the (unnormalized) log density of posterior and its gradient to perform Hamiltonian MCMC update? In other words, is there a way I can ask the HMC sampler to use the gradients provided by me to perform MCMC update?
I found a related question over here, but it does not exactly answer my question.

Distributed Lag Model in Python

I have quickly looked for Distributed Lag Model in StatsModels but can't find one. The one that is similar is VAR model. Can I transform VAR model to Distributed Lag Model and how? It will be great if there are already other packages which have Distributed Lag Model. Please let me know if so.
Thanks!
If you are using a finite distributed lag model, just use OLS or FGLS, with the lagged predictors forming the covariate matrix, and some parameterized model of autocorrelation (if using FGLS).
If your target variable is vector-valued, then the same advice applies and it just becomes a multiple regression problem, with a separate regression for each component of the output, and possibly additional covariance structure if there is correlation between error terms across components of the target.
It does not appear there is a standard statistics package in Python that implements this directly, likely because it would boil down to FGLS in almost any practical situation.

Scikit-learn: why does KernelDensity.score() provide a different result than GaussianMixture.score()?

Comparing a parametric statistical modelling method (gaussian mixture modelling - GMM) with a nonparametric statistical modelling method (kernel density estimation - KDE), I wanted to use the Akaike Information Criteria (AIC) to take into account the parametric components that are used in gaussian mixture modelling.
Scikit-learn GaussianMixture class contains a .aic(X) method that returns the AIC directly. The KernelDensity class does not contain this method. In programming the algorithm I noticed that the GaussianMixture.score(X) method works different for both classes.
GaussianMixture.score(X) returns:
log_likelihood : float
Log likelihood of the Gaussian mixture given X.
KernelDensity.score(X) returns:
logprob : float
Total log-likelihood of the data in X.
While not directly a problem for my code I was wondering if this was done on purpose, and if so, what that purpose would be. It did lead to initial misleading results before I found the difference, which is why I am putting this up on stackoverflow, in case other people found the same issue.

Weighted Linear Regression- R to Python - Statsmodels

I'm attempting to translate R code into Python and running into trouble trying to replicate the R lm{stats} function which contains 'weights', allowing for weights to be used in the fitting process.
My ultimate goal is to simply run a weighted linear regression in Python using the statsmodels library.
Searching through the Statsmodels issues I've located caseweights in linear models #743 and SUMM/ENH rare events, unbalanced sample, matching, weights #2701 which make me think this may not be possible with Statsmodels.
Is it possible to add weights to GLM models in Statsmodels or alternatively, is there a better way to run a weighted linear regression in python?
WLS has weights for the linear model, where weights are interpreted as inverse variance for the result statistics.
http://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.WLS.html
The unreleased version of statsmodels has frequency weights for GLM, but no variance weights.
see freq_weights in http://www.statsmodels.org/dev/generated/statsmodels.genmod.generalized_linear_model.GLM.html
(There are many open issues to expand the types of weights and adding weights to other models, but those are not available yet.)

Lasso Generalized linear model in Python

I would like to fit a generalized linear model with negative binomial link function and L1 regularization (lasso) in python.
Matlab provides the nice function :
lassoglm(X,y, distr)
where distr can be poisson, binomial etc.
I had a look at both statmodels and scikit-learn but I did not find any ready to use function or example that could direct me towards a solution.
In matlab it seems they minimize this:
min (1/N * Deviance(β0,β) + λ * sum(abs(β)) )
where deviance depends on the link function.
Is there a way to implement this easily with scikit or statsmodels or I should go for cvxopt?
statsmodels has had for some time a fit_regularized for the discrete models including NegativeBinomial.
http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.NegativeBinomial.fit_regularized.html
which doesn't have the docstring (I just saw). The docstring for Poisson has the same information http://statsmodels.sourceforge.net/devel/generated/statsmodels.discrete.discrete_model.Poisson.fit_regularized.html
and there should be some examples available in the documentation or unit tests.
It uses an interior algorithm with either scipy slsqp or optionally, if installed, cvxopt. Compared to steepest descend or coordinate descend methods, this is only appropriate for cases where the number of features/explanatory variables is not too large.
Coordinate descend with elastic net for GLM is in a work in progress pull request and will most likely be available in statsmodels 0.8.

Categories