How does pymc represent the prior distribution and likelihood function?

How does pymc represent the prior distribution and likelihood function? - python

If pymc implements the Metropolis-Hastings algorithm to come up with samples from the posterior density over the parameters of interest, then in order to decide whether to move to the next state in the markov chain it must be able to evaluate something proportional to the posterior density for all given parameter values.
The posterior density is proportion to the likelihood function based on the observed data times the prior density.
How are each of these represented within pymc? How does it calculate each of these quantities from the model object?
I wonder if anyone can give me a high level description of the approach or point me to where I can find it.

To represent the prior, you need an instance of the Stochastic class, which has two primary attributes:
value : the variable's current value
logp : the log probability of the variable's current value given the values of its parents
You can initialize a prior with the name of the distribution you are using.
To represent the likelihood, you need a so-called Data Stochastic. That is, an instance of class Stochastic whose observed flag is set to True. The value of this variable cannot be changed and it will not be sampled. Again, you can initialize the likelihood with the name of the distribution you are using (but don't forget to set the observed flag to True).
Say we have the following setup:
import pymc as pm
import numpy as np
import theano.tensor as t
x = np.array([1,2,3,4,5,6])
y = np.array([0,1,0,1,1,1])
We can run a simple logistic regression with the following:
with pm.Model() as model:
#Priors
b0 = pm.Normal("b0", mu=0, tau=1e-6)
b1 = pm.Normal("b1", mu=0, tau=1e-6)
#Likelihood
z = b0 + b1 * x
yhat = pm.Bernoulli("yhat", 1 / (1 + t.exp(-z)), observed=y)
# Sample from the posterior
trace = pm.sample(10000, pm.Metropolis())
Most of the above came from Chris Fonnesbeck's iPython notebook here.

Related

Posterior predictive check in PyMC3, discrete distributions

I want to implement a basic model of capture and recapture in PyMC3 (you capture 100 animals and mark them, then you liberate them and recapture 100 of them after they have mixed and annotate how many are marked). This is my code:
import numpy as np
import pymc3 as pm
import arviz as az
# Datos:
K = 100 #marked animals in first round
n = 100 #captured animals in second round
obs = 10 #observed marked in second round
with pm.Model() as my_model:
N = pm.DiscreteUniform("N", lower=K, upper=10000)
likelihood = pm.HyperGeometric('likelihood', N=N, k=K, n=n, observed=obs)
trace = pm.sample(10000)
print(pm.summary(trace))
print(trace['N'])
ppc = pm.sample_posterior_predictive(trace, 100, var_names=["N"])
data_ppc = az.from_pymc3(trace=trace, posterior_predictive=ppc) #create inference data
az.plot_ppc(data_ppc, figsize=(12, 6))
But I obtain the error in plot_ppc that 'var names: "[\'likelihood\'] are not present" in dataset'. Also the warning posterior predictive variable N's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.
What is happening and what can I do to obtain a posterior predictive check plot?

The root of all the problems is in this line ppc = pm.sample_posterior_predictive(trace, 100, var_names=["N"]).
By using var_names=["N"] you are indicating PyMC to "sample" only the variable N which is actually a latent variable that was sampled while sampling the posterior in the pm.sample call. Doing this in the pm.sample_posterior_predictive call is indicating PyMC to not sample the observed variable (likelihood in this case) and to just copy the samples for N to the posterior predictive too. You will see that data_ppc is an InferenceData object with multiple groups, N is already in the posterior group (like it is in the trace object).
By using 100 (aka samples=100 as a positional argument) you are indicating PyMC to draw posterior predictive samples only for the first 100 draws of the first chain. This is a bad idea, so ArviZ prints a warning when converting to InferenceData. You should generate one posterior predictive sample per posterior sample, only generating samples for a subset of the posterior if posterior predictive sampling were very slow.
My recommendation, which also applies as a general rule, is to trust PyMC defaults unless you have a reason not to or want things to give the same result with multiple versions. We update the defaults from time to time to try and keep them coherent with best practices which are updated and improve over the time. You should therefore do: ppc = pm.sample_posterior_predictive(trace). PyMC will default to sampling the likelihood variable only and to generate one sample per posterior draw.

Incorporate known data moments into GPFlow fitting

I have recently been working with gpflow, in-particular Gaussian process regression, to model a process for which I have access to approximated moments for each input. I have a vector of input values X of size (N,1) and a vector of responses Y of size (N,1). However, I also know, for each (x,y) pair, an approximation of the associated variance, skewness, kurtosis and so on for the particular y value.
From this, I know properties that inform me of appropriate likelihoods to use for each data point.
In the simplest case, I just assume all likelihoods are Gaussian, and specify the variance at each point. I've created a minimal example of my code by adapting the tutorial on: https://nbviewer.jupyter.org/github/GPflow/GPflow/blob/develop/doc/source/notebooks/advanced/varying_noise.ipynb#Demo-2:-grouped-noise-variances.
import numpy as np
import gpflow
def generate_data(N=100):
X = np.random.rand(N)[:, None] * 10 - 5 # Inputs, shape N x 1
F = 2.5 * np.sin(6 * X) + np.cos(3 * X) # Mean function values
groups = np.arange( 0, N, 1 ).reshape(-1,1)
NoiseVar = np.array([i/100.0 for i in range(N)])[groups]
Y = F + np.random.randn(N, 1) * np.sqrt(NoiseVar) # Noisy data
return X, Y, groups, NoiseVar
# Get data
X, Y, groups, NoiseVar = generate_data()
Y_data = np.hstack([Y, groups])
# Generate one likelihood per data-point
likelihood = gpflow.likelihoods.SwitchedLikelihood( [gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])])
# model construction (notice that num_latent is 1)
kern = gpflow.kernels.Matern52(input_dim=1, lengthscales=0.5)
model = gpflow.models.VGP(X, Y_data, kern=kern, likelihood=likelihood, num_latent=1)
# Specify the likelihood as non-trainable
model.likelihood.set_trainable(False)
# build the natural gradients optimiser
natgrad_optimizer = gpflow.training.NatGradOptimizer(gamma=1.)
natgrad_tensor = natgrad_optimizer.make_optimize_tensor(model, var_list=[(model.q_mu, model.q_sqrt)])
session = model.enquire_session()
session.run(natgrad_tensor)
# update the cache of the variational parameters in the current session
model.anchor(session)
# Stop Adam from optimising the variational parameters
model.q_mu.trainable = False
model.q_sqrt.trainable = False
# Create Adam tensor
adam_tensor = gpflow.train.AdamOptimizer(learning_rate=0.1).make_optimize_tensor(model)
for i in range(200):
session.run(natgrad_tensor)
session.run(adam_tensor)
# update the cache of the parameters in the current session
model.anchor(session)
print(model)
The above code works for a gaussian likelihood, and known variances. Inspecting my real data, I see that it is skewed very often and as a result, I want to use non-gaussian likelihoods to model it, but am unsure how to specify these other likelihood parameters given what I know.
So my question is: Given this setup, how can I adapt my code so far to include non-Gaussian likelihoods at each step, in-particular specifying and fixing their parameters based on my known variances, skewness, kurtosis and so on associated with each individual y value?

Firstly, you will need to choose which non-Gaussian likelihood you use. GPflow includes various ones in likelihoods.py. You then need to adapt the line
likelihood = gpflow.likelihoods.SwitchedLikelihood(
[gpflow.likelihoods.Gaussian(variance=NoiseVar[i]) for i in range(Y.shape[0])]
)
to give a list of your non-Gaussian likelihoods.
Which likelihood can take advantage of your skewness and kurtosis information is a statistical question. Depending on what you come up with, you may need to implement your own likelihood class, which can be done by inheriting from Likelihood. You should be able to follow some other examples from likelihoods.py.

How is the p value calculated for multiple variables in linear regression?

I am wondering how the p value is calculated for various variables in a multiple linear regression. I am sure upon reading several resources that <5% indicates the variable is significant for the model. But how is the p value calculated for each and every variable in the multiple linear regression?
I tried to see the statsmodels summary using the summary() function. I can just see the values. I didn't find any resource on how p value for various variables in a multiple linear regression is calculated.
import statsmodels.api as sm
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)
X = sm.add_constant(X)
y = np.dot(X, beta) + e
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
This question has no error but requires an intuition on how p value is calculated for various variables in a multiple linear regression.

Inferential statistics work by comparison to known distributions. In the case of regression, that distribution is typically the t-distribution
You'll notice that each variable has an estimated coefficient from which an associated t-statistic is calculated. x1 for example, has a t-value of -0.278. To get the p-value, we take that t-value, place it on the t-distribution, and calculate the probability of getting a value as extreme as the t-value you calculated. You can gain some intuition for this by noticing that the p-value column is called P>|t|
An additional wrinkle here is that the exact shape of the t-distribution depends on the degrees of freedom
So to calculate a p-value, you need 2 pieces of information: the t-statistic and the residual degrees of freedom of your model (97 in your case)
Taking x1 as an example, you can calculate the p-value in Python like this:
import scipy.stats
scipy.stats.t.sf(abs(-0.278), df=97)*2
0.78160405761659357
The same is done for each of the other predictors using their respective t-values

Get the distribution when a point is sampled from a mixture in PyMC3

I have a model with a pm.NormalMixture(), and when I sample from the normal mixture, I also want to know which of the mixed distributions that point is being sampled from.
import numpy as np
import pymc3 as pm
obs = np.concatenate([np.random.normal(5,1,100),
np.random.normal(10,2,200)])
with pm.Model() as model:
mu = pm.Normal('mu', 10, 10, shape=2)
sd = pm.Normal('sd', 10, 10, shape=2)
x = pm.NormalMixture('x', mu=mu, sd=sd, observed=obs)
I sample from that model, then use that trace to sample from the posterior predictive distribution, and what I want to know is for each x in the posterior predictive trace, which of the two normal distributions being sampled from it belongs to. Is that possible in PyMC3 without doing it manually?

This example demonstrates how posterior predictive checks (PPCs) work. The gist of a PPC is that you first draw random samples from the trace. The trace is essentially always multivariate, and in your model a single sample would be defined by the vector (mu[i,0], mu[i,1], sd[i,0], sd[i,1]). Then, for each trace sample, generate random numbers from the distribution specified for the likelihood with its parameter values equal to those from the trace samples. In your case, this would be NormalMixture(mu[i,:], sd[i,:]). In your model, x is the likelihood function, not an individual point of the trace.
Some practical notes:
You haven't specified a weighting variable, so I'm assuming by default it forces the normal distributions to be weighted equally (I haven't tested this).
The odds of a given point coming from one distribution or the other is just the ratio between the probability densities at that point.
Check out this for recommendations on how to choose priors. For example, your SD prior is placing a lot of weight on very large SDs, which would bias your results, especially for smaller datasets.
Good luck!

Posterior Predictive Check on PyMC3 Deterministic Variable

TL; DR
What's the right way to do posterior predictive checks on pm.Deterministic variables that take stochastics (rendering the deterministic also stochastic) as input?
Too Short; Didn't Understand
Say we have a pymc3 model like this:
import pymc3 as pm
with pm.Model() as model:
# Arbitrary, trainable distributions.
dist1 = pm.Normal("dist1", 0, 1)
dist2 = pm.Normal("dist2", dist1, 1)
# Arbitrary, deterministic theano math.
val1 = pm.Deterministic("val1", arb1(dist2))
# Arbitrary custom likelihood.
cdist = pm.DensityDistribution("cdist", logp(val1), observed=get_data())
# Arbitrary, deterministic theano math.
val2 = pm.Deterministic("val2", arb2(val1))
I may be misunderstanding, but my intention is for the posteriors of dist1 and dist2 to be sampled, and for those samples to fed into the deterministic variables. Is the posterior predictive check only possible on observed random variables?
It's straightforward to get posterior predictive samples from dist2 and other random variables using pymc3.sampling.sample_ppc, but the majority of my model's value is derived from the state of val1 and val2, given those samples.
The problem arises in that pm.Deterministic(.) seems to return a th.TensorVariable. So, when this is called:
ppc = pm.sample_ppc(_trace, vars=[val1, val2])["val1", "val2"]
...and pymc3 attempts this block of code in pymc3.sampling:
410 for var in vars:
--> 411 ppc[var.name].append(var.distribution.random(point=param,
412 size=size))
...it complains because a th.TensorVariable obviously doesn't have a .distribution.
So, what is the right way to carry the posterior samples of stochastics through deterministics? Do I need to explicitly create a th.function that takes stochastic posterior samples and calculates the deterministic values? That seems silly given the fact that pymc3 already has the graph in place.

Yes, I was misunderstanding the purpose of .sample_ppc. You don't need it for unobserved variables because those have samples in the trace. Observed variables aren't sampled from, because their data is observed, thus you need sample_ppc to generate samples.
In short, I can gather samples of the pm.Deterministic variables from the trace.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.