Bayesian Correlation with PyMC3 - python

I'm trying to convert this example of Bayesian correlation for PyMC2 to PyMC3, but get completely different results. Most importantly, the mean of the multivariate Normal distribution quickly goes to zero, whereas it should be around 400 (as it is for PyMC2). Consequently, the estimated correlation quickly goes towards 1, which is wrong as well.
The full code is available in this notebook for PyMC2 and in this notebook for PyMC3.
The relevant code for PyMC2 is
def analyze(data):
# priors might be adapted here to be less flat
mu = pymc.Normal('mu', 0, 0.000001, size=2)
sigma = pymc.Uniform('sigma', 0, 1000, size=2)
rho = pymc.Uniform('r', -1, 1)
#pymc.deterministic
def precision(sigma=sigma,rho=rho):
ss1 = float(sigma[0] * sigma[0])
ss2 = float(sigma[1] * sigma[1])
rss = float(rho * sigma[0] * sigma[1])
return np.linalg.inv(np.mat([[ss1, rss], [rss, ss2]]))
mult_n = pymc.MvNormal('mult_n', mu=mu, tau=precision, value=data.T, observed=True)
model = pymc.MCMC(locals())
model.sample(50000,25000)
My port of the above code to PyMC3 is as follows:
def precision(sigma, rho):
C = T.alloc(rho, 2, 2)
C = T.fill_diagonal(C, 1.)
S = T.diag(sigma)
return T.nlinalg.matrix_inverse(T.nlinalg.matrix_dot(S, C, S))
def analyze(data):
with pm.Model() as model:
# priors might be adapted here to be less flat
mu = pm.Normal('mu', mu=0., sd=0.000001, shape=2, testval=np.mean(data, axis=1))
sigma = pm.Uniform('sigma', lower=1e-6, upper=1000., shape=2, testval=np.std(data, axis=1))
rho = pm.Uniform('r', lower=-1., upper=1., testval=0)
prec = pm.Deterministic('prec', precision(sigma, rho))
mult_n = pm.MvNormal('mult_n', mu=mu, tau=prec, observed=data.T)
return model
model = analyze(data)
with model:
trace = pm.sample(50000, tune=25000, step=pm.Metropolis())
The PyMC3 version runs, but clearly does not return the expected result. Any help would be highly appreciated.

The call signature of pymc.Normal is
In [125]: pymc.Normal?
Init signature: pymc.Normal(self, *args, **kwds)
Docstring:
N = Normal(name, mu, tau, value=None, observed=False, size=1, trace=True, rseed=True, doc=None, verbose=-1, debug=False)
Notice that the third positional argument of pymc.Normal is tau, not the standard deviation, sd.
Therefore, since the pymc code uses
mu = Normal('mu', 0, 0.000001, size=2)
The corresponding pymc3 code should use
mu = pm.Normal('mu', mu=0., tau=0.000001, shape=2, ...)
or
mu = pm.Normal('mu', mu=0., sd=math.sqrt(1/0.000001), shape=2, ...)
since tau = 1/sigma**2.
With this one change, your pymc3 code produces (something like)

Related

Coding Custom Likelihood Pymc3

I am struggling to implement a linear regression in pymc3 with a custom likelihood.
I previously posted this question on CrossValidated & it was recommended to post here as the question is more code orientated (closed post here)
Suppose you have two independent variables x1, x2 and a target variable y, as well as an indicator variable called delta.
When delta is 0, the likelihood function is standard least squares
When delta is 1, the likelihood function is the least squares contribution only when the target variable is greater than the prediction
Example snippet of observed data:
x_1 x_2 𝛿 observed_target
10 1 0 100
20 2 0 50
5 -1 1 200
10 -2 1 100
Does anyone know how this can be implemented in pymc3? As a starting point...
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
mu = (intercept + beta_0*x_0 + beta_1*x_1)
In case this is helpful, from reading the docs it looks like something along these lines might work, but I have not been able to test it and it was too long to pop into a comment.
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
delta = pm.Data('delta', df['delta']) # Or whatever this column is
target = pm.Data('target', df['observed_target'])
ypred = (intercept + beta_0*x_0 + beta_1*x_1) # Intermediate result
target_ge_ypred = pm.math.ge(target, ypred) # Compare target to intermediate result
zero = pm.math.constant(0) # Use this if delta==1 and target<ypred
# EDIT: Check delta
alternate = pm.math.switch(target_ge_ypred, ypred, zero) # Alternative result
mu = pm.math.switch(pm.math.eq(delta, zero), ypred, alternate) # Actual result wanted?

How to create Non-Central Student’s T distribution and what priors to use with the distribution?

I have been working with the following link,
Fitting empirical distribution to theoretical ones with Scipy (Python)?
I have been using my data to the code from the link and found out that the common distribution for my data is the Non-Central Student’s T distribution. I couldn’t find the distribution in the pymc3 package, so, I decided to have a look with scipy to understand how the distribution is formed. I created a custom distribution and I have few questions:
I would like to know if my approach to creating the distribution is right?
How can I implement the custom distribution into models?
Regarding the prior distribution, do I use same steps in normal distribution priors (mu and sigma) combined with halfnormed for degree of freedom and noncentral value?
My custom distribution:
import numpy as np
import theano.tensor as tt
from scipy import stats
from scipy.special import hyp1f1, nctdtr
import warnings
from pymc3.theanof import floatX
from pymc3.distributions.dist_math import bound, gammaln
from pymc3.distributions.continuous import assert_negative_support, get_tau_sigma
from pymc3.distributions.distribution import Continuous, draw_values, generate_samples
class NonCentralStudentT(Continuous):
"""
Parameters
----------
nu: float
Degrees of freedom, also known as normality parameter (nu > 0).
mu: float
Location parameter.
sigma: float
Scale parameter (sigma > 0). Converges to the standard deviation as nu increases. (only required if lam is not specified)
lam: float
Scale parameter (lam > 0). Converges to the precision as nu increases. (only required if sigma is not specified)
"""
def __init__(self, nu, nc, mu=0, lam=None, sigma=None, sd=None, *args, **kwargs):
super().__init__(*args, **kwargs)
super(NonCentralStudentT, self).__init__(*args, **kwargs)
if sd is not None:
sigma = sd
warnings.warn("sd is deprecated, use sigma instead", DeprecationWarning)
self.nu = nu = tt.as_tensor_variable(floatX(nu))
self.nc = nc = tt.as_tensor_variable(floatX(nc))
lam, sigma = get_tau_sigma(tau=lam, sigma=sigma)
self.lam = lam = tt.as_tensor_variable(lam)
self.sigma = self.sd = sigma = tt.as_tensor_variable(sigma)
self.mean = self.median = self.mode = self.mu = mu = tt.as_tensor_variable(mu)
self.variance = tt.switch((nu > 2) * 1, (1 / self.lam) * (nu / (nu - 2)), np.inf)
assert_negative_support(lam, 'lam (sigma)', 'NonCentralStudentT')
assert_negative_support(nu, 'nu', 'NonCentralStudentT')
assert_negative_support(nc, 'nc', 'NonCentralStudentT')
def random(self, point=None, size=None):
"""
Draw random values from Non-Central Student's T distribution.
Parameters
----------
point: dict, optional
Dict of variable values on which random values are to be
conditioned (uses default point if not specified).
size: int, optional
Desired size of random sample (returns one sample if not
specified).
Returns
-------
array
"""
nu, nc, mu, lam = draw_values([self.nu, self.nc, self.mu, self.lam], point=point, size=size)
return generate_samples(stats.nct.rvs, nu, nc, loc=mu, scale=lam ** -0.5, dist_shape=self.shape, size=size)
def logp(self, value):
"""
Calculate log-probability of Non-Central Student's T distribution at specified value.
Parameters
----------
value: numeric
Value(s) for which log-probability is calculated. If the log probabilities for multiple
values are desired the values must be provided in a numpy array or theano tensor
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
mu = self.mu
lam = self.lam
n = nu * 1.0
nc = nc * 1.0
x2 = value * value
ncx2 = nc * nc * x2
fac1 = n + x2
trm1 = n / 2. * tt.log(n) + gammaln(n + 1)
trm1 -= n * tt.log(2) + nc * nc / 2. + (n / 2.) * tt.log(fac1) + gammaln(n / 2.)
Px = tt.exp(trm1)
valF = ncx2 / (2 * fac1)
trm1 = tt.sqrt(2) * nc * value * hyp1f1(n / 2 + 1, 1.5, valF)
trm1 /= np.asarray(fac1 * tt.gamma((n + 1) / 2))
trm2 = hyp1f1((n + 1) / 2, 0.5, valF)
trm2 /= np.asarray(np.sqrt(fac1) * tt.gamma(n / 2 + 1))
Px *= trm1 + trm2
return bound(Px, lam > 0, nu > 0, nc > 0)
def logcdf(self, value):
"""
Compute the log of the cumulative distribution function for Non-Central Student's T distribution
at the specified value.
Parameters
----------
value: numeric
Value(s) for which log CDF is calculated. If the log CDF for multiple
values are desired the values must be provided in a numpy array or theano tensor.
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
return nctdtr(nu, nc, value)
My Custom model:
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('sigma', 0, 10)
sigma = pm.Normal('sigma', 0, 10)
nc= pm.HalfNormal('nc', sigma=10)
nu= pm.HalfNormal('nu', sigma=1)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
=> (input custom distribution) observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, observed=data)
# draw 5000 posterior samples
trace = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred = pm.sample_posterior_predictive(trace, samples=3000)
print(post_pred['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace))
print(pm.stats.summary(data=post_pred))
Edit 1:
I redesigned the custom model to include the custom distribution, however, I keep on getting error based on the equations used to get the likelihood distribution or sometimes tensor locks down and the code just freeze. Find my code below,
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('mu', mu=0, sigma=1)
sd = pm.HalfNormal('sd', sigma=1)
nc = pm.HalfNormal('nc', sigma=10)
nu = pm.HalfNormal('nu', sigma=1)
# Custom distribution:
# observed_data = pm.DensityDist('observed_data', NonCentralStudentT, observed=data_list)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
observed_data = NonCentralStudentT('observed_data', mu=mu, sd=sd, nc=nc, nu=nu, observed=data_list)
# draw 5000 posterior samples
trace_S = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred_S = pm.sample_posterior_predictive(trace_S, samples=3000)
print(post_pred_S['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace_S))
print(pm.stats.summary(data=post_pred_S))
Edit 2:
I am looking online in order to convert the function to theano, the only thing that I found to define the function is from the following GitHub link hyp1f1 function GitHub
Will this be enough to use in order to convert the function into theano?
In addition, I have a question, it is okay to use NumPy arrays with theano?
Also, I thought of another way but I am not sure if this can be implemented, I looked into the nct function in scipy and they wrote the following,
If Y is a standard normal random variable and V is an independent
chi-square random variable ( chi2 ) with k degrees of freedom, then
X=(Y+c) / sqrt(V/k)
has a non-central Student’s t distribution on the real line. The
degrees of freedom parameter k (denoted df in the implementation)
satisfies k>0 and the noncentrality parameter c (denoted nc in the
implementation) is a real number.
The probability density above is defined in the “standardized” form.
To shift and/or scale the distribution use the loc and scale
parameters. Specifically, nct.pdf(x, df, nc, loc, scale) is
identically equivalent to nct.pdf(y, df, nc) / scale with y = (x -
loc) / scale .
So, I thought of only using the priors as normal and chi2 random variables code part in their distributions and use the degree of freedom variable as mentioned before in the code into the equation mentioned in SciPy, will it be enough to get the distribution?
Edit 3:
I managed to run the code in the link about fitting empirical distribution and found out the second best was the student t distribution, so, I will be using this. Thank you for your help. I just have a side question, I ran my model with student t distribution but I got these warnings:
There were 52 divergences after tuning. Increase target_accept or
reparameterize. The acceptance probability does not match the target.
It is 0.7037574708196309, but should be close to 0.8. Try to increase
the number of tuning steps. The number of effective samples is smaller
than 10% for some parameters.
I am just confused about these warnings, Do you have any idea what it means? I know that this won't affect my code, but, I can reduce the divergences? and regarding the effective samples, Do I need to increase the number of samples in the trace code?

Pymc3 python function to deterministic

In this notebook from Bayesian Methods for Hackers, they create a Deterministic variable from a python function as such:
# from code line 9 in the notebook
#pm.deterministic
def lambda_(tau=tau, lambda_1=lambda_1, lambda_2=lambda_2):
out = np.zeros(n_count_data)
out[:tau] = lambda_1 # lambda before tau is lambda1
out[tau:] = lambda_2 # lambda after (and including) tau is lambda2
return out
I'm trying to recreate this experiment almost exactly, but apparently #pm.deterministic is not a thing in pymc3. Any idea how I would do this in pymc3?
This model is translated in the PyMC3 port of "Probabilistic Programming and Bayesian Methods for Hackers" as
with pm.Model() as model:
alpha = 1.0/count_data.mean() # Recall count_data is the
# variable that holds our txt counts
lambda_1 = pm.Exponential("lambda_1", alpha)
lambda_2 = pm.Exponential("lambda_2", alpha)
tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)
# These two lines do what the deterministic function did above
idx = np.arange(n_count_data) # Index
lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
observation = pm.Poisson("obs", lambda_, observed=count_data)
trace = pm.sample()
Note that we are just using pm.math.switch (with is an alias for theano.tensor.switch) to compute lambda_. There is also pm.Deterministic, but it is not needed here.

CAR model from pymc2 to PyMC3

I'm still a noob in PyMC3, so the question might me naive, but I don't know how to translate this pymc2 code in pymc3. In particular it's not clear to me how to translate the R function.
beta = pymc.Normal('beta', mu=0, tau=1.0e-4)
s = pymc.Uniform('s', lower=0, upper=1.0e+4)
tau = pymc.Lambda('tau', lambda s=s: s**(-2))
### Intrinsic CAR
#pymc.stochastic
def R(tau=tau, value=np.zeros(N)):
# Calculate mu based on average of neighbors
mu = np.array([sum(W[i]*value[A[i]])/Wplus[i] for i in xrange(N)])
# Scale precision to the number of neighbors
taux = tau*Wplus
return pymc.normal_like(value, mu, taux)
#pymc.deterministic
def M(beta=beta, R=R):
return [np.exp(beta + R[i]) for i in xrange(N)]
obsvd = pymc.Poisson("obsvd", mu=M, value=Y, observed=True)
model = pymc.Model([s, beta, obsvd])
Code from https://github.com/Youki/statistical-modeling-for-data-analysis-with-python/blob/945c13549a872d869e33bc48082c42efc022a07b/Chapter11/Chapter11.rst, and http://glau.ca/?p=340
Can you help me? Thanks
In PyMC3, you can implement the CAR model using the scan function of Theano. There is a sample code in their documentation. There are two implementations for CAR in the linked document. Here is the first one [Source]:
from theano import scan
floatX = "float32"
from pymc3.distributions import continuous
from pymc3.distributions import distribution
class CAR(distribution.Continuous):
"""
Conditional Autoregressive (CAR) distribution
Parameters
----------
a : list of adjacency information
w : list of weight information
tau : precision at each location
"""
def __init__(self, w, a, tau, *args, **kwargs):
super(CAR, self).__init__(*args, **kwargs)
self.a = a = tt.as_tensor_variable(a)
self.w = w = tt.as_tensor_variable(w)
self.tau = tau*tt.sum(w, axis=1)
self.mode = 0.
def get_mu(self, x):
def weigth_mu(w, a):
a1 = tt.cast(a, 'int32')
return tt.sum(w*x[a1])/tt.sum(w)
mu_w, _ = scan(fn=weigth_mu,
sequences=[self.w, self.a])
return mu_w
def logp(self, x):
mu_w = self.get_mu(x)
tau = self.tau
return tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x))
with pm.Model() as model1:
# Vague prior on intercept
beta0 = pm.Normal('beta0', mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal('beta1', mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma('tau_h', alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma('tau_c', alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal('theta', mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR('mu_phi', w=wmat, a=amat, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic('phi', mu_phi-tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))
# Likelihood
Yi = pm.Poisson('Yi', mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic('sd_h', tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic('sd_c', tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic('alpha', sd_c/(sd_h+sd_c))
trace1 = pm.sample(1000, tune=500, cores=4,
init='advi',
nuts_kwargs={"target_accept":0.9,
"max_treedepth": 15})
The M function is written here as:
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))

Why does this hierarchical Poisson model not match true params from generated data?

I am trying to fit a hierarchical Poisson regression to estimate time_delay per group and globally. I am confused as to whether pymc automatically applies a log link function to mu or do I have to do so explicitly:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_participants)
mu = a[participants_idx]
y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay'].values)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
The below traceplot shows estimates for a. You can see group estimates between 0 and 750.
My confusion begins when I plot the hyper parameter gamma distribution by using the mean for alpha and beta as parameters. The below distribution shows support between 0 and 5 approx. This doesn't fit my expectation whilst looking at the group estimates for a above. What does a represent? Is it log(a) or something else?
Thanks for any pointers.
Adding example using fake data as requested in comments: This example has just a single group, so it should be easier to see if the hyper parameter could plausibly produce the Poisson distribution of the group.
test_data = []
model = []
for i in np.arange(1):
# between 1 and 100 messages per conversation
num_messages = np.random.uniform(1, 100)
avg_delay = np.random.gamma(15, 1)
for j in np.arange(num_messages):
delay = np.random.poisson(avg_delay)
test_data.append([i, j, delay, i])
model.append([i, avg_delay])
model_df = pd.DataFrame(model, columns=['conversation_id', 'synthetic_mean_delay'])
test_df = pd.DataFrame(test_data, columns=['conversation_id', 'message_id', 'time_delay', 'participants_str'])
test_df.head()
# Estimate parameters of model using test data
# convert categorical variables to integer
le = preprocessing.LabelEncoder()
test_participants_map = le.fit(test_df['participants_str'])
test_participants_idx = le.fit_transform(test_df['participants_str'])
n_test_participants = len(test_df['participants_str'].unique())
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_test_participants)
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
I don't see how the below hyper parameter could produce a poisson distribution with parameter between 13 and 17.
ANSWER: pymc uses different parameters than scipy to represent Gamma distributions. scipy uses alpha & scale, whereas pymc uses alpha and beta. The below model works as expected:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
scale = pm.Gamma('scale', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=1.0/scale, shape=n_test_participants)
#mu = T.exp(a[test_participants_idx])
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)

Categories