I am struggling to implement a linear regression in pymc3 with a custom likelihood.
I previously posted this question on CrossValidated & it was recommended to post here as the question is more code orientated (closed post here)
Suppose you have two independent variables x1, x2 and a target variable y, as well as an indicator variable called delta.
When delta is 0, the likelihood function is standard least squares
When delta is 1, the likelihood function is the least squares contribution only when the target variable is greater than the prediction
Example snippet of observed data:
x_1 x_2 𝛿 observed_target
10 1 0 100
20 2 0 50
5 -1 1 200
10 -2 1 100
Does anyone know how this can be implemented in pymc3? As a starting point...
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
mu = (intercept + beta_0*x_0 + beta_1*x_1)
In case this is helpful, from reading the docs it looks like something along these lines might work, but I have not been able to test it and it was too long to pop into a comment.
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
delta = pm.Data('delta', df['delta']) # Or whatever this column is
target = pm.Data('target', df['observed_target'])
ypred = (intercept + beta_0*x_0 + beta_1*x_1) # Intermediate result
target_ge_ypred = pm.math.ge(target, ypred) # Compare target to intermediate result
zero = pm.math.constant(0) # Use this if delta==1 and target<ypred
# EDIT: Check delta
alternate = pm.math.switch(target_ge_ypred, ypred, zero) # Alternative result
mu = pm.math.switch(pm.math.eq(delta, zero), ypred, alternate) # Actual result wanted?
Related
I am trying to implement ERGMs with PyMC.
I've found this, this, this and this, but these resources are a bit dated.
I have an NxN matrix for each network statistic (density, triangles, istar2, istar3 & distance). Each cell in each matrix indicates how the presence of that potential edge would change that statistic, holding the rest of the network constant. am is the adjacency matrix of graph G (nx.to_numpy_array(G)).
My model looks like this.
with pm.Model() as model:
density = pm.ConstantData("density", density)
triangles = pm.ConstantData("triangles", triangles)
istar2 = pm.ConstantData("istar2", istar2)
istar3 = pm.ConstantData("istar3", istar3)
distance = pm.ConstantData("distance", distance)
β_density = pm.Normal('β_density', mu=0, sigma=100)
β_triangles = pm.Normal('β_triangles', mu=0, sigma=100)
β_istar2 = pm.Normal('β_istar2', mu=0, sigma=100)
β_istar3 = pm.Normal('β_istar3', mu=0, sigma=100)
β_distance = pm.Normal('β_distance', mu=0, sigma=100)
μ = β_density*density + β_triangles*triangles + β_istar2*istar2 + β_istar3*istar3 + β_distance*distance
θ = pm.Deterministic('θ', pm.math.sigmoid(μ))
y = pm.Bernoulli('y', p=θ, observed=am)
trace=pm.sample(
draws=500,
tune=1000,
cores=1,
)
Am I doing this correctly?
I have a pandas data frame df with observed weights and heights (df.weight and df.height resp.)
I am trying to use linear regression (predicting weight from height) with Pymc3 to find an 89% compatibility interval for the weight of an 140cm tall individual.
This is my setup so far
with pm.Model() as q1_model:
# specify the model
alpha = pm.Normal('alpha', mu=45, sd=100)
beta = pm.Normal('beta', mu=0, sd=10)
sigma = pm.Uniform('sigma', lower=0, upper=50)
weight = pm.Normal('weight', mu=alpha + beta * df.height_c, sd=sigma, observed=df.weight)
# find the posterior distribution of the weight
trace = pm.sample(1000)
where df.height_c is df.height - df.height.mean(). But I'm not sure how to get the interval. I tried:
pm.hdi(trace.alpha + (140 - df.height.mean()) * trace.beta, 0.89)
which gives [35.0, 36.6] but I think this is just an interval for the mean height mu of an 140cm person, not for height itself. The actual interval is supposedly meant to be [29.1, 42.8] confirming my suspicion here.
I'm still a noob in PyMC3, so the question might me naive, but I don't know how to translate this pymc2 code in pymc3. In particular it's not clear to me how to translate the R function.
beta = pymc.Normal('beta', mu=0, tau=1.0e-4)
s = pymc.Uniform('s', lower=0, upper=1.0e+4)
tau = pymc.Lambda('tau', lambda s=s: s**(-2))
### Intrinsic CAR
#pymc.stochastic
def R(tau=tau, value=np.zeros(N)):
# Calculate mu based on average of neighbors
mu = np.array([sum(W[i]*value[A[i]])/Wplus[i] for i in xrange(N)])
# Scale precision to the number of neighbors
taux = tau*Wplus
return pymc.normal_like(value, mu, taux)
#pymc.deterministic
def M(beta=beta, R=R):
return [np.exp(beta + R[i]) for i in xrange(N)]
obsvd = pymc.Poisson("obsvd", mu=M, value=Y, observed=True)
model = pymc.Model([s, beta, obsvd])
Code from https://github.com/Youki/statistical-modeling-for-data-analysis-with-python/blob/945c13549a872d869e33bc48082c42efc022a07b/Chapter11/Chapter11.rst, and http://glau.ca/?p=340
Can you help me? Thanks
In PyMC3, you can implement the CAR model using the scan function of Theano. There is a sample code in their documentation. There are two implementations for CAR in the linked document. Here is the first one [Source]:
from theano import scan
floatX = "float32"
from pymc3.distributions import continuous
from pymc3.distributions import distribution
class CAR(distribution.Continuous):
"""
Conditional Autoregressive (CAR) distribution
Parameters
----------
a : list of adjacency information
w : list of weight information
tau : precision at each location
"""
def __init__(self, w, a, tau, *args, **kwargs):
super(CAR, self).__init__(*args, **kwargs)
self.a = a = tt.as_tensor_variable(a)
self.w = w = tt.as_tensor_variable(w)
self.tau = tau*tt.sum(w, axis=1)
self.mode = 0.
def get_mu(self, x):
def weigth_mu(w, a):
a1 = tt.cast(a, 'int32')
return tt.sum(w*x[a1])/tt.sum(w)
mu_w, _ = scan(fn=weigth_mu,
sequences=[self.w, self.a])
return mu_w
def logp(self, x):
mu_w = self.get_mu(x)
tau = self.tau
return tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x))
with pm.Model() as model1:
# Vague prior on intercept
beta0 = pm.Normal('beta0', mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal('beta1', mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma('tau_h', alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma('tau_c', alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal('theta', mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR('mu_phi', w=wmat, a=amat, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic('phi', mu_phi-tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))
# Likelihood
Yi = pm.Poisson('Yi', mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic('sd_h', tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic('sd_c', tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic('alpha', sd_c/(sd_h+sd_c))
trace1 = pm.sample(1000, tune=500, cores=4,
init='advi',
nuts_kwargs={"target_accept":0.9,
"max_treedepth": 15})
The M function is written here as:
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))
I'm trying to convert this example of Bayesian correlation for PyMC2 to PyMC3, but get completely different results. Most importantly, the mean of the multivariate Normal distribution quickly goes to zero, whereas it should be around 400 (as it is for PyMC2). Consequently, the estimated correlation quickly goes towards 1, which is wrong as well.
The full code is available in this notebook for PyMC2 and in this notebook for PyMC3.
The relevant code for PyMC2 is
def analyze(data):
# priors might be adapted here to be less flat
mu = pymc.Normal('mu', 0, 0.000001, size=2)
sigma = pymc.Uniform('sigma', 0, 1000, size=2)
rho = pymc.Uniform('r', -1, 1)
#pymc.deterministic
def precision(sigma=sigma,rho=rho):
ss1 = float(sigma[0] * sigma[0])
ss2 = float(sigma[1] * sigma[1])
rss = float(rho * sigma[0] * sigma[1])
return np.linalg.inv(np.mat([[ss1, rss], [rss, ss2]]))
mult_n = pymc.MvNormal('mult_n', mu=mu, tau=precision, value=data.T, observed=True)
model = pymc.MCMC(locals())
model.sample(50000,25000)
My port of the above code to PyMC3 is as follows:
def precision(sigma, rho):
C = T.alloc(rho, 2, 2)
C = T.fill_diagonal(C, 1.)
S = T.diag(sigma)
return T.nlinalg.matrix_inverse(T.nlinalg.matrix_dot(S, C, S))
def analyze(data):
with pm.Model() as model:
# priors might be adapted here to be less flat
mu = pm.Normal('mu', mu=0., sd=0.000001, shape=2, testval=np.mean(data, axis=1))
sigma = pm.Uniform('sigma', lower=1e-6, upper=1000., shape=2, testval=np.std(data, axis=1))
rho = pm.Uniform('r', lower=-1., upper=1., testval=0)
prec = pm.Deterministic('prec', precision(sigma, rho))
mult_n = pm.MvNormal('mult_n', mu=mu, tau=prec, observed=data.T)
return model
model = analyze(data)
with model:
trace = pm.sample(50000, tune=25000, step=pm.Metropolis())
The PyMC3 version runs, but clearly does not return the expected result. Any help would be highly appreciated.
The call signature of pymc.Normal is
In [125]: pymc.Normal?
Init signature: pymc.Normal(self, *args, **kwds)
Docstring:
N = Normal(name, mu, tau, value=None, observed=False, size=1, trace=True, rseed=True, doc=None, verbose=-1, debug=False)
Notice that the third positional argument of pymc.Normal is tau, not the standard deviation, sd.
Therefore, since the pymc code uses
mu = Normal('mu', 0, 0.000001, size=2)
The corresponding pymc3 code should use
mu = pm.Normal('mu', mu=0., tau=0.000001, shape=2, ...)
or
mu = pm.Normal('mu', mu=0., sd=math.sqrt(1/0.000001), shape=2, ...)
since tau = 1/sigma**2.
With this one change, your pymc3 code produces (something like)
I am trying to fit a hierarchical Poisson regression to estimate time_delay per group and globally. I am confused as to whether pymc automatically applies a log link function to mu or do I have to do so explicitly:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_participants)
mu = a[participants_idx]
y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay'].values)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
The below traceplot shows estimates for a. You can see group estimates between 0 and 750.
My confusion begins when I plot the hyper parameter gamma distribution by using the mean for alpha and beta as parameters. The below distribution shows support between 0 and 5 approx. This doesn't fit my expectation whilst looking at the group estimates for a above. What does a represent? Is it log(a) or something else?
Thanks for any pointers.
Adding example using fake data as requested in comments: This example has just a single group, so it should be easier to see if the hyper parameter could plausibly produce the Poisson distribution of the group.
test_data = []
model = []
for i in np.arange(1):
# between 1 and 100 messages per conversation
num_messages = np.random.uniform(1, 100)
avg_delay = np.random.gamma(15, 1)
for j in np.arange(num_messages):
delay = np.random.poisson(avg_delay)
test_data.append([i, j, delay, i])
model.append([i, avg_delay])
model_df = pd.DataFrame(model, columns=['conversation_id', 'synthetic_mean_delay'])
test_df = pd.DataFrame(test_data, columns=['conversation_id', 'message_id', 'time_delay', 'participants_str'])
test_df.head()
# Estimate parameters of model using test data
# convert categorical variables to integer
le = preprocessing.LabelEncoder()
test_participants_map = le.fit(test_df['participants_str'])
test_participants_idx = le.fit_transform(test_df['participants_str'])
n_test_participants = len(test_df['participants_str'].unique())
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_test_participants)
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
I don't see how the below hyper parameter could produce a poisson distribution with parameter between 13 and 17.
ANSWER: pymc uses different parameters than scipy to represent Gamma distributions. scipy uses alpha & scale, whereas pymc uses alpha and beta. The below model works as expected:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
scale = pm.Gamma('scale', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=1.0/scale, shape=n_test_participants)
#mu = T.exp(a[test_participants_idx])
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)