Related
What are the values I print now with sigma_ab and how can I calculate the confidence interval at 95?
for g in all:
c0 = 5
c2 = 0.2
c3 = 0.7
start = g['y'].iloc[0]
p0 = np.array([c0, c2, c3]), # Construct initial guess array
popt, pcov = curve_fit(
model, g['x'], g['y'],
absolute_sigma=True, maxfev=100000
)
sigma_ab = np.sqrt(np.diagonal(pcov))
n = g.name
print(n+' Estimated parameters: \n', popt)
print(n + ' Approximated errors: \n', sigma_ab)
These are the estimated parameters
[0.24803625 0.06072472 0.46449578]
This is sigma_ab but I don't know exactly what it is. I would like to calculate the upper and lower limit of the mean with 95% confidence interval.
[1.32778766 0.64261562 1.47915215]
Your sigma_ab (sqrt of the diagonal elements of the covariance) will be the 1-sigma (68.3%) uncertainties. If the distribution of your uncertainties is strictly Gaussian (often a good but not perfect assumption, so maybe "a decent starting estimate"), then the 2-sigma (95.5%) uncertainties will be twice those values.
If you want a more detailed measure (and one that doesn't assume symmetric uncertainties), you might find lmfit and its Model class helpful. By default (and when possible) it will report 1-sigma uncertainties from the covariance, which is fast, and usually pretty good. It can also explicitly bracket and find 1-, 2-, 3-sigma uncertainties, positive and negative separately.
You didn't give a very complete example, so it's a little hard to tell what your model function is doing. If you have a model function like:
def modelfunc(x, amp, cen, sigma):
return amp * np.exp(-(x-cen)*(x-cen)/sigma**2)
you could use
import numpy as np
import lmfit
def modelfunc(x, amp, cen, sigma):
return amp * np.exp(-(x-cen)*(x-cen)/sigma**2)
x = np.linspace(-10.0, 10.0, 201)
y = modelfunc(x, 3.0, 0.5, 1.1) + np.random.normal(scale=0.1, size=len(x))
model = lmfit.Model(modelfunc)
params = model.make_params(amp=5., cen=0.2, sigma=1)
result = model.fit(y, params, x=x)
print(result.fit_report())
# now calculate explicit 1-, 2, and 3-sigma uncertainties:
ci = result.conf_interval(sigmas=[1,2,3])
lmfit.printfuncs.report_ci(ci)
which will print out
[[Model]]
Model(modelfunc)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 21
# data points = 201
# variables = 3
chi-square = 1.93360112
reduced chi-square = 0.00976566
Akaike info crit = -927.428077
Bayesian info crit = -917.518162
[[Variables]]
amp: 2.97351225 +/- 0.03245896 (1.09%) (init = 5)
cen: 0.48792611 +/- 0.00988753 (2.03%) (init = 0.2)
sigma: 1.10931408 +/- 0.01398308 (1.26%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
C(amp, sigma) = -0.577
99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
amp : -0.09790 -0.06496 -0.03243 2.97351 +0.03255 +0.06543 +0.09901
cen : -0.03007 -0.01991 -0.00992 0.48793 +0.00991 +0.01990 +0.03004
sigma: -0.04151 -0.02766 -0.01387 1.10931 +0.01404 +0.02834 +0.04309
which gives explicitly calculated uncertainties, and shows that - for this case - the very fast estimates of the 1-sigma uncertainties are very good, and 2-sigma is pretty close to 2x the 1-sigma values. Like, you shouldn't really trust past the 2nd significant digit anyway...
Finally, in your example, you are actually not passing in your initial values, which illustrates a very serious flaw in curve_fit.
I am struggling to implement a linear regression in pymc3 with a custom likelihood.
I previously posted this question on CrossValidated & it was recommended to post here as the question is more code orientated (closed post here)
Suppose you have two independent variables x1, x2 and a target variable y, as well as an indicator variable called delta.
When delta is 0, the likelihood function is standard least squares
When delta is 1, the likelihood function is the least squares contribution only when the target variable is greater than the prediction
Example snippet of observed data:
x_1 x_2 𝛿 observed_target
10 1 0 100
20 2 0 50
5 -1 1 200
10 -2 1 100
Does anyone know how this can be implemented in pymc3? As a starting point...
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
mu = (intercept + beta_0*x_0 + beta_1*x_1)
In case this is helpful, from reading the docs it looks like something along these lines might work, but I have not been able to test it and it was too long to pop into a comment.
model = pm.Model()
with model as ttf_model:
intercept = pm.Normal('param_intercept', mu=0, sd=5)
beta_0 = pm.Normal('param_x1', mu=0, sd=5)
beta_1 = pm.Normal('param_x2', mu=0, sd=5)
std = pm.HalfNormal('param_std', beta = 0.5)
x_1 = pm.Data('var_x1', df['x1'])
x_2 = pm.Data('var_x2', df['x2'])
delta = pm.Data('delta', df['delta']) # Or whatever this column is
target = pm.Data('target', df['observed_target'])
ypred = (intercept + beta_0*x_0 + beta_1*x_1) # Intermediate result
target_ge_ypred = pm.math.ge(target, ypred) # Compare target to intermediate result
zero = pm.math.constant(0) # Use this if delta==1 and target<ypred
# EDIT: Check delta
alternate = pm.math.switch(target_ge_ypred, ypred, zero) # Alternative result
mu = pm.math.switch(pm.math.eq(delta, zero), ypred, alternate) # Actual result wanted?
I have been working with the following link,
Fitting empirical distribution to theoretical ones with Scipy (Python)?
I have been using my data to the code from the link and found out that the common distribution for my data is the Non-Central Student’s T distribution. I couldn’t find the distribution in the pymc3 package, so, I decided to have a look with scipy to understand how the distribution is formed. I created a custom distribution and I have few questions:
I would like to know if my approach to creating the distribution is right?
How can I implement the custom distribution into models?
Regarding the prior distribution, do I use same steps in normal distribution priors (mu and sigma) combined with halfnormed for degree of freedom and noncentral value?
My custom distribution:
import numpy as np
import theano.tensor as tt
from scipy import stats
from scipy.special import hyp1f1, nctdtr
import warnings
from pymc3.theanof import floatX
from pymc3.distributions.dist_math import bound, gammaln
from pymc3.distributions.continuous import assert_negative_support, get_tau_sigma
from pymc3.distributions.distribution import Continuous, draw_values, generate_samples
class NonCentralStudentT(Continuous):
"""
Parameters
----------
nu: float
Degrees of freedom, also known as normality parameter (nu > 0).
mu: float
Location parameter.
sigma: float
Scale parameter (sigma > 0). Converges to the standard deviation as nu increases. (only required if lam is not specified)
lam: float
Scale parameter (lam > 0). Converges to the precision as nu increases. (only required if sigma is not specified)
"""
def __init__(self, nu, nc, mu=0, lam=None, sigma=None, sd=None, *args, **kwargs):
super().__init__(*args, **kwargs)
super(NonCentralStudentT, self).__init__(*args, **kwargs)
if sd is not None:
sigma = sd
warnings.warn("sd is deprecated, use sigma instead", DeprecationWarning)
self.nu = nu = tt.as_tensor_variable(floatX(nu))
self.nc = nc = tt.as_tensor_variable(floatX(nc))
lam, sigma = get_tau_sigma(tau=lam, sigma=sigma)
self.lam = lam = tt.as_tensor_variable(lam)
self.sigma = self.sd = sigma = tt.as_tensor_variable(sigma)
self.mean = self.median = self.mode = self.mu = mu = tt.as_tensor_variable(mu)
self.variance = tt.switch((nu > 2) * 1, (1 / self.lam) * (nu / (nu - 2)), np.inf)
assert_negative_support(lam, 'lam (sigma)', 'NonCentralStudentT')
assert_negative_support(nu, 'nu', 'NonCentralStudentT')
assert_negative_support(nc, 'nc', 'NonCentralStudentT')
def random(self, point=None, size=None):
"""
Draw random values from Non-Central Student's T distribution.
Parameters
----------
point: dict, optional
Dict of variable values on which random values are to be
conditioned (uses default point if not specified).
size: int, optional
Desired size of random sample (returns one sample if not
specified).
Returns
-------
array
"""
nu, nc, mu, lam = draw_values([self.nu, self.nc, self.mu, self.lam], point=point, size=size)
return generate_samples(stats.nct.rvs, nu, nc, loc=mu, scale=lam ** -0.5, dist_shape=self.shape, size=size)
def logp(self, value):
"""
Calculate log-probability of Non-Central Student's T distribution at specified value.
Parameters
----------
value: numeric
Value(s) for which log-probability is calculated. If the log probabilities for multiple
values are desired the values must be provided in a numpy array or theano tensor
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
mu = self.mu
lam = self.lam
n = nu * 1.0
nc = nc * 1.0
x2 = value * value
ncx2 = nc * nc * x2
fac1 = n + x2
trm1 = n / 2. * tt.log(n) + gammaln(n + 1)
trm1 -= n * tt.log(2) + nc * nc / 2. + (n / 2.) * tt.log(fac1) + gammaln(n / 2.)
Px = tt.exp(trm1)
valF = ncx2 / (2 * fac1)
trm1 = tt.sqrt(2) * nc * value * hyp1f1(n / 2 + 1, 1.5, valF)
trm1 /= np.asarray(fac1 * tt.gamma((n + 1) / 2))
trm2 = hyp1f1((n + 1) / 2, 0.5, valF)
trm2 /= np.asarray(np.sqrt(fac1) * tt.gamma(n / 2 + 1))
Px *= trm1 + trm2
return bound(Px, lam > 0, nu > 0, nc > 0)
def logcdf(self, value):
"""
Compute the log of the cumulative distribution function for Non-Central Student's T distribution
at the specified value.
Parameters
----------
value: numeric
Value(s) for which log CDF is calculated. If the log CDF for multiple
values are desired the values must be provided in a numpy array or theano tensor.
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
return nctdtr(nu, nc, value)
My Custom model:
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('sigma', 0, 10)
sigma = pm.Normal('sigma', 0, 10)
nc= pm.HalfNormal('nc', sigma=10)
nu= pm.HalfNormal('nu', sigma=1)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
=> (input custom distribution) observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, observed=data)
# draw 5000 posterior samples
trace = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred = pm.sample_posterior_predictive(trace, samples=3000)
print(post_pred['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace))
print(pm.stats.summary(data=post_pred))
Edit 1:
I redesigned the custom model to include the custom distribution, however, I keep on getting error based on the equations used to get the likelihood distribution or sometimes tensor locks down and the code just freeze. Find my code below,
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('mu', mu=0, sigma=1)
sd = pm.HalfNormal('sd', sigma=1)
nc = pm.HalfNormal('nc', sigma=10)
nu = pm.HalfNormal('nu', sigma=1)
# Custom distribution:
# observed_data = pm.DensityDist('observed_data', NonCentralStudentT, observed=data_list)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
observed_data = NonCentralStudentT('observed_data', mu=mu, sd=sd, nc=nc, nu=nu, observed=data_list)
# draw 5000 posterior samples
trace_S = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred_S = pm.sample_posterior_predictive(trace_S, samples=3000)
print(post_pred_S['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace_S))
print(pm.stats.summary(data=post_pred_S))
Edit 2:
I am looking online in order to convert the function to theano, the only thing that I found to define the function is from the following GitHub link hyp1f1 function GitHub
Will this be enough to use in order to convert the function into theano?
In addition, I have a question, it is okay to use NumPy arrays with theano?
Also, I thought of another way but I am not sure if this can be implemented, I looked into the nct function in scipy and they wrote the following,
If Y is a standard normal random variable and V is an independent
chi-square random variable ( chi2 ) with k degrees of freedom, then
X=(Y+c) / sqrt(V/k)
has a non-central Student’s t distribution on the real line. The
degrees of freedom parameter k (denoted df in the implementation)
satisfies k>0 and the noncentrality parameter c (denoted nc in the
implementation) is a real number.
The probability density above is defined in the “standardized” form.
To shift and/or scale the distribution use the loc and scale
parameters. Specifically, nct.pdf(x, df, nc, loc, scale) is
identically equivalent to nct.pdf(y, df, nc) / scale with y = (x -
loc) / scale .
So, I thought of only using the priors as normal and chi2 random variables code part in their distributions and use the degree of freedom variable as mentioned before in the code into the equation mentioned in SciPy, will it be enough to get the distribution?
Edit 3:
I managed to run the code in the link about fitting empirical distribution and found out the second best was the student t distribution, so, I will be using this. Thank you for your help. I just have a side question, I ran my model with student t distribution but I got these warnings:
There were 52 divergences after tuning. Increase target_accept or
reparameterize. The acceptance probability does not match the target.
It is 0.7037574708196309, but should be close to 0.8. Try to increase
the number of tuning steps. The number of effective samples is smaller
than 10% for some parameters.
I am just confused about these warnings, Do you have any idea what it means? I know that this won't affect my code, but, I can reduce the divergences? and regarding the effective samples, Do I need to increase the number of samples in the trace code?
I'm trying to recreate a plot from An Introduction to Statistical Learning and I'm having trouble figuring out how to calculate the confidence interval for a probability prediction. Specifically, I'm trying to recreate the right-hand panel of this figure (figure 7.1) which is predicting the probability that wage>250 based on a degree 4 polynomial of age with associated 95% confidence intervals. The wage data is here if anyone cares.
I can predict and plot the predicted probabilities fine with the following code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import PolynomialFeatures
wage = pd.read_csv('../../data/Wage.csv', index_col=0)
wage['wage250'] = 0
wage.loc[wage['wage'] > 250, 'wage250'] = 1
poly = Polynomialfeatures(degree=4)
age = poly.fit_transform(wage['age'].values.reshape(-1, 1))
logit = sm.Logit(wage['wage250'], age).fit()
age_range_poly = poly.fit_transform(np.arange(18, 81).reshape(-1, 1))
y_proba = logit.predict(age_range_poly)
plt.plot(age_range_poly[:, 1], y_proba)
But I'm at a loss as to how the confidence intervals of the predicted probabilities are calculated. I have thought about bootstrapping the data many times to get the distribution of probabilities for each age but I know there is an easier way which is just beyond my grasp.
I have the estimated coefficient covariance matrix and the standard errors associated with each estimated coefficient. How would I go about calculating the confidence intervals as shown in the right-hand panel of the figure above given this information?
Thanks!
You can use delta method to find approximate variance for predicted probability. Namely,
var(proba) = np.dot(np.dot(gradient.T, cov), gradient)
where gradient is the vector of derivatives of predicted probability by model coefficients, and cov is the covariance matrix of coefficients.
Delta method is proven to work asymptotically for all maximum likelihood estimates. However, if you have a small training sample, asymptotic methods may not work well, and you should consider bootstrapping.
Here is a toy example of applying delta method to logistic regression:
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
# generate data
np.random.seed(1)
x = np.arange(100)
y = (x * 0.5 + np.random.normal(size=100,scale=10)>30)
# estimate the model
X = sm.add_constant(x)
model = sm.Logit(y, X).fit()
proba = model.predict(X) # predicted probability
# estimate confidence interval for predicted probabilities
cov = model.cov_params()
gradient = (proba * (1 - proba) * X.T).T # matrix of gradients for each observation
std_errors = np.array([np.sqrt(np.dot(np.dot(g, cov), g)) for g in gradient])
c = 1.96 # multiplier for confidence interval
upper = np.maximum(0, np.minimum(1, proba + std_errors * c))
lower = np.maximum(0, np.minimum(1, proba - std_errors * c))
plt.plot(x, proba)
plt.plot(x, lower, color='g')
plt.plot(x, upper, color='g')
plt.show()
It draws the following nice picture:
For your example the code would be
proba = logit.predict(age_range_poly)
cov = logit.cov_params()
gradient = (proba * (1 - proba) * age_range_poly.T).T
std_errors = np.array([np.sqrt(np.dot(np.dot(g, cov), g)) for g in gradient])
c = 1.96
upper = np.maximum(0, np.minimum(1, proba + std_errors * c))
lower = np.maximum(0, np.minimum(1, proba - std_errors * c))
plt.plot(age_range_poly[:, 1], proba)
plt.plot(age_range_poly[:, 1], lower, color='g')
plt.plot(age_range_poly[:, 1], upper, color='g')
plt.show()
and it would give the following picture
Looks pretty much like a boa-constrictor with an elephant inside.
You could compare it with the bootstrap estimates:
preds = []
for i in range(1000):
boot_idx = np.random.choice(len(age), replace=True, size=len(age))
model = sm.Logit(wage['wage250'].iloc[boot_idx], age[boot_idx]).fit(disp=0)
preds.append(model.predict(age_range_poly))
p = np.array(preds)
plt.plot(age_range_poly[:, 1], np.percentile(p, 97.5, axis=0))
plt.plot(age_range_poly[:, 1], np.percentile(p, 2.5, axis=0))
plt.show()
Results of delta method and bootstrap look pretty much the same.
Authors of the book, however, go the third way. They use the fact that
proba = np.exp(np.dot(x, params)) / (1 + np.exp(np.dot(x, params)))
and calculate confidence interval for the linear part, and then transform with the logit function
xb = np.dot(age_range_poly, logit.params)
std_errors = np.array([np.sqrt(np.dot(np.dot(g, cov), g)) for g in age_range_poly])
upper_xb = xb + c * std_errors
lower_xb = xb - c * std_errors
upper = np.exp(upper_xb) / (1 + np.exp(upper_xb))
lower = np.exp(lower_xb) / (1 + np.exp(lower_xb))
plt.plot(age_range_poly[:, 1], upper)
plt.plot(age_range_poly[:, 1], lower)
plt.show()
So they get the diverging interval:
These methods produce so different results because they assume different things (predicted probability and log-odds) being distributed normally. Namely, delta method assumes predicted probabilites are normal, and in the book, log-odds are normal. In fact, none of them are normal in finite samples, and they all converge to normal in infinite samples, but their variances converge to zero at the same time. Maximum likelihood estimates are insensitive to reparametrization, but their estimated distribution is, and that's the problem.
Here is an instructive and efficient method to calculate the standard errors ('se') of the fit ('mean_se') and single observations ('obs_se') on top of a statsmodels Logit().fit() object ('fit'), identical to the method in the book ISLR and the last method from the answer by David Dale:
fit_mean = fit.model.exog.dot(fit.params)
fit_mean_se = ((fit.model.exog*fit.model.exog.dot(fit.cov_params())).sum(axis=1))**0.5
fit_obs_se = ( ((fit.model.endog-fit_mean).std(ddof=fit.params.shape[0]))**2 + \
fit_mean_se**2 )**0.5
A figure similar to the one in the book ISLR
The shaded regions represent the 95% confidence intervals for the fit and single observations.
Ideas for improvement are most welcome.
I am trying to fit a hierarchical Poisson regression to estimate time_delay per group and globally. I am confused as to whether pymc automatically applies a log link function to mu or do I have to do so explicitly:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_participants)
mu = a[participants_idx]
y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay'].values)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
The below traceplot shows estimates for a. You can see group estimates between 0 and 750.
My confusion begins when I plot the hyper parameter gamma distribution by using the mean for alpha and beta as parameters. The below distribution shows support between 0 and 5 approx. This doesn't fit my expectation whilst looking at the group estimates for a above. What does a represent? Is it log(a) or something else?
Thanks for any pointers.
Adding example using fake data as requested in comments: This example has just a single group, so it should be easier to see if the hyper parameter could plausibly produce the Poisson distribution of the group.
test_data = []
model = []
for i in np.arange(1):
# between 1 and 100 messages per conversation
num_messages = np.random.uniform(1, 100)
avg_delay = np.random.gamma(15, 1)
for j in np.arange(num_messages):
delay = np.random.poisson(avg_delay)
test_data.append([i, j, delay, i])
model.append([i, avg_delay])
model_df = pd.DataFrame(model, columns=['conversation_id', 'synthetic_mean_delay'])
test_df = pd.DataFrame(test_data, columns=['conversation_id', 'message_id', 'time_delay', 'participants_str'])
test_df.head()
# Estimate parameters of model using test data
# convert categorical variables to integer
le = preprocessing.LabelEncoder()
test_participants_map = le.fit(test_df['participants_str'])
test_participants_idx = le.fit_transform(test_df['participants_str'])
n_test_participants = len(test_df['participants_str'].unique())
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
beta = pm.Gamma('beta', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=beta, shape=n_test_participants)
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)
I don't see how the below hyper parameter could produce a poisson distribution with parameter between 13 and 17.
ANSWER: pymc uses different parameters than scipy to represent Gamma distributions. scipy uses alpha & scale, whereas pymc uses alpha and beta. The below model works as expected:
with pm.Model() as model:
alpha = pm.Gamma('alpha', alpha=1, beta=1)
scale = pm.Gamma('scale', alpha=1, beta=1)
a = pm.Gamma('a', alpha=alpha, beta=1.0/scale, shape=n_test_participants)
#mu = T.exp(a[test_participants_idx])
mu = a[test_participants_idx]
y = test_df['time_delay'].values
y_est = pm.Poisson('y_est', mu=mu, observed=y)
start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
step = pm.Metropolis(start=start)
trace = pm.sample(20000, step, start=start, progressbar=True)