Stochastic Indexing in Pymc3 - python

I'm fairly new to pymc3, and I'm trying to understand how to work random variables into models in different ways. I would like to fit the following (contrived) model, but I cannot find any support for it in the documention.
I tried the following, but numpy does not allow such indexing:
seq = numpy.arange(10,y_train.size)
basic_model = pymc3.Model()
with basic_model:
alpha = pymc3.Normal('alpha',mu=0,sd=1)
beta = pymc3.Normal('beta',mu=0,sd=1)
gamma = pymc3.DiscreteUniform('gamma',lower=1,upper=10)
mu = pymc3.Deterministic('mu',alpha+beta*y_train[seq-gamma])
y = pymc3.Normal('y',mu=mu,sd=sigma,observed=y_train[11:])
map_estimate = pymc3.find_MAP(model=basic_model)
step = pymc3.Metropolis()
trace = pymc3.sample(10000,step,start=map_estimate,progressbar=True)

You need to convert the numpy array to a theano const first:
tt.as_tensor_variable(y_train)[seq-gamma]

Related

Summarise the posterior of a single parameter from an array with arviz

I am estimating a model using the pyMC3 library in python. In my "real" model, there are four parameter arrays, two of which have over 170,000 parameters in them. Summarising this array of parameters is too computationally intensive on my computer. I have been trying to figure out if the summary function in arviz will allow me to only summarise one (or a small number) of parameters in the array. Below is a reprex where the same problem is present, though the model is a lot simpler. In the linear regression model below, the parameter array b has three parameters in it b[0], b[1], b[2]. I would like to know how to get the summary for just b[0] and b[1] or alternatively for just a single parameter, e.g., b[0].
import pandas as pd
import pymc3 as pm
import arviz as az
d = pd.read_csv("https://quantoid.net/files/mtcars.csv")
mpg = d['mpg'].values
hp = d['hp'].values
weight = d['wt'].values
with pm.Model() as model:
b = pm.Normal("b", mu=0, sigma=10, shape=3)
sig = pm.HalfCauchy("sig", beta=2)
mu = pm.Deterministic('mu', b[0] + b[1]*hp + b[2]*weight)
like = pm.Normal('like', mu=mu, sigma=sig, observed=mpg)
fit = pm.fit(10000, method='advi')
samp = fit.sample(1500)
with model:
smry = az.summary(samp, var_names = ["b"])
It looked like the coords argument to the summary() function would do it, but after googling around and finding a few examples, like the one here with plot_posterior() instead of summary(), I was unable to get something to work. In particular, I tried the following in the hopes that it would return the summary for b[0] and b[1].
with model:
smry = az.summary(samp, var_names = ["b"], coords={"b_dim_0": range(1)})
or this to return the summary of b[0]:
with model:
smry = az.summary(samp, var_names = ["b"], coords={"b_dim_0": [0]})
I suspect I am missing something simple (I'm an R user who dabbles occasionally with Python). Any help is greatly appreciated.
(BTW, I am using Python 3.8.0, pyMC3 3.9.3, arviz 0.10.0)
To use coords for this, you need to update to the development (which will still show 0.11.2 but has the code from github or any >0.11.2 release) version of ArviZ. Until 0.11.2, the coords argument in summary was not used to subset the data (like it did in all plotting functions) but instead it was only taken into account if the input was not already InferenceData in which case it was passed to the converter.
With older versions, you need to use xarray to subset the data before passing it to summary. Therefore you need to explicitly convert the trace to inferencedata beforehand. In the example above it would look like:
with model:
...
samp = fit.sample(1500)
idata = az.from_pymc3(samp)
az.summary(idata.posterior[["b"]].sel({"b_dim_0": [0]}))
Moreover, you may also want to indicate summary to compute only a subset of the stats/diagnostics as shown in the docstring examples.

Generating random matrices for VAR(p) in PyMC3

I am trying to build a simple VAR(p) model using pymc3, but I'm getting some cryptic errors about incompatible dimensions. I suspect the issue is that I'm not properly generating random matrices. Here is an attempt at VAR(1), any help would be welcome:
# generate some data
y_full = numpy.zeros((2,100))
t = numpy.linspace(0,2*numpy.pi,100)
y_full[0,:] = numpy.cos(5*t)+numpy.random.randn(100)*0.02
y_full[1,:] = numpy.sin(6*t)+numpy.random.randn(100)*0.01
y_obs = y_full[:,1:]
y_lag = y_full[:,:-1]
with pymc3.Model() as model:
beta= pymc3.MvNormal('beta',mu=numpy.ones((4)),cov=numpy.ones((4,4)),shape=(4))
mu = pymc3.Deterministic('mu',beta.reshape((2,2)).dot(y_lag))
y = pymc3.MvNormal('y',mu=mu,cov=numpy.eye(2),observed=y_obs)
The last line should be
y = pm.MvNormal('y',mu=mu.T, cov=np.eye(2),observed=y_obs.T)
MvNormal interprets the last dimension as the mvnormal vectors. This is because the behaviour of numpy indexing implies that y_obs is a vector of length 2 containing vectors of length 100 (y_lag[i].shape == (100,))

Conditional Probability Using Pymc3

My question is how to use PYMC3 package to carry out conditional probability models.
I have a set of data a_observed, b_observed, c_observed, and I want to find the relations between them. I suspect that a, b, c are all normal distributions, b depends on a, c depends on a, b. I need to find the parameters.
So far I have:
with model:
# define priors
muA = pm.Uniform('muA', lower=0, upper=24)
muB = pm.Uniform('muB', lower=0, upper=24)
muC = pm.Uniform('muC', lower=0, upper=24)
sigmaA = pm.Uniform('sigmaA', lower=0, upper=1000)
sigmaB = pm.Uniform('sigmaB', lower=0, upper=1000)
sigmaC = pm.Uniform('sigmaC', lower=0, upper=1000)
distributionA = pm.Normal('a', mu = muA, sd = sigmaA, observed = a_observed)
distributionB = pm.Normal('b', mu = muB, sd = sigmaB, observed = b_observed)
distributionC = pm.Normal('c', mu = muC, sd = sigmaC, observed = c_observed)
start = pm.find_MAP()
step = pm.Slice()
Now I want A to be independent, B|A, C|A,B. What is the best approach in PYMC3 to carry out this? I've seen lambda functions here http://healthyalgorithms.com/2011/11/23/causal-modeling-in-python-bayesian-networks-in-pymc/, but this approach gives out the conditional probability directly.
Also, I want to know how easy is it to expand the model to more than three variables with more complicated dependencies. Thanks!
Please have a look at the following question: Simple Bayesian Network via Monte Carlo Markov Chain ported to PyMC3. In the associated gist I ported the PyMC2 example that you reference above to PyMC3. The key is to use pm.Deterministic() and pm.math.switch(). I hope this helps.

confidence interval for the data itself using lmfit in python

Here is the link for the LMFIT implementation of the confidence intervals of parameters: http://lmfit.github.io/lmfit-py/confidence.html
Here is the code I am using:
import lmfit
import numpy as np
# x = np.linspace(1, 10, 250)
# np.random.seed(0)
# y = 1. -np.exp(-(x)/10.) + 0.1*np.random.randn(len(x))
pars = lmfit.Parameters()
pars.add_many(('n', 1.), ('tau', 3.))
# def residual(pars,data=None):
def residual(pars):
v = pars.valuesdict()
# if data is None:
# return 1.0 - np.exp(-(x**v['n'])/v['tau'])
return 1.0 - np.exp(-(x**v['n'])/v['tau'])-y
# create Minimizer
mini = lmfit.Minimizer(residual, pars)
# first solve with Nelder-Mead
out1 = mini.minimize(method='Nelder')
out2 = mini.minimize(method='leastsq', params=out1.params)
lmfit.report_fit(out2.params, min_correl=0.5)
ci, trace = lmfit.conf_interval(mini, out2, sigmas=[0.95],
trace=True, verbose=False)
lmfit.printfuncs.report_ci(ci)
It is a bit difficult to understand the title Confidence interval for the data itself using lmfit in python (there is no data), or the the first sentence I am doing curve fitting using lmfit package (you need data to fit).
I think what you are asking for is a way to get extreme values for the model function that best matches your data. If so, would it work to evaluate your function with all combinations of parameter values best +/- delta (where delta could be any uncertainly level you like), and take the extreme values of the model function? That's not very automated, but shouldn't be too hard.

Define different bounds for a multidimentional stochastic variable in pymc

I'm having an issue with defining bounds for a multidimentional stochastic variable.
Here is a dummy exemple to explain my problem.
If I want to have a 3 dimension discrete uniform between [0,100]
import pymc as mc
from numpy import empty
truth = mc.DiscreteUniform("bin1", lower=0, upper=100, value=[50,50,50], size=3)
#mc.deterministic(plot=False)
def unfold(truth=truth):
out = empty(3)
for r in xrange(3):
out[r] = truth[r]
return out
data = [5, 10, 30]
unfolded = mc.Poisson('unfolded', mu=unfold, value=data, observed=True, size=3)
model = mc.Model([unfolded, unfold, truth])
mcmc = mc.MCMC( model )
mcmc.use_step_method(mc.AdaptiveMetropolis, truth)
mcmc.sample(10000,1000,10)
this will sample a DiscreteUniform for 3 bins with the same range for each bin (between 0 and 100).
Now, I tried several things to define different range for each bin, but can not succeed. I tried arrays of DiscreteUniform and arrays of bounds (upper,lower), but they obviously does not work.
Does anyone ahs any idea how to define different range for the various bins of a stochastic variable?
To define different ranges and initial values you need to call the stochastic constructor N times to create a list of variables and then use the Container constructor to make the list pymc-readable:
bin1 = mc.DiscreteUniform("bin1", lower=0, upper=100, value=50, size=1)
bin2 = mc.DiscreteUniform("bin2", lower=0, upper=40, value=20, size=1)
bin3 = mc.DiscreteUniform("bin3", lower=10, upper=50, value=30, size=1)
truth = mc.Container([bin1,bin2,bin3])

Categories