Generating predictions from inferred parameters in pymc3 - python

I run into a common problem I'm wondering if someone can help with. I often would like to use pymc3 in two modes: training (i.e. actually running inference on parameters) and evaluation (i.e. using inferred parameters to generate predictions).
In general, I'd like a posterior over predictions, not just point-wise estimates (that's part of the benefit of the Bayesian framework, no?). When your training data is fixed, this is typically accomplished by adding simulated variable of a similar form to the observed variable. For example,
from pymc3 import *
with basic_model:
# Priors for unknown model parameters
alpha = Normal('alpha', mu=0, sd=10)
beta = Normal('beta', mu=0, sd=10, shape=2)
sigma = HalfNormal('sigma', sd=1)
# Expected value of outcome
mu = alpha + beta[0]*X1 + beta[1]*X2
# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
Y_sim = Normal('Y_sim', mu=mu, sd=sigma, shape=len(X1))
start = find_MAP()
step = NUTS(scaling=start)
trace = sample(2000, step, start=start)
But what if my data changes? Say I want to generate predictions based on new data, but without running inference all over again. Ideally, I'd have a function like predict_posterior(X1_new, X2_new, 'Y_sim', trace=trace) or even predict_point(X1_new, X2_new, 'Y_sim', vals=trace[-1]) that would simply run the new data through the theano computation graph.
I suppose part of my question relates to how pymc3 implements the theano computation graph. I've noticed that the function model.Y_sim.eval seems similar to what I want, but it requires Y_sim as an input and seems just to return whatever you give it.
I imagine this process is extremely common, but I can't seem to find any way to do it. Any help is greatly appreciated. (Note also that I have a hack to do this in pymc2; it's more difficult in pymc3 because of theano.)

Note: This functionality is now incorporated in the core code as the pymc.sample_ppc method. Check out the docs for more info.
Based on this link (dead as of July 2017) sent to me by twiecki, there are a couple tricks to solve my issue. The first is to put the training data into a shared theano variable. This allows us to change the data later without screwing up the theano computation graph.
X1_shared = theano.shared(X1)
X2_shared = theano.shared(X2)
Next, build the model and run the inference as usual, but using the shared variables.
with basic_model:
# Priors for unknown model parameters
alpha = Normal('alpha', mu=0, sd=10)
beta = Normal('beta', mu=0, sd=10, shape=2)
sigma = HalfNormal('sigma', sd=1)
# Expected value of outcome
mu = alpha + beta[0]*X1_shared + beta[1]*X2_shared
# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
start = find_MAP()
step = NUTS(scaling=start)
trace = sample(2000, step, start=start)
Finally, there's a function under development (will likely eventually get added to pymc3) that will allow to predict posteriors for new data.
from collections import defaultdict
def run_ppc(trace, samples=100, model=None):
"""Generate Posterior Predictive samples from a model given a trace.
"""
if model is None:
model = pm.modelcontext(model)
ppc = defaultdict(list)
for idx in np.random.randint(0, len(trace), samples):
param = trace[idx]
for obs in model.observed_RVs:
ppc[obs.name].append(obs.distribution.random(point=param))
return ppc
Next, pass in the new data that you want to run predictions on:
X1_shared.set_value(X1_new)
X2_shared.set_value(X2_new)
Finally, you can generate posterior predictive samples for the new data.
ppc = run_ppc(trace, model=model, samples=200)
The variable ppc is a dictionary with keys for each observed variable in the model. So, in this case ppc['Y_obs'] would contain a list of arrays, each of which is generated using a single set of parameters from trace.
Note that you can even modify the parameters extracted from the trace. For example, I had a model using a GaussianRandomWalk variable and I wanted to generate predictions into the future. While you could allow pymc3 to sample into the future (i.e. allow the random walk variable to diverge), I just wanted to use a fixed value of the coefficient corresponding to the last inferred value. This logic can implemented in the run_ppc function.
It's also worth mentioning that the run_ppc function is extremely slow. It takes about as much time as running the actual inference. I suspect this has to do with some inefficiency related to how theano is used.
EDIT: The link originally included seems to be dead.

Above answer from #santon is correct. I am just adding to that.
Now you don't need to write your own method run_ppc. pymc3 provides sample_posterior_predictive method which does the same.

Related

Get parameter estimates from logistic regression model using pycaret

I am training and tuning a model in pycaret such as:
from pycaret.classification import *
clf1 = setup(data = train, target = 'target', feature_selection = True, test_data = test, remove_multicollinearity = True, multicollinearity_threshold = 0.4)
# create model
lr = create_model('lr')
# tune model
tuned_lr = tune_model(lr)
# optimize threshold
optimized_lr = optimize_threshold(tuned_lr)
I would like to get the parameters estimated for the features in the Logistic Regression, so I could proceed with understanding the effect size of each feature on the target. However, the object optimized_lr has a function optimized_lr.get_params() which returns the hyperparameters of the model, however, I am not quite interested in my tuning decisions, instead, I am very interested in the real parameters of the model, the ones estimated in Logistic Regression.
How could I get them to use pycaret? (I could easily get those using other packages such as statsmodels, but I want to know in pycaret)
how about
for f, c in zip (optimized_lr.feature_names_in_,tuned.coef_[0]):
print(f, c)
To get the coefficients, use this code:
tuned_lr.feature_importances_ #this will give you the coefficients
get_config('X_train').columns #this code will give you the names of the columns.
Now we can create a dataframe so that we could see clearly how it correlates with the independent variable.
Coeff=pd.DataFrame({"Feature":get_config('X_train').columns.tolist(),"Coefficients":tuned_lr.feature_importances_})
print(Coeff)
# It would give me the Coefficient with the names of the respective columns. Hope it helps.

prediction interval for arma-garch models in python

Is there a way to measure the accuracy of an ARMA-GARCH model in Python using a prediction interval (alpha=0.05)? I fitted an ARMA-GARCH model on log returns and used some classical metrics such as RMSE, MSE (out-of-sample), AIC (in-sample), check on residuals and so on. I would like to add a prediction interval as another measurement of accuracy based on my ARMA-GARCH model predictions. I used the armagarch library (https://github.com/iankhr/armagarch).
I already checked on how to use prediction intervals but not sure how to use it with ARMA-GARCH.
I found these formula searching online: Estimator +- 1.96 (for 95%) * Standard Error.
So far i got it, but i have several Standard Errors in my model output for each parameter in the ARMA and GARCH part, which one i have to use? Is there one Standard Error for the whole model itself?
I would be really happy if anyone could help.
ARMA-GARCH model output
So far I created an ARMA(2,2)-GARCH(1,1) model:
#final test of function
import armagarch as ag
#definitions framework
data = pd.DataFrame(data)
meanMdl = ag.ARMA(order = {'AR':2,'MA':2})
volMdl = ag.garch(order = {'p':1,'q':1})
distMdl = ag.normalDist()
model = ag.empModel(data, meanMdl, volMdl, distMdl)
model_fit = model.fit()
After the model fit defining prediction length and
Recieved two arrays as an output (mean + variance) put them into the correct length:
#first array is mean, second is variance
pred = model.predict(nsteps=len(df_test))
#correct the shapes!
df_pred_mean = pd.DataFrame(np.reshape(pred[0], (len(df_test),
1)))
df_pred_variance = pd.DataFrame(np.reshape(pred[1],
(len(df_test), 1)))
So far so good, now i would like to implement a prediction interval.
I got that one has to use the ARMA part +- 1.96 (95%)* GARCH prediction for each prediction. I implemented it for the upper and lower bound. It just shows the upper bound lower bound is same but using * (-1.96) at the end of the formula.
#upper bound
df_all["upper bound"] =df_all["pred_Mean"]+df_all["pred_Variance"]*1.96
Using it on the actual log returns i trained the model with fails in the way its completely wrong. Now I'm unsure if the main approach i used is wrong or the model I used means the package.
prediction interval vs. actual log return

While calculating posterior using the parameter(say w_mu) vs using the parameter.data(w_mu.data) makes a difference?

I am trying to implement Bayes By Backprop. While calculating posterior if I use the parameter as an input vs parameter.data as input the result accuracy drastically changes.
self.w_post = Normal(self.w_mu.data, torch.log(1+torch.exp(self.w_rho)))
self.b_post = Normal(self.b_mu.data, torch.log(1+torch.exp(self.b_rho)))
self.log_post = self.w_post.log_prob(self.w).sum() + self.b_post.log_prob(self.b).sum()
This works, while the next block doesn't.
self.w_post = Normal(self.w_mu, torch.log(1+torch.exp(self.w_rho)))
self.b_post = Normal(self.b_mu, torch.log(1+torch.exp(self.b_rho)))
self.log_post = self.w_post.log_prob(self.w).sum() + self.b_post.log_prob(self.b).sum()
Since w_post and b_post aren't the parameters so why does this affect my answer. This snip of code lies in the forward function of a custom-defined linear layer.
While the value of log_posterior does not change through the epochs. Can it have something to do with the seed?
I think both the solutions can be applied and it somehow only changes the seed. After a few iterations both the model converge.

How to define the policy in the case of continuous action space that sum up to 1?

I am currently working on a continuous state-action space problem using policy gradient methods.
The environment action space is defined as ratios that has to sum up to 1 at each timestep. Hence, using the gaussian policy doesn't seem to be suitable in this case.
What I did instead is I tried to tweak the softmax policy (to make sure the policy network output sums up to 1), but I had hard time determining the loss function to use and eventually its gradient in order to update the network parameters.
So far, I have tried a discounted return-weighted Mean Squared Error, but the results aren't satisfactory.
Are there any other policies that can be used in this particular case? Or ar there any ideas which loss function to use?
Here is the implementation of my policy network (inside my agent class) in tensorflow.
def policy_network(self):
self.input = tf.placeholder(tf.float32,
shape=[None, self.input_dims],
name='input')
self.label = tf.placeholder(tf.float32, shape=[None, self.n_actions], name='label')
# discounted return
self.G = tf.placeholder(tf.float32, shape=[
None,
], name='G')
with tf.variable_scope('layers'):
l1 = tf.layers.dense(
inputs=self.input,
units=self.l1_size,
activation=tf.nn.relu,
kernel_initializer=tf.contrib.layers.xavier_initializer())
l2 = tf.layers.dense(
inputs=l1,
units=self.l2_size,
activation=tf.nn.relu,
kernel_initializer=tf.contrib.layers.xavier_initializer())
l3 = tf.layers.dense(
inputs=l2,
units=self.n_actions,
activation=None,
kernel_initializer=tf.contrib.layers.xavier_initializer())
self.actions = tf.nn.softmax(l3, name='actions')
with tf.variable_scope('loss'):
base_loss = tf.reduce_sum(tf.square(self.actions - self.label))
loss = base_loss * self.G
with tf.variable_scope('train'):
self.train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)
On top of my head, you may want to try 2D-Gaussian or multivariate Gaussian. https://en.wikipedia.org/wiki/Gaussian_function
For example, you could predict the 4 parameters (x_0, x_1, sigma_0, sigma_1) of 2D-Gaussian, which you could generate a pair of numbers on the 2D-Gaussian plane, say (2, 1.5), then you could use softmax to produce the desired action softmax([2, 1.5])=[0.62245933 0.37754067].
Then you could calculate the probability of the pair of numbers on the 2D-Gaussian plane, which you could then use to calculate the negative log probability, advantage, etc, to make the loss function and update the gradient.
Have you thought of using Dirichlet distribution? Your network can output concentration parameters alpha > 0 and then you can use them to generate a sample which would sum to one. Both PyTorch and TF support this distribution and you can both sample and get logProb from them. In this case, in addition to getting your sample, since it is a probability distribution, you can get a sense its variance too which can be a measure of the agent confidence. For the action of 3 dimensions, having alpha={1,1,1} basically means your agent doesn't have any preference and having alpha={100,1,1} would imply that it is very certain about most of the weight should go to the first dimensions.
Edit based on the comment:
Vanilla REINFORCE would have a hard time optimizing the policy when you use Dirichlet distribution. The problem is, in vanilla policy gradient, you can control how fast you change your policy in the network parameters space through gradient clipping and adaptive learning rate, etc. However, what matters the most is to control the rate of change in the probability space. Some network parameters may change probabilities a lot more than the others. Therefore, even though you control the learning rate to limit the delta of your network parameters, you may change the variance of your Dirichlet distribution a lot, which makes sense for your network if you think. In order to maximize the log-prob of your actions, your network might focus more on reducing the variance than shifting the mode of your distribution which would later hurt you in both exploration and learning meaningful policy. One way to alleviate this problem is to limit the rate of change of your policy probability distribution through limiting the KL-divergence of your new policy ditribution vs old one. TRPO or PPO are two of the ways to address this issue and solve the constraint optimization problems.
It is also probably good to make sure that in practice alpha > 1. You can achieve this easily by using softplus ln(1+exp(x)) + 1 on your neural network outputs before feeding it into your Drichlet distribution. Also monitor the gradients reaching your layers and make sure it exists.
You may also want to add the entropy of the distribution to your objective function to ensure enough exploration and prevent distribution with very low variance (very high alphas).

Is there a way to generate variables from pymc3?

If I have a model like the one below, how do I access the theano function in order to get the value(s) for my model I'm fitting?
This is quite a basic model and so I could just calculate with the raw function for my variables. However, I intend to generate pymc3 models dynamically where some variables are reused/fixed/bounded etc.
I know I can access the theano function from model.makefn([expected]) but this will rely on transformed arguments like sigma_log_ instead of sigma.
Ideally, I'm looking for something like model.evalute([expected], alpha=1, beta=2)
Is there such a method?
Thanks
def function(a, b):
# do something
basic_model = Model()
with basic_model:
# Priors for unknown model parameters
alpha = Normal('alpha', mu=0, sd=10)
beta = Normal('beta', mu=0, sd=10, shape=2)
sigma = HalfNormal('sigma', sd=1)
# Expected value of outcome
expected = Deterministic('expected', function(alpha,beta))
# Likelihood (sampling distribution) of observations
Y_obs = Normal('Y_obs', mu=function, sd=sigma, observed=Y)
The typical approach here would be to first sample from the model's posterior distribution with something like
with model:
trace = pm.sample(N_SAMPLES)
then use the samples to approximate the posterior expected value of your function.

Categories