I'm quite new to probabilistic programming and pymc3...
Currently, I want to implement the Kennedy-O’Hagan framework in pymc3.
The setup is according to the paper of Kennedy and O'Hagan as follows:
We have n observations zi of the form
zi = f(xi , theta) + g(xi) + ei,
where xi are known imputs and theta are unknown calibration parameters and ei are iid error terms. We also have m model evaluations yj of the form
yj = f(x'j, thetaj), where both x'j (different than xi above) and thetaj are known. Therefore, the data consists of all zi and yj. In the paper, Kennedy-O'Hagan model f, g using gaussian processes:
f ~ GP{m1 (.,.), Sigma1[(.,.),(.,.)] }
g ~ GP{m2 (.), Sigma2[(.),(.)] }
Among other things, the goal is to get posterior samples for the unknow calibration parameters theta.
What I've done so far:
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
from multiprocessing import freeze_support
import sys
import theano
import theano.tensor as tt
from mpl_toolkits.mplot3d import Axes3D
import pyDOE
from scipy.stats.distributions import uniform
def physical_system(x):
return 0.65 * x / (1 + x / 5)
def observation(x):
return physical_system(x[:]) + np.random.normal(0,0.01,len(x))
def computational_system(input):
return input[:,0]*input[:,1]
if __name__ == "__main__":
freeze_support()
# observations with noise
x_obs = np.linspace(0,4,10)
y_real = physical_system(x_obs[:])
y_obs = observation(x_obs[:])
# computation model
N = 60
design = pyDOE.lhs(2, samples=N, criterion='center')
left = [-0.2,-0.2]; right = [4.2,1.2]
for i in range(2):
design[:,i] = uniform(loc=left[i],scale=right[i]-left[i]).ppf(design[:,i])
x_comp = design[:,0][:,None]; t_comp = design[:,1][:,None]
input_comp = np.hstack((x_comp,t_comp))
y_comp = computational_system(input_comp)
x_obs_shared = theano.shared(x_obs[:, None])
with pm.Model() as model:
noise = pm.HalfCauchy('noise',beta=5)
ls_1 = pm.Gamma('ls_1', alpha=1, beta=1, shape=2)
cov = pm.gp.cov.ExpQuad(2,ls=ls_1)
f = pm.gp.Marginal(cov_func=cov)
# train the gp f with data from computer model:
f_0 = f.marginal_likelihood('f_0', X=input_comp, y=y_comp, noise=noise)
trace = pm.sample(500, pm.Metropolis(), chains=4)
burned_trace = trace[300:]
Until here, everything is fine. My GP f is trained according the computer model.
Now, I want to test if I could fit this trained GP to my observed data:
#gp f is now trained to data from computer model
#now I want to fit this trained gp to observed data and find posterior for theta
with model:
sd = pm.Gamma('eta', alpha=1, beta=1)
theta = pm.Normal('theta', mu=0, sd=sd)
sigma = pm.Gamma('sigma', alpha=1, beta=1)
input_1 = tt.concatenate([x_obs_shared, tt.tile(theta, len(x_obs[:,None]), ndim=2).T], axis=1)
f_1 = gp1.conditional('f_1', Xnew=input_1, shape=(10,))
y_ = pm.Normal('y_', mu=f_1,sd=sigma, observed=y_obs)
step = pm.Metropolis()
trace_ = pm.sample(30000, step,start=pm.find_MAP(), chains=4)
Is this formulation correct? I get very unstable results...
The full formulation according KOH should be something like this:
with pm.Model() as model:
theta = pm.Normal('theta', mu=0, sd=10)
noise = pm.HalfCauchy('noise',beta=5)
ls_1 = pm.Gamma('ls_1', alpha=1, beta=1, shape=2)
cov = pm.gp.cov.ExpQuad(2,ls=ls_1)
gp1 = pm.gp.Marginal(cov_func=cov)
gp2 = pm.gp.Marginal(cov_func=cov)
gp = gp1 + gp2
input_1 = tt.concatenate([x_obs_shared, tt.tile(theta, len(x_obs), ndim=2).T], axis=1)
f_0 = gp1.marginal_likelihood('f_0', X=input_comp, y=y_comp, noise=noise)
f_1 = gp1.marginal_likelihood('f_1', X=input_1, y=y_obs, noise=noise)
f = gp.marginal_likelihood('f', X=input_1, y=y_obs, noise=noise)
Could somebody give me some advise how to formulate the KOH properly with pymc3? I am desperate... Would appreciate any help. Thank you!
You might have found the solution but if not, that's a good one (Guidelines for the Bayesian calibration of building energy models)
Related
I am triyng to use scipy curve_fit function to fit a gaussian function to my data to estimate a theoretical power spectrum density. While doing so, the curve_fit function always return the initial parameters (p0=[1,1,1]) , thus telling me that the fitting didn't work.
I don't know where the issue is. I am using python 3.9 (spyder 5.1.5) from the anaconda distribution on windows 11.
here a Wetransfer link to the data file
https://wetransfer.com/downloads/6097ebe81ee0c29ee95a497128c1c2e420220704110130/86bf2d
Here is my code below. Can someone tell me what the issue is, and how can i solve it?
on the picture of the plot, the blue plot is my experimental PSD and the orange one is the result of the fit.
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.constants as cst
File = np.loadtxt('test5.dat')
X = File[:, 1]
Y = File[:, 2]
f_sample = 50000
time=[]
for i in range(1,len(X)+1):
t=i*(1/f_sample)
time= np.append(time,t)
N = X.shape[0] # number of observation
N1=int(N/2)
delta_t = time[2] - time[1]
T_mes = N * delta_t
freq = np.arange(1/T_mes, (N+1)/T_mes, 1/T_mes)
freq=freq[0:N1]
fNyq = f_sample/2 # Nyquist frequency
nb = 350
freq_block = []
# discrete fourier tansform
X_ft = delta_t*np.fft.fft(X, n=N)
X_ft=X_ft[0:N1]
plt.figure()
plt.plot(time, X)
plt.xlabel('t [s]')
plt.ylabel('x [micro m]')
# Experimental power spectrum on both raw and blocked data
PSD_X_exp = (np.abs(X_ft)**2/T_mes)
PSD_X_exp_b = []
STD_PSD_X_exp_b = []
for i in range(0, N1+2, nb):
freq_b = np.array(freq[i:i+nb]) # i-nb:i
psd_b = np.array(PSD_X_exp[i:i+nb])
freq_block = np.append(freq_block, (1/nb)*np.sum(freq_b))
PSD_X_exp_b = np.append(PSD_X_exp_b, (1/nb)*np.sum(psd_b))
STD_PSD_X_exp_b = np.append(STD_PSD_X_exp_b, PSD_X_exp_b/np.sqrt(nb))
plt.figure()
plt.loglog(freq, PSD_X_exp)
plt.legend(['Raw Experimental PSD'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.figure()
plt.loglog(freq_block, PSD_X_exp_b)
plt.legend(['Experimental PSD after blocking'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
kB = cst.k # Boltzmann constant [m^2kg/s^2K]
T = 273.15 + 25 # Temperature [K]
r = (2.8 / 2) * 1e-6 # Particle radius [m]
v = 0.00002414 * 10 ** (247.8 / (-140 + T)) # Water viscosity [Pa*s]
gamma = np.pi * 6 * r * v # [m*Pa*s]
Do = kB*T/gamma # expected value for D
f3db_o = 50000 # expected value for f3db
fc_o = 300 # expected value pour fc
n = np.arange(-10,11)
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
for i in range(0,len(x)):
# print(i)
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
popt, pcov = curve_fit(theo_spectrum_lorentzian_filter, freq_block, PSD_X_exp_b, p0=[1, 1, 1], sigma=STD_PSD_X_exp_b, absolute_sigma=True, check_finite=True,bounds=(0.1, 10), method='trf', jac=None)
D_, fc_, f3db_ = popt
D1 = D_*Do
fc1 = fc_*fc_o
f3db1 = f3db_*f3db_o
print('Diffusion constant D = ', D1, ' Corner frequency fc= ',fc1, 'f3db(diode,eff)= ', f3db1)
I believe I've successfully fitted your data. Here's the approach I took.
First, I plotted your model (with popt=[1, 1, 1]) and the data you had. I noticed your data was significantly lower than the model. Then I started fiddling with the parameters. I wanted to push the model upwards. I did that by multiplying popt[0] by increasingly large values. I ended up with 1E13 as a ballpark value. Note that I have no idea if this is physically possible for your model. Then I jury-rigged your fitting function to multiply D_ by 1E13 and ran your code. I got this fit:
So I believe it's a problem of 1) inappropriate starting values and 2) inappropriate bounds. In your position, I would revise this model, check if there's any problems with units and so on.
Here's what I used to try to fit your model:
plt.figure()
plt.loglog(freq_block[:170], PSD_X_exp_b[:170], label='Exp')
plt.loglog(freq_block[:170],
theo_spectrum_lorentzian_filter(
freq_block[:170],
1E13*popt[0], popt[1], popt[2]),
label='model'
)
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.legend()
I limited the data to point 170 because there were some weird backwards values that made me uncomfortable. I would recheck them if I were you.
Here's the model code I used. I didn't change the curve_fit call (except to limit x to :170.
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
D_ = 1E13*D_ # I only changed here
for i in range(0,len(x)):
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
I noticed the KL-Divergence term KL(Q(x)||P(x)) is computed differently when using
mean(Q(x)*(log Q(x) - log P(x)))
vs
torch.distributions.kl_divergence(Q, P)
where
Q = torch.distributions.Normal(some mean, some sigma)
P = torch.distributions.Normal(0, 1)
and when I plot the KL-divergence losses, I get this two similar but different plots:
here
Can anyone point out what is causing this difference?
The full code is below:
import numpy as np
import torch
import torch.distributions as dist
import matplotlib.pyplot as plt
def kl_1(log_qx, log_px):
"""
inputs: [B, z_dim] torch
"""
return (log_qx.exp() * (log_qx-log_px)).mean()
# ground-truth (target) P(x)
P = dist.Normal(0, 1)
mus = np.arange(-5, 5, 0.1)
sigma = 1
N = 100
kls = {"1": [], "2": []}
for mu in mus:
# prediction (current) Q(x)
Q = dist.Normal(mu, sigma)
# sample from Q
qx = Q.sample((N,))
# log prob
log_qx = Q.log_prob(qx)
log_px = P.log_prob(qx)
# kl 1
kl1 = kl_1(log_qx, log_px)
kls['1'].append(kl1.numpy())
# kl 2
kl2 = dist.kl_divergence(Q, P)
kls['2'].append(kl2.numpy())
plt.figure()
plt.scatter(mus, kls['1'], label="Q*(logQ-logP)")
plt.scatter(mus, kls['2'], label="kl_divergence")
plt.xlabel("mean of Q(x)")
plt.ylabel("computed KL Divergence")
plt.legend()
plt.show()
You have the sample weighted by the probability density if you are computing the expected value from an integral on dx. If you are using a sample from the given distribution then you approximate the expected value as the mean directly, that corresponds to integration on d cq(x) thus d cq(x) = q(x) dx, where cq(x) is the cumulative probability function, and q(x) id the probability density funciton of the variable Q.
import numpy as np
import torch
import torch.distributions as dist
import matplotlib.pyplot as plt
def kl_1(log_qx, log_px):
"""
inputs: [B, z_dim] torch
"""
return (log_qx-log_px).mean()
# ground-truth (target) P(x)
P = dist.Normal(0, 1)
mus = np.arange(-5, 5, 0.1)
sigma = 1
N = 100
kls = {"1": [], "2": []}
for mu in mus:
# prediction (current) Q(x)
Q = dist.Normal(mu, sigma)
# sample from Q
qx = Q.sample((N,))
# log prob
log_qx = Q.log_prob(qx)
log_px = P.log_prob(qx)
# kl 1
kl1 = kl_1(log_qx, log_px)
kls['1'].append(kl1.numpy())
# kl 2
kl2 = dist.kl_divergence(Q, P)
kls['2'].append(kl2.numpy())
plt.figure()
plt.scatter(mus, kls['1'], label="Q*(logQ-logP)")
plt.scatter(mus, kls['2'], label="kl_divergence")
plt.xlabel("mean of Q(x)")
plt.ylabel("computed KL Divergence")
plt.legend()
I am a learner of data science and machine learning. I have written a code for gradient descent optimization of linear regression cost function without using builtin python library. However, just to confirm whether my code is correct and verify results, I have also implemented the same using builtin python library.
The coefficient and intercept values I obtained through my code are not matching with the coefficient and intercept values obtained using builtin python module. Kindly suggest what is the error in my way of gradient descent optimization of linear regression?
my method:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
#generating column of ones
X0 = np.ones(len(Data)).reshape(-1,1)
#print(X0.shape)
X = Data.drop(['Y'], axis = 1).values
X_new = np.concatenate((X0,X), axis = 1)
#print(X_new)
#print(X_new.shape)
Y = Data.loc[:,['Y']].values
#print(Y)
#print(Y.shape)
# initial theta
theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1)
#print(theta.shape)
J_history = []
theta_history = [list(theta.flatten())]
#gradient descent implementation
iterations = 1000
alpha = 0.01
m = len(Y)
for iter in range(1,iterations):
H = X_new.dot(theta)
loss = (H-Y)
J = loss/(2*m)
J_history.append(J)
G = X_new.T.dot(loss)/m
theta_new = theta - alpha*G
theta_history.append(list(theta_new.flatten()))
theta = theta_new
# collecting costs (J) and coefficients (theta_0,theta_1)
theta_history.pop()
J_history = [i[0] for i in J_history]
params = pd.DataFrame()
params['J']=J_history
for i in range(len(theta_history[0])):
params['theta_'+str(i)]=[k[i] for k in theta_history]
idx = params[params['J']==min(params['J'])].index
values = params.iloc[idx[0]][1:params.shape[1]].tolist()
print('intercept: {}, coeff: {}'.format(values[0],values[1]))
using builtin library:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000)
model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1,1))
print('coeff: {}, intercept: {}'.format(model.coef_, model.intercept_))
First of all I appreciate your effort to understand and implement by yourself the SGD algorithm.
Now, back to your code. There are some minor errors that need to be corrected:
Your Js are not scalars but numpy.arrays but the way you're using them implies that they're assumed to be scalars hence the error raised when your code is executed.
After running your chain, you must take the theta who has the lowest error and this error is actually J^2 and not J as J may be negative as well.
The scikit learn SGDRegressor that you're actually using is, as its name suggests, stochastic by definition and given the small size of your dataset you need to run it many times and average its estimates if you want to get something reliable from it.
Your learning rate 0.01 seems to be a little big
When those changes are made, I get from your code a "comparable" results with SGDRegressor.
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()
sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
#generating column of ones
X0 = np.ones(len(Data)).reshape(-1,1)
#print(X0.shape)
X = Data.drop(['Y'], axis = 1).values
X_new = np.concatenate((X0,X), axis = 1)
#print(X_new)
#print(X_new.shape)
Y = Data.loc[:,['Y']].values
#print(Y)
#print(Y.shape)
# initial theta
theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1)
#print(theta.shape)
J_history = []
theta_history = [list(theta.flatten())]
#gradient descent implementation
iterations = 2000
alpha = 0.001
m = len(Y)
for iter in range(1,iterations):
H = X_new.dot(theta)
loss = (H-Y)
J = loss/(2*m)
J_history.append(J[0]**2)
G = X_new.T.dot(loss)/m
theta_new = theta - alpha*G
theta_history.append(list(theta_new.flatten()))
theta = theta_new
theta_history.pop()
J_history = [i[0] for i in J_history]
# collecting costs (J) and coefficients (theta_0,theta_1)
params = pd.DataFrame()
params['J']=J_history
for i in range(len(theta_history[0])):
params['theta_'+str(i)]=[k[i] for k in theta_history]
idx = params[params['J']== params['J'].min()].index
values = params.iloc[idx[0]][1:params.shape[1]].tolist()
print('intercept: {}, coeff: {}'.format(values[0],values[1]))
#> intercept: 0.654041555750147, coeff: 1.2625626277290982
Now let's see the scikit learn model
from sklearn.linear_model import SGDRegressor
intercepts = []
coefs = []
for _ in range(500):
model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000)
model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1))
intercepts.append(model.intercept_)
coefs.append(model.coef_)
intercept = np.concatenate(intercepts).mean()
coef = np.vstack(coefs).mean(0)
print('intercept: {}, coeff: {}'.format( intercept, coef))
#> intercept: 0.6912403374422401, coeff: [1.24932246]
I'm still a noob in PyMC3, so the question might me naive, but I don't know how to translate this pymc2 code in pymc3. In particular it's not clear to me how to translate the R function.
beta = pymc.Normal('beta', mu=0, tau=1.0e-4)
s = pymc.Uniform('s', lower=0, upper=1.0e+4)
tau = pymc.Lambda('tau', lambda s=s: s**(-2))
### Intrinsic CAR
#pymc.stochastic
def R(tau=tau, value=np.zeros(N)):
# Calculate mu based on average of neighbors
mu = np.array([sum(W[i]*value[A[i]])/Wplus[i] for i in xrange(N)])
# Scale precision to the number of neighbors
taux = tau*Wplus
return pymc.normal_like(value, mu, taux)
#pymc.deterministic
def M(beta=beta, R=R):
return [np.exp(beta + R[i]) for i in xrange(N)]
obsvd = pymc.Poisson("obsvd", mu=M, value=Y, observed=True)
model = pymc.Model([s, beta, obsvd])
Code from https://github.com/Youki/statistical-modeling-for-data-analysis-with-python/blob/945c13549a872d869e33bc48082c42efc022a07b/Chapter11/Chapter11.rst, and http://glau.ca/?p=340
Can you help me? Thanks
In PyMC3, you can implement the CAR model using the scan function of Theano. There is a sample code in their documentation. There are two implementations for CAR in the linked document. Here is the first one [Source]:
from theano import scan
floatX = "float32"
from pymc3.distributions import continuous
from pymc3.distributions import distribution
class CAR(distribution.Continuous):
"""
Conditional Autoregressive (CAR) distribution
Parameters
----------
a : list of adjacency information
w : list of weight information
tau : precision at each location
"""
def __init__(self, w, a, tau, *args, **kwargs):
super(CAR, self).__init__(*args, **kwargs)
self.a = a = tt.as_tensor_variable(a)
self.w = w = tt.as_tensor_variable(w)
self.tau = tau*tt.sum(w, axis=1)
self.mode = 0.
def get_mu(self, x):
def weigth_mu(w, a):
a1 = tt.cast(a, 'int32')
return tt.sum(w*x[a1])/tt.sum(w)
mu_w, _ = scan(fn=weigth_mu,
sequences=[self.w, self.a])
return mu_w
def logp(self, x):
mu_w = self.get_mu(x)
tau = self.tau
return tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x))
with pm.Model() as model1:
# Vague prior on intercept
beta0 = pm.Normal('beta0', mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal('beta1', mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma('tau_h', alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma('tau_c', alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal('theta', mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR('mu_phi', w=wmat, a=amat, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic('phi', mu_phi-tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))
# Likelihood
Yi = pm.Poisson('Yi', mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic('sd_h', tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic('sd_c', tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic('alpha', sd_c/(sd_h+sd_c))
trace1 = pm.sample(1000, tune=500, cores=4,
init='advi',
nuts_kwargs={"target_accept":0.9,
"max_treedepth": 15})
The M function is written here as:
mu = pm.Deterministic('mu', tt.exp(logE + beta0 + beta1*aff + theta + phi))
Say I try to estimate the slope of a simple y= m * x problem using the following data:
x_data = np.array([0,1,2,3])
y_data = np.array([0,1,2,3])
Clearly the slope is 1. However, when I run this in PyMC I get 10
slope = pm.Uniform('slope', lower=0, upper=20)
#pm.deterministic
def y_gen(value=y_data, x=x_data, slope=slope, observed=True):
return slope * x
model = pm.Model([slope])
mcmc = pm.MCMC(model)
mcmc.sample(100000, 5000)
# This returns 10
final_guess = mcmc.trace('slope')[:].mean()
but it should be 1!
Note: The above is with PyMC2.
You need to define a likelihood, try this:
import pymc as pm
import numpy as np
x_data = np.linspace(0,1,100)
y_data = np.linspace(0,1,100)
slope = pm.Normal('slope', mu=0, tau=10**-2)
tau = pm.Uniform('tau', lower=0, upper=20)
#pm.deterministic
def y_gen(x=x_data, slope=slope):
return slope * x
like = pm.Normal('likelihood', mu=y_gen, tau=tau, observed=True, value=y_data)
model = pm.Model([slope, y_gen, like, tau])
mcmc = pm.MCMC(model)
mcmc.sample(100000, 5000)
# This returns 10
final_guess = mcmc.trace('slope')[:].mean()
It returns 10 because you're just sampling from your uniform prior and 10 is the expected value of that.
You need to set value=y_data, observed=True for the likelihood. Also, a minor point, you don't need to instantiate a Model object. Just pass your nodes (or a call to locals()) to MCMC.