Resetting kernel hyperparameter values in GPflow - python

My use case is this: I have a function that takes in a kernel of the user's choice then I will iterate through every date in the dataset and use a Gaussian Process Regression to estimate the model using the specified kernel. However, since I'm pointing to the kernel object, I need to reset it to the default values before I run the next iteration.
import gpflow
class WrapperClass(object):
def __init__(self, kernel):
super().__init__()
self.kernel = kernel
def fit(self, X, y):
m = gpflow.models.GPR(X, y, self.kernel) # I need to reset the kernel here
# some code later
def some_function(Xs, ys, ts, f):
for t in ts:
X = Xs.loc[t] # pandas dataframe
y = ys.loc[t] # pandas
f.fit(X, y)
k1 = gpflow.kernels.RBF(1)
k2 = gpflow.kernels.White(0.1)
k = k1 + k2
f = WrapperClass(k)
sume_function(Xs, ys, ts, f)
I've found the method read_trainables() on the kernel so one strategy is to save the settings the user has provided, but there doesn't seem to be any way to set them?
In [7]: k1.read_trainables()
Out[7]: {'Sum/rbf/lengthscales': array(1.), 'Sum/rbf/variance': array(1.)}
Cheers,
Steve

You can set the parameters of Parameterized objects (models, kernels, likelihoods etc) using assign(): k1.assign(k1.read_trainables()) (or some other dict of path-value pairs). You might as well create a new kernel object, though!
Note that each time you create new parameterized objects - this applies both to kernels and models, as in your fit() method - you add operations to the tensorflow graph, which can slow down graph computation significantly if it grows a lot. You probably want to look into manually handling tf.Graph() and tf.Session() to keep them distinct for each model. (See the notebooks on session handling and further tips and tricks in the new GPflow documentation.)

Related

Should we scale before the KElbowVisualizer method for clustering in python

I know that before any clustering we need to scale the data.
But I want to ask if the KElbowVisualizer method do the scaling by itself or before giving it the data I should scale it.
I already searched in the documentation of this method but I did not find an answer please can you share it with me if you find it. Thank you;
I looked at the implementation of KElbowVisualizer inyellowbrick/cluster/elbow.py at github and I havn't found any code under function fit (line 306) for scaling the X variables.
# https://github.com/DistrictDataLabs/yellowbrick/blob/main/yellowbrick/cluster/elbow.py
#...
def fit(self, X, y=None, **kwargs):
"""
Fits n KMeans models where n is the length of ``self.k_values_``,
storing the silhouette scores in the ``self.k_scores_`` attribute.
The "elbow" and silhouette score corresponding to it are stored in
``self.elbow_value`` and ``self.elbow_score`` respectively.
This method finishes up by calling draw to create the plot.
"""
self.k_scores_ = []
self.k_timers_ = []
self.kneedle = None
self.knee_value = None
if self.locate_elbow:
self.elbow_value_ = None
self.elbow_score_ = None
for k in self.k_values_:
# Compute the start time for each model
start = time.time()
# Set the k value and fit the model
self.estimator.set_params(n_clusters=k)
self.estimator.fit(X, **kwargs)
# Append the time and score to our plottable metrics
self.k_timers_.append(time.time() - start)
self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
#...
So, you may need to scale your data (X parameters) before passing to KElbowVisualizer().fit()

How to implement coordinate descent using tensorflow?

For example, for a simple linear model y=wx+b where x and y are input and output respectively, w and b are training parameters, I am wondering, in every epoch, how can I update b first and then update w?
Tensorflow might not be the best tool for this. You can do it just using python.
And if you need to do the regression with a more complex function scikit-learn might be a more appropriate library.
Regardless of the tool, you can do Batch Gradient Descent or Stochastic Gradient Descent.
But first you need to define a "Cost Function", this function basically tells you how far away from the true value you are, for example least mean square (LMS), this types of functions takes the prediction from your model and the true value and perform the adjustment to the training parameters.
This is the function that is optimized by BGD or SGD in the training process.
Here is an example I did to understand what is happening, it's not the optimum solution but it will give you an idea of what is happening.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
alpha = 0.00005
thetas = np.array([1.,1.])
def h(thetas, x):
#print(f'theta 0: {thetas[0]}')
#print(f'theta 1: {thetas[1]}')
#print(f'h=m:{thetas[0] + (thetas[1]*x[1])}')
return thetas[0] + (thetas[1]*x[1])
for i in zip(tips.total_bill, tips.tip):
x = np.array([1, i[0]])
y = i[1]
for index, theta in enumerate(thetas):
#print(f'theta in: {thetas[index]}')
#print(f'error: {thetas[index] + alpha*(y - h(thetas, x))*x[index]}')
thetas[index] = thetas[index] + alpha*(y - h(thetas, x))*x[index]
#print(f'theta out: {thetas[index]}')
#print(thetas)
print(thetas)
xplot = np.linspace(min(tips.total_bill), max(tips.total_bill), 100, endpoint=True)
xp = [[1,x] for x in xplot]
yp = [h(thetas, xi) for xi in xp]
plt.scatter(tips.total_bill,tips.tip)
plt.plot(xplot, yp, 'o', color= 'orange')
plt.show()
Not really possible. TF's backprop calculate gradients across all variables based on the values of the other variables at the time of forward prop. If you want to alternate between training w and b, you would unfreeze w and freeze b (set it to trainable=False), forwardprop and backprop, then freeze w and unfreeze b, and forward prop and back prop. I don't think that'd run very fast since TF isn't really design to switch the trainable flag on every mini batch.

using scipy curve_fit with dask/xarray

I'm trying to use scipy.optimize.curve_fit on a large latitude/longitude/time xarray using dask.distributed as computing backend.
The idea is to run an individual data fitting for every (latitude, longitude) using the time series.
All of this runs fine outside xarray/dask. I tested it using the time series of a single location passed as a pandas dataframe. However, if I try to run the same process on the same (latitude, longitude) directly on the xarray, the curve_fit operation returns the initial parameters.
I am performing this operation using xr.apply_ufunc like so (here I'm providing only the code that is strictly relevant to the problem):
# function to perform the fit
def _fit_rti_curve(data, data_rti, fit, loc=False):
fit_func, linearize, find_init_params = _get_fit_functions(fit)
# remove nans
x, y = _filter_nodata(data_rti, data)
# remove outliers
x, y = _filter_for_outliers(x, y, linearize=linearize)
# find a first guess for maximum achieveable value
yscale = np.max(y) * 1.05
# find a first guess for the other parameters
# here loc can be manually passed if you have a good estimation
init_parms = find_init_params(x, y, yscale, loc=loc, linearize=linearize)
# fit the curve and return parameters
parms = curve_fit(fit_func, x, y, p0=init_parms, maxfev=10000)
parms = parms[0]
return parms
# shell around _fit_rti_curve
def find_rti_func_parms(data, rti, fit):
# sort and fit highest n values
top_data = np.sort(data)
top_data = top_data[-len(rti):]
# convert to float64 if needed
top_data = top_data.astype(np.float64)
rti = rti.astype(np.float64)
# run the fit
parms = _fit_rti_curve(top_data, rti, fit, loc=0) #TODO maybe add function to allow a free loc
return parms
# call for the apply_ufunc
# `fit` is a string that defines the distribution type
# `rti` is an array for the x values
parms_data = xr.apply_ufunc(
find_rti_func_parms,
xr_obj,
input_core_dims=[['time']],
output_core_dims=[[fit + ' parameters']],
output_sizes = {fit + ' parameters': len(signature(fit_func).parameters) - 1},
vectorize=True,
kwargs={'rti':return_time_interval, 'fit':fit},
dask='parallelized',
output_dtypes=['float64']
)
My guess would be that is a problem related to threading, or at least some shared memory space that is not properly passed between workers and scheduler.
However, I am just not knowledgeable enough to test this within dask.
Any idea on this problem?
You should have a look at this issue https://github.com/pydata/xarray/issues/4300
I had the same problem and I solved using apply_ufunc. It is not optimized, since it has to perform rechunking operations, but it works!
I've created a GitHub Gist for it https://gist.github.com/clausmichele/8350e1f7f15e6828f29579914276de71
This previous answer might be helpful? It's using numpy.polyfit but I think the general approach should be similar.
Applying numpy.polyfit to xarray Dataset
Also, I haven't tried it but xr.polyfit() just got merged recently! Could also be something to look into. http://xarray.pydata.org/en/stable/generated/xarray.DataArray.polyfit.html#xarray.DataArray.polyfit

Is it possible to provide sklearns KernelRidge a callable kernel that provides a Gram Matrix?

I want to run a kernel ridge regression in python using the sklearn.kernel_ridge.KernelRidge function with a custom kernel (wendland kernel), that is not implemented in python, so I have to provide a callable (I want to avoide to use the 'precomputed' option in order to keep it in line with my other models). The problem is, that the callable has to return a float number, so it will be called once for each datapoint, which causes a real slow training.
Looking at a similar setup model, i.e. SVM.SVR, one has to provide a callable kernel function which returns the whole kernel matrix at once, which makes it much faster.
So my question is, if there is a possibility to make the KernelRidge function accept a callable function that provides the gram matrix in one step in order to speed up the process? Are there other alternatives?
import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.metrics.pairwise import check_pairwise_arrays, euclidean_distances
def Wendland_kernel(eps=None):
#Kernel I want to use und am allowed to use with SVM.SVR
def Wendland_gram_intern(X, Y=None, eps=eps):
X, Y = check_pairwise_arrays(X,Y)
if eps is None:
eps = 1.0 / X.shape[1]
K = euclidean_distances(X, Y, squared=False)
K = 1 - eps*K
return np.maximum(K,0)**2
return Wendland_gram_intern
def Wendland_single(eps=None):
#Kernel I have to use
def Wendland_single_intern(x1, y1, eps=eps):
K = np.linalg.norm(x1-y1)
K = 1 - eps*K
return np.maximum(K,0)**2
return Wendland_single_intern
X = np.random.random((10,2))
y = np.random.normal(size=(10,))
clf = KernelRidge(kernel=Wendland_single(eps=2.5))
clf.fit(X, y)
print(clf.predict([[0.5,0.5]]))

How to define General deterministic function in PyMC

In my model, I need to obtain the value of my deterministic variable from a set of parent variables using a complicated python function.
Is it possible to do that?
Following is a pyMC3 code which shows what I am trying to do in a simplified case.
import numpy as np
import pymc as pm
#Predefine values on two parameter Grid (x,w) for a set of i values (1,2,3)
idata = np.array([1,2,3])
size= 20
gridlength = size*size
Grid = np.empty((gridlength,2+len(idata)))
for x in range(size):
for w in range(size):
# A silly version of my real model evaluated on grid.
Grid[x*size+w,:]= np.array([x,w]+[(x**i + w**i) for i in idata])
# A function to find the nearest value in Grid and return its product with third variable z
def FindFromGrid(x,w,z):
return Grid[int(x)*size+int(w),2:] * z
#Generate fake Y data with error
yerror = np.random.normal(loc=0.0, scale=9.0, size=len(idata))
ydata = Grid[16*size+12,2:]*3.6 + yerror # ie. True x= 16, w= 12 and z= 3.6
with pm.Model() as model:
#Priors
x = pm.Uniform('x',lower=0,upper= size)
w = pm.Uniform('w',lower=0,upper =size)
z = pm.Uniform('z',lower=-5,upper =10)
#Expected value
y_hat = pm.Deterministic('y_hat',FindFromGrid(x,w,z))
#Data likelihood
ysigmas = np.ones(len(idata))*9.0
y_like = pm.Normal('y_like',mu= y_hat, sd=ysigmas, observed=ydata)
# Inference...
start = pm.find_MAP() # Find starting value by optimization
step = pm.NUTS(state=start) # Instantiate MCMC sampling algorithm
trace = pm.sample(1000, step, start=start, progressbar=False) # draw 1000 posterior samples using NUTS sampling
print('The trace plot')
fig = pm.traceplot(trace, lines={'x': 16, 'w': 12, 'z':3.6})
fig.show()
When I run this code, I get error at the y_hat stage, because the int() function inside the FindFromGrid(x,w,z) function needs integer not FreeRV.
Finding y_hat from a pre calculated grid is important because my real model for y_hat does not have an analytical form to express.
I have earlier tried to use OpenBUGS, but I found out here it is not possible to do this in OpenBUGS. Is it possible in PyMC ?
Update
Based on an example in pyMC github page, I found I need to add the following decorator to my FindFromGrid(x,w,z) function.
#pm.theano.compile.ops.as_op(itypes=[t.dscalar, t.dscalar, t.dscalar],otypes=[t.dvector])
This seems to solve the above mentioned issue. But I cannot use NUTS sampler anymore since it needs gradient.
Metropolis seems to be not converging.
Which step method should I use in a scenario like this?
You found the correct solution with as_op.
Regarding the convergence: Are you using pm.Metropolis() instead of pm.NUTS() by any chance? One reason this could not converge is that Metropolis() by default samples in the joint space while often Gibbs within Metropolis is more effective (and this was the default in pymc2). Having said that, I just merged this: https://github.com/pymc-devs/pymc/pull/587 which changes the default behavior of the Metropolis and Slice sampler to be non-blocked by default (so within Gibbs). Other samplers like NUTS that are primarily designed to sample the joint space still default to blocked. You can always explicitly set this with the kwarg blocked=True.
Anyway, update pymc with the most recent master and see if convergence improves. If not, try the Slice sampler.

Categories