Gaussian process regression - explain behaviour

Gaussian process regression - explain behaviour - python

I'm looking into GP regression, but I'm getting some behaviour that I do not understand.
Basically, I wanted to show convergence for GP on the osciallatory Genz function (basically a period wave), which led me to this picture Gp convergence, sorry for the missing labels (x axis: num samples, y axis: relative error measure in 2000 points)
This is OK, but I was curious why it took so long before the error started to drop. Plotting the resulting GP fit I got this (busy) plot GP fit is orange, true function is blue. What I don't understand is what happens up until it starts to capture the true function. I assumed it had something to do with the kernel. The plot here uses a RBF kernel with length_scale = 1 (I also tried both higher and lower values, but got the same results).
I kind of expected it to have a more smooth behaviour even if it couldn't capture the true model.
So, to my question: why do I see this "spikey" behaviour? And can I do something to change it (kernel-wise or other)?
kernel = RBF(length_scale = 1, length_scale_bounds = (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)
def genz(x, method = 'default'):
d = x.shape[1]
a = 10/d
w = 1/2
num_points = x.shape[0]
funcval = np.empty([1,num_points])
for i in range(num_points):
funcval[0,i] = np.cos(2 * np.pi * w + np.sum(a * x[i,:]))
return funcval

It seems like the optimized length scale is very small compared to its domain space. I also felt very weird when I was digging into this library; changing some hyperparameters and the number of optimization didn't work for me as well. It might be helpful to change your kernel function to matern with changing the gamma value but not very much. If you really want to customize as you want, I might recommend you to use gpytorch similar to torch implementation or the GPML matlab toolbox.

Related

lmfit Stepped functions and Step size

I want to fit a 2D shape in an image. In the past, I have successfully done this using lmfit in Python and wrapping the 2D function/data to 1D. On that occasion, the 2D model was a smooth function (a ring with a gaussian profile). Now I am trying to do the same but with a "non-smooth function" and it is not working as expected.
This is what I am trying to do (guessed and fitted are the same):
I have shifted the guessed parameters in purpose to easily see if it moves as expected, and nothing happens.
I have noticed that if instead of a swiss flag I use a 2D gaussian, which is a smooth function, this works fine (see MWE below):
So I guess the problem is related to the fact that the Swiss flag function is not smooth. I have tried to make it smooth by adding a gaussian filter (blur) but it still did not work, even though the swiss flag plot became very blurred.
After some time I came to the thought that maybe the step size that is using lmfit (o whoever is in the background) is too small to produce any change in the swiss flag. I would like to try to increase the step size to 1, but I don't know exactly how to do that.
This is my MWE (sorry, it is still quite long):
import numpy as np
import myplotlib as mpl # https://github.com/SengerM/myplotlib
import lmfit
def draw_swiss_flag(fig, center, side, **kwargs):
fig.plot(
np.array(2*[side] + 2*[side/2] + 2*[-side/2] + 2*[-side] + 2*[-side/2] + 2*[side/2] + 2*[side]) + center[0],
np.array([0] + 2*[side/2] + 2*[side] + 2*[side/2] + 2*[-side/2] + 2*[-side] + 2*[-side/2] + [0]) + center[1],
**kwargs,
)
def swiss_flag(x, y, center: tuple, side: float):
# x, y numpy arrays.
if x.shape != y.shape:
raise ValueError(f'<x> and <y> must have the same shape!')
flag = np.zeros(x.shape)
flag[(center[0]-side/2<x)&(x<center[0]+side/2)&(center[1]-side<y)&(y<center[1]+side)] = 1
flag[(center[1]-side/2<y)&(y<center[1]+side/2)&(center[0]-side<x)&(x<center[0]+side)] = 1
return flag
def gaussian_2d(x, y, center, side):
return np.exp(-(x-center[0])**2/side**2-(y-center[1])**2/side**2)
def wrapper_for_lmfit(x, x_pixels, y_pixels, function_2D_to_wrap, *params):
pixel_number = x # This is the pixel number in the data array
# x_pixels and y_pixels are the number of pixels that the image has. This is needed to make the mapping.
if (pixel_number > x_pixels*y_pixels - 1).any():
raise ValueError('pixel_number (x) > x_pixels*y_pixels - 1')
x = np.array([int(p%x_pixels) for p in pixel_number])
y = np.array([int(p/x_pixels) for p in pixel_number])
return function_2D_to_wrap(x, y, *params)
data = np.genfromtxt('data.txt') # Read data
data -= data.min().min()
data = data/data.max().max()
guessed_center = (data.sum(axis=0).argmax()+11, data.sum(axis=1).argmax()+11) # I am adding 11 in purpose.
guessed_side = 19
model = lmfit.Model(lambda x, xc, yc, side: wrapper_for_lmfit(x, data.shape[1], data.shape[0], swiss_flag, (xc,yc), side))
params = model.make_params()
params['xc'].set(value = guessed_center[0], min = 0, max = data.shape[1])
params['yc'].set(value = guessed_center[1], min = 0, max = data.shape[0])
params['side'].set(value = guessed_side, min = 0)
fit_results = model.fit(data.ravel(), params, x = [i for i in range(len(data.ravel()))])
mpl.manager.set_plotting_package('matplotlib')
fit_plot = mpl.manager.new(
title = 'Data vs fit',
aspect = 'equal',
)
fit_plot.colormap(data)
draw_swiss_flag(fit_plot, guessed_center, guessed_side, label = 'Guessed')
draw_swiss_flag(fit_plot, (fit_results.params['xc'],fit_results.params['yc']), fit_results.params['side'], label = 'Fitted')
swiss_flag_plot = mpl.manager.new(
title = 'Swiss flag plot',
aspect = 'equal',
)
xx, yy = np.meshgrid(np.array([i for i in range(data.shape[1])]), np.array([i for i in range(data.shape[0])]))
swiss_flag_plot.colormap(
z = swiss_flag(xx, yy, center = (fit_results.params['xc'],fit_results.params['yc']), side = fit_results.params['side']),
)
mpl.manager.show()
and this is the content of data.txt.

It seems your code is all fine. The issue is, as you already guessed, that the algorithm used by lmfit is not dealing well with non-smooth data.
By default lmfit uses a leas squares method. Let's change it to method 'differential_evolution' instead.
params['side'].set(value=guessed_side, min=0, max=len(data))
fit_results = model.fit(data.ravel(), params,
x=[i for i in range(len(data.ravel()))],
method='differential_evolution'
)
Note that I needed to add some finite value for the max value to prevent a "differential_evolution requires finite bound for all varying parameters" message.
After switching to the evolutionary algorithm, the fit now looks like this:

All the fitting algorithms in lmfit (and scipy.optimize for that matter), and including the "global optimizers" really work on continuous variables (double precision). When trying to find the optimal parameter values, most of the algorithms will make a very small step (at the ~1.e-7 level) in the value to determine the derivative which will then be used to make the next guess of the optimal values.
The problem you're seeing is that your model function uses the parameter value as discrete values - as the index of an array using int(). If a small change is made to the parameter value, no change in the result will be detected - the algorithm will decide that the fit result does not depend on small changes to that value.
The so-called "global solvers" like differential evolution, basin-hopping, shgo, take the view that the derivative approach can lead to "false minima" and so will "spray parameter space" with lots of candidate values and then use different strategies to refine the best of those results to find the optimal values. Generally speaking, these are much slower to run (OTOH runtime is cheap!) and very good for problems where there may be multiple "minima" and you really want to find the best of these, or where getting a decent guess of starting values is very hard.
For your problem, it is pretty clear that you can guess starting values (the center pixels must be on the image, say, so maybe guess "the middle"), and it seems likely from the image that there are not a lot of false minima that might be found. That means that the expense of a global solver might not be needed.
Another approach would be to allow your shaped object to be centered at any continuous center in the image, and not only at integer pixels. Of course, you do have to map that to the discrete image, but it doesn't need to fully on/off. Using a sigmoidal functions like scipy.special.erf() and erfc() will allow you to still have a transition from "on" to "off", but with a small but finite width, bleeding into adjacent pixels. And that would be enough to allow a fit to find a continuous (and so, sub-pixel!) value for the center position. In 1-d, that might look like::
from scipy.special import erf
def smoothed_window(x, edge1, edge2, width):
return (erf((x-edge1)/width) + erf((edge2-x)/width))/2.0
For integer x values, a width of 0.5 (that is, half a pixel) will almost certainly allow a fit to find sub-integer values for edge1 and edge2. (Aside: either force the width parameter to be fixed or force it to be positive, eithr in the code or at the Parameter level).
I have not tried to extend that to your more complicated "swiss flag" function, but it should be possible and also work for fitting center values.

Can a variable be used as 'observed' data in a PyMC3 model?

I am new to the Bayesian world and PyMC3, and am struggling with a simple model setup. Specifically, how to deal with a setup where the 'observed' data are themselves modified by the random variables? As an example, lets' say I have a collection of 2d points [Xi, Yi] that form an arc about a circle whose central point [Xc,Yc], I don't know. However, I expect that the distances between the points and the circle center, Ri, should be normally distributed, about a known radius, R. I therefore initially thought I could assign Xc and Yc uniform priors (on some arbitrarily large range) and then re-calculate Ri within the model and assign Ri as the 'observed' data to get posterior estimates on Xc and Yc:
import pymc3 as pm
import numpy as np
points = np.array([[2.95, 4.98], [3.28, 4.88], [3.84, 4.59], [4.47, 4.09], [2.1,5.1], [5.4, 1.8]])
Xi = points[:,0]
Yi = points[:,1]
#known [Xc,Yc] = [2.1, 1.8]
R = 3.3
with pm.Model() as Cir_model:
Xc = pm.Uniform('Xc', lower=-20, upper=20)
Yc = pm.Uniform('Yc', lower=-20, upper=20)
Ri = pm.math.sqrt((Xi-Xc)**2 + (Yi-Yc)**2)
y = pm.Normal('y', mu=R, sd=1.0, observed=Ri)
samples = pm.fit(random_seed=2020).sample(1000)
pm.plot_posterior(samples, var_names=['Xc'])
pm.plot_posterior(samples, var_names=['Yc']);
While this code runs and gives me something, it clearly isn't working properly, which isn't surprising because it didn't seem right to be feeding a variable (Ri) in as 'observed' data. However, while I know there is something seriously wrong with my model setup (and my understanding more generally), I can't seem to recognize it. Any help greatly appreciated!

This model is actually doing fine, but there are a few things you might improve:
Using a variable as an observation is not great, in that you should think about what it is doing to the distribution you are fitting. It will fit a distribution, but you should think about whether you are double-counting variables in a prior and a likelihood. That doesn't matter so much for this toy model though!
You are using pm.fit(...), which uses variational inference, but MCMC is fine here, so replacing that whole line with samples = pm.sample() works.
The points you provide are almost exactly on a circle -- the empirical standard deviation is around 0.004, but standard deviation you supply in the liklihood is 1: around 250x the true value! Sampling from the model as-is allows for the center of the points to be in two different places:
If you change the likelihood to y = pm.Normal('y', mu=R, sd=0.01, observed=Ri), you still get two possible centers, though there's a little more mass near the true center:
Finally, you could take an approach where you put a prior on the scale, and also learn that, which happily feels the most principled and gives results closest to the true ones. Here's the model:
with pm.Model():
Xc = pm.Uniform('Xc', lower=-20, upper=20)
Yc = pm.Uniform('Yc', lower=-20, upper=20)
Ri = pm.math.sqrt((Xi-Xc)**2 + (Yi-Yc)**2)
obs_sd = pm.HalfNormal('obs_sd', 1)
y = pm.Normal('y', mu=R, sd=obs_sd, observed=Ri)
samples = pm.sample()
and here's the output:

Bad quality of Viterbi Algorithm (HMM)

I've been trying to get into hidden Markov models and the Viterbi algorithm recently. I found a library called hmmlearn (http://hmmlearn.readthedocs.io/en/latest/tutorial.html) to help me generate a state sequence for two states (with Gaussian emissions). Then I wanted to re-determine the state sequence using Viterbi. My code works, but predicts approximately 5% of the states wrong (depending on the means and variances of the Gaussian emissions). The hmmlearn library has a .predict method which also uses Viterbi to determine the state sequence.
My problem now is that the Viterbi algorithm by hmmlearn is much better than my hand-written one (error rate is lower than 0.5% compared to my 5%). I couldn't find any major problem in my code, so I'm not sure why this is the case. Below is my code where I first generate the state and observation sequence Z and X, predict Z with hmmlearn and finally predict it with my own code:
# Import libraries
import numpy as np
import scipy.stats as st
from hmmlearn import hmm
# Generate a sequence
model = hmm.GaussianHMM(n_components = 2, covariance_type = "spherical")
model.startprob_ = pi
model.transmat_ = A
model.means_ = obs_means
model.covars_ = obs_covars
X, Z = model.sample(T)
## Predict the states from generated observations with the hmmlearn library
Z_pred = model.predict(X)
# Predict the state sequence with Viterbi by hand
B = np.concatenate((st.norm(mean_1,var_1).pdf(X), st.norm(mean_2,var_2).pdf(X)), axis = 1)
delta = np.zeros(shape = (T, 2))
psi = np.zeros(shape= (T, 2))
### Calculate starting values
for s in np.arange(2):
delta[0, s] = np.log(pi[s]) + np.log(B[0, s])
psi = np.zeros((T, 2))
### Take everything in log space since values get very low as t -> T
for t in range(1,T):
for s_post in range(0, 2):
delta[t, s_post] = np.max([delta[t - 1, :] + np.log(A[:, s_post])], axis = 1) + np.log(B[t, s_post])
psi[t, s_post] = np.argmax([delta[t - 1, :] + np.log(A[:, s_post])], axis = 1)
### Backtrack
states = np.zeros(T, dtype=np.int32)
states[T-1] = np.argmax(delta[T-1])
for t in range(T-2, -1, -1):
states[t] = psi[t+1, states[t+1]]
I'm not sure if I have a big error in my code or if hmmlearn just uses a more refined Viterbi algorithm. I have noticed by looking into the falsely predicted states that the impact of the emission probability B seems to be too big as it causes the states to change too frequently even if the transition probability to go to the other state is really low.
I'm rather new to python so please excuse my ugly coding. Thanks in advance for any tips you might have!
Edit: As you can see in the code, I'm stupid and used variances instead of the standard deviation to determine the emission probabilities. After fixing this, I get the same result as the implemented Viterbi algorithm.

Using scipy optimize for MLE estimate and curve fitting

I randomly generated 1000 data points using the weights I know are true for the normal distribution. Now I am trying to minimize the -log likelihood function to estimate the values of sig^2 and the weights. I sort of get the process conceptually, but when I try to code it I'm just lost.
This is my model:
p(y|x, w, sig^2) = N(y|w0+w1x+...+wnx^n, sig^2)
I've been googling for a while now and I've learned the scipy.stats.optimize.minimize function is good for this, but I can't get it to work right. Every solution I have tried has worked for the example I got the solution from, but I'm unable to extrapolate it to my problem.
x = np.linspace(0, 1000, num=1000)
data = []
for y in x:
data.append(np.polyval([.5, 1, 3], y))
#plot to confirm I do have a normal distribution...
data.sort()
pdf = stats.norm.pdf(data, np.mean(data), np.std(data))
plt.plot(test, pdf)
plt.show()
#This is where I am stuck.
logLik = -np.sum(stats.norm.logpdf(data, loc=??, scale=??))
I have found that the equation error(w) = .5*sum(poly(x_n, w) - y_n)^2 is relevant for minimizing the error of the weights, which therefore maximizes my likelihood for the weights, but I don't understand how to code this... I have found a similar relationship for sig^2, but have the same problem. Can somebody clarify how to do this to help my curve fitting? Maybe go as far to post psuedo code I can use?

Yes, implementing likelihood fitting with minimize is tricky, I spend a lot of time on it. Which is why I wrapped it. If I may shamelessly plug my own package symfit, your problem can be solved by doing something like this:
from symfit import Parameter, Variable, Likelihood, exp
import numpy as np
# Define the model for an exponential distribution
beta = Parameter()
x = Variable()
model = (1 / beta) * exp(-x / beta)
# Draw 100 samples from an exponential distribution with beta=5.5
data = np.random.exponential(5.5, 100)
# Do the fitting!
fit = Likelihood(model, data)
fit_result = fit.execute()
I have to admit I don't exactly understand your distribution, since I don't understand the role of your w, but perhaps with this code as an example, you'll know how to adapt it.
If not, let me know the full mathematical equation of your model so I can help you further.
For more info check the docs. (For a more technical description of what happens under the hood, read here and here.)

I think there's an issue with your setup. With maximum likelihood, you obtain the parameters that maximize the probability of observing your data (given a certain model). Your model seems to be:
where epsilon is N(0, sigma).
So you maximize it:
or equivalently take logs to get:
The f in this case is the log-normal probability density function which you can get with stats.norm.logpdf. You should then use scipy.minimize to maximize an expression that will be the summation of stats.norm.logpdf evaluated at each of the i points, from 1 to your sample size.
If I've understood you correctly, your code is missing having a y vector plus an x vector! Show us a sample of those vectors and I can update my answer to include a sample code for estimating MLE with that date.

Fourier smoothing of data set

I am following this link to do a smoothing of my data set.
The technique is based on the principle of removing the higher order terms of the Fourier Transform of the signal, and so obtaining a smoothed function.
This is part of my code:
N = len(y)
y = y.astype(float) # fix issue, see below
yfft = fft(y, N)
yfft[31:] = 0.0 # set higher harmonics to zero
y_smooth = fft(yfft, N)
ax.errorbar(phase, y, yerr = err, fmt='b.', capsize=0, elinewidth=1.0)
ax.plot(phase, y_smooth/30, color='black') #arbitrary normalization, see below
However some things do not work properly.
Indeed, you can check the resulting plot :
The blue points are my data, while the black line should be the smoothed curve.
First of all I had to convert my array of data y by following this discussion.
Second, I just normalized arbitrarily to compare the curve with data, since I don't know why the original curve had values much higher than the data points.
Most importantly, the curve is like "specular" to the data point, and I don't know why this happens.
It would be great to have some advices especially to the third point, and more generally how to optimize the smoothing with this technique for my particular data set shape.

Your problem is probably due to the shifting that the standard FFT does. You can read about it here.
Your data is real, so you can take advantage of symmetries in the FT and use the special function np.fft.rfft
import numpy as np
x = np.arange(40)
y = np.log(x + 1) * np.exp(-x/8.) * x**2 + np.random.random(40) * 15
rft = np.fft.rfft(y)
rft[5:] = 0 # Note, rft.shape = 21
y_smooth = np.fft.irfft(rft)
plt.plot(x, y, label='Original')
plt.plot(x, y_smooth, label='Smoothed')
plt.legend(loc=0)
plt.show()
If you plot the absolute value of rft, you will see that there is almost no information in frequencies beyond 5, so that is why I choose that threshold (and a bit of playing around, too).
Here the results:

From what I can gather you want to build a low pass filter by doing the following:
Move to the frequency domain. (Fourier transform)
Remove undesired frequencies.
Move back to the time domain. (Inverse fourier transform)
Looking at your code, instead of doing 3) you're just doing another fourier transform. Instead, try doing an actual inverse fourier transform to move back to the time domain:
y_smooth = ifft(yfft, N)
Have a look at scipy signal to see a bunch of already available filters.
(Edit: I'd be curious to see the results, do share!)

I would be very cautious in using this technique. By zeroing out frequency components of the FFT you are effectively constructing a brick wall filter in the frequency domain. This will result in convolution with a sinc in the time domain and likely distort the information you want to process. Look up "Gibbs phenomenon" for more information.
You're probably better off designing a low pass filter or using a simple N-point moving average (which is itself a LPF) to accomplish the smoothing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gaussian process regression - explain behaviour - python

Related

lmfit Stepped functions and Step size

Can a variable be used as 'observed' data in a PyMC3 model?

Bad quality of Viterbi Algorithm (HMM)

Using scipy optimize for MLE estimate and curve fitting

Fourier smoothing of data set

Categories

Resources