Linear regression for calibrating on a 2D grid - numpy - python

I am running a laser experiment where I'm trying to measure some features, but I also have to deal with a linear background. This background is both a constant (whatever light I measure when the laser is off) as well as a multiplicative scale factor. This background cannot be determined analytically, so I need to do an mx+b fit on the data. But, I need to do this on every point in the field of view.
The way I'd do it would be to take calibration images at a range of uniform brightnesses, and then run a regression, assigning a unique m_ij and b_ij to every point. I could probably do this in a for loop, but that seems like it'd be insanely slow for an image that's on the order of 1 mpx.
I found a solution here that used np.vander. I've tried using that, but (a) don't quite understand what I'm doing with it, and (b) it doesn't work with curve_fit. I could use np.linalg.lstsq, but it doesn't allow me to assign yerr corresponding to the noise of the images.
My current non-working example:
def fit_many_with_error(x, y, order=2, xerrs=None, yerrs=None):
'''
arguments:
x: [N]
y: [N x S]
where:
N - # of measurements per pixel
S - # pixels
returns [`order` x S]
'''
def f(x, m, b):
return m * x + b
A = np.vander(x, N=order)
B = np.vander(y, N=order)
params = curve_fit(f, A, B, sigma=None)
return params
params = fit_many_with_error(xvals, yvals)
Which gives me ValueError: operands could not be broadcast together with shapes (20,) (20,200,100)

Related

Interpolating a function over a grid with different input sizes

I have a function f(u,v,w) which I would like to interpolate using a scipy function (with linear interpolation). This is easy enough.
When I run the interpolation step, I simply do the following (interpolating over a u,v,w grid):
u = np.linspace(-1,1,100)
v = np.linspace(-2,2,50)
w = np.linspace(3,8,30)
values_grid = np.zeros((len(u),len(v),len(w)))
count = 0
for i in range(len(u)):
for j in range(len(w)):
for k in range(len(w)):
values_grid[i,j,k] = f(u[i],v[j],w[k])
from scipy.interpolate import RegularGridInterpolator
my_interpolating_function = RegularGridInterpolator((u, v, w), values_grid, method='linear',bounds_error=False,fill_value=-999)
This is fine for many cases. However, when I want to evaluate this interpolation function it seems like I am required to use inputs which have shape [(Number of input samples) x (Dimension of Samples)]. E.g:
func_input = np.vstack([u_samps,v_samps,w_samps].T # E.g. shape is 500,3
output = my_interpolating_function(func_input)) # Has output shape 500
This works fine. The issue is that I would like to evaluate this function over a grid where the samples have the following shape
shape(u_samps) = 500
shape(v_samps) = (100,100)
shape(w_samps) = (100,100)
Meaning I would like to evaluate
my_interpolating_function([u_samps, v_samps, w_samps])
and get out an array which has shape (500,100,100) (so the interpolation is evaluated for all 500 u_samps over the v_samps and w_samps grids). I can flatten the v_samps and w_samps array, but then I have to make several (hundreds) copies of u_samps to get the inputs into the correct format. So is there any way to have an interpolation function that can take the inputs above (u_samps, v_samps, w_samps with the specified shapes) and get out an array with shape (500,100,100) efficiently?
Any help greatly appreciated, I have been stuck on this problem and it's really holding up my progress! The end goal is to use this function in a statistical likelihood which needs to be sampled with MCMC, so speed is pretty important (and making hundreds of copies of massive arrays is very slow)

lmfit Stepped functions and Step size

I want to fit a 2D shape in an image. In the past, I have successfully done this using lmfit in Python and wrapping the 2D function/data to 1D. On that occasion, the 2D model was a smooth function (a ring with a gaussian profile). Now I am trying to do the same but with a "non-smooth function" and it is not working as expected.
This is what I am trying to do (guessed and fitted are the same):
I have shifted the guessed parameters in purpose to easily see if it moves as expected, and nothing happens.
I have noticed that if instead of a swiss flag I use a 2D gaussian, which is a smooth function, this works fine (see MWE below):
So I guess the problem is related to the fact that the Swiss flag function is not smooth. I have tried to make it smooth by adding a gaussian filter (blur) but it still did not work, even though the swiss flag plot became very blurred.
After some time I came to the thought that maybe the step size that is using lmfit (o whoever is in the background) is too small to produce any change in the swiss flag. I would like to try to increase the step size to 1, but I don't know exactly how to do that.
This is my MWE (sorry, it is still quite long):
import numpy as np
import myplotlib as mpl # https://github.com/SengerM/myplotlib
import lmfit
def draw_swiss_flag(fig, center, side, **kwargs):
fig.plot(
np.array(2*[side] + 2*[side/2] + 2*[-side/2] + 2*[-side] + 2*[-side/2] + 2*[side/2] + 2*[side]) + center[0],
np.array([0] + 2*[side/2] + 2*[side] + 2*[side/2] + 2*[-side/2] + 2*[-side] + 2*[-side/2] + [0]) + center[1],
**kwargs,
)
def swiss_flag(x, y, center: tuple, side: float):
# x, y numpy arrays.
if x.shape != y.shape:
raise ValueError(f'<x> and <y> must have the same shape!')
flag = np.zeros(x.shape)
flag[(center[0]-side/2<x)&(x<center[0]+side/2)&(center[1]-side<y)&(y<center[1]+side)] = 1
flag[(center[1]-side/2<y)&(y<center[1]+side/2)&(center[0]-side<x)&(x<center[0]+side)] = 1
return flag
def gaussian_2d(x, y, center, side):
return np.exp(-(x-center[0])**2/side**2-(y-center[1])**2/side**2)
def wrapper_for_lmfit(x, x_pixels, y_pixels, function_2D_to_wrap, *params):
pixel_number = x # This is the pixel number in the data array
# x_pixels and y_pixels are the number of pixels that the image has. This is needed to make the mapping.
if (pixel_number > x_pixels*y_pixels - 1).any():
raise ValueError('pixel_number (x) > x_pixels*y_pixels - 1')
x = np.array([int(p%x_pixels) for p in pixel_number])
y = np.array([int(p/x_pixels) for p in pixel_number])
return function_2D_to_wrap(x, y, *params)
data = np.genfromtxt('data.txt') # Read data
data -= data.min().min()
data = data/data.max().max()
guessed_center = (data.sum(axis=0).argmax()+11, data.sum(axis=1).argmax()+11) # I am adding 11 in purpose.
guessed_side = 19
model = lmfit.Model(lambda x, xc, yc, side: wrapper_for_lmfit(x, data.shape[1], data.shape[0], swiss_flag, (xc,yc), side))
params = model.make_params()
params['xc'].set(value = guessed_center[0], min = 0, max = data.shape[1])
params['yc'].set(value = guessed_center[1], min = 0, max = data.shape[0])
params['side'].set(value = guessed_side, min = 0)
fit_results = model.fit(data.ravel(), params, x = [i for i in range(len(data.ravel()))])
mpl.manager.set_plotting_package('matplotlib')
fit_plot = mpl.manager.new(
title = 'Data vs fit',
aspect = 'equal',
)
fit_plot.colormap(data)
draw_swiss_flag(fit_plot, guessed_center, guessed_side, label = 'Guessed')
draw_swiss_flag(fit_plot, (fit_results.params['xc'],fit_results.params['yc']), fit_results.params['side'], label = 'Fitted')
swiss_flag_plot = mpl.manager.new(
title = 'Swiss flag plot',
aspect = 'equal',
)
xx, yy = np.meshgrid(np.array([i for i in range(data.shape[1])]), np.array([i for i in range(data.shape[0])]))
swiss_flag_plot.colormap(
z = swiss_flag(xx, yy, center = (fit_results.params['xc'],fit_results.params['yc']), side = fit_results.params['side']),
)
mpl.manager.show()
and this is the content of data.txt.
It seems your code is all fine. The issue is, as you already guessed, that the algorithm used by lmfit is not dealing well with non-smooth data.
By default lmfit uses a leas squares method. Let's change it to method 'differential_evolution' instead.
params['side'].set(value=guessed_side, min=0, max=len(data))
fit_results = model.fit(data.ravel(), params,
x=[i for i in range(len(data.ravel()))],
method='differential_evolution'
)
Note that I needed to add some finite value for the max value to prevent a "differential_evolution requires finite bound for all varying parameters" message.
After switching to the evolutionary algorithm, the fit now looks like this:
All the fitting algorithms in lmfit (and scipy.optimize for that matter), and including the "global optimizers" really work on continuous variables (double precision). When trying to find the optimal parameter values, most of the algorithms will make a very small step (at the ~1.e-7 level) in the value to determine the derivative which will then be used to make the next guess of the optimal values.
The problem you're seeing is that your model function uses the parameter value as discrete values - as the index of an array using int(). If a small change is made to the parameter value, no change in the result will be detected - the algorithm will decide that the fit result does not depend on small changes to that value.
The so-called "global solvers" like differential evolution, basin-hopping, shgo, take the view that the derivative approach can lead to "false minima" and so will "spray parameter space" with lots of candidate values and then use different strategies to refine the best of those results to find the optimal values. Generally speaking, these are much slower to run (OTOH runtime is cheap!) and very good for problems where there may be multiple "minima" and you really want to find the best of these, or where getting a decent guess of starting values is very hard.
For your problem, it is pretty clear that you can guess starting values (the center pixels must be on the image, say, so maybe guess "the middle"), and it seems likely from the image that there are not a lot of false minima that might be found. That means that the expense of a global solver might not be needed.
Another approach would be to allow your shaped object to be centered at any continuous center in the image, and not only at integer pixels. Of course, you do have to map that to the discrete image, but it doesn't need to fully on/off. Using a sigmoidal functions like scipy.special.erf() and erfc() will allow you to still have a transition from "on" to "off", but with a small but finite width, bleeding into adjacent pixels. And that would be enough to allow a fit to find a continuous (and so, sub-pixel!) value for the center position. In 1-d, that might look like::
from scipy.special import erf
def smoothed_window(x, edge1, edge2, width):
return (erf((x-edge1)/width) + erf((edge2-x)/width))/2.0
For integer x values, a width of 0.5 (that is, half a pixel) will almost certainly allow a fit to find sub-integer values for edge1 and edge2. (Aside: either force the width parameter to be fixed or force it to be positive, eithr in the code or at the Parameter level).
I have not tried to extend that to your more complicated "swiss flag" function, but it should be possible and also work for fitting center values.

Generating 3D Gaussian Data [duplicate]

This question already has answers here:
Generating 3D Gaussian distribution in Python
(2 answers)
Closed 2 years ago.
I'm trying to generate a 3D distribution, where x, y represents the surface plane, and z is the magnitude of some value, distributed over a range.
I'm looking at numpy's multivariate_normal, but it only lets me get a number of samples. I'd like the ability to specify some x, y coordinate, and get back what the z value should be; so I'd be able to query gp(x, y) and get back a z value that adheres to some mean and covariance.
Perhaps a more illustrative (toy) example: assume I have some temperature distribution that can be modeled as a gaussian process. So I might have a mean temperature of 20 at (0, 0), and some covariance [[1, 0], [0, 1]]. I'd like to be able to create a model that I can then query at different x, y locations to get the temperature at that position (so, at (5, 5) I might get back something like 7 degrees).
How to best accomplish this?
I assume that your data can be copied to a single np.array, which I will refer to as X in my code, with shape X.shape = (n,2), where n is the number of data points you have and you can have n = 1, if you wish to test a single point at a time. 2, of course, refers to the 2D space spanned by your coordinates (x and y) base. Then:
def estimate_gaussian(X):
return X.mean(axis=0), np.cov(X.T)
def mva_gaussian( X, mu, sigma2 ):
k = len(mu)
# check if sigma2 is a vector and, if yes, use as the diagonal of the covariance matrix
if sigma2.ndim == 1 :
sigma2 = np.diag(sigma2)
X = X - mu
return (2 * np.pi)**(-k/2) * np.linalg.det(sigma2)**(-0.5) * \
np.exp( -0.5 * np.sum( np.multiply( X.dot( np.linalg.inv(sigma2) ), X ), axis=1 ) ).reshape( ( X.shape[0], 1 ) )
will do what you want - that is, given data points you will get the value of the gaussian function at those points (or a single point). This is actually a generalized version of what you need, as this function can describe a multivariate gaussian. You seem to be interested in the k = 2 case and a diagonal covariance matrix sigma2.
Moreover, this is also a probability distribution - which you say you don't want. We don't have enough info to know what exactly it is you're trying to fit to (i.e. what you expect the three parameters of the gaussian function to be. Usually, people are interested in a normal distribution). Nevertheless, you can simply change the parameters in the return statement of the mva_gaussian function according to your needs and ignore the estimate gaussian function if you don't want a normalized distribution (although a normalized function would still give you what you seek - a real valued temperature - as long as you know the normalization process - which you do :-) ).
You can create a multivariate normal using scipy.stats.multivariate_normal.
>>> import scipy.stats
>>> dist = scipy.stats.multivariate_normal(mean=[2,3], cov=[[1,0],
[0,1]])
Then to find p(x,y) you can use pdf
>>> dist.pdf([2,3])
0.15915494309189535
>>> dist.pdf([1,1])
0.013064233284684921
Which represents the probability (which you called z) given any [x,y]

Reducing difference between two graphs by optimizing more than one variable in MATLAB/Python?

Suppose 'h' is a function of x,y,z and t and it gives us a graph line (t,h) (simulated). At the same time we also have observed graph (observed values of h against t). How can I reduce the difference between observed (t,h) and simulated (t,h) graph by optimizing values of x,y and z? I want to change the simulated graph so that it imitates closer and closer to the observed graph in MATLAB/Python. In literature I have read that people have done same thing by Lavenberg-marquardt algorithm but don't know how to do it?
You are actually trying to fit the parameters x,y,z of the parametrized function h(x,y,z;t).
MATLAB
You're right that in MATLAB you should either use lsqcurvefit of the Optimization toolbox, or fit of the Curve Fitting Toolbox (I prefer the latter).
Looking at the documentation of lsqcurvefit:
x = lsqcurvefit(fun,x0,xdata,ydata);
It says in the documentation that you have a model F(x,xdata) with coefficients x and sample points xdata, and a set of measured values ydata. The function returns the least-squares parameter set x, with which your function is closest to the measured values.
Fitting algorithms usually need starting points, some implementations can choose randomly, in case of lsqcurvefit this is what x0 is for. If you have
h = #(x,y,z,t) ... %// actual function here
t_meas = ... %// actual measured times here
h_meas = ... %// actual measured data here
then in the conventions of lsqcurvefit,
fun <--> #(params,t) h(params(1),params(2),params(3),t)
x0 <--> starting guess for [x,y,z]: [x0,y0,z0]
xdata <--> t_meas
ydata <--> h_meas
Your function h(x,y,z,t) should be vectorized in t, such that for vector input in t the return value is the same size as t. Then the call to lsqcurvefit will give you the optimal set of parameters:
x = lsqcurvefit(#(params,t) h(params(1),params(2),params(3),t),[x0,y0,z0],t_meas,h_meas);
h_fit = h(x(1),x(2),x(3),t_meas); %// best guess from curve fitting
Python
In python, you'd have to use the scipy.optimize module, and something like scipy.optimize.curve_fit in particular. With the above conventions you need something along the lines of this:
import scipy.optimize as opt
popt,pcov = opt.curve_fit(lambda t,x,y,z: h(x,y,z,t), t_meas, y_meas, p0=[x0,y0,z0])
Note that the p0 starting array is optional, but all parameters will be set to 1 if it's missing. The result you need is the popt array, containing the optimal values for [x,y,z]:
x,y,z = popt
h_fit = h(x,y,z,t_meas)

Fast 3D interpolation of atmospheric data in Numpy/Scipy

I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)

Categories