How to pass independent variable as parameter to lmfit minimize - python

I am new to this statistical thing.
So I have a data set y = f(x) for some values of x. I want to fit this data to a func so that for every point in y I can calculate the value of x.
suppose the model I want to fit is something like
def func(x,a,b,c):
return a+b*x/c
now to use minimize function, i've to define parameters:
params = Parameters()
params.add('a' , value = 10)
params.add('b' , value = 1)
params.add('c' , value = 2)
result = minimize(func, param, arg=(x, y))
My question is, what if I want to make my x variable as parameter and pass it as parameter.
Basically when I pass x as variable i am passing an array which corresponds to specific points in my data set. However I want to find use x as a parameter because I want to find value of x for certain points of data y.

Parameters and fitting variables are scalar floating-point numbers. That is to say, they have one value that can take a continuous range of values.
Do you mean that you want every element of x to be independently varied in the fit?
It is common to use minimization methods to fit data: find the set of values for the variables (say, your a, b, and c) so that y - func(x, a, b, c) is as small as possible.
Your incomplete code snippet (to be clear, it is always better to include a complete example) doesn't do that -- it doesn't pass y into func.
More importantly, you seem to be looking to "find the value of x for certain points of data y". That doesn't quite make sense to me.... Maybe clean up and clarify the question?

It's not a one way step.
I achieved it by using Model class in lmfit
Consider this.
I have model function which I name say: MF through which I calculate exact samples as of raw data.
I have raw data : Raw_Data (say for example coming from sensors)
then I have certain parameters. (x, y, z, samples)
Now I consider that samples is an independent variable.
My goal is to estimate one of the parameters.
First I have to create a Model class
mod = Model (MF, parameters, independant_vars = ['samples']
then you set the initial parameters using Parameters() class.
and add init values
fit_params = Parameters()
fit_params.add('x', value = 170)
fit_params.add('y', value = 120)
fit_params.add('z', value = 110)
Then you fit the model
Result - mod.fit(Raw_data, params = fit_params, samples = your_samples)
The output is a dict.

Related

How to find length of parameter input of function?

I'm trying to use Scipy's ODR to fit various different curves to data. These curves have to be given as an ODR Model, which is defined by a function. This function has two arguments: p and x. p is a list of the parameters that will be optimised. Example:
def f(p, x):
m, c = p
return m*x + c
model = Model(f)
data = RealData(xdata, ydata)
odr_setup = ODR(data, model, beta0=[0,0], partol=0.001)
odr_result = odr_setup.run()
My problem: the length of p depends on the function I am fitting. If the length of beta0 and p are not the same, it gives a ValueError. To get round this, I have constructed some nested try-except statements to get the matching length. Is there a more elegant way of achieving the same thing?
Ideally I would like something like ODR(data, model, ***beta0=[0]*n_params***, partol=0.001). How would I find n_params? Or is there a better way?
Thanks!

Parametric Polymorphism Problem: Using function with single float parameter with an array of float parameters

To clarify what I mean, my issue is with a simulated annealing problem where I want to find the theta that gives me the max area of a shape:
def Area(theta):
#returns area
def SimAnneal(space,func, T):
#space is some linspace
#func is some function that takes in some theta and outputs some area
#T = separate temperature parameter that is not relevant for problem
#returns maximum area from given thetas
Simulated annealing starts by choosing a random starting “theta”, in this scenario. My goal is to use the setup above as shown below. It should be noted that the input for area() is a single theta, but my hope was that there was some way to make ? a “potential” list of thetas that the next function, SimAnneal(), can choose from.
x = np.linspace(0,100,1000)
func = area(?)
T = 4
SimAnneal(x,func,T)
What should I put into ? in order for SimAnneal to output correctly.
In other words, is there a ? that can satisfy the condition of being a single float parameter but carry all the possible float parameters in some linspace?
You can use np.vectorize to apply a func taking a single value as follows:
import numpy as np
def Area(theta):
pass
def SimAnneal(space, func, T):
applied_space = np.vectorize(func)(space)
x = np.linspace(0, 100, 1000)
T = 4
SimAnneal(x, Area, T)
Note that np.vectorize won't actually give you performance improvements that we see with actual vectorization. It is instead a convenient interface that exactly fits your need: Applying a func that takes a single value to a bunch of values (your space).
Alternative, you can move the np.vectorize call outside of SimAnneal like this:
def SimAnneal(space, func, T):
applied_space = func(space)
x = np.linspace(0, 100, 1000)
func = np.vectorize(Area)
T = 4
SimAnneal(x, func, T)
This is closer to your original example.
First, there is no data type that is both a float and a collection. Additionally, you want to pass the area function directly into the SimAnneal function rather than the return of a call to it as you currently have it:
SimAnneal(x, area, T)
From a design standpoint, it would make more sense to leave the area function as is taking a single float as a parameter. That being said, it is relatively simple to run a single function through a list and store those outputs with the theta that created it using a technique called Dictionary Comprehensions. In the example below thetas is the list of thetas you want to choose from:
areas = {i: area(i) for i in thetas}
From there you can then search through the new dictionary to find the theta that produced the greatest area:
max_theta = list(areas.keys())[0] # retrieve the first theta
for theta, area in areas.items():
if area > areas[theta]:
max_theta = theta
return theta

How do I improve a Gaussian/Normal fit in Python 3.X by using a running median?

I have an array of 100x100 data points, where I'm trying to perform a Gaussian fit to each column of 100 values in the array. I then want the parameters of the Gaussian found by using the fit of the first column to be the initial parameters of the starting point for the next column to use. Let's say I start with the initial parameters of 1000, 0, and 1, and the fit finds values of 800, 3, and 1.5. I then want the fitter to use these three parameters as initial values for the next column.
My code is:
x = np.linspace(-50,50,100)
Gauss_Model = models.Gaussian1D(amplitude = 1000., mean = 0, stddev = 1.)
Fitting_Model = fitting.LevMarLSQFitter()
Fit_Data = []
for i in range(0, Data_Array.shape[0]):
Fit_Data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
Right now it uses the same initial values for every fit. Does anyone know how to perform such a running median/mean for a Gaussian fitting method? Would really appreciate any help or being pointed in the right direction, thanks!
I'm not familiar with the specific library you are using, but if you can get your fitted parameters out with something like fit_data[-1].amplitude or fit_data[-1].mean, then you could modify your loop to use something like:
for i in range(0, data_array.shape[0]):
if fit_data: # true if not an empty list
Gauss_Model = models.Gaussian1D(amplitude=fit_data[-1].amplitude,
mean=fit_data[-1].mean,
stddev=fit_data[-1].stddev)
fit_data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
basically checking whether you have already fit a model, and if you have, use the most recent fitted amplitude, mean, and standard deviation as the starting point for your next Gauss_Model.
A thought: this might speed up your fitting, but it shouldn't result in a "better" fit to the 100 data points in each fit operation. Your resulting model is probably the best fit model to the data it was presented. If you want to estimate the error in the parameters of your model, you can use the fact that, for two normal distributions A ~ N(m_a, v_a) and B ~ N(m_b, v_b), the distribution A + B will have mean m_a + m_b and variance is v_a + v_b. Thus, the distribution of your means will be N(sum(means)/n, sum(variances)/n). Basically you can say that your true mean is centered at the mean of your means with standard deviation (sum(stddev)/sqrt(n)).
I also cannot tell what library you are using, and the details of how to do this probably depend on the details of how that library stores the fitted values. I can say that for lmfit (https://lmfit.github.io/lmfit-py/) we struggled with this sort of usage and arrived at a design that makes what you are trying to do pretty easy. With lmfit, you might compose this problem as:
import numpy as np
from lmfit import GaussianModel
x = np.linspace(-50,50,100)
# get Data_Array from somewhere....
# create a model for a Gaussian
Gauss_Model = GaussianModel()
# make a set of parameters, setting initial values
params = Gauss_Model.make_params(amplitude=1000, center=0, sigma=1.0)
Fit_Results = []
for i in range(Data_Array.shape[1]):
result = Gauss_Model.fit(Data_Array[:, i], params, x=x)
Fit_Results.append(result)
# update `params` with the current best fit params for the next column
params = result.params
Note that this works because lmfit is careful that Model.fit() will not alter the input parameters, and will put the resulting best-fit parameters for each fit in result.params.
And, if you decide you do want to have all columns use the original initial values, just comment out that last params = result.params.
Lmfit has a lot more bells and whistles, but I hope that helps you do what you need.

Calculate maximum likelihood using PyMC3

There are cases when I'm not actually interested in the full posterior of a Bayesian inference, but simply the maximum likelihood (or maximum a posterior for suitably chosen priors), and possibly it's Hessian. PyMC3 has functions to do that, but find_MAP seems to return the model parameters in transformed form depending on the prior distribution on them. Is there an easy way to get the untransformed values from these? The output of find_hessian is even less clear to me, but it's most likely in the transformed space too.
May be the simpler solution will be to pass the argument transform=None, to avoid PyMC3 doing the transformation and then using find_MAP
I let you and example for a simple model.
data = np.repeat((0, 1), (3, 6))
with pm.Model() as normal_aproximation:
p = pm.Uniform('p', 0, 1, transform=None)
w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
mean_q = pm.find_MAP()
std_q = ((1/pm.find_hessian(mean_q))**0.5)[0]
print(mean_q['p'], std_q)
Have you considered using ADVI?
I came across this once more and found a way to get the untransformed values from the transformed ones. Just in case somebody else needs this as-well. The gist of it is that the untransformed values are essentially theano expressions that can be evaluated given the transformed values. PyMC3 helps here a little by providing the Model.fn() function which creates such an evaluation function accepting values by name. Now you only need to supply the untransformed variables of interest to the outs argument. A complete example:
data = np.repeat((0, 1), (3, 6))
with pm.Model() as normal_aproximation:
p = pm.Uniform('p', 0, 1)
w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
map_estimate = pm.find_MAP()
# create a function that evaluates p, given the transformed values
evalfun = normal_aproximation.fn(outs=p)
# create name:value mappings for the free variables (e.g. transformed values)
inp = {v:map_estimate[v.name] for v in model.free_RVs}
# now use that input mapping to evaluate p
p_estimate = evalfun(inp)
outs can also receive a list of variables, evalfun will then output the values of the corresponding variables in the same order.

forecast results for a specific vector x using statsmodels linear regressions

I've successfully built a model using OLS for a large number of data.
results = smf.ols(formula='ind ~ Age + C(County) + C(Class)', data=df).fit()
I'd like to implement a method that allows the user to input a vector X and have it return a y based on the regression. I looked into "predict" and "forecast" features of statsmodels but it didn't seem to be what I was looking for.
So for example, what I'd like to do is:
## Although the following is wrong, It shows what I'm trying to do:
def forecast_y(X):
return results.forecast(X)
## example:
print forecast_y([1, 3, 4])
# the model should return
4.53
If I'm getting that right, you don't want the in-sample prediction of y, which is why you don't want to use the predict method; instead you just want to be able to plug in arbitrary x values and get a value for y based on the predicted coefficients ?
If that's the case, continuing from your example:
params = results.params #vector of your coefficients
arbitrary_x = np.array([.5, .5, .5...]) #whatever x values you want to test, with the constant first
assert(len(params) == len(arbitrary_x))
arbitrary_y = (params * arbitrary_x).sum()
I'll leave understanding the implications of this to the reader, but do use with care.

Categories