Intelligent fitting with python

Intelligent fitting with python - python

Is there a more intelligent function than scipy.optimize.curve_fit in Python?
I also need to define a function to fit data with.
I've spend ages trying to fit data with it. I can fit only basic functions and fitting two lines with piecewise function is impossible while the y-axis has low values like 0.01-0.05 and x-axis values like 20-60.
I know I have to plug in initial values, but still it takes too much time and sometimes it does not work.
EDIT
I added graph where are data I fitted and you can see the effect of changing bounds in scipy.optimize.curve_fit.
The function I fit with is this one:
def abslines(x,a,b,c,d):
return np.piecewise(x, [x < -b/a, x >= -b/a], [lambda x: a*x+b+d, lambda x: c*(x+b/a)+d])
Initial conditions are same everytime and I think they are close enough:
p0=[-0.001,0.2,0.005,0.]
because the values of parameters from best fit are:
[-0.00411946 0.19895546 0.00817832 0.00758401]
Bounds are:
No bounds;
bounds=([-1.,0.,0.,0.],[0.,1.,1.,1.])
bounds=([-0.5,0.01,0.0001,0.],[-0.001,0.5,0.5,1.])
bounds=([-0.1,0.01,0.0001,0.],[-0.001,0.5,0.1,1.])
bounds=([-0.01,0.1,0.0001,0.],[-0.001,0.5,0.1,1.])
starting with no bounds, end with best bounds
Still I think, that this takes too much time and curve_fit can find it better. This way I have to almost specify the function and it seems like I am fitting by changing parameters not that curve_fit is fitting.

Without knowing what is exactly the regression algorithm in Python it is quite impossible to give a definitive answer. Probably the calculus is iterative and requires initial guesses, which are probably derived from the specified bounds. So, the bounds have an indirect effect on the convergence and the results.
I suggest to try a simpler algorithm (not iterative, no initial guess) coming from this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The code is easy to write in any computer language. I suppose this can be done with Python as well.
The piecewise function to be fitted is :
The parameters to be computed are a1, p1, q1, p2 and q2.
The result is shown on the next figure, with the approximate values of the parameters.
So that, no bounds are required to be specified and as a consequence no problems related to bounds.
NOTE : The method is based on the fitting of a convenient integral equation such as shown in the above referenced paper. The numerical calculus of the integral is subjected to deviations if the number of points is too small. In the present case, they are a large number of points. So, even scattered this is a favourable case for the practical application of this method.

1.Algorithms behind curve_fit expect differentiable functions, thus it can go south if given a non-differential one.
For a more powerful interface to curve fitting, have a look at lmfit.

Related

why am I getting OptimizeError while trying to fit a gaussian/lorentzian to data using curve_fit?

I am trying to fit a data set which may fit a gaussian or lorentzian, with scipy optimize curve_fit function.
I am getting the error:
"OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated',"
the data set looks like this:
enter image description here
which , as you can see, may fit a gaussian.
my code is :
def gaussian(x,a,b,c,d):
func=a*np.exp(-((x-b)**2)/c)+d
return func
def lorentzian (x,a,b,c):
func=a/(((x-b)**2+a**2)*np.pi)+c
return func
x,y_data= np. loadtxt('in 0.6 out 0.6.dat', unpack = True)
popt, pcov = curve_fit(lorentzian, x, y_data)
thank you!

You're getting this error because the fitting algorithm couldn't find an appropriate solution. No matter where it moved the parameters, the fit quality didn't change. If you provide an initial guess, you're more likely to reach the solution. Given that the function parameters are relatively easily obtained from glancing the curves, you could provide most of them. For example, the center (which you called a) is around 545.5. Wikipedia also has a relationship for the value at the maximum for a slightly different form of your equation, which lacks the c parameter to shift the curve upwards. Providing the guess p0 = (0.1, 545.5, 0) and a bound of (0, 1E10) you get something much closer to your results, yet still unsatisfactory (next time, provide the data array, I had to use a point extractor to plot this)
Now, notice how you're supposed to reach a maximum value of 40, yet that seems unattainable by your model. I took the liberty of normalizing your model simply by dividing it by its maximum value and trying to fit again. I don't know if this is the appropriate normalization, but this is just to illustrate the difference. This time, the result is much more satisfactory:
Yet I think a lorentzian is a bit too narrow for your curve (especially evident if you set c to 0), which looks much more like a gaussian (given you provided its definition but didn't use it, I guess you probably would have used it in the future).
Note how I didn't have to normalize y.
In summary:
Provide an initial guess to your fitting algorithm, and bounds if possible.
Always plot your models and data to see what's going on.
Be aware of the limits of your models, what values it can or can't reach. Use this to evaluate if a fit is even possible in the first place.

Why the initial value p0 you assigned will largely affect the fitting result

When I assign different initial values to P0, the fitting varies two orders of magnitude. The question is, how could I find the right one as the purpose of the fitting is the search the right parameters.
I got 2 variables and two unknown parameters to be found by a suitable method.
p0 = [0.5,0.3] #initial value
c,cov = curve_fit(titrition,(xp,xd),y,p0)
yp = titrition ((xp,xd),c[0],c[1])

This question isn't so much a coding question, but a very specific numerics question. curve_fit is simply trying to minimize a cost function to fit your function to the data the best it can. I don't know what the function titration looks like, but I suspect that optimizing it is non-convex. In other words, curve_fit is probably finding a different optimum value depending on the initial guess you supply. This is where you must use some sort of physical intuition to supply a best guess for what the initial conditions should be.

Scalar minimization using scipy (`minimize` vs `minimize_scalar`)

I have a polynomial function for which I would like to find all local extrema. I can evaluate the polynomial via P(x) and to its derivative via d_P(x).
My first thought was to use minimize_scalar, however this does not seem to be able to take advantage of the fact that I can evaluate the derivative. Alternatively, I can use the more general minimize function and provide the gradient.
Is there a rule of thumb about which method will work better, or is this something where I should test out both methods and see what works better. Since the function I am optimizing is a polynomial (well behaved) I wonder if it really matters so much which I use, but if someone has a more background that would be great.
In particular, P(x) is the (unique) polynomial of degree n which alternatively attains a value of 1 or -1 on a set of n-1 points.
Here is a sample of the P(x) scaled so that P(0)=1. Note that the y axis is plotted on a symlog scale.

Since you have a continuous scalar function, the documentation of minimize_scalar suggests a more discrete optimization approach. Since it doesn't use gradient information you won't have trouble with noise/discontinuities/discreteness in your objective. However, if you use minimize in conjunction with a gradient based method then you will have trouble with convergence for noise/discontinuities/discreteness.
If the objective function is fist order continuous then both minimize and minimize_scalar should yield the same solution for a given bound.

Scipy.optimize.minimize only iterates some variables.

I have written python (2.7.3) code wherein I aim to create a weighted sum of 16 data sets, and compare the result to some expected value. My problem is to find the weighting coefficients which will produce the best fit to the model. To do this, I have been experimenting with scipy's optimize.minimize routines, but have had mixed results.
Each of my individual data sets is stored as a 15x15 ndarray, so their weighted sum is also a 15x15 array. I define my own 'model' of what the sum should look like (also a 15x15 array), and quantify the goodness of fit between my result and the model using a basic least squares calculation.
R=np.sum(np.abs(model/np.max(model)-myresult)**2)
'myresult' is produced as a function of some set of parameters 'wts'. I want to find the set of parameters 'wts' which will minimise R.
To do so, I have been trying this:
res = minimize(get_best_weightings,wts,bounds=bnds,method='SLSQP',options={'disp':True,'eps':100})
Where my objective function is:
def get_best_weightings(wts):
wts_tr=wts[0:16]
wts_ti=wts[16:32]
for i,j in enumerate(portlist):
originalwtsr[j]=wts_tr[i]
originalwtsi[j]=wts_ti[i]
realwts=originalwtsr
imagwts=originalwtsi
myresult=make_weighted_beam(realwts,imagwts,1)
R=np.sum((np.abs(modelbeam/np.max(modelbeam)-myresult))**2)
return R
The input (wts) is an ndarray of shape (32,), and the output, R, is just some scalar, which should get smaller as my fit gets better. By my understanding, this is exactly the sort of problem ("Minimization of scalar function of one or more variables.") which scipy.optimize.minimize is designed to optimize (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.minimize.html ).
However, when I run the code, although the optimization routine seems to iterate over different values of all the elements of wts, only a few of them seem to 'stick'. Ie, all but four of the values are returned with the same values as my initial guess. To illustrate, I plot the values of my initial guess for wts (in blue), and the optimized values in red. You can see that for most elements, the two lines overlap.
Image:
http://imgur.com/p1hQuz7
Changing just these few parameters is not enough to get a good answer, and I can't understand why the other parameters aren't also being optimised. I suspect that maybe I'm not understanding the nature of my minimization problem, so I'm hoping someone here can point out where I'm going wrong.
I have experimented with a variety of minimize's inbuilt methods (I am by no means committed to SLSQP, or certain that it's the most appropriate choice), and with a variety of 'step sizes' eps. The bounds I am using for my parameters are all (-4000,4000). I only have scipy version .11, so I haven't tested a basinhopping routine to get the global minimum (this needs .12). I have looked at minimize.brute, but haven't tried implementing it yet - thought I'd check if anyone can steer me in a better direction first.
Any advice appreciated! Sorry for the wall of text and the possibly (probably?) idiotic question. I can post more of my code if necessary, but it's pretty long and unpolished.

Python: Plotting a power law function with exponential cutoff

I have a graph between 2 functions f and g.
I know it follows a power law function with exponential cutoff.
f(x) = x**(-alpha)*e**(-lambda*x)
How do I find the value of exponent alpha?

If you have sufficiently close x points (for example one every 0.1), you can try the following:
ln(f(x)) = -alpha ln(x) - lambda x
ln(f(x))' = - alpha / x - lambda
So depending on where you have your points:
If you have a lot of points near 0, you can try:
h(x) = x ln(f(x))' = -alpha - lambda x
So the limit of the function h when x goes to 0 is -alpha
If you have large values of x, the function x -> ln(f(x))' tends toward lambda when x goes to infinity, so you can guess lambda and use pwdyson's expression.
If you don't have close x points, the numerical derivative will be very noisy, so I would try to guess lambda as the limit of -ln(f(x)/x for large x's...
If you don't have large values, but a large number of x's, you can try a minimization of
sum_x_i (ln(y_i) + alpha ln(x_i) + lambda x_i) ^2
on both alpha and lambda (I guess It would be more precise than the initial expression)...
It is a simple least square regression (numpy.linalg.lstsq will do the job).
So you have plenty of methods, the one to chose really depends on you inputs.

The usual and general way of doing what you want is to perform a non-linear regression (even though, as noted in another response, it is possible to linearize the problem). Python can do this quite easily with the help of the SciPy package, which is used by many scientists.
The routine you are looking for is its least-square optimization routine (scipy.optimize.leastsq). Once you wrap your head around the way this general optimization procedure works (see the example), you will probably find many other opportunities to use it. Basically, you calculate the list of differences between your measurements and their ideal value f(x), and you ask SciPy to find the parameters that make these differences as small as possible, so that your data fits the model as well as possible. This then gives you the parameter you are looking for.

It sounds like you might be trying to fit a power-law to a distribution with an exponential cutoff at the low end due to incompleteness - but I may be reading too far into your problem.
If that is the problem you're dealing with, this website (and accompanying publication) addresses the issue: http://tuvalu.santafe.edu/~aaronc/powerlaws/. I wrote the python implementation of the power-law fitter on that page; it is linked from there.

If you know that the points follow this law exactly, then invert the equation and put in an x and its corresponding f(x) value:
import math
alpha = -(lambda*x + math.log(f(x)))/math.log(x)
But the if the points do not exactly fit the equation you will need to do some sort of regression to determine alpha.
EDIT: Ok, so they don't fit exactly. This is getting beyond a Python question, but there may be something in numpy that can handle it. Here is a numpy linear regression recipe but your equation can't be rearranged into a linear form, so you'll have to look into non-linear regression.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.