I was trying to fit an A*cos(wt)cos(Ot) function to this dataset:
dataset,
using the scipy curve fit function, but it either fails (doesn't find a fit) or the fits is not good.
Here is my code
def PEND(x,A,O,w): #the function I want to fit
y=A*np.cos(O*x)*np.cos(w*x)
return y
Guess=[2.1,4.39822971502571,0.029]
parameters, covariance = curve_fit(PEND, xdata=t, ydata=x,p0=Guess,bounds=([1.9,0.1,0.001],[2.2,20,1]))
A=parameters[0]
O=parameters[1]
w=parameters[2]
xfit=PEND(t,A,O,w)
Result:
[1.9 4.40327678 0.02658705]
Where I have tried changing the Guess, the variables I fit, the function, bounds etc. many times, and the best I got was with the code above resulting in:
Resulting fit
closeup
As you can see the fit is not satisfactory. I do know the model is not perfect, as the amplitude falls of gradually, but my problem doesn't change whether or not I do it on the whole data set or the first 1/3 of the dataset. As you can also see, the second frequency goes down by quite a lot in the fit, which is weird, as mine was a bit too low to begin with. Also the Amplitude goes down to the minimum and if I do not set the bounds like I do it goes down to basically zero, while the frequencies get barley changed. I believe that the program tries to fit A too much and doesn't fit the frequencies at all. If I take my best guess of the Amplitude and exclude it from the fit, I get the runtime exceeded not fit found error.
What can I do to fit this well?
Related
I am trying to fit a data set which may fit a gaussian or lorentzian, with scipy optimize curve_fit function.
I am getting the error:
"OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated',"
the data set looks like this:
enter image description here
which , as you can see, may fit a gaussian.
my code is :
def gaussian(x,a,b,c,d):
func=a*np.exp(-((x-b)**2)/c)+d
return func
def lorentzian (x,a,b,c):
func=a/(((x-b)**2+a**2)*np.pi)+c
return func
x,y_data= np. loadtxt('in 0.6 out 0.6.dat', unpack = True)
popt, pcov = curve_fit(lorentzian, x, y_data)
thank you!
You're getting this error because the fitting algorithm couldn't find an appropriate solution. No matter where it moved the parameters, the fit quality didn't change. If you provide an initial guess, you're more likely to reach the solution. Given that the function parameters are relatively easily obtained from glancing the curves, you could provide most of them. For example, the center (which you called a) is around 545.5. Wikipedia also has a relationship for the value at the maximum for a slightly different form of your equation, which lacks the c parameter to shift the curve upwards. Providing the guess p0 = (0.1, 545.5, 0) and a bound of (0, 1E10) you get something much closer to your results, yet still unsatisfactory (next time, provide the data array, I had to use a point extractor to plot this)
Now, notice how you're supposed to reach a maximum value of 40, yet that seems unattainable by your model. I took the liberty of normalizing your model simply by dividing it by its maximum value and trying to fit again. I don't know if this is the appropriate normalization, but this is just to illustrate the difference. This time, the result is much more satisfactory:
Yet I think a lorentzian is a bit too narrow for your curve (especially evident if you set c to 0), which looks much more like a gaussian (given you provided its definition but didn't use it, I guess you probably would have used it in the future).
Note how I didn't have to normalize y.
In summary:
Provide an initial guess to your fitting algorithm, and bounds if possible.
Always plot your models and data to see what's going on.
Be aware of the limits of your models, what values it can or can't reach. Use this to evaluate if a fit is even possible in the first place.
I want to fit part of my data with python, using lmfit( it is not a must!). I'd like to have dynamic range of data to be fitted, meaning having two fitting parameters which define the part of my data to be fitted (let's call it lower and upper boundaries).
the reason is I have many data sets. in each of the the fitting range varies and I cannot define a model to fit the whole range of data. on the other hand I cannot go through each data set and define the fitting range.
is it possible at all? I thought of multiplying a pulse function to my model which affects the original data as well. though as far as I understand I cannot tell lmfit to multiply it to the data. so I am out of idea!
The number of observations (data points or length of the residual array) returned by the model function or function to be minimized has to be the same throughout an individual fit. Of course, this can change between successive fits. So, you could try multiple fits for each data set with different ranges, perhaps set based on the previous fit.
I think that your idea of using "where the fit is bad" to determine where to not fit is somewhat suspect and you would want to make sure to avoid that leading to absurd results. If for example, the range was automatically reduced so much so that Ndata = Nvariables+1, you could probably get a very low chi-square compared to Ndata = 100*Nvariables.
Without knowing the particulars, I think you would better off coming up with criteria for selecting the data range that depended on the data alone, and not a fit to it.
I am trying to fit a function to a part of the following graph:
I want to find out the time the signal starts increasing exponentially. To do this I fit an exponential curve to the data, multiplied by a heavyside step function.
def fit(x, a, b, c, d, e):
return np.heaviside(x-a, 0.5)*b*np.exp(c*x-d)+e
parameter, covariance = curve_fit(fit, fitx, fity)
x = np.linspace(min(fitx), max(fitx), 1000)
plt.plot(fitx, fity)
plt.plot(x, fit(x, *parameter), 'b-', label='fit')
plt.show()
The result is somehow a straight line
When I fit only the exponential part I get the following graph:
I'd expect a straight line at the x-axis, followed by the exponential graph in image 2. Does anybody know where I went wrong?
The most likely situation is that you have an issue with convergence of the parameters. In most cases, this convergence problem is due to bad starting points for the parameters.
Since it works as expected without the heavyside function, my guess would be that you should give a reasonable starting point for your parameter a in the curve_fit function call.
You say that you want to find "the time the signal starts increasing exponentially", but the signal plotted does not increase exponentially. In fact, it decreases (at least going in increasing time and left to right). and looks peak-like. Do you mean that you want to fit some function to that drop?
I'd guess that a Gaussian might work well. Using a step function might be OK too, but probably won't fit well above t=1e-8 or so.
You didn't include data or complete code, so it's hard to give a concrete example. But you might find the lmfit package helpful here. It has a builtin Step Model that can use a linear or error function or logistic curve. See http://lmfit.github.io/lmfit-py/builtin_models.html#step-like-models. This might be close to what you're trying to do.
Is there a more intelligent function than scipy.optimize.curve_fit in Python?
I also need to define a function to fit data with.
I've spend ages trying to fit data with it. I can fit only basic functions and fitting two lines with piecewise function is impossible while the y-axis has low values like 0.01-0.05 and x-axis values like 20-60.
I know I have to plug in initial values, but still it takes too much time and sometimes it does not work.
EDIT
I added graph where are data I fitted and you can see the effect of changing bounds in scipy.optimize.curve_fit.
The function I fit with is this one:
def abslines(x,a,b,c,d):
return np.piecewise(x, [x < -b/a, x >= -b/a], [lambda x: a*x+b+d, lambda x: c*(x+b/a)+d])
Initial conditions are same everytime and I think they are close enough:
p0=[-0.001,0.2,0.005,0.]
because the values of parameters from best fit are:
[-0.00411946 0.19895546 0.00817832 0.00758401]
Bounds are:
No bounds;
bounds=([-1.,0.,0.,0.],[0.,1.,1.,1.])
bounds=([-0.5,0.01,0.0001,0.],[-0.001,0.5,0.5,1.])
bounds=([-0.1,0.01,0.0001,0.],[-0.001,0.5,0.1,1.])
bounds=([-0.01,0.1,0.0001,0.],[-0.001,0.5,0.1,1.])
starting with no bounds, end with best bounds
Still I think, that this takes too much time and curve_fit can find it better. This way I have to almost specify the function and it seems like I am fitting by changing parameters not that curve_fit is fitting.
Without knowing what is exactly the regression algorithm in Python it is quite impossible to give a definitive answer. Probably the calculus is iterative and requires initial guesses, which are probably derived from the specified bounds. So, the bounds have an indirect effect on the convergence and the results.
I suggest to try a simpler algorithm (not iterative, no initial guess) coming from this paper : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The code is easy to write in any computer language. I suppose this can be done with Python as well.
The piecewise function to be fitted is :
The parameters to be computed are a1, p1, q1, p2 and q2.
The result is shown on the next figure, with the approximate values of the parameters.
So that, no bounds are required to be specified and as a consequence no problems related to bounds.
NOTE : The method is based on the fitting of a convenient integral equation such as shown in the above referenced paper. The numerical calculus of the integral is subjected to deviations if the number of points is too small. In the present case, they are a large number of points. So, even scattered this is a favourable case for the practical application of this method.
1.Algorithms behind curve_fit expect differentiable functions, thus it can go south if given a non-differential one.
For a more powerful interface to curve fitting, have a look at lmfit.
I am aware of the existence of this, and this on this topic. However, I would like to finalize on an actual implementation in Python this time.
My only problem is that the elbow point seems to be changing from different instantiations of my code. Observe the two plots shown in this post. While they appear to be visually similar, the value of the elbow point changed significantly. Both the curves were generated from an average of 20 different runs. Even then, there is a significant shift in the value of the elbow point. What precautions can I take to make sure that the value falls within a certain bound?
My attempt is shown below:
def elbowPoint(points):
secondDerivative = collections.defaultdict(lambda:0)
for i in range(1, len(points) - 1):
secondDerivative[i] = points[i+1] + points[i-1] - 2*points[i]
max_index = secondDerivative.values().index(max(secondDerivative.values()))
elbow_point = max_index + 1
return elbow_point
points = [0.80881476685027154, 0.79457906121371058, 0.78071124401504677, 0.77110686192601441, 0.76062373158581287, 0.75174963969985187, 0.74356408965979193, 0.73577573557299236, 0.72782434749305047, 0.71952590556748364, 0.71417942487824781, 0.7076502559300516, 0.70089375208028415, 0.69393584640497064, 0.68550490458450741, 0.68494440529025913, 0.67920157634796108, 0.67280267176628761]
max_point = elbowPoint(points)
Its sounds like your actual concern is how to smooth your data as it contains noise? in which case perhaps you should fit a curve to the data first, then find the elbow of the fitted curve?
Whether this will work would depend on the source of the noise, and if the noise is important for your application? by the way you may want to see how sensitive your fit is to your data by seeing how it changes (or hopefully doesn't) when a point is omitted from the fit (obviously with a high enough polynomial you will always get a good fit to a specific set of data, but you are presumably interested in the general case)
I have no idea if this approach is acceptable, intuitively though i'd think that sensitivity to small errors is bad. ultimately by fitting a curve you are saying that the underlying process is, in the ideal case, modelled by the curve, and any deviation from the curve is an error/noise